Thanks Gideon and Cheng. Those are some good ways to optimize this core cycles.
One more method to get it optimized is by using the below given assembly code.
P0.H = HI(_sImage);
P0.L = LO(_sImage);
P1 = 3*1024*4;
R0 = 0;
LSETUP( ZeroInitLoop , ZeroInitLoop ) LC0 = P1;
ZeroInitLoop: W[P0++] = R0;
Regards,
Punarva