This is indeed pretty odd.
Here's what I would experiment with:
Try introducing short pauses between the words being pulled by the ARM (and possibly remove the SMLS bit).
What is the slave select line doing (you are not showing it in the scope trace).
According to the documentation it acts as an SPI reset of sorts and the line needs to return high between words if CPHASE=0. So I would try having the ARM toggle the slave select line high between words.
It looks like you might not be doing that since the scope trace shows four 32 bit transfers back to back...
This would be my number one suspicion of why you are having trouble.
Finally you enable interrupt generation for the DMA. Do you have any handlers installed for these?
Klaus
p.s. there is no need to flush the FIFOs manually as you do in your code if you disable and then re-enable the SPI. Of course it doesn't hurt either.