This is really up to you.
The first chunk (384 32 bit words) is sucked in via DMA, so that is faster than you could supply it.
After that the boot loader is executed which is the code that is inside this first block.
You can write the boot kernel any way you like and add flow control via FLAG pins. This might be needed if your boot loader takes a lot of time to initialize large amounts of external memory for example.
Klaus