How much stack is my program using is useful information to have as you have to allocate enough for the maximum depth to ensure that overflow and corruption does not occur.
Some architectures have a trap mechanism which you can set the bounds of the stack and any push\pop type operation outside of this area will trigger a handler.
The ARM7TDMI does not have this, does not have dedicated stack instructions, and has multiple stack pointers to complicate the situation.
The common approach is to initialise the stack area with a pattern, run the program through all conditions and see how much of the “pattern” has been consumed. This gives an indication of what amount of stack space is required for each stack. I think some of the compiler toolchains can do this automatically for you.
They can also some indication of the stack depth for the main thread through analysis of the code.
This is difficult to do for nested interrupts. I don’t have much to suggest here but an interesting suggestion was to place the IRQ stack at the bottom of the SRAM. One traditional approach is to place variables at the bottom of the SRAM and the stacks at the top. The ARM uses what is known as a full-descending type stack which means it will grow downwards. If it grows downwards too much it will corrupt non stack based variables.
A suggestion is to reverse this, and put the IRQ stack at the bottom of SRAM. Then when if an overflow happens, it will trigger an abort exception at least preventing corruption. I haven’t given it much more thought than that but thought it an interesting approach.
Sorry we can be more help.