With a multiprocessor machine, you should be writing the code in vector notation [like A(B:X,C:Y), for instance]wherever possible. This was certainly true of the Crays (I took a class years ago) and should also be true of more modern architectures.
If your computer has a compiler designed to optimize code for the architecture of your computer (it should), this lets the compiler optimize the calculations to best use the processors available to you. Read your system's Fortran documentation for more details.
Pat O'Connell