In the following article (
you will see that one feature of the compiler for Visual C++ 7.0 or .NET is that it can take advantage of specific features of Intel Pentium 4 and AMD Athlon.
Ok, but I don't see there, that optimisation flags could be choosed by compiler authomatically - you must decide manually, what kind of optimisation to use:
On a Pentium 4 or AMD Athlon machine, the /G7 /arch:SSE2 version runs about 10% faster. This code cannot be run on a machine without the appropriate chip.
Compiler cannot authomatically generate code, which will not work on certain machines. It would mean, that compiling on Athlon will produce code, not running on Celeron.
So my opinion is, that dwcasey could try to test different optimisation options - may be some of them will be more suitable for Xeon.
In addition, doing looping is not a very good way of measuring performance because the software may be sharing the processor's time with other running applications/services and these running applications/services may differ between the two machines.
Yes, of course it will measure performance of whole system - hardware + OS + running environment, but dwcasey said, his application uses 97% of processor time. What do you think - is this pure process running time or it contains also some time elapsed for switching between processes?
In addition you can measure execution time of different kinds of calculation you use in your application, adding to loop body calls of different functions. May be, for instance, floating point unit is slow on your server - floating calculation is also probably unusual job for servers. Possible optimisations should be taken into account here:
for(ii=0; ii<1000; ii++) rr=sqrt(123.45);
could be optimised to
rr=sqrt(123.45);
for(ii=0; ii<1000; ii++);
or more likely to
rr=11.11;
for(ii=0; ii<1000; ii++);