Hello John.
Which compiler did you used to produce this code?
It seems like your 'swap_xor()' is indeed more
efficient than the 'swap_mov()' your compiler generates.
The interesting thing is: When I replace the code from my two functions by the inline code of your previous post, the swap_mov() method is... faster. I am surprised, but maybe it has something to do with the Pentium 'pipelined' processing/caching or whatever. If you have enough time: Could you please run my program on your machine and post the results? It would be interesting to know.
regards,
-- ralf