Of course the other side of the cache issue is that very often a problem needs several different bits of code that differ only in a single instruction (example, drawing lines by Bressenham's algorithm if the line is more or less than 45 degrees), going up or down etc. You have 3 options:
(1) put all possible instructions in the middle of your loop, with a branch before to decide which is relevant for this execution. This is slower than necessary (but short).
(2) modify the instruction (once, and once only, before the loop starts). This is traditional, self modifying, short, and quick.
(3) use multiple copies of nearly-identical code, differing in only 1 byte. This looks good in these days of cheap memory, but it vastly increases program size, and vastly increases the amount of stuff you need to get into the cache. Cache misses caused by calls to functions that aren't currently in cache are very expensive. Yes, Bressenham's line drawing method is very short compared to modern caches, but just imagine if you have nested versions of code with multiple options at each level. Perhaps every programer in the team thought multiple copies of his code were good... Just look at code-bloat anyway. In throwing away this sort of self-modifying code we've lost a very valuable baby with some undoubtedly undesirable bathwater.
Incidentally, this sort of self-modifying code is maintenance friendly too. If you want to change your algorithm, you've only got one version to change. You never get the situation that you improved the line-drawing and now all lines drawn upwards are great, but ones drawn downwards at less than 45 degrees suddenly aren't, because you forgot to modify all the versions.