Why is self-modifying code bad?

sweetflame · Oct 18, 2004

So everywhere tells me that self-modifying code is bad but doesn't say why. Can someone please explain to me why it is so bad? Thanks!

rwong · Oct 19, 2004

Hey flame

Certainly nowadays you do not require self-modifying code, because you have plenty of resources available like memory and processor speed, etc.
Most malicious code like viruses and malware tend to use self-modifying code in order to fool software that we use to detect them. Any modern os includes capabilities that will launch alarms if they see this kind of behavior, or allow separate memory space for code and data, which is safer. But if you feel that you have an application (not a virus) that could be benefited using self-modifying code, go ahead,

regards,

Rick

lionelhill · Oct 20, 2004

At least historically, self modifying code has played an important role. Before instruction sets became so rich in indexing registers and indirect addressing, self-modifying was sometimes the best way to do indirect addressing.

The problems with it are:

(1) Those programers and computer science lecturers who are a bit low on talent find it hard to understand.

(2) Any operating system that won't allow code to be changed will not run self-modifying code. (But my biassed view is that operating systems working like this have thrown away one of the greatest realisations in computing history: that code and data are actually both numbers and freely mixable when it's useful to do so).

(3) Very obscure point: under certain incredibly narrow and strange circumstances, on some processors, you could get an ambiguity about what happens, or at least a loss of processing speed. The example I'm thinking of is the prefetch queue issue: if an instruction changes something that follows it at once, the next thing may already be in the processor's prefetch queue; the processor may already be handling it. Therefore either the processor has to carry on regardless (and execute something other than the logical intent of the code), or clear its entire "production line" and start again (slow).

Having said all that, self-modifying code has been used successfully all over the place, and has saved huge amounts of memory over the years. I have a decided sympathy for it, but perhaps, like valves and the genuine bell in a teletext printer, it belongs to computing's picturesque past.

Vorlath · Nov 19, 2004

1. If the instruction being modified is already in the pipeline, the changed instruction won't get executed or there will be a hefty stall to reload the pipeline and cache. It'll all depend on how the processor is designed.

2. Most 32 bit CPUs now have protection mechanisms or paging that differentiates between code and data. Code is usually read-only and if you try to modify it, you'll get a gpf or some other exception.

3. Most CPU have caches split in halves, one for code and one for data. Caches for code are usually load only and not writeable. This means if you try to change an instruction, the CPU will just ignore it because the cache won't get updated.

4. Behavior may be undefined depending on what the CPU designers decide in order to speed up execution. Self-modifying code results in unnecessary CPU design complexities.

A programmer is a device for converting Coke into software.

lionelhill · Nov 27, 2004

Of course the other side of the cache issue is that very often a problem needs several different bits of code that differ only in a single instruction (example, drawing lines by Bressenham's algorithm if the line is more or less than 45 degrees), going up or down etc. You have 3 options:
(1) put all possible instructions in the middle of your loop, with a branch before to decide which is relevant for this execution. This is slower than necessary (but short).
(2) modify the instruction (once, and once only, before the loop starts). This is traditional, self modifying, short, and quick.
(3) use multiple copies of nearly-identical code, differing in only 1 byte. This looks good in these days of cheap memory, but it vastly increases program size, and vastly increases the amount of stuff you need to get into the cache. Cache misses caused by calls to functions that aren't currently in cache are very expensive. Yes, Bressenham's line drawing method is very short compared to modern caches, but just imagine if you have nested versions of code with multiple options at each level. Perhaps every programer in the team thought multiple copies of his code were good... Just look at code-bloat anyway. In throwing away this sort of self-modifying code we've lost a very valuable baby with some undoubtedly undesirable bathwater.

Incidentally, this sort of self-modifying code is maintenance friendly too. If you want to change your algorithm, you've only got one version to change. You never get the situation that you improved the line-drawing and now all lines drawn upwards are great, but ones drawn downwards at less than 45 degrees suddenly aren't, because you forgot to modify all the versions.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Why is self-modifying code bad?

sweetflame

Programmer

rwong

Programmer

lionelhill

Technical User

Vorlath

Programmer

lionelhill

Technical User

Similar threads

Part and Inventory Search

Sponsor