Video gamers know about cheat codes, but assembly language programmers are often in search of undocumented instructions. One way to find them is to map out all of a CPU’s opcodes and where there are holes, try those values, and see what happens. Not good enough for [Ken Shirriff]. He prefers examining the CPU’s microcode and deducing what each part of it does.
Microcode is a feature of many modern CPUs. The CPU runs several “microcode” instructions to process a single opcode. For the Intel 8086, there are 512 micro instructions, each with 21 bits. Each instruction has two parts: a part that moves a source to a destination and another that performs some other operation, such as an ALU operation. [Ken] explains it all in the post, including several hidden registers you can’t see, but the microcode can.
Some of the undocumented instructions are probably not useful. They are either impractical or duplicate a function you can already do another way. Not all of the instructions are there for technical reasons. For example, opcode D6, commonly known as SALC for “Set AL to Carry”, seems to exist only as a trap for anyone making a carbon copy of Intel’s microcode. When other companies like NEC made 8086 clones, having an undocumented instruction would strongly suggest they just copied Intel’s intellectual property (in NECs case, they didn’t).
Other cases happen where an instruction just doesn’t make sense. For example, you can pop all segment registers, and though it is not documented, you can deduce that POP CS should be opcode 0F. The problem is there is no sane reason to pop CS off the stack. The instruction works; it just isn’t useful. The opcodes from 60-6F are conditional jumps that are no different from the instructions at 70-7F because of decoding. There is no reason to document both identical instruction ranges.
The plot thickens when you go to two-byte instructions. You’ll find plenty of instructions of dubious value. You don’t hear much about undocumented instructions anymore. Why? Because modern CPUs have enough circuitry to dedicate some to detecting illegal instructions and halting the CPU. But the 8086 was squeezed too tight to allow for such a luxury. Good thing for people like us who enjoy solving puzzles.
You can still get a modern CPU to tell you more about instructions even if it won’t run them. Even the 80286 had some secret opcodes.
Very interesting, I love [Ken Shirriff]’s articles. I notice now that c9h is a return instruction on the 8086 as well as the z80!
Oops I’ve given away my misspent youth!
That’s probably because both the Z80 and the 8086 are derivatives of the 8080, which also had 0C9H as the opcode for return.
Cloning early CPU carried risks. If they copied 100% exact, they risked lawsuit for unauthorized copies. But if they made a compatible clone and left out undocumented opcode, games that used it wouldn’t work at all.
Getting license to make clones were the only way to avoid pitfalls.
another technique is to get a major anti-monopoly organization on your side. which obviously has its own downsides but i believe some of the s/360 clones were formally blessed by government, even with formal requirements against IBM to avoid this kind of undocumented shenanigan
“But if they made a compatible clone and left out undocumented opcode, games that used it wouldn’t work at all.”
And that’s a problem, really. Because, un-documented op-codes weren’t necessarily a secret. Some un-documented op-codes were an “open secret”, so to say. Let ‘s just think of the famous Z80. There were whole books covering programming with the un-documented op-codes..
What you do is less important than how you do it in legal cases.
The early IBM PCs where cloned 100% perfectly by using the Cleanroom design / Chinese wall technique.
Team A reverse engineers the product and documents in detail how it works.
Team B takes that document and creates a product.
There is no direct communication between (members of) team A and team B, which means is was no “direct” copy of the product.
https://en.wikipedia.org/wiki/Clean_room_design
heh “assembly language programmers are often in search of undocumented instructions”
i beg to differ :)
I agree. I have enough ‘instructions’ to deal with/learn rather than go after ‘undocumented ones’ :) . Some people just have to a go a bit further though … just for fun!
And yet, programs were/are written with undocumented instructions.
I’m not sure that the idea of popping all the segments including CS is that odd at the micro level. Might it be part of IRET?
Right but from the end user it would be pretty unmanageable since you pop CS you’re going to wind up in a new segment. I’m sure all of those things have some purpose or some reason like the duplicate decoding, but it’s hard to imagine what end programmer wants to pop CS off the stack. You’d have to coordinate code between multiple segments so that the pop made sense
I can see it being useful in code obfuscation for copy protection.
i’m going to describe “coordinate code between multiple segments” with the phrase “just a linking problem” :P
and already i can envision the ldscript to accomplish that. of course, linkers weren’t that complicated back then. sometimes i get sad thinking about all the space DOS left for our exploration, which never got explored simply because it didn’t last very long in the grand scheme of things. silly as that is, since good riddance to 16 bit addressing.
The problem is that you replace CS with whatever is on the stack, but not the IP within the current segment so execution would continue at the next IP in a different segment. It would take some carefully structured code to make use of that.
And also since instructions are pre-fetched, an undefined number of instructions will still be executed from the old CS after POP CS is run, as this instruction does not clear the cache.
So it doesn’t quite do what it’s supposed to.
It could be fenced. Also this would not make sense as the effective address being executed has changed so the cache is irrelevant, the pipeline would be cleared like on a branch misprediction or self modifying code. Unless you have tested and examined this behavior, I would think it’s far more likely the pipeline behavior works properly. Prefetch has nothing to do with cache when the effective address has changed. Sure the cache in the page the CS now points to if it was indeed cached could be stale but that’s not what you imply. So it sounds like you’ve little idea how things work here as you’ve utterly mixed concepts.
I wonder what would happen if this was done with a descriptor use for ‘big real’ mode to get that descriptor into CS.
The pedant in me rankles a bit when things that are really unintended implementation artifacts are called “undocumented instructions”, implying there was some design and intent behind the behaviour. While there are instructions that were left out of the documentation for some reason or another, that’s obviously not true for a lot of them, especially in old CPUs.
HCF: Halt and Catch Fire
Ken’s articles are always amazing. This one reminds me of “No More Secrets” by Groepaz ( https://csdb.dk/release/?id=226987 ) doing a similar thing for the 6510 CPU. Mind you, illegal opcodes on that CPU are part of most modern coding on Commodore platforms!
SALC is absolutely useful especially for size coding but that usefulness is of course limited.
SBB AL, AL does the same thing if modifying the carry flag isn’t a problem, just being slightly larger.
Reminds me of the great talk, “Breaking the x86 Instruction Set”. Worth a watch.
https://www.youtube.com/watch?v=KrksBdWcZgQ