Disassembly Required

If you really want to hack software, you are going to face a time when you have to take apart someone’s machine code. If you aren’t very organized, it might even be your own — source code does get lost. If you want to impress everyone, you’ll just read through the hex code (well, the really tough old birds will read it in binary). That was hard to do even when CPUs only had a handful of instructions.

A more practical approach is to use a tool called a disassembler. This is nothing more than a program that converts numeric machine code into symbolic instructions. The devil, of course, is in the details. Real programs are messy. The disassembler can’t always figure out the difference between code and data, for example. The transition points between data and code can also be tricky.

When Not to Use

If you are coding your own program in assembly,  a disassembler isn’t usually necessary. The disassembly can’t recover things like variable names, some function names, and — of course — comments. If you use a high-level language and you want to check your compiler output, you can easily have the compiler provide assembly language output (see below).

The real value of a disassembler is when you don’t have the source code. But it isn’t easy, especially for anything nontrivial. Be prepared to do a lot of detective work in most cases.

An Online Tool

Exactly what tool you use will depend on what CPU architecture you want to work with. However, there is a very interesting online tool that can handle a lot of different architectures. In the old days, a disassembler just generated a lot of output in a file or a print out. But this online version does a lot of smart analysis and provides hyperlinked cross-references. Even better, you can interactively give it some hints about the subject code and it will improve the results. You can even collaborate with others, which would be really handy when working on a large project.

How’s It Work?

Just to get a taste of how the tool works, have a look at this simple program:

#include <stdio.h>
#include <stdlib.h>

void do_it(void)
{
 printf("Howdy Hackaday!\n");
}

int main(int argc, char *argv[])
{
 char *p=malloc(100);
 do_it();
 free(p);
 return 0;
}

I compiled this to an executable using GCC under Cygwin. Of course, this is cheating because we already know too much about the code for it to be a fair test. In addition, the disassembler can pull information out of the executable file that helps it do things like segregate code and data. Don’t forget, if we really wanted to see what the compiler was generating, we could just ask it.  If you want a more realistic example, the web site has a menu where you can pick several examples, but they are much more complex.

On the Web

Once you have the executable in hand, you can upload it to the disassembler using the File menu (use the Upload item, of course). Since the PE file format Windows uses has some information in it, the disassembler knows about some symbols and segments. The left side shows a kind of button bar that lets you select different items in the left-hand navigation pane. The top button shows symbols and if you click on main and make sure the right hand top selector is set to disassembly, you’ll see your main function (see below).

screen1

The other left hand panes let you pick strings, do searches, or identify data items. Along the top to the right you can pick to see a call graph, a hex dump, the file sections (populated because this was a structured file), and information about the file itself.

Everything that makes sense is linked. If you click on the call to do_it, for example, the view will jump to that part of the code. That doesn’t always seem to work on data though. Here’s the do_it function:screen2

If you click on puts, you’ll jump to the code, but look at the lea instruction ahead of it that loads the string to print. No link.

You can skim through the strings or do a search. However, you can also note the address and the section (.rdata). Clicking on the sections display lets you jump to .rdata directly. From there you can find the address quickly and see the string you expect.

By right clicking on the screen (or using keyboard commands) you can add comments, define variables and functions, or tell the tool what area is code vs data. In this case, some of that is done for us, but if you spend time you can document the disassembly very nicely. For example, here’s one of the samples provided:

screen3

The arrows showing the jumps is a nice touch.

Going Forward

You could do worse than to take the tutorial on the Help menu. The tool claims to support 60 CPUs, but to find the list you need to open the configuration menu for the “live view” where you can just type in hex codes or load a binary file. They do have quite a list including x86, ARM, AVR, VAX, System 390, MIPS, PPC, and even the Z80. I was sorry not to see the 1802, but I can still disassemble its code by myself.

The next time you want to peek inside some binary code, this web site is a useful tool. Just the fact that it has so many CPUs is worth something. I’m not likely to have a VAX disassembler handy, much less one with so many analysis and collaboration tools.

28 thoughts on “Disassembly Required

    1. I did that with 6502. Knew all the HEX opcodes and calculated CPU cycles on paper to optimize simple “parallel” processing stuff. Sigh. Those were fun times. Nowadays “all software is bugged” is a LAW, not a cheap excuse by people who haven’t done their homework.

      1. Al, got any suggestions for an 1802 Disassembler? I’ve got a Sage 1802 computer and I need to interface with and I’ve never worked with the 1802 before. I have plenty of experience with the usual suspects: PIC, Z80, 6502, 6800, 68K, x86, etc.

    2. I did a lot of 6502 by hand. Taught myself after copy protection messed up my 1541 alignment so I could bypass copy protection. I’d disassemble and step through the code and JMP around the un-needed portions.

  1. I did disassemble some ZX Spectrum games but quite recently and not back then in the golden Z-80 years. Modern computers make it a lot easier and I’ve also picked up some programming skills in the meantime. It can be very instructive to discover how they did things with the limited resources available.

      1. IDA Pro 5.0 is available as freeware, but that’s limited by architecture (32 bit x86 only..though on Github there’s a mod out there which opens it up for full functionality re: 64 bit execs). The most recent version of IDA is always available as crippleware on their site (no save functionality etc).

        Right now, I’m just too ‘sunken cost’ into the IDA platform that learning a new system, finding alternative plugins that interface with WinDBG and/or IDAPython and/or gdb and/or VMware (if I need kernel level ring-0 dbg’ing), would take so much time. I was pretty much boxed in though since I’ve been using IDA Pro since the SoftICE 4 days. You really didn’t have much else as an option. R2 is built with modularity in mind, so building that bridge between static and dynamic analysis isn’t going to be nearly as cumbersome as the old process of — IDA + your set of IDA plugins -> do static analysis -> export SoftICE or Olly compatible file -> import it into your debugger so you actually have symbols/annotations -> rinse/repeat.

        BinaryNinja is competing for the “IDA but cheaper” spot. R2, as H3g3mon mentioned, is probably your safest bet. https://recon.cx/2015/slides/recon2015-04-jeffrey-crowell-julien-voisin-Radare2-building-a-new-IDA.pdf The rev-eng ecosystem is really healthy right now and most people are piling up There are other attempts (https://github.com/das-labor/panopticon got some press recently) but if I had to start again from scratch I’d use R2

  2. @Al Williams
    No way! I’m good friends with a individual who wrote a rom monitor for the RCA Studio II (1802 based), where the user
    would have a hex input through it’s keypad. I never knew that was one route to cheap computing in the mid-late 70’s.

  3. I dont like that cloud thing going on. Who knows where that binairy goed, some stuff needs to stay inhouse. And im not talking about ibteligence firms or private security stuff firms that make antivirus. Hackers who work on stuff like hardware they want to reverse engineer without anybody to know about (yet). So ida and radare like programs it stays.
    Ive looked at that program, now if it was opensource and i could pull an inage and made some plugins, it would have gotten my brains. Now i just dont care (yet).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s