If you really want to hack software, you are going to face a time when you have to take apart someone’s machine code. If you aren’t very organized, it might even be your own — source code does get lost. If you want to impress everyone, you’ll just read through the hex code (well, the really tough old birds will read it in binary). That was hard to do even when CPUs only had a handful of instructions.
A more practical approach is to use a tool called a disassembler. This is nothing more than a program that converts numeric machine code into symbolic instructions. The devil, of course, is in the details. Real programs are messy. The disassembler can’t always figure out the difference between code and data, for example. The transition points between data and code can also be tricky.
When Not to Use
If you are coding your own program in assembly, a disassembler isn’t usually necessary. The disassembly can’t recover things like variable names, some function names, and — of course — comments. If you use a high-level language and you want to check your compiler output, you can easily have the compiler provide assembly language output (see below).
The real value of a disassembler is when you don’t have the source code. But it isn’t easy, especially for anything nontrivial. Be prepared to do a lot of detective work in most cases.
An Online Tool
Exactly what tool you use will depend on what CPU architecture you want to work with. However, there is a very interesting online tool that can handle a lot of different architectures. In the old days, a disassembler just generated a lot of output in a file or a print out. But this online version does a lot of smart analysis and provides hyperlinked cross-references. Even better, you can interactively give it some hints about the subject code and it will improve the results. You can even collaborate with others, which would be really handy when working on a large project.
How’s It Work?
Just to get a taste of how the tool works, have a look at this simple program:
#include <stdio.h> #include <stdlib.h> void do_it(void) { printf("Howdy Hackaday!\n"); } int main(int argc, char *argv[]) { char *p=malloc(100); do_it(); free(p); return 0; }
I compiled this to an executable using GCC under Cygwin. Of course, this is cheating because we already know too much about the code for it to be a fair test. In addition, the disassembler can pull information out of the executable file that helps it do things like segregate code and data. Don’t forget, if we really wanted to see what the compiler was generating, we could just ask it. If you want a more realistic example, the web site has a menu where you can pick several examples, but they are much more complex.
On the Web
Once you have the executable in hand, you can upload it to the disassembler using the File menu (use the Upload item, of course). Since the PE file format Windows uses has some information in it, the disassembler knows about some symbols and segments. The left side shows a kind of button bar that lets you select different items in the left-hand navigation pane. The top button shows symbols and if you click on main and make sure the right hand top selector is set to disassembly, you’ll see your main function (see below).
The other left hand panes let you pick strings, do searches, or identify data items. Along the top to the right you can pick to see a call graph, a hex dump, the file sections (populated because this was a structured file), and information about the file itself.
Everything that makes sense is linked. If you click on the call to do_it
, for example, the view will jump to that part of the code. That doesn’t always seem to work on data though. Here’s the do_it
function:
If you click on puts
, you’ll jump to the code, but look at the lea
instruction ahead of it that loads the string to print. No link.
You can skim through the strings or do a search. However, you can also note the address and the section (.rdata). Clicking on the sections display lets you jump to .rdata directly. From there you can find the address quickly and see the string you expect.
By right clicking on the screen (or using keyboard commands) you can add comments, define variables and functions, or tell the tool what area is code vs data. In this case, some of that is done for us, but if you spend time you can document the disassembly very nicely. For example, here’s one of the samples provided:
The arrows showing the jumps is a nice touch.
Going Forward
You could do worse than to take the tutorial on the Help menu. The tool claims to support 60 CPUs, but to find the list you need to open the configuration menu for the “live view” where you can just type in hex codes or load a binary file. They do have quite a list including x86, ARM, AVR, VAX, System 390, MIPS, PPC, and even the Z80. I was sorry not to see the 1802, but I can still disassemble its code by myself.
The next time you want to peek inside some binary code, this web site is a useful tool. Just the fact that it has so many CPUs is worth something. I’m not likely to have a VAX disassembler handy, much less one with so many analysis and collaboration tools.
Some blind people have such sensitive finers they can feel the bits in a flash chip directly.
Fingers that is, sorry for ty typo.
:)
Error correction mechanism came into effect and I read “fingers” in the first comment :)
The promise of an online tool is more powerful hardware and software that can democratize the development process.
I have used IDA for donkey’s years, will be fun to play with something else
The website itself is some horrible glitzy overdone thing that sends me running and screaming. It seems engineered to tease me and prevent me from actually getting to the nuts and bolts.
Yeah, just went there and within three minute I left.
Click for ‘more details’ and it takes you back to the top of the page. What a waste of time!!! I was hoping to find what architectures it supported under the banner “and many more”, but I guess that is all to hard.
I’ll stick to off-line tools thanks.
About four decades ago, I didn’t know anything about software tools, and I assembled/disassembled Z80 code “by heart”, using pen and paper only. You had to be a real hacker to deal with microprocessors.
I did that with 6502. Knew all the HEX opcodes and calculated CPU cycles on paper to optimize simple “parallel” processing stuff. Sigh. Those were fun times. Nowadays “all software is bugged” is a LAW, not a cheap excuse by people who haven’t done their homework.
I can still do that for the 1802.
Al, got any suggestions for an 1802 Disassembler? I’ve got a Sage 1802 computer and I need to interface with and I’ve never worked with the 1802 before. I have plenty of experience with the usual suspects: PIC, Z80, 6502, 6800, 68K, x86, etc.
Well, I wrote one that runs on the 1802. You can find it in the COSMAC Elf Yahoo User’s Group. But if you want something on a PC, there’s plenty:
http://myweb.tiscali.co.uk/pclare/DASMx/
https://groups.yahoo.com/neo/groups/cosmacelf/files/Al%20Williams/
The minicom loader has gone on to be famous in other applications ;-)
I did a lot of 6502 by hand. Taught myself after copy protection messed up my 1541 alignment so I could bypass copy protection. I’d disassemble and step through the code and JMP around the un-needed portions.
Any reason why one listing shows the code in AT&T format and the other in Intel/Microsoft format (MASM)?
I did disassemble some ZX Spectrum games but quite recently and not back then in the golden Z-80 years. Modern computers make it a lot easier and I’ve also picked up some programming skills in the meantime. It can be very instructive to discover how they did things with the limited resources available.
I’m seeing better and better web tools develop that I would give my female-child producing testes to have like javascipt PIC (dis)assemblers.
There is a saying here: Windows is opensource just you need to learn x86 ASM
“The disassembly can’t recover things like variable names, some function names, and — of course — comments. ”
dsassm02 disagrees with you.
would like to try, but is missing intel 8051 arch :( back to ida…
IDA is incredibly expensive. Is there anything in the middle price-wise? Or is there an IDA not for profit license?
radare2 is worth a look. It’s open source. But I haven’t used IDA myself much so I can’t compare.
IDA Pro 5.0 is available as freeware, but that’s limited by architecture (32 bit x86 only..though on Github there’s a mod out there which opens it up for full functionality re: 64 bit execs). The most recent version of IDA is always available as crippleware on their site (no save functionality etc).
Right now, I’m just too ‘sunken cost’ into the IDA platform that learning a new system, finding alternative plugins that interface with WinDBG and/or IDAPython and/or gdb and/or VMware (if I need kernel level ring-0 dbg’ing), would take so much time. I was pretty much boxed in though since I’ve been using IDA Pro since the SoftICE 4 days. You really didn’t have much else as an option. R2 is built with modularity in mind, so building that bridge between static and dynamic analysis isn’t going to be nearly as cumbersome as the old process of — IDA + your set of IDA plugins -> do static analysis -> export SoftICE or Olly compatible file -> import it into your debugger so you actually have symbols/annotations -> rinse/repeat.
BinaryNinja is competing for the “IDA but cheaper” spot. R2, as H3g3mon mentioned, is probably your safest bet. https://recon.cx/2015/slides/recon2015-04-jeffrey-crowell-julien-voisin-Radare2-building-a-new-IDA.pdf The rev-eng ecosystem is really healthy right now and most people are piling up There are other attempts (https://github.com/das-labor/panopticon got some press recently) but if I had to start again from scratch I’d use R2
There was a free version of IDA yonks ago in an attempt to minimise the rampant pirating of the full version, but that didn’t work and was pulled.
http://binary.ninja/ was released recently; haven’t tried it yet but it seems to be in the same niche. A lot cheaper, too.
Hopper disassembler seems more affordable – but it’s limited to x86/64 and arm. Oh, and it runs natively on Linux and OS-X only.
@Al Williams
No way! I’m good friends with a individual who wrote a rom monitor for the RCA Studio II (1802 based), where the user
would have a hex input through it’s keypad. I never knew that was one route to cheap computing in the mid-late 70’s.
I dont like that cloud thing going on. Who knows where that binairy goed, some stuff needs to stay inhouse. And im not talking about ibteligence firms or private security stuff firms that make antivirus. Hackers who work on stuff like hardware they want to reverse engineer without anybody to know about (yet). So ida and radare like programs it stays.
Ive looked at that program, now if it was opensource and i could pull an inage and made some plugins, it would have gotten my brains. Now i just dont care (yet).
wow reverse engineering code is so easy im going to figure out how the internet is coded! thanks for the awesone tutorial! ps… what architecture is the internet?
Government bribes and donations
If you want the source code see these guys http://www.gchq.gov.uk