Assembly Language For Real

We all probably know that for ultimate control and maximum performance, you need assembly language. No matter how good your compiler is, you’ll almost always be able to do better by using your human smarts to map your problem onto a computer’s architecture. Programming in assembly for PCs though is a little tricky. A lot of information about PC assembly language dates back from when assembly was more common, but it also covers old modes that, while still available, aren’t the best answer for the latest processors. [Gpfault] has launched a series on 64-bit x86 assembly that tries to remedy that, especially if you are working under Windows.

So far there are three entries. The first covers setting up your toolchain and creating a simple program that does almost nothing. But it is a start.

The second entry talks more about FASM and how to use macros and other features to simplify your programming. In particular, he shows macros that can wrap details like PE tables and calling convention protocols to make things easier. You wind up with a working Hello World program.

The third entry starts work on a fantasy CPU emulator, QBX. This isn’t a bad idea since emulating a CPU forces you to use many of the host computer instructions and doesn’t require any special knowledge other than what you probably have if you are trying to learn assembly language, anyway.

Of course, if you are writing boot code, you need to know all that old-fashioned legacy stuff. We liked [Ben Jojo’s] tutorial for that. If Linux is more your jam, we have an introduction for that, too.

Header: AMD Ryzen x86-64 processor, Fritzchens Fritz / CC0

48 thoughts on “Assembly Language For Real

  1. Takes me back a bit, I remember writing an assembly I/O routine to allow a PDP 11/23 to talk to a somewhat specialized $1.0M Watkins Johnson radio receiver. The receiver received and reported its frequency in plain binary in multiple words on a parallel interface bus. Debugging an error in binary numbers as large as 18*10^9 proved to be a PITFA. Also those were the days when you had to squeeze your code into 256K of memory so some skills in memory link mapping were required. As I discovered after a long day of pulling my hair out, if you accidentally removed a module name from the linker command line while forgetting to remove the accompanying comma the DEC linker was quite happy to link in a chunk of blank memory for your program to fall into.
    Later on I needed some tools for microwave analysis work and wrote a matrix + FFT library using the MASM assembler. I used the bios INT=15H / AH=87 call to run the code in extended memory on the 80286/80287 as a way to avoid the 640K memory limit. Came with a memory garbage collection system that was fairly efficient. And so much faster than the code you got from the Fortran compilers back then.
    Yes, assembler is a useful tool if all else fails!

  2. “No matter how good your compiler is, you’ll almost always be able to do better by using your human smarts to map your problem onto a computer’s architecture.”

    As someone who did a lot of x86 assembler when it was just 808x, I will say that on a modern CISC processor, you might be hard pressed to beat a GOOD C compiler, like Intel’s. The “x86” is probably 100x more complicated than it was 40 years ago with all sorts of timing variability and state dependencies. Maybe you could if you were an obsessive wizard but your productivity would be degraded, even with a big library of macros.

    Even assembly has become fat. On a PC. I wrote .COM programs in assembler that people wanted and that did useful things that were nine bytes long (OK, I cheated, called the BIOS). I know, code/data segregation and security makes that size impossible today.

    I wonder what the smallest possible size is today for “hello, world” on a PC?

    1. I agree. Writing assembly that is more efficient than a good compiler is virtually impossible nowadays. Assembly is useful is edge cases where you need to access a specific register or hardware, but coding efficiency and speed is rarely a good enough reason. Also assembler tends to make your code less readable and portable

    2. yeah i second this. compilers do a pretty good job and even assembly isn’t what it used to be. i recently disassembled a .com file i made for DOS back in the day, it was pretty eye-opening.

      i want to expand a little on why compiler-generated code is usually better. if you’ve got a talented and determined programmer looking at a small inner loop, i think he will typically be able to do a better job if he’s working in assembly than in C. but your program is going to be a lot larger than just that small inner loop, and no matter how talented your programmer is, he won’t be able to keep up that level of effort over 10,000 lines of assembler code. but he *will* be able to keep up a relatively high effort over 1,000 lines of C code.

      the point is, if you have to generate a lot of lines of assembly code, you are forced to make a bunch of performance compromises just to make the code manageable (and even so, it isn’t really manageable). the equivalent C code will generally be *MUCH* more performant, as well as (of course) more manageable.

      i’ve had the privilege of interacting with a few large ASM projects and it is really amazing how poorly performing they are. it’s just impossible to keep up that level of effort across such a large volume of bloated unmanageable code.

      as an aside, i work with a fancy macro assembler and a lot of times large assembly projects make heavy use of that macro assembler. the result is that it can take a lot longer to compile assembly code than C code!!

      1. “if you’ve got a talented and determined programmer looking at a small inner loop, i think he will typically be able to do a better job if he’s working in assembly than in C”

        Then he will leave for another job or retiered and you will not find anyone the maintain that software anymore

    3. ” I will say that on a modern CISC processor, you might be hard pressed to beat a GOOD C compiler, like Intel’s.”

      OK. There’s a bit of silliness here, especially for x86. x86 processors do so much magic behind the scenes to translate x86 into a more-optimized instruction stream that really, “x86 assembly” isn’t actually “assembly” anymore at all, in the sense that “assembly” is supposed to be ‘what the CPU executes’. And because the CPU vendors are *translating* the instructions (via microcode, register renaming, instruction reordering, branch prediction) they *don’t optimize the CPU* for hand-coded patterns. They optimize it for the code a compiler puts out.

      So really, you’re just saying “on an x86 system, you’d be hard pressed to get the same performance by hand-writing instructions for Intel’s black box rather than using Intel’s black box to generate instructions for Intel’s *other* black box.” Not particularly surprising.

      It’s also silly because if you’re writing for a CPU (not MCU, so like, GHz-ish ARM or x86) you’re likely running it on an operating system, likely with a bunch of interfacing libraries. You’ve *already given up* performance for convenience at that point. You’re not comparing C and assembly at that point anymore.

      But that’s the key: assembly *is not a programming language*. Not in the way we think of them anymore. You can’t write “assembly libraries” that are easily interoperable because there’s no “assembly language calling convention” or “assembly language function linking” or anything like that. Those are things that a high-level language *gives* you. So when you say “you’d be hard pressed to beat a GOOD C compiler” – even on a non-x86, where you *can* actually program in “assembly” – you’re really saying “unless you write the entire freaking system from scratch yourself, you’re hard pressed to beat a C compiler.” Again. Not particularly surprising.

      Just to explain what I mean, I’ve actually worked with a Python tool that ‘translates’ C code into assembly for a softcore processor. You can’t use temporary variables, or complex expressions. Functions can’t take parameters or return anything. You’ve got 32 global variables, and that’s it, plus goofy syntax to view those variables as multi-register objects. RAM/IO access is via functions.

      It still *looks* like C, and it translates exactly to the processor (duh), but you can’t write libraries that way. You have to start agreeing on *some* minimal calling convention, and memory organization. And as soon as you do that, you do give up performance.

      That’s my point. If you hardcoded the entire thing in assembly, start to finish, bare metal, of *course* you’d beat any operating system implementation. Duh. You’d win on startup time alone by huge amounts. But that’s not the metric we actually judge things by.

    4. Yeah, this is what I was thinking too.

      On ARM, sure, I can do better by hand if I want to spend the time and lose the maintainability. But on x86? No way! That’s not a reasonable assumption in any way, shape, or form, and it ignores the reasons that CISC even still exists.

    5. “I wonder what the smallest possible size is today for “hello, world” on a PC?” I remember back in the late 80’s I believe, someone wrote a binary using only 7 bit instruction mnemonics, making it transmittable as ascii text. One could copy/paste the ascii text from a BBS and save it as a COM file and run it. It worked! I was impressed. :-P This was years before uuencoding mind you. Is uuencode even used anymore?

      1. “years before uuencoding” Just a correction, as this forum picks nits. :-P I meant to say years before uuencode was ported to DOS. I of course realize it’s been with unix since the early 80’s thus unix-to-unix encoding.

  3. I was just thinking about learning a bit of assembly, then this interesting post popped up. I read the first tutorial and good grief, what a tangled mess assembly is. I’m not sure I want to know assembly that bad.

      1. Assembly on any superscalar/out of order processor is fairly pointless. The actual assembly you pass isn’t “really” what gets executed, so you basically need to perform code flow analysis to figure out if what you’re doing is actually faster. Which… means you basically need a compiler.

    1. The approach taken here is in no way representative of how the very valuable subject of Assembly Language should be taught. What it IS representative of is how to discourage a complete neophyte from learning any Assembly Language, for any processor.
      [It most certainly does NOT help that the author of this particular approach decided to START with the Assembly Language of–arguably–one of the most arcane and difficult processors one could possibly choose to use for an introduction to this valuable subject. Starting with the most complex example is no way to teach a rudimentary subject.]

      Be well aware that any Assembly Language is totally, completely tied to a particular processor, or CPU. Starting out by learning Assembly for a less-difficult, far-less-complex processor should, perhaps, be one of your choices.

      Any time one presumes to teach Assembly Language, and then you find that the first requirements of that particular approach are to learn high-level ‘aids’ to “help you”, stop. Look elsewhere. [This is precisely the reason that so many expert books which purported to teach ‘Raspberry Pi Assembly Language’ were such utter failures and disasters. When was the last time you saw ANY mention of a stand-alone, complete RPi Assembly Language program?]

      The worst that happens when this approach is taken is shown, writ bold, here: people are thoroughly discouraged, and most will probably never make another attempt to learn the subject, for ANY processor, because the one thing they’ve already been taught (by this technique) is how hard Assembly Language is.

      The learning of any subject can be made either relatively easy or relatively hard, depending only on the approach taken as to how it is taught.

      1. Why are we pretending that “assembly language” is a thing?

        x86 assembly is not ARM assembly, which is not ARM Thumb, which is not RISCV assembly, which is not PIC assembly, which is not AVR assembly, which is not 8051 assembly.

        I’m completely fluent in the assembly language of several processors, some enough that I don’t even need mnemonics. Doesn’t help me in the slightest understanding x86 or ARM assembly. Hell even if we forced all of them to start using similar syntax it wouldn’t help, because the entire reason to *use* assembly is to use all the features of the processor.

        Why do we bother calling them the same thing? Why don’t we just call it “x86 language”, “ARM language”, “ARM Thumb language”. That’s what they *are*, after all. Pretending that they have *any* relationship to each other just leads to total silliness like this.

    1. On some platforms, but this is about x86 ASM.

      Your ASM instructions get ignored by this processor, and it does other stuff it wants to instead. ASM is no more “real” than anything else on these processors.

      If you want to write code that is “hard” and “real” you better have a processor that honors your instructions!

      I can write embedded Ruby on RISC that is more “hard” and more “real” than ASM on CISC. If I turn off compiler optimizations I can even know exactly how many cycles each line of Ruby will take. You can’t know that using ASM on CISC.

      1. It’s not “RISC” vs “CISC.” This isn’t the 1990s.

        Any superscalar/out-of-order processor throws away your exact instructions and does what it wants, so long as the end result is architecturally the same. That’s the entire point. You say “move r7 to r3”, it says “ehh…. don’t need to ACTUALLY do that” and ignores it. As soon as a processor is superscalar/out of order, you will have no idea how long each instruction will take without doing *some* code flow analysis.

        It gets super bad with branch prediction, obviously, but the entire point of a superscalar processor is that it can look at *groups* of instructions and say “yeah, no, I’ll do this instead, that’s fine.”

  4. While I agree that x86 ASM is not as common today (especially when most programs rely on JIT), ASM for microcontrollers is still highly relevant. Of course, not for everything but for special cases. Like when I tried to get gcc to remove enough fat that a 16 MHz AVR (obligatory in that case) could meet the specification for the WS1812 (and meet it strictly). Stripping out most of the stack/frame calls while interleaving some memory and arithmetic instructions in critical places did the trick, but it would be impossible with pure C (again, in this case – you should always optimize for the specific conditions after profiling and consulting Mr. Knuth).

  5. When Covid first hit and I was unable to do anything else, as we were under lock down, I wrote (in 8085 assembly) an entire operating system for a 1970’s single board computer (the Rigel computer) since nobody had ever done that before. All that it had was a machine language monitor in rom. So I expanded the ram using modern memory, expanded the rom region using flash memory, wrote the operating system and flashed it in. I then wrote a C compiler for the computer (C98 version) and wrote the first C compiled and run on the SBC ever. Thinking how awesome this was (since I haven’t even powered that computer up in over 30 years) I went out on the net to see if anyone else had a Rigel and wanted the software. Nothing. No interest. :-/ But it was a fun exercise in retro computing at least. I don’t normally deal much with assembly these days. :-P

  6. If you don’t write in assembler then you are not writing software you are using other people’s software in various different orders of execution in order to make it do what you want in much the same way as you use an excel spreadsheet.

      1. Not wanting to repeat myself.
        But the line between RISC and CISC and weather or not modern x86 processors are RISC or not under the hood is actually not as clear cut as some might think.

        Nor is one superior over the other, it all depends on a lot of factors.

        Here is a comment I made on another article explaining the situation in a bit more depth:

    1. Now find out that the CPU runs firmware that’s underneath the published interface, and that your ASM is also just software.

      And that when you go even deeper, and get to the hardware, you’re just selecting which circuits you want, when, much the same as when you use a spreadsheet.

      And then there are more layers that are like that, where at each layer you’re simply selecting between choices provided by other engineers in the past. This is what “on the shoulders of giants” means.

      1. Standing on the shoulders of giants is indeed something that most people will need to do.

        For an example. I have been developing an architecture for a while, and it uses fractions instead of floating point. But when handling fractions, it is rather nice if one had an instruction for finding the greatest common divisor between two numbers.

        Now, I don’t want to fiddle with trying to find out an efficient solution to the problem. Instead, I just use the binary GCD algorithm (the Stein’s algorithm). Saves me a ton of time trying to figure it out myself. And the algorithm is proven to be very efficient to the point where even a more efficient algorithm isn’t going to be a major improvement. (And a lot of the bit operations in Stein’s algorithm can be done in a single cycle with dedicated hardware, greatly improving its performance. (And the work needed to implement Stein’s algorithm in hardware is to me rather trivial, figuring out the maths behind the algorithm is though “black magic” as far as I am concerned.))

        By using an off the shelf solution, one can instead focus on the larger things within one’s project, and thereby more efficiently use one’s time. It is a collaboration, though, at times some people just “pick” a thing thinking it is great even if it might not be good at all for one’s application. (Like always using floating point regardless of what one is doing with it…)

        So obviously one shouldn’t be careless when choosing what off the shelf stuff one uses in one’s project, but in general, there is nothing wrong with building on a foundation made by others.

    2. “If you don’t write in assembler then you are not writing software” Those of us with grey hair can say things like “assembler is for lazy folks” as we started out programming micros by hex entry on a keypad, byte by byte. Or at least I did anyway, on a Kim-1, my first 6502. Most early micros were poorly supported. When assemblers became available, that made life a whole lot easier. Imagine counting a branch forward or reverse in hex and getting it off by just one byte. That would usually cause a crash. Early micros were VERY frustrating, so I’m not embarrassed using a modern macro assembler, especially where macros are useful (embedded controllers). It makes assembly almost as easy as any early higher level language. :-)

  7. By understanding a machine-oriented language,‭ ‬the programmer will tend to use a much more efficient method‭; ‬it is much closer to reality.”–Donald Knuth‭

    By “…machine-oriented language…”, Dr. Knuth means, precisely, exactly, Assembly Language.


    Donald Knuth is, arguably, one of the premier, world-class computer scientists of the 20th and 21st centuries. See

    “Donald Knuth”

  8. “[A] simple program that does almost nothing” ? That’s right in my wheelhouse! What I was put on Earth to do. Haven’t played with assemblY since the ’80s.



    …for all you people who really want to learn Assembly Language:

    The simplest test of the quality of the instruction purported to BE given is this: it shouldn’t cost you anything!
    That’s right: nothing.
    All manufacturers who offer assembly-language-programmable devices all offer Assembler programs for their devices for free, or some nominal sum very close to ‘free’ (these ‘Assemblers’ are also referred to as ‘Macro Assemblers’, but since all support the ‘macro’ capability, most manufacturers may or may not use this adjective).

    No, and I do mean NO, Assembler (the program) REQUIRE YOU to generate your code with an expensive add-on usually referred to as an “IDE”–Integrated Development Environment”. You WILL be told by some that you absolutely can NOT get by without purchasing one; that you absolutely ‘NEED’ an IDE. Run away. As fast as you possibly can. Find someone who is really an expert to get your advice from (you just encountered one way to sort out the experts from the other types).

    Any Assembler worth its salt will happily accept–and generate PERFECT machine code from–plain English-language code written with any text editor or word processor which has been set to generate (save its output as) pure text, or ‘ASCII’ code. That is the only crucial part: again–make certain that the program you’re using is set to save its output in ‘pure text’, or ASCII format.
    This is the only way I have ever written Assembly Language programs all the years I’ve been doing it. Never had a problem. Never will have.

    Have a lot of fun; Assembly Language programming IS a lot fun, to say nothing of being extremely satisfying.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.