A Literate Assembly Language

May 8, 2023

A recent edition of [Babbage’s] The Chip Letter discusses the obscurity of assembly language. He points out, and I think correctly, that assembly language is more often read than written, yet nearly all of them are hampered by obscurity left over from the days when punched cards had 80 columns and a six-letter symbol was all you could manage in the limited memory space of the computer. For example, without looking it up, what does the ARM instruction FJCVTZS do? The instruction’s full name is Floating-point Javascript Convert to Signed Fixed-point Rounding Towards Zero. Not super helpful.

But it did occur to me that nothing is stopping you from writing a literate assembler that is made to be easier to read. First, most C compilers will accept some sort of asm statement, and you could probably manage that with compile-time string construction and macros. However, I think there is a better possibility.

Reuse, Recycle

Since I sometimes develop new CPU architectures, I have a universal cross assembler that is, honestly, an ugly hack, but it works quite well. I’ve talked about it before, but if you don’t want to read the whole post about it, it uses some simple tricks to convert standard-looking assembly language formats into C code that is then compiled. Executing the resulting program outputs the desired machine language into a desired file format. It is very easy to set up, and in the middle, there’s a nice C program that emits machine code. It is not much more readable than the raw assembly, but you shouldn’t have to see it. But what if we started the process there and made the format readable?

At the heart of the system is a C program that lives in soloasm.c. It handles command line options and output file generation. It calls an external function, genasm with a single integer argument. When that argument is set to 1, it indicates the assembler is in its first pass, and you only need to fill in label values with real numbers. If the pass is a 2, it means actually fill in the array that holds the code.

That array is defined in the __solo_info instruction (soloasm.h). It includes the size of the memory, a pointer to the code, the processor’s word size, the beginning and end addresses, and an error flag. Normally, the system converts your assembly language input into a bunch of function calls it writes inside the genasm function. But in this case, I want to reuse soloasm.c to create a literate assembly language.

Modernize

I wrote all this a long time ago, but I wanted the creation of literate assembly to be easier, so I decided to do a low-effort conversion to C++. This allows you to use nice data structures for the symbol table, for example. However, I didn’t use all the C++ features I could have, simply in the interest of time.

The base class is reasonably agnostic about the processor, and, as an example, I’ve provided a literate RCA 1802 assembler. Just a proof of concept, so I could probably name the instructions a bit more consistently, and there is plenty of room for other improvements, but it gets my point across.

Here’s an excerpt of a blinking light program written for the 1802 using the standard assembler syntax:

ORG 0
Main:
   LDI HIGH(R3Go)
   PHI R3
   LDI LOW(R3Go)
   PLO R3
   SEP R3
R3Go: LDI HIGH(Delay)
   PHI R9
   LDI LOW(Delay)
   PLO R9
   LDI HIGH(Stack)
   PHI R7
   LDI LOW(Stack)
   PLO R7
   SEX R7
   LDI 0
   STR R7
Loop: OUT 4
. . .
   NOP
   BR DELAY1

   ORG $F0
Stack: DB 0
   END Main

Now here is the exact same written for the literate assembler:

// Simple 1802 Literate Program
#include "lit1802.h"

#define ON 1
#define OFF 0

#define DELAYPC 9 // delay subroutine
#define DELAYR 8 // delay count register
#define MAINPC 3 // Main routine PC
#define RX 7 // RX value
#define DELAYVAL 0xFF // time to delay (0-255)

void Program(void)
{
   Origin(0x0);
// Blinky light program
// Main:
Define_Label("Main");
// Force R3 as PC just in case
   Load_R_Label(MAINPC,"R3Go");
   Set_PC_To_Register(MAINPC);
// Here we are P=3
// R3Go:
Define_Label("R3Go");
// Set R9 to delay routine (default PC=0)
   Load_R_Label(DELAYPC,"Delay");
// Set RX=7 at memory 00F0
   Load_R_Label(RX,"Stack");
   Set_X_To_Register(RX);
   Load_D_Imm(0);
   Store_D_To_Reg_Address(RX);

// Loop:
Define_Label("Loop");
   Output_Mem_RX_Incr(4); // write count to LED
. . . 
   NOP(10);
   Branch(Label("Delay1")); // note... could define BRANCH as _BRANCH and then #define Branch(l) _BRANCH(Label(l)) if you like...

Location(0xF0); // storage for RX
// Stack:
Define_Label("Stack");
   Byte();
   End_Program(Label("Main")); // End of program
}

Well, admittedly, there are comments and symbols, but still. You can download both files if you want to compare. You can also find the entire project online.

Under the Hood

The idea is simple. Each function simply populates an array with the byte or bytes necessary. Admittedly, the 1802 is pretty simple. It would be harder to do this for a modern processor with many instructions and complex modes. But not impossible.

You can do lots of things to make life easier, both while programming and while setting up instructions. For example, if you wanted 100 NOP instructions, you could write:

for (int i = 0 ; i < 100 ; i++) NOP();

On the other hand, NOP has an optional argument that will do it for you. You can freely use the C++ compiler and the macro preprocessor to make your life easier. For example, a common task on the 1802 is putting a constant value like a label into a register. The lit1802.h file has a macro to make this easy:


void Load_R_Label(uint8_t reg,const std::string s)
{
  Load_D_Imm(HIGH(s));
  Put_High_Register(reg);
  Load_D_Imm(LOW(s));
  Put_Low_Register(reg);
}

Obviously, you can change the names to suit or have as many aliases as you want. Don’t forget that function call overhead, like calling Load_R_Label, is incurred at compile time. You wind up with the same machine code either way.

The assembler is two-pass. The first pass only defines labels. The second pass generates real code. This would make it hard, for example, to create a smart jump instruction that used a branch when the target was near and a long jump when it was far unless you don’t mind padding the branch with a NOP, which would not save space but might save execution time.

There would be other complications for a modern processor. For example, not trying to allocate the entire memory space or generating relocatable output. But this is truly a proof-of-concept. None of those things are impossible, they are just more work.

Bottom Line

I’ve written and read dozens of assembly languages for years, so I’m pretty comfortable with the status quo and I’m unlikely to use litasm myself. However, I did think [Babbage’s] point was well made. If you want to make assembly more readable, there are benefits and this shows it doesn’t have to be that hard to do. You could also write a litasm disassembler to convert object code into this kind of format.

Want to know more about the Universal Assembler? If you’d rather tackle practical x86-64 assembly, we know a good place to start.

80 thoughts on “A Literate Assembly Language”

Jan says:

May 8, 2023 at 10:32 am

Interesting idea, but I stick to the original form, less typing. Although in practice programming is mostly about thinking what to type instead of typing. Unless it’s a movie, then people are able to program as if they are writing a novel and everything always works the first time… seriously?!?! Oh… and in many movies programmers wear sweatshirts with hoodies in a badly lit environment with loud upbeat music.

Personally I’d prefer this:
ORG 0
Main:

Instead of this:
Origin(0x0);
// Blinky light program
// Main:
Define_Label(“Main”);

But in the end it’s up to what you are used to use, so the second option might be practical for newbies who are not used to anything yet.

The problem with any language is that many programmers still fail to enter some decent commenting. It isn’t fair that the assembly example given doesn’t contain a single remark. Don’t get me wrong, but how are you expected to understand that without reading every single instruction and executing the code in your head, that’s pure madness.

The bottom line if, use what you like. The problem with adding another standard to an existing set of standards is that there is just one extra standard and the problem mostly remains or a new problem pops up.

Report comment

Reply
1. TranSister says:
  
  May 8, 2023 at 12:44 pm
  
  Do you program professionally?
  
  Report comment
  
  Reply
2. Alan says:
  
  May 8, 2023 at 3:23 pm
  
  If you are are looking for improved understanding, why start with an assembler? Surely a DISASSEMBLER would make more sense.
  
  Raw code, no comments, and a printout for you to puzzle through. If the proposal is valid, the disassembled code is easier to read – and those tasked with reverse engineering old code rejoice.
  
  Report comment
  
  Reply
  1. Chris Maple says:
    
    May 8, 2023 at 5:24 pm
    
    With code for processors like the Z80, it’s not unusual to find a data table in the middle of executable code. Such tables are capable of completely confusing disassemblers.
    
    Having an assembly language source code for translation into another assembly language often provides clues that make understanding easier: comments, structure, meaningful variable names. Disassemblers provide none of that.
    
    Report comment
    
    Reply
  2. mould says:
    
    May 13, 2023 at 12:25 am
    
    Compilers can produce some very cryptic code with lots of “clever tricks”. Disassemblers are good for seeing how compilers work but they’re not a good way to learn assembly language.
    
    Report comment
    
    Reply
Sam Ochi says:

May 8, 2023 at 10:51 am

I believe most definitely that a literate assembly language makes sense. Even if it is very verbose, it also forces one to remember what it does. Certainly, the short version can exist along side it as well. It is sort of like a magazine article which has definitions of what it is discussing or explaining which contains within it the short translations or synonyms along side it. In fact, one should be able with a click to eliminate the synonyms/translations and still get the same assembly language instead of a long novel of the original.

Report comment

Reply
1. Ostracus says:
  
  May 8, 2023 at 11:15 am
  
  People complained about LISP’s parenthesis. I doubt they want more verbosity in their programming. That and carpel tunnel.
  
  Report comment
  
  Reply
  1. Peter Sanders says:
    
    May 8, 2023 at 2:29 pm
    
    COBOL for a truly verbose language.
    
    Report comment
    
    Reply
    1. ian 42 says:
      
      May 8, 2023 at 4:00 pm
      
      COBOL really is a verbose assembly language.. Go look at it!
      
      Report comment
      
      Reply
jpa says:

May 8, 2023 at 10:57 am

I feel short instructions are faster to read, and there are a lot of lines to read when working at asm level. Typically 5-10x more than in C.

Maybe a simple tooltip is all that is needed? Hover over instruction and it shows it’s description. Not sure if this exists already.

Report comment

Reply
1. Greg A says:
  
  May 8, 2023 at 11:41 am
  
  yeah and
  
  even more than faster to read, i think they’re faster to look up. i know some people really learn a single assembly language but most of the time if i’m writing asm, i’ve got the architecture reference manual open. the chapter listing the instructions is usually sorted alphabetic by mnemonic, and golly i want to be able to look it up by the same mnemonic i tell the assembler! if i didn’t care about nailing the details like “does this instruction set the condition code?” and “how many cycles is a conditional branch if it’s taken?” then i probably wouldn’t be writing in asm in the first place. i need the reference!
  
  i actually get really stymied writing ARM assembler with gas for example, because if you forget the “.syntax unified” at the top of your source then the mnemonics don’t exactly match the ARM manuals and it’s easy to get confused. on the other hand, sometimes the assembler will get wise on me, and like replace a “ldr r0,=3” with “mov r0,#3” because it figured out that the literal is small enough to use an immediate instead (i don’t think that’s a real example but i’ve seen stuff like that). which is slick and good but still ground-shifting-under-your-feet-ish.
  
  Report comment
  
  Reply
aki009 says:

May 8, 2023 at 10:58 am

Great. All the limitations of assembler with triple the opportunities for typos.

But I’m sure there’ll be some corner cases where this is better.

Report comment

Reply
1. Shannon says:
  
  May 9, 2023 at 11:17 am
  
  There might be more opportunities for typos but there would be fewer opportunities to be missed by automatic tools and code review.
  
  Report comment
  
  Reply
jeremy says:

May 8, 2023 at 10:59 am

If there was a point to your comment, you utterly failed to communicate it !

Report comment

Reply
1. Sam Ochi says:
  
  May 8, 2023 at 11:24 am
  
  Actually, this is where one can use AI (the one that talks to you…) to translate assembly language into something one can read and understand.
  
  Report comment
  
  Reply
  1. 𐂀 𐂅 says:
    
    May 8, 2023 at 3:04 pm
    
    I tried that, the results are rather verbose and you just shift the cognitive load to consuming long repetitive sentences rather than recalling the meaning of mnemonics, therefore I’d suggest that Al’s “show everything as a well names function” approach is not such a bad idea.
    
    Report comment
    
    Reply
  2. Herr Brain says:
    
    May 8, 2023 at 6:38 pm
    
    That defeats the entire purpose of an assembly language. I actually had a fairly lengthy discussion with the author in the original article. Short version: an assembly language is intended to have a near 1:1 relationship with machine code. Aside macro expansion and creature comforts such as labels rather than hard-coded addresses, the assembly code should generally try to match the underlying architecture.
    
    Report comment
    
    Reply
    1. Marc says:
      
      May 8, 2023 at 10:34 pm
      
      That can of course be achieved without the cryptic mnemonics.
      C isn’t a half bad abstraction – “c+=1” would translate to “add c,1” whereas “++c” would be “inc c”.
      
      Report comment
      
      Reply
glaskows says:

May 8, 2023 at 11:26 am

I would just have two columns, on the left the normal assembly and on the right with more readable mnemonics.

The thing with assembly is if you need to use it, you will end up memorizing a big chunk of the instructions and will check from time to time the other bit. Is like reading the matrix, after a while you dont see the code anymore.

Report comment

Reply
1. Al Williams says:
  
  May 8, 2023 at 2:20 pm
  
  That was sort of what I was thinking with disassembly. You could convert back and forth between concise and verbose mode pretty easily.
  
  Report comment
  
  Reply
Greg A says:

May 8, 2023 at 11:31 am

wow. i mean. wow. this is perverse! i’m impressed! from the headline i was expecting like

add r0,r5 ; add the contents of register 5 to register 0, storing the result in register 0

add r0,r3 ; you might expect this comment to say what r3 is but instead it says r0 += r3

it is a real struggle to make good comments in any language but that “direct restatement with no value add” pattern is especially prevalent in ASM code. but no, that’s not what the article’s about! wow!

i never thought of writing a machine language generator in C this way, and i have used a lot of assembly languages and written a few assemblers from scratch. and i have made a few very trivial “compilers” that convert an input language into assembler (but then it passes that out to ‘gas’ or ‘picasm’ or whatever instead of making machine code itself).

this reminds me more than anything of FORTH, where it’s common to define words that append machine code to the dictionary HERE position.

it also kind of reminds me of the code generator phase of a compiler. but every compiler i’ve worked on generates text assembly language, not machine code. but we still have the thing where you’re spelling out assembler idioms in C with function calls for labels and literals and so on, and struggling to decide the correct abstraction, a generic load_number_into_reg() vs. spelling out the idiom in some target-specific way.

i think this post will be on my mind for many days. i’m not even sure there’s a “there” there, but it’s just one of those subtly different ways to think about a problem that can get under your skin

Report comment

Reply
some guy says:

May 8, 2023 at 11:35 am

I am afraid i have to agree that this is mostly horrible. It reminds me people who use multiple nested preprocessor macros in C and define those macros in dozens of different files across a project. It is a mess to follow/read/understand.
As other have said, if you use ASM you will at some point memorize the mnemonics and if there is one weird mnemonic in some (disassembled or written by somebody else) code you can look it up. Speaking about dissambled code, most of the instructions are quite easy to guess with some basic knowledge of ASM in general and some knowledge about the specific architecture, even if it is only a small Wikipedia entry. Of course, if you *really* want to dig into some code and/or write your own ASM you will have to read/learn a lot (those ARM-documentation-pdf that are hundreds or thousands pages long…), but that’s how stuff work…

Report comment

Reply
Al Williams says:

May 8, 2023 at 11:40 am

Rather than reply to specifics let me just say that — as I mentioned at the end — I probably won’t actually use this myself. I’ve been writing 3 letter opcodes for 40+ years and I’m too old to change. But not everyone feels the same and my point is if you agree with the original poster then you can certainly do it.

Report comment

Reply
Dave says:

May 8, 2023 at 12:38 pm

It’s an assembler source DSL?

Report comment

Reply
1. Dave says:
  
  May 8, 2023 at 12:39 pm
  
  (Not source, I guess.)
  
  Report comment
  
  Reply
Abd Alrahman Alnablse says:

May 8, 2023 at 1:26 pm

Could this be used to make learning to code easier?
Like a starting point.

Report comment

Reply
scatterbrained2 says:

May 8, 2023 at 1:46 pm

I would have to say that if you count all the compiled languages, assembler is FAR more often written than read, so this seems like scratching someone’s personal itch. As long as I’m not forced to use this new fangled syntax, go ahead and gild that lily.

Report comment

Reply
ytm0ytm says:

May 8, 2023 at 1:48 pm

This doesn’t seem to bring anything useful. Assembler programs are long and narrow so it would seems that making them shorter and wider would be better than making them longer and wider.

GEOS for C64/128 had a set of macros (google for ‘geosmac.inc’) for 6502 that made it feel as if it was a quasi 16-bit CPU. The program could be “read” like C or FORTH.

There was “LoadW r0, imm” that was a shortcut for “LDA #imm : STA r0+1”. That is one line instead of four. A comment would then explain what is the meaning of the “imm” constant.
This was useful also for byte-size operations: “CmpBI r0, imm” is one line for “compare byte to immediate value” instead of “LDA r0 : CMP #imm”.

Report comment

Reply
gbo says:

May 8, 2023 at 1:57 pm

Have you considered WebAssembly as your literate assembler? I would be curious about HDL implementantions (a subset) of WebAssembly CPU, and how to modify LLVM (that seem to be able to compile to WebAssembly) so that you could compile to a smaller subset of WebAssembly.

Report comment

Reply
1. Pete Cockerell says:
  
  May 8, 2023 at 4:16 pm
  
  Actually, this is how I’ve been using AssemblyScript. It lets you express control structures, assignments etc using TypeScript syntax, but you can drop down to generating individual Wasm instructions when necessary.
  
  Report comment
  
  Reply
James Brakefield says:

May 8, 2023 at 2:34 pm

I do this when I need to generate a binary for a new ISA. Distinct subroutines named for each op-code and a subroutine to assign label addresses. First pass to evaluate labels, second pass to generate code. So an overall subroutine to collect all the op-code and label calls which is then called twice.

Report comment

Reply
Michael Black says:

May 8, 2023 at 2:54 pm

I seem to recall Randy Hyde inventing a new set of mnemonics, for the 6502. And I recall people complaining about the Zilog.mnemonics.

Report comment

Reply
1. Paul LeBlanc says:
  
  May 8, 2023 at 5:09 pm
  
  My recollection of first using Z80 mnemonics instead of 8080 was one of sheer relief that my code all of a sudden made sense, even when I re-visited it a month later. A LD was always a LD, regardless of source and destination – the 8080 had at least 8 different mnemonics.
  
  Report comment
  
  Reply
  1. Michael Black says:
    
    May 8, 2023 at 5:16 pm
    
    I never used either. But I remember some complaints. But maybe it was about change rather than absolute.
    
    Report comment
    
    Reply
Gary Didio says:

May 8, 2023 at 3:08 pm

Why so much negativity? Kudos to Al for trying something new. Perhaps those of you that are being so negative and trashing Al’s work should be so bold as to post your own ideas for others to comment.

Keep up the great work, Al!

Report comment

Reply
1. oikos says:
  
  May 8, 2023 at 4:49 pm
  
  Agreed, not sure why everyone is so down on this idea.
  I personally don’t find the code any easier to read, probably the opposite actually, but you can at least appreciate the effort, right.
  
  Report comment
  
  Reply
2. Andrew says:
  
  May 8, 2023 at 9:09 pm
  
  By the time I got to your comment Gary, I was thinking the same thing. There sure is no lack of experts in the peanut gallery here. It’s a shame they never get around to publishing their brilliance for all to critique like Al has, time and time again.
  
  Report comment
  
  Reply
combinatorylogic says:

May 8, 2023 at 3:18 pm

The assembly snippets in Turing papers were quite interesting, using algebraic notation, not opcodes.

That sort of inspired me to experiment with what kind of syntax is suitable for even a very primitive assembly. E.g, here: https://github.com/combinatorylogic/soc/blob/master/backends/tiny1/sw/ice.s

Report comment

Reply
Joshua says:

May 8, 2023 at 3:19 pm

Or just use machine language? It has the advantage of not having any connections to English language.

Report comment

Reply
BT says:

May 8, 2023 at 3:51 pm

“FJCVTZS” is the fault of the mnemonic designer, not assembly as such. ld, mv, jz etc are perfectly clear once you know the basics.

Try writing your assembler mnemonics in lower case, it works wonders for readability. THAT is the thing “hampered by obscurity left over from the days when punched cards had 80 columns and a six-letter symbol was all you could manage” – these days we can manage lower case.

Try experimenting with indentation. I have found indenting loops in assembler is tricky (in my code anyway) because of multiple exit points/destinations, but indenting (just one space) after a push and outdenting after a pop works well for me. If you find yourself writing an indented ret or jumping to a different indentation level you’ve caught a bug before even the first assemble.

Report comment

Reply
James Newton says:

May 8, 2023 at 3:59 pm

The good:
– It’s very much easier to read and understand
– With a good code editor and a keyword file, predictive typing could make writing the code easier
– It’s pretty easy to implement.

The bad;
– “Why use 1 letter when I can use 10?”
– No improvement in the lines per instruction count. e.g. one line, one instruction.

I would strongly recommend looking at other systems which are very low level, but terse and readable. For example, why not adopt a destination, operation, source standard format and use that? e.g.

Main:
R3.H = D = R3Go.H // Generates LDI HIGH(R3Go), PHI R3
R3.L = D = R3Go.L // Generates LDI LOW(R3Go), PLO R3
P = 3 // Generates SEP R3
R3Go:
R9.H = D = Delay.H // LDI HIGH(Delay), PHI R9
R9.L = D = Delay.L // LDI LOW(Delay), PLO R9
R7.H = D = Stack.H // LDI HIGH(Stack), PHI R7
R7.L = D = Stack.L // LDI LOW(Stack), PLO R7
X = 7 // SEX R7
R7.M = D = 0 // LDI 0, STR R7
Loop: OUT 4
. . .
P = P //NOP
P = DELAY1 //BR DELAY1

M = $F0
Stack:
M = + 1

Shorter (you can cut out the ” = D = ” part if you are willing to assume the user knows that assignments blow through the D register). And it wouldn’t be hard to break the high and low stuff out. E.g. it could be

Main:
R3 = R3Go // Generates LDI HIGH(R3Go), PHI R3, LDI LOW(R3Go), PLO R3
P = 3 // Generates SEP R3

Report comment

Reply
1. Al Williams says:
  
  May 8, 2023 at 4:10 pm
  
  Actually, some of the “macros” (I didn’t do many, but more are possible) do expand one line to multiple instructions. Same for things like NOP that have a count in them. So the last “bad” I think is inaccurate and that could be expanded upon. For example, an SCRT call instruction, but I didn’t want to get that far down the 1802 rabbit hole….
  
  Report comment
  
  Reply
  1. James Newton says:
    
    May 8, 2023 at 5:19 pm
    
    Sorry I missed that. I think I should add an item to the “good” list as well: You can easily write more and more complicated “macros” based on the existing ones. e.g. it’s a very extensible system.
    
    I still /really/ like the destination, operation, source pattern.
    http://techref.massmind.org/techref/idea/minimalcontroller.htm
    
    Report comment
    
    Reply
WestfW says:

May 8, 2023 at 4:04 pm

> Floating-point Javascript Convert to Signed Fixed-point Rounding Towards Zero
That’s because ARM is a RISC CPU. :-)

The big piece missing from writing assembly language these days is a community with coding and style standards. Back when major projects from major vendors were written in assembly language, there was likely to be a set of standard macros for things like calling conventions, register names, stack frame manipulation, program structure, bitfield definition, and all sorts of stuff. New names for the basic opcodes were probably NOT among them.

I’ve used some assemblers with single-character opcodes. Brevity isn’t all it’s cracked up to be, either.

Report comment

Reply
Can Still Remember Z-80 Machine Instructions in Hex says:

May 8, 2023 at 4:26 pm

You might want to take a look at the mnemonics for PDP-11 assembler, with all of its indirect reference and execution modes (a truly orthogonal and symmetric instruction set). The addressing shorthand really made sense, and the mnemonics, while terse, were readable.

It would be really straightforward to define readable macros for obfuscated mnemonics, and let the preprocessor do your work for you.

Brave of you to show examples from the dreaded RCA1802. Yes, CMOS, yes, good for aerospace, but absolutely dreadful instruction set (I programmed in it long enough to find that it had either a call but no return instruction, or the other way around – yecch!).

Report comment

Reply
1. Al Williams says:
  
  May 8, 2023 at 5:28 pm
  
  Well I can do *most* Z80 codes in my head still, too, but not all of them. The 1802 was actually pretty easy to learn but a lot more like RISC where everything took a few opcodes. Most people used SCRT. There is a RET instruction, but no real call. But for small fast code you didn’t need it. You simply changed the program counter like I did in the example. Every register could be a program counter.
  
  Report comment
  
  Reply
2. Michael Black says:
  
  May 8, 2023 at 5:51 pm
  
  Back then, I think people chose a CPU for hardware reasons. The 1802 really appealed to the home builder. Just a bit of effort and you can start programming. They didn’t look over CPU datasheets to find the best opcode set or mnemonics. I chose the 6502 because of the article in Byte in 1975, reinforced when I was a KIM-1 in 1979. I barely used the 8080, actually the 8085, so the mnemonics always looked odd. I didn’t use Intel until I got a used Pentium in 2001 to run Linux
  
  Report comment
  
  Reply
  1. Joshua says:
    
    May 8, 2023 at 9:34 pm
    
    NEC used different mnemonics from Intel at the time, maybe they look less odd to you. See NEC V20/V30 documents. These also had an i8080 emulation mode, thus these document do cover i8080 mnemonics, too.
    
    Report comment
    
    Reply
3. Steve Jordan says:
  
  May 9, 2023 at 1:04 am
  
  The PDP-11 assembler code was indeed very easy to learn and to read. Moving from that to the Intel and Zilog instructions was a little painful, even a step backwards.
  
  Report comment
  
  Reply
Keshlam says:

May 8, 2023 at 4:29 pm

Funny; I always thought of C itself as a “literate assembly language”, or close to. And on modern processors, after all the microcode/parallelism/speculative execution, it isn’t clear that true assembler is still a human-compatible task without a highly specialized development environment that exposes those interactions.

I’ve written my share of on-the-metal code. I expect to do more of that, for specific tasks. But when even measurement applications may have a processor that will run Linux fast enough to keep up with the real world, it’s entirely too easy to throw hardware at a problem to save software development costs.

Report comment

Reply
1. Joshua says:
  
  May 8, 2023 at 9:37 pm
  
  “Funny; I always thought of C itself as a “literate assembly language”, or close to.”
  
  That’s because C is a so-called “Super Assembler”, rather than a true high-level language.
  A true blue high-level language would be BASIC and Pascal, for example.
  
  Report comment
  
  Reply
2. combinatorylogic says:
  
  May 9, 2023 at 12:42 am
  
  Tightly constrained MCUs are still a thing and will always be. You won’t put a full Linux capable SoC where Cortex-M0 is enough. Any mass produced device will eat up all development cost savings in no time.
  
  Report comment
  
  Reply
Justin says:

May 8, 2023 at 5:19 pm

I’ve written almost this exact same thing. For my custom CPU, I needed a macro assembler. The CPU was a one-instruction CPU, so it needed a lot of macros to do anything useful. It was easier to write it in C than write a real assembler. Instead of macros, I could just write C functions. And you can build the disassembler right into it as it’s assembling. It was far easier than making your own real assembler. With decent coding software, it’s no slower to write C than assembly.

Report comment

Reply
1. Joshua says:
  
  May 8, 2023 at 9:55 pm
  
  Because C is a weird assembler in disguise, it’s not a real-high level language. It’s inability to prevent memory leaks is just one of its design flaws.
  
  Report comment
  
  Reply
2. James Newton says:
  
  May 11, 2023 at 11:11 am
  
  The key there is the difficulty of adapting C to a new CPU. I don’t see a lot of tutorials on how to do that. I know it’s possible, and I know (more or less) how to do it, but I don’t see it as being easy to do, or easy to find out how to do.
  
  Report comment
  
  Reply
jouka says:

May 8, 2023 at 5:52 pm

Assembly isn’t hard to read or understand, it’s incredibly simple. It’s just tedious when doing high level tasks. For low-level stuff, it’s often more concise and easier to understand than equivalent high-level, which gets bogged down working around types and syntax.

Report comment

Reply
Martin Usher says:

May 8, 2023 at 6:02 pm

You don’t need anything special to do this. Most assemblers are macro assemblers which allow you to define one or more instructions as a user defined name. Macros can take parameters and invariably have string and arithmetic capability. Since assemblers allow you to plant bytes/words/etc directly you can even define a completely new instruction set.

Most programmers don’t bother, though. Instructions are instructions so as long as you have commented the code sufficiently well that someone can find out what its supposed to be doing that’s all we need. (For most modern processors I’ll usually set up a stack frame and work in ‘C’ for nearly everything.)

Report comment

Reply
Spud says:

May 8, 2023 at 6:18 pm

My favorite aspect of assembly programming is just how lean and mean it is.

I like your project and believe it will be useful to some people, but for me it is gonna be a pass, due to a bit too much stuff mixed in there.

Report comment

Reply
1. val says:
  
  May 8, 2023 at 8:14 pm
  
  Horrible when compared to pure asm
  
  Report comment
  
  Reply
Who says:

May 8, 2023 at 6:57 pm

“a six-letter symbol was all you could manage in the limited memory space of the computer. For example, without looking it up, what does the ARM instruction FJCVTZS do?”

What does it do? It takes up SEVEN letters worth of space.

Report comment

Reply
Anonymous says:

May 8, 2023 at 9:23 pm

This just makes everything worse! High level languages should be more like assembly, not the other way around!

Report comment

Reply
1. Joshua says:
  
  May 8, 2023 at 9:44 pm
  
  High-level languages feature hardware abstraction. Not sure if making them low-level really is desirable. If needed, peek&poke and DATA fields already can be mixed-in into source code of high-level languages. Otherwise we might end up with something broken like the C “language”, that macro assembler on sugar.
  
  Report comment
  
  Reply
James Reed Feeney says:

May 8, 2023 at 9:38 pm

Assembly, or machine language is so %#&! hard. Many compilers today are fairly efficent, and produce fast code. BUT, sometimes there’s a critical timing, or speed problem that can only be done with heavily tweaked code. At this point when you reach a blockaid that you just can’t get around, Assmebly Language starts to become the only way. Maybe it’s a FFT calculator, or some other numarically intensive function, or maybe a device driver that might require a decent into Assembly. Actual Assembly Language programming can sometimes be done by writing the dificult code in C++, or C#, and taking the Assembly output of the compiler, messaging the code directly in an text editor, and squeezing the last CPU cycle out of the code. This can make Assembly Language a bit more palatable.

Report comment

Reply
1. Joshua says:
  
  May 8, 2023 at 9:48 pm
  
  I beg to differ. I really recommend checking out 8052AH BASIC.
  It’s an ancient, yet sophisticated BASIC interpreter with interrupt support and other nifty features. Not a toy, at all. Don’t get me wrong, assembler is neat, but sometimes BASIC is superior.
  
  Report comment
  
  Reply
Daniel says:

May 8, 2023 at 10:15 pm

Have a look at the assembly language for the Blackfin architecture or its predecessor SHARC (or the yet older ADSP-2100 family). They make extensive use of mathematical syntax to denote assignments and use “if” constructs for conditional execution that resemble higher level languages.

Report comment

Reply
fonz says:

May 9, 2023 at 1:07 am

look at ADSP21xx assembly

Report comment

Reply
Andros says:

May 9, 2023 at 1:45 am

If we continue on this path, we will end up creating COBOL… again.

Report comment

Reply
David Given says:

May 9, 2023 at 11:19 am

I’ve been working in 6502 code recently (I’ve done a CP/M port…) and one of the first things I did was put together a set of macros for structured programming. This allows zero-cost loops, procedures, conditionals etc.

“`
.zproc skip_command_spaces
ldx command_ptr
.zloop
lda command_buffer, x
.zbreak eq
cmp #’ ‘
.zbreak ne
inx
.zendloop
stx command_ptr
rts
.zendproc
“`

(from https://github.com/davidgiven/cpm65/blob/master/apps/bedit.asm)

It made things _so much easier_. I don’t have to think about temporary labels; I get proper indentation; I get scopes which actually make sense so labels can be named sensibly without having to worry about conflicts; etc.

But, I’m not sure how much more abstraction I want. In the above code, the .zbreaks turns into a bne or beq and the .zendloop turns into a jmp. I could save a byte by replacing the loop with a .zrepeat / .zuntil eq; I know that X can never be zero, because of an invariant elsewhere (the command buffer length limitation). So, rather than use a three-byte jmp, I can use a two-byte beq, which is both smaller and faster. When writing machine code, this is the kind of detail that’s important, and you don’t want them hidden. There are also places where the scoping gets in the way, such as when you want to jump directly into the middle of a routine — you know, the kind of awfulness that machine code is so famous for.

(Although I will note that if compiling for the 65c02, it’d be entirely reasonable to throw an assembler switch and have the .zendloop turn into a bra instruction. That’s the kind of place where telling the assembler what you mean rather than what you want actually helps.

Report comment

Reply
1. James Newton says:
  
  May 11, 2023 at 11:08 am
  
  Nice! I did something like that for the SX processor (PIC knock off by ParallaxInc, was used in the Basic STAMP)
  
  http://techref.massmind.org/Techref/scenix/keymacs.src#
  
  Report comment
  
  Reply
Anders says:

May 10, 2023 at 4:41 am

I’m sorry, but this looks horrible. Just let a C compiler do the assembly for you.

Report comment

Reply
1. James Newton says:
  
  May 11, 2023 at 10:58 am
  
  You really miss the point here. This works when there is no C compiler, and it gives you far better control than a C compiler would ever give you. It may not be pretty, but it’s better than assembly, and more precise than C.
  
  Report comment
  
  Reply
rino mardo says:

May 10, 2023 at 7:12 am

this is why we can’t have good things. leave the readable low-level source code to C. if one can’t hack assembly language then it is not for them!

Report comment

Reply
1. James Newton says:
  
  May 11, 2023 at 10:57 am
  
  Such a poor attitude. Very sad to see people react like this.
  
  Report comment
  
  Reply
  1. 1a2s3d4f says:
    
    May 11, 2023 at 11:03 am
    
    He is perfectly right, if you can not handle assembly language then for sure this is not for you, which, btw, is also perfectly ok, is not for everyone.
    
    Report comment
    
    Reply
JCS says:

May 10, 2023 at 11:40 am

I think there was a 8086 similar thing called TERSE… but closer to BASIC…
here (thanks internet archive) https://web.archive.org/web/20230501173230/http://www.terse.com/

There was also C– (thanks wikipedia) https://en.wikipedia.org/wiki/C–
those were the days….

Report comment

Reply
1. James Newton says:
  
  May 11, 2023 at 10:56 am
  
  I really enjoyed Terse! I wish it hadn’t been lost. e.g. that the author open sourced it before shutting down.
  
  Report comment
  
  Reply
Landon Dyer says:

May 13, 2023 at 2:50 pm

Back in the 1980s, the Atari coin-op division used a bunch of “structured programming” macros for their 6502 assembly work. It had things like while loops, if-then-else and procedures.

It made for interesting reading, but I’m not sure it was an improvement over simply writing well-commented assembly to begin with.

[I never used those macros, since I was writing cartridges and had to know where every single byte of ROM was being spent]

Report comment

Reply
WestfW says:

May 13, 2023 at 5:30 pm

I did write a set of “structured programming” macros for the Gnu Assembler. If/else/elsif/endif and do/while/until/break. I believe that they’re written in such a way that you could add support for any CPU that is supported by GAS, and any CPU that you can coerce GAS into supporting.
https://github.com/WestfW/structured_gas
It was easier than expected, given GAS’s support of local labels.

Report comment

Reply
monsonite says:

July 25, 2023 at 3:04 am

Assembly language serves a purpose, it is the stepping stone to all high level languages. Learning assembly for a given processor is about as intimate as computing gets. You might like a HLL and “cut and paste” coding, but there is no better feeling than writing 50 bytes of asm, and knowing that you hand crafted that code.

The first thing you do is to write a monitor or interpreter. Famously Steve Wozniak’s WozMon which fitted into a 256 byte PROM. With 1K bytes, you can write a fairly useful interpreter.

However, if you choose C++ (Arduino) it takes 22 K bytes just to flash an LED on an STM32 mcu.

After 58 laps around the sun, I have witnessed many aspects of computing.

Assembly language is the entry point, everything else is built on top of it. But for every layer placed on top of raw assembly – you are looking at a x10 speed reduction.

20 years ago, I wrote 16Fxx PIC assembly code to send and receive DTMF and V23 modem tones. You are not going to do that on a PIC 16F writing in C!

Report comment

Reply
1. Andrew Wasson says:
  
  July 25, 2023 at 11:07 pm
  
  Pretty good assessment. I write super high level abstracted code these days and I can move and manipulate mountains of data but the code is really huge, the framework it sits on is huge and it takes loads of computing power to run it.
  
  In my spare time I write ASM and sometimes just op code machine language, which is what I started with back in the day. It’s not practical and it doesn’t pay the bills but it informs the way I approach writing high level code and it’s fun.
  
  Report comment
  
  Reply

Hackaday

A Literate Assembly Language

Reuse, Recycle

Modernize

Under the Hood

Bottom Line

80 thoughts on “A Literate Assembly Language”

Leave a Reply to JoshuaCancel reply

Search

Never miss a hack

If you missed it

Libogc Allegations Rock Wii Homebrew Community

A Gentle Introduction To COBOL

The DIY 1982 Picture Phone

Life On K2-18b? Don’t Get Your Hopes Up Just Yet

From PostScript To PDF

Our Columns

Supercon 2024: Turning Talk Into Action

Hackaday Podcast Episode 319: Experimental Archaeology, Demoscene Oscilloscope Music, And Electronic Memories

This Week In Security: AirBorne, EvilNotify, And Revoked RDP

Researchers Create A Brain Implant For Near-Real-Time Speech Synthesis

FLOSS Weekly Episode 831: Let’s Have Lunch

Reuse, Recycle

Modernize

Under the Hood

Bottom Line

80 thoughts on “A Literate Assembly Language”

Leave a Reply to JoshuaCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns