Full-Blown Cross-Assembler…in A Bash Script

Have you ever dreamed of making a bash script that assembles Intel 8080 machine code? [Chris Smith] did exactly that when he created xa.sh, a cross-assembler written entirely in Bourne shell script.

Assembly language (like the above) goes in, a binary comes out.

The script exists in part as a celebration of the power inherent in a standard Unix shell with quite ordinary POSIX-compliant command line tools like awk, sed, and printf. But [Chris] admits that mostly he found the whole project amusing.

It’s designed in a way that adding support for 6502 and 6809 machine code would be easy, assuming 8080 support isn’t already funny enough on its own.

It’s not particularly efficient and it’s got some quirks, most of which involve syntax handling (hexadecimal notation should stick to 0 or 0x prefixes instead of $ to avoid shell misinterpretations) but it works.

Want to give it a try? It’s a shell script, so pull a copy and and just make it executable. As long as the usual command-line tools exist (meaning your system is from sometime in the last thirty-odd years), it should run just fine as-is.

An ambitious bash script like this one recalls how our own Al Williams shared ways to make better bash scripts by treating it just a bit more like the full-blown programming language it qualifies as.

6 thoughts on “Full-Blown Cross-Assembler…in A Bash Script

  1. It’s bourne shell assembler. It’s all scripting, and it’s a little bit for everybody. I wanna get, you know, all different operands. You know: the XCHG, LDA, SUB, RNC opcodes. I also wanna bring the ♂DAD♂DI♂. The main theme is every opcode is getting assembled whether they like it or not. That’s the main theme and awk hex.

  2. I like how simple and clean it is!

    Two downsides…first, it’s sensitive to whitespace, which is actually not uncommon in assemblers, historically speaking. But you have to do “MVI A, 1” instead of “MVI A,1”. The other is that its two-pass label resolution isn’t “fully featured” in my mind…real assemblers have a step between the two passes that resolves the address relationships. Like gas supports the syntax “.word 1 DUP(LABEL1-LABEL2)”, where the size of a region determines how many copies of the word it will emit. Which, my criticism is really just to say, it’s pretty impressive that the two-pass structure is fully there and quite simple. The only thing “special” is that the first pass sets the label addresses. It doesn’t even skip emitting the code, the driver just redirects it to /dev/null.

    Something that tickled me is he used dc to do some basic arithmetic. I also use dc heavily in my shell scripts but for this operation, the shell $((..)) would have worked just as well. Especially since for the base conversion, he was already using printf and could have used %02X to convert to hex. That’s not a criticism, just a reflection of how very idiomatic we all are when writing shell code!

    I enjoyed reading something so simple…though if i were to make an assembler with this much simplicity, i’d probably use forth. Or for just slightly more complexity, you can write a truly full-featured assembler in C. heh

    1. Yes, the space-sensitivity is a sticking point with me as well, but I’ll be honest, fixing it — while it’s probably possible — would complicate things a bit too much. And I’ll admit I considered doing something other than dc for that math, but I’ll be honest, dc has been around since 1971 or so. $(( is a much newer innovation, I think. I wanted to go as far back as possible. No new utilities, no GNU utilities, no shell extensions, where it’s reasonable to do it. I think at the moment, there are some POSIX-specific things upon which it relies, but that still gets us compatible with most things straight out of the box, back to the early 90s.

      You’re right, though; I could have just printf’d in a different numeric base and that may have even been slightly more efficient. Mostly, that part is just down to the way things fell together. I was initially trying to avoid doing printf at all, again, in favor of something maybe a bit less modern, and the conversion of the numeric bases was implemented before I decided to give up for the moment and use printf for output. One thing I thought about doing to write out binary data on really old Unix was to write the actual binary into a shell variable or something, a whole string of 0x0 to 0xFF, and maybe cut each character out when I wanted to write it. I think that could work and work well, but it would look like a terrible mess.

      As for the label handling, that’s absolutely at the bare minimum level right now (and error-checking is bad for it, besides), and it’s something I’ve considered improving, though I had not considered what you suggested. Not until now, anyway. One other thing I did consider throwing in was local labels. Of course, that greatly complicates the preprocessing. If we’re talking about just doing some math on the label addresses, well, maybe that could be done with just another eval.

  3. This reminds me of Henry Spencer’s “Amazing Awk Assembler,” a cross-assembler written entirely in awk and sed back in the mid 1980s. Unusably slow, but impressive that it worked at all.

  4. beginning in late 1983, I worked as a student assistant to a scientist who designed the controlling nodes (ECB-Bus systems with intel 8085 CPUs) for hardware connected to the particle accelerator in Bonn. He wrote a Z80(syntax) cross assembler as a set of macros to the VAX/VMS macro assembler, which I used for my tasks.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.