The first computer I personally owned had 256 bytes of memory. Bytes. The processor in my mouse and keyboard both have more memory than that. Lots more. Granted, 256 bytes was a bit extreme, but even the embedded systems I was building as part of my job back then generally had a small fraction of the 64K bytes of memory they could address.
Some people are probably glad they don’t have to worry about things like that anymore. Me, I kind of miss it. It was often like a puzzle trying to squeeze ten more bytes out of an EPROM to get a bug fix or a new feature put in. I though with the 1K challenge underway, I might share some of the tricks we used in those days to work around the small memory problems.
I Can’t Help You…
Unfortunately, optimization at this level very often requires specific solutions and no one can help you unless they know exactly what you are doing. Often, the best ways to save memory involve either changing your algorithm or using CPU-specific features. For example, if you are storing a lot of strings but don’t need a lot of characters, perhaps you can pack your strings. On the right CPU, you might spend two bytes writing:
This presumably clears hypothetical register B and usually takes two bytes of encoding (one byte for the MOV with a bitfield that indicates the B register, and another byte for the literal zero). However, if available, an XOR instruction probably takes one byte:
This also zeros register B since the XOR of anything with itself is zero. A subtract could do the same thing, if that’s available.
These are all specific to your project and platform, though. So for those sort of things you are on your own.
But, Some Things are Universal
One of the easiest ways to change your program involve subroutines. Most CPUs have some facility for call and return (although my 256 byte computer didn’t, oddly). The first rule is to use them. If you have four or five instructions being repeated, that’s a good candidate for a subroutine. Common sense, right? But there are a few other tricks involving subroutines to keep in mind.
The first idea is to look for items you can move into a subroutine, even if it requires a little tweak somewhere in the code. To illustrate this (and some other ideas) I went to this online assembly language simulator. It has a simple assembly language that is similar to real CPUs. The ability to be able to step through the code in your web browser is nice, too. Of course, you’ll have to apply the concepts to your specific CPU.
Below is a simple program to read a four-character ASCII string that represents a decimal number and puts it in a 16-bit register. So the string “255” will turn into 00FF in the register.
; Simple example ; Convert decimal # at istring into 16-bits in B:C JMP start start: CALL convert ; more interesting things happen here HLT convert: MOV B,0 MOV C,0 ; answer in B:C MOV D, istring ; Point to var CALL decdig ; convert each digit INC D CALL decdig INC D CALL decdig INC D CALL decdig RET shift16: MOV A,0 SHL C,1 JNC shift16a INC A shift16a: SHL B,1 ADD B,A ; output carry is meaningless RET decdig: CALL shift16; *2 PUSH B PUSH C CALL shift16; *4 CALL shift16 ; *8 POP A ADD C,A ; 8X+2X=10X! JNC decdig0 INC B decdig0: POP A ADD B,A MOV A,[D] SUB A,'0' ADD C,A JNC xret INC B xret: RET istring: DB "1924" ; input DB 0 ; String terminator end:
This assembles into 89 bytes. Not much, but we can do better. The first thing to notice is that the D register holds the address of the string. The program loads it at the start and after each call to the digit conversion subroutine, there is an increment to point to the next character. One obvious way to save some space is to factor the increment instruction into the subroutine. You can put it towards the end, or–if it makes more sense–load the address minus one (most assemblers can figure that out at assembly time). In this case, it doesn’t really matter, but either way will get you to 85 bytes. I put the POP D instruction after the xret label and the convert routine now looks like this:
convert: MOV B,0 MOV C,0 ; answer in B:C MOV D, istring ; Point to var CALL decdig ; convert each digit CALL decdig CALL decdig CALL decdig RET
Notice those last two lines. You can shave a byte by changing the final CALL into a JMP. The return at the end of decdig will go back to the original caller. You can save even more space by rearranging the code so that decdig appears right at that spot:
CALL decdig CALL decdig CALL decdig decdig: ...
You can apply this even more. Notice that you have 4 calls to decdig which could be considered two groups of two. So you could write:
CALL decdig2 decdig2: CALL decdig decdig: ....
You might have to think about that for a minute. The first call executes two operations and the return goes back to decdig2 where you do two more operations. At the end, the return goes back to the original caller.
That’s not very readable, I’ll admit. Comments are free, so you should use them if you aren’t writing a blog post around the code. In particular, any time code depends on being in a particular location, you ought to document it:
CALL decdig2: ; **** FALL THROUGH decdig2:
This prevents you from accidentally moving it or inserting code and breaking things. Now we are at 80 bytes. You can pull the same trick with the shift16 subroutine, but since it is only called once, there’s no real savings in this case.
Saving 9 bytes doesn’t seem like much, but it is about 10% of the original code. Sometimes that’s all you need. You could probably optimize the algorithm, too.
On some CPUs, you can do conditional return (that is, like return on zero or RZ). But on other processors, you can only do a branch or jump on a condition. This leads to code like this:
JNZ nextinst RET nextinst:
However, you probably have a return somewhere in your code already (like the xret label in the example program). That means you can write:
There you’ve saved another byte. Like pennies, they all add up.
What’s your favorite space-saving trick? Share it in the comments unless you want to save it as your secret sauce for the 1K challenge. You have until January 5th to squeeze out those extra bytes.