Dirty Tricks For 6502 Programming

August 20, 2019

We know the 6502 isn’t exactly the CPU of choice for today’s high-performance software, but with the little CPU having appeared in so many classic computers — the Apple, the KIM-1, The Commodores, to name a few — we have a real soft spot for it. [Janne] has a post detailing the eight best entries in the Commodore 64 coding competition. The goal was to draw an X on the screen using the smallest program possible. [Janne] got 56 bytes, but two entrants clocked in at 34 bytes.

In addition to the results, [Janne] also exposes the tricks people used to get these tiny programs done. Just looking at the solution in C and then 6502 assembly is instructive. Naturally, one trick is to use the existing ROM code to do tasks such as clearing the screen. But that’s just the starting point.

Some of the efficiencies are good practice on any CPU. For example, converting multiplication in a loop into a running total is always a good idea unless you have hardware multiplication that is as cheap as an addition. Some of the tricks are a bit more specific. For example, it was more efficient to draw the figure at the bottom of the screen and scroll than it was to draw each part at a specific X and Y position.

You might think some of these tricks aren’t really dirty, but then you’ll see self-modifying code. A legitimate hack, but always messy. There are also some special tricks used to get the C64 to load the machine code without going through BASIC first.

You might think 34 bytes would be the smallest possible program. You’d be wrong. After the contest, everyone had a look at all the entries and several people were able to come in even smaller — in one case, 29 bytes.

We wished we could find the PRG files so you could run them on an emulated C64. But we were too lazy to build them up from source. If you don’t have a C64, you could always pull out an FPGA. You can even build a new one — seriously.

30 thoughts on “Dirty Tricks For 6502 Programming”

paul says:

August 20, 2019 at 8:28 pm

Reading about small programs reminds me of the smallest program I’ve ever used.
It was for x86 / dos and it waited for a key input and returned the key as the error level.
It’s intended use was to enhance batch files, so unlike just writing an X for fun, it was an acutally useful utility.

I don’t remember it’s exact size, but It was a .com file and I think it was 3 asm instructions.

Report comment

Reply
1. DrTiTus says:
  
  August 20, 2019 at 8:49 pm
  
  At a guess, I’d say:
  
  mov ah, 07h
  int 21h
  mov ah, 4ch
  int 21h
  
  Report comment
  
  Reply
  1. Alex Rossie says:
    
    August 21, 2019 at 9:51 am
    
    Function 7 doesn’t let you redirect output sounds like paul wanted that.
    
    Report comment
    
    Reply
2. iconjack says:
  
  August 20, 2019 at 9:20 pm
  
  MOV AH,7 ; get key code in AL
  INT 21H
  MOV AH, 4CH ; exit with AL = return code
  INT 21H
  
  Report comment
  
  Reply
3. Alex Rossie says:
  
  August 21, 2019 at 5:11 am
  
  It’s been a very long time, was it as simple as?
  
  mov ah, 1
  int 21h
  int 20h
  
  Report comment
  
  Reply
  1. Alex Rossie says:
    
    August 21, 2019 at 5:49 am
    
    Looks like int 20h is return without errorlevel but 21 also provides 4c which should work, it was probably
    
    mov ah, 1
    int 21h
    mov ah, 4c
    int 21h
    
    Which is 8 bytes B401 CD21 B44C CD21
    
    But I can’t get it working in dosbox so can’t confirm.
    
    Report comment
    
    Reply
  2. Comedicles says:
    
    August 21, 2019 at 8:30 am
    
    This is disgusting! A MOV instruction in comments about 6502.
    
    Report comment
    
    Reply
tetsuoii says:

August 20, 2019 at 9:43 pm

The world relies on hyper optimized code. Video compression, 3D rendering, network protocols, search algorithms. Nothing works without tightly packed bits executed by brutally optimized loops.

Performance by default. No more bloat.

Report comment

Reply
1. Ostracus says:
  
  August 20, 2019 at 10:49 pm
  
  Time vs money.
  Modern hardware is both more efficient and capable.
  
  Report comment
  
  Reply
  1. tekkieneet says:
    
    August 20, 2019 at 10:58 pm
    
    The compilers for the Arm are better too. There are plenty of CPU registers, they do away with specialized registers and extending them. As a result, the compiler do a lot less register shuffling and pend more time on generating better code.
    
    Report comment
    
    Reply
2. Daid says:
  
  August 20, 2019 at 11:20 pm
  
  No. Not performance by default. Maintainability by default. Some other idiot has to work with your code as well. Performance is only important for certain bits of code.
  
  I’ve 15 years of embedded software experience. And I teach everyone the 3 basic rules of code optimization:
  1) Do not optimize.
  2) Do not optimize yet.
  3) Profile the code.
  
  Report comment
  
  Reply
  1. Shannon says:
    
    August 21, 2019 at 3:57 am
    
    I second this!
    I’ve worked with distributed systems on extremely capable machines, and I’ve worked with resource constrained embedded, real time systems. 99 times out of 100 the best performance gains are not by cleverly hyper-optimising the existing code, they’re from structural or algorithmic changes. Where there has been cleverly optimised code it’s almost always needed to be tweaked eventually which ruins the optimisations, and takes ages to decipher and understand the implications.
    
    Report comment
    
    Reply
    1. Douglas Coulter says:
      
      August 21, 2019 at 9:31 am
      
      Agree with all of the above…but there’s a but.
      Every time I do somthing tricky – it gets a paragraph of comments, and those are specially marked with something unlikely to be in real code, like @@@.
      If everyone did this, there’s be little or no argument. I’m retired these days, but still, coming back to years old (or less…) code – someone else may as well have written it – I do this in self-defense. I did start the practice as a working consultant however- the customers really liked that style, and the only issues were when their own software guys ignored what I wrote, explaining things like “I don’t have the check ready here or use a semaphore because, unless you change something else in a stupid way, we’ll always be ready when we get here”.
      
      And in fact, it’s the code that doesn’t need to be there (or the wires/components if hardware) due to careful design that are in fact the hardest to figure out for the next guy (even if it’s you).
      
      Those paragraphs are really worth it, and don’t take up space on the target.
      
      We all seem to have no problem spewing words online, just do some of that in comment blocks!
      
      Report comment
      
      Reply
  2. MattyD says:
    
    August 21, 2019 at 9:48 am
    
    Could you provide any good resources for embedded software coding? I’d appreciate it as I’d like to try my hand at it. Thank you on advance
    
    Report comment
    
    Reply
  3. Nerd Ralph says:
    
    August 21, 2019 at 12:29 pm
    
    I strongly disagee. Lazy unoptimized cut/paste coding is a pain to maintain. Simpler, smaller code is easier to maintain.
    
    Report comment
    
    Reply
  4. Andrew Kingdom says:
    
    September 26, 2019 at 1:29 pm
    
    Agree (mostly) that you don’t want to prematurely optimise. That said, an experienced coder should know what approach will be better optimised. Code should be self-documenting where possible, or contain clear comments.
    
    Report comment
    
    Reply
    1. Dave says:
      
      September 26, 2019 at 5:54 pm
      
      …then you can enter into the realm of highly optimised self-modifying code, without a shred of a single comment. Heaps of fun trying to work out what the purpose and the end result is when tracing or disassembling.
      
      Report comment
      
      Reply
3. NMI80 says:
  
  August 21, 2019 at 1:39 am
  
  Smallest code does mean fastest code…
  It’s always a matter of balance between both parameters… for the skilled ones…
  Day to day wanabe-programmers are just reusing bricks they do not understand over other bricks their partners do not understand … the current result of software indutry is awfull ;-)
  
  Report comment
  
  Reply
Jeff Martin says:

August 20, 2019 at 9:43 pm

It’s great to study and consider different possibilities whenever practical. I’ve learned many useful tricks from other talented programmers that help me shrink my code to fit into tighter spaces- microcontrollers especially. It really does take challenges like this to stretch your mind and break out of the box.

Report comment

Reply
tekkieneet says:

August 20, 2019 at 10:49 pm

I did some work with the 8051 and did some crazy things. I was able to implement JTAG – sort of like SPI. Normally it is very painful because the CPU can only do stuff with the accumulator, so handling both MOSI and MISO (TDI,TDO) would required a lot of shuffling, shifting, bit testing and branches. I unrolled the loops and did away with the bit shifting, accumulator with bit variables. My 8051 assembly code was pretty fast compare to some of the sloppy AVR JTAG C code I have seen. :)

I wouldn’t bother with 8-bit processors. Their days are long gone. I prefer to start with the cheaper and more capable Arm chips. Nothing like writing bare metal code to know the modern day peripherals well and using multiple DMA channels, IRQ to shift the loads. No longer I need to go into assembly any more. The few times I looked at the Arm compiler generated code, I was impressed.

Report comment

Reply
1. thedryparn says:
  
  August 21, 2019 at 12:35 am
  
  8-bit is still relevant. For quick and small projects they are unbeatable. They are easy to implement both in hardware and software and require a small amount of components to implement.
  
  Report comment
  
  Reply
  1. Ostracus says:
    
    August 21, 2019 at 7:59 am
    
    Plus much like ARM, MIPS, and other such, it may be embedded as a core of a bigger chip.
    
    Report comment
    
    Reply
2. rubypanther says:
  
  August 21, 2019 at 11:22 am
  
  8 bit are not gone at all. I do a lot with AVR, and if you’re building a product you can go a ways cheaper at 8bit than 16 or 32.
  
  8051 is still popular, if you’re going even smaller than AVR.
  
  Mouser has an 8051 from Silicon Labs in a QFN-20 package. (New product! You don’t see that on things long gone.) For 32 cents. UART, 15 ADC channels, 4 16b timers. For some use cases that is actually beefier than the cheapest AVRs.
  
  For about 2 cents you can get an 8051 from China (also recent designs) but don’t ask about documentation. I think Dave at EEVblog did a video about them.
  
  Report comment
  
  Reply
  1. tekkieneet says:
    
    August 21, 2019 at 1:08 pm
    
    > I do a lot with AVR
    Arm are cheaper and have better specs than AVR. :P I played with the Silicon Labs part for a project, but I am not impressed as the 8051 code density is bad and their Eclipse tool isn’t that hot either.
    
    I have played with various Arm chips from other vendors and have no problems switch families or vendors. Can’t say the same for 8-bit world.
    
    Right now my chips of choice are $0.20 STM8F003, $0.36 STM32F030 and $1 STM32F103. These are commodity parts that shows up on Aliexpress and they all use same hardware debugging clone that cost $2.
    
    Report comment
    
    Reply
RandysView says:

August 20, 2019 at 11:27 pm

The software world is very bimodal. On one hand, and as another commenter pointed out, we have hyper-optimized loops for realtime conversion (mostly). On the other we have hyper layered code, sometimes running through not one but two interpreters, and whose overhead we simply rely on the massive performance of modern processors to hide. But, in the middle are almost limitless opportunities for competitive advantage by simply optimizing a handful of critical loops in current code.

One excellent example is an extremely high quality electronic piano. This program uses massive samples for each piano (well over 40 GB per piano) to give not only the played note, but harmonics and resonances of the original instrument. Even on a very fast laptop processor (i7) it required over 35% of the CPU to continually mix and remix an enormous resonance profile. And this all to get a single sample output at 96KHz. The most critical loop was this resonance creation and mixing loop and it took 80% of the processing time. By recoding in assembly, once for each processor family supported, the program suddenly took less than 5% of the CPU and was able to run on extremely cheap hardware (as long as the data bandwidth was sufficient). This opened up the possibility of many pianos on the same hardware, dedicated boxes for the piano and other functions (mixers, analysis, video integration, etc.). The final loop was about 60 instructions, unrolled once.

This is all the legacy coding assembly starting with the 6502. I think it’s just as valuable today even though the opportunity goes missed in too many cases.

Report comment

Reply
M S says:

August 21, 2019 at 6:07 am

If you want performance, and security, code in JAVA. If you want security holes, poor performance, code in ASSembly or C/C++.

Java is fast. It’s the fastest, nothing is faster than java, nothing.

It gets faster with each iteration, at just a small cost of memory and cpu cycles.

I mean, who doesn’t have a core-i7 with 16Gb of RAM as the minimum nowadays?

With just a few GB of helper libraries, and few GB for compatibility libraries, and another few GB for versions of the interpreter and byte code compiler, you can have “hello world” compile and run in just minutes!

Programmer time is expensive, user time is free! and so is user hardware!

Report comment

Reply
1. Shannon says:
  
  August 21, 2019 at 7:12 am
  
  Uh huh, sure, uh huh.
  
  https://media.giphy.com/media/ZgqJGwh2tLj5C/giphy.gif
  
  Report comment
  
  Reply
2. DED says:
  
  August 21, 2019 at 10:29 am
  
  Nice trolling!
  
  Report comment
  
  Reply
Bushy says:

August 27, 2019 at 11:48 pm

my first attempt with z80 code on the VZ200 got 44 bytes. Still learning. But it did actually assemble correctly and produced an accurate X on first attempt.

Report comment

Reply
1. Dave says:
  
  October 21, 2019 at 2:30 am
  
  …I managed to get it down to 34 bytes. And a fellow VZ enthusiast got it down to 26 btyes using some cool ideas.
  
  Report comment
  
  Reply

Hackaday

Dirty Tricks For 6502 Programming

30 thoughts on “Dirty Tricks For 6502 Programming”

Leave a ReplyCancel reply

Search

Never miss a hack

If you missed it

UDP Broadcasting And The Joys Of IPv4 Subnetting

The Death Of Physical Media And The Real Challenges To Software Archiving

A Brief History Of The Crazy Old 7-Segment Display

Is Now The Time For Volumetric 3D Printing?

Ultra-Long Range Flights To Ease Australian Air Travel

Our Columns

2026 Hackaday Supercon: Call For Proposals

Hackaday Links: July 12, 2026

When Changing Scale Isn’t Just More Of The Same

Hackaday Podcast Episode Ep 377: Parallel Pixels, Wiggly Consoles, And Seven Segments

This Week In Security: Escaping Linux VMs, Vulnerable Solar, Confusing AI (Again), And Confusing NPM Malware

30 thoughts on “Dirty Tricks For 6502 Programming”

Leave a ReplyCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns