Ask Hackaday Answered: The Tale of the Top-Octave Generator

We got a question from [DC Darsen], who apparently has a broken electronic organ from the mid-70s that needs a new top-octave generator. A top-octave generator is essentially an IC with twelve or thirteen logic counters or dividers on-board that produces an octave’s worth of notes for the cheesy organ in question, and then a string of divide-by-two logic counters divide these down to cover the rest of the keyboard. With the sound board making every pitch all the time, the keyboard is just a simple set of switches that let the sound through or not. Easy-peasy, as long as you have a working TOG.

I bravely, and/or naïvely, said that I could whip one up on an AVR-based Arduino, tried, and failed. The timing requirements were just too tight for the obvious approach, so I turned it over to the Hackaday community because I had this nagging feeling that surely someone could rise to the challenge.

The community delivered! Or, particularly, [Ag Primatic]. With a clever approach to the problem, some assembly language programming, and an optional Arduino crystalectomy, [AP]’s solution is rock-solid and glitch-free, and you could build one right now if you wanted to. We expect a proliferation of cheesy synth sounds will result. This is some tight code. Hat tip!

Squeezing Cycles Out of a Microcontroller

Let’s take a look at [AP]’s code. The approach that [AP] used is tremendously useful whenever you have a microcontroller that has to do many things at once, on a rigid schedule, and there’s not enough CPU time between the smallest time increments to do much. Maybe you’d like to control twelve servo motors with no glitching? Or drive many LEDs with binary code modulation instead of primitive pulse-width modulation? Then you’re going to want to read on.

There are two additional tricks that [AP] uses: one to fake cycles with a non-integer number of counts, and one to make the AVR’s ISR timing absolutely jitter-free. Finally, [Ag] ended up writing everything in AVR assembly language to make the timing work out, but was nice enough to also include a C listing. So if you’d like to get your feet wet with assembly, this is a good start.

In short, if you’re doing anything with hard timing requirements on limited microcontroller resources, especially an AVR, read on!

Taking Time to Think

The goal of the top-octave generator is to take an input clock and divide it down into twelve simultaneous sub-clocks that all run independently of each other. Just to be clear, this means updating between zero and twelve GPIO pins at a frequency of 1 MHz or so — updating every twenty clock cycles at the AVR’s maximum CPU speed. If you thought you could loop through twelve counters and decide which pins to flip in twenty cycles, you’d be mistaken.

But recognizing the problem is the first step to solving it. Although the tightest schedule might require flipping one pin exactly twenty clocks after flipping another, most of the time there are more cycles between pin updates — hundreds up to a few thousand. So the solution is to recognize when there is time to think, and use this time to pre-calculate a buffer full of next states.

[Ag]’s solution uses a few different loops that run exactly 20, 40, and 60 cycles each — the longer versions being just the 20-cycle one padded out with NOPs. These loops run inside an interrupt-service routine (ISR). When there are 80 or more cycles of thinking time until the next scheduled pin change, control is returned to the main loop and the next interrupt is set to re-enter the tight loops at the next necessary update time.

All the fast loop has to do is read two bytes, write them out to the GPIO pins, increment the pointer to the next row of data, and figure out if it needs to stall for 20 or 40 additional cycles, or set the ISR timer for longer delays and return to calculations. And this it can do in just twelve of the twenty cycles! Slick.


Taking a step back from the particulars of the top-octave generator, this is a classic problem and a classic solution. It’s worth your time to internalize it, because you’ll run into this situation any time you have real-time constraints. The problem is that on average there’s more than enough time to complete the calculations, but that in the worst cases it’s impossible. So you split the problem in two parts: one that runs as fast as possible, and one that does the calculations that the fast section will need. And connecting together fast and slow processes is exactly why computer science gave us the buffer.

In [AP]’s code, this buffer is a table where each entry has two bytes for the state of the twelve GPIO pins, and one byte to store the number of clock cycles to delay until the next update. One other byte is left empty, yielding 64 entries or 256 bytes for the whole table. Why 256 bytes? Because the AVR has an 8-bit unsigned integer, wrapping around from the end of the table back to the beginning is automatic, saving a few cycles of wasteful if statements.

But even with this fast/slow division of labor, there is not much time left over for doing the pre-calculation. Sounding the highest C on a piano keyboard (4186 Hz) with a 20 MHz CPU clock requires toggling a GPIO pin every 2,390 cycles, so that’s the most time that the CPU will ever see. When the virtual oscillators are out of phase, this can be a lot shorter. By running the AVR at its full 20 MHz, and coding everything in assembly, [AP] can run the calculations fast enough to support twelve oscillators. At 16 MHz, there’s only time for ten, so every small optimization counts.

Some Optimization Required

Perhaps one of the cleverest optimizations that [AP] made is the one that makes this possible at all. The original top-octave chips divide down a 2 MHz square wave by a set of carefully chosen integer divisors. Running the AVR equivalent at 2 MHz resolution would mean just ten clocks per update and [AP]’s fast routine needed twelve, so the update rate would have to be halved. But that means that some odd divisors on the original IC would end up non-integral in the AVR code. For example, the highest C is reproduced in silicon as 2 MHz / 239, so to pull this off at 1 MHz requires counting up to 119.5 on an integer CPU. How to cope?

You could imagine counting to 119 half of the time, and 120 the other. Nobody will notice the tiny difference in duty cycle, and the pitch will still be spot on. The C programmer in me would want to code something like this:

uint8_t counter[12] = { 0, ... };
uint8_t counter_top[12] = { 119, ... };
uint8_t is_counter_fractional[12] = { 1, 0, ... };
uint8_t is_low_this_time[12] = { 0, ... };

// and then loop
for ( i=0 ; i<12; ++i){
  if ( counter[i] == 0 ){
    if ( is_counter_fractional[i] ){
      if ( is_low_this_time[i] ){
        counter[i] = counter_top[i];
    is_low_this_time = 0;
      else {
        counter[i] = counter_top[i] + 1;
    is_low_this_time = 1;

That will work, but the ifs costs evaluation time. Instead, [AP] did the equivalent of this:

uint8_t counter[12] = { 0, ... };
uint8_t counter_top[12] = { 119, ... };
uint8_t phase[12] = { 1, 0, ... };

for ( i=0 ; i<12; ++i){
  if ( counter[i] == 0 ){
    counter[i] = counter_top[i] + phase[i];
    counter_top[i] = counter[i];
    phase[i] = -phase[i];

What’s particularly clever about this construction is that it doesn’t need to distinguish between the integer and non-integer delays in code. Alternately adding and subtracting one from the non-integer values gets us around the “half” problem, while setting the phase variable to 0 means that the integer-valued divisors run unmodified, with no ifs.

The final optimization shows just how far [AP] went to make this AVR top-octave generator work like the real IC. When setting the timer to re-enter the fast loop in the ISR, there’s the possibility for one cycle’s worth of jitter. Because AVR instructions run in either one or two clock cycles, it’s possible that a two-cycle instruction could be running when the ISR timer comes due. Depending on luck, then, the interrupt will run four or five clocks later: see the section “Interrupt Response Time” in the AVR data sheet for details.

In a prologue to the ISR, [AP]’s code double-checks the hardware timer to see if it has entered on a one-cycle instruction, and adds in an extra NOP to compensate. This makes the resulting oscillator nearly jitter free, pushing out a possible source of 50 ns (one cycle at 20 MHz) slop. I don’t think you’d be able to hear this jitter, but the result surely looks pretty on the oscilloscope, and this might be a useful trick to know if you’re ever doing ultra-precise timing with ISRs.

The Proof of the Pudding

Naturally, I had to test out this code on real hardware. The first step was to pull a random Arduino-clone out of the closet and flash it in. Because “Arduinos” run at 16 MHz with the stock crystal, the result is that a nominal 440 Hz concert A plays more like a slightly sharp F, a musical third down. It sounds fine on its own, but you won’t be able to play along with any other instruments that are tuned to 440 Hz.

[AP]’s preferred solution is to run the AVR chip at 20 MHz. Since the hardware requirements are very modest, you could use a $0.50 ATTiny816 coupled with a 20 MHz crystal and you’d have a top-octave generator for literal pocket change — certainly cheaper than buying even an Arduino clone. I tested it out with an ATMega48P and a 20 MHz crystal on a breadboard because it’s what I had. Or you could perform crystalectomy on your Arduino to get it running at full speed.

We went back and forth via e-mail about all the other (firmware) options. [AP] had tried them all. You could trim the ISR down to 16 cycles and run at 16 MHz, but then there’s only enough CPU time in the main loop to support ten notes, two shy of a full octave. You could try other update rates than 1 MHz, but the divisors end up being pretty wonky. To quote [AP] from our e-mail discussion on the topic:

“After playing with the divider values from the original top octave generator IC and trying different base frequencies, it appears that the 2 MHz update rate is a “sweet spot” for getting reasonable integer divisors with < +/-2 cents of error. The original designers of the chip must have done the same calculations.”

To make a full organ out of this setup, you’ll also need twelve binary counter chips to divide down each note to fill up the lower registers of the keyboard, but these are easy to design with and cost only a few tens of cents apiece. All in all, thanks to [AP]’s extremely clever coding, you can build a fully-polyphonic noisemaker that spits out 96 simultaneous pitches, all in tune, for under $10. That’s pretty amazing.

And of course, I’ve already built a small device based on this code, but that’s a topic for another post. To be continued.

39 thoughts on “Ask Hackaday Answered: The Tale of the Top-Octave Generator

  1. The artwork is great, as it usually is. But… can anyone explain to me what is going on there? Did Howard the Duck play an electric organ? Is there some cultural reference there that I am ignorant about? Help me, I am curious!


  2. The illustration is great, as usual. But.. What is going on there?

    Was Howard the Duck into playing electric organs?
    Is this some other cultural reference that I am ignorant of, maybe a perfomer who dresses/dressed up as a duck?

    Help me, I’m curious!


    1. Some (many) faster CPUs have an instruction cache (and cache misses can ruin your day), and/or interrupt latency can be problematically huge.

      Starting with a 20MHz CPU where every instruction takes guaranteed fixed time is sometimes an easier solution.

  3. I seem to recall that 2MHz was slightly under the desired frequency. Don Lancaster certainly covered this in one of his articles about electronic music in the seventies, probably in both his TTL and CMOS Cookbooks.

    I did build programmable divider to play with thIs, using an ft-243 crystal. But then top octave generators arrived, and no need to build the equivalent.


    1. Hi Michael,

      I played around with different base clock frequencies, and 2 MHz seemed to work best in terms of pitch accuracy:

      Note Divider Actual Freq Ideal Freq Offset
      Hz Hz cents
      C8 239 8368.2 8372.0 -0.8
      B7 253 7905.1 7902.1 0.7
      A#7 268 7462.7 7458.6 0.9
      A7 284 7042.3 7040.0 0.6
      G#7 301 6644.5 6644.9 -0.1
      G7 319 6269.6 6271.9 -0.6
      F#7 338 5917.2 5919.9 -0.8
      F7 358 5586.6 5587.7 -0.3
      E7 379 5277.0 5274.0 1
      D#7 402 4975.1 4978.0 -1
      D7 426 4694.8 4698.6 -1.4
      C#7 451 4434.6 4434.9 -0.1

      Worst case error was +/- 1 cent.

      If you have a better base clock frequency, I’d love to try it.

      Ag Primatic

    2. You have a great memory, Michael! I searched through the back issues of Popular Electronics, and found the articles by Don Lancaster from 1974(!) that you may have been referring to.

      This article describes the equal tempered scale and the first mention of 8-bit and 9-bit divider values (called Series A and Series B in Table II in the article):
      page 39

      This article describes building a tone generator with a 1.94204 MHz clock, which I’m assuming is the “under 2MHz clock” you were recalling:
      page 50

      If you divide the 1.94204 MHz clock by 2, the ratios fit into 8 bits. The downside is that the pitch accuracy is +2.5/-2.8 cents, not as good as the 2 MHz version accuracy of +1/-1.4 cents, but probably still acceptable.

      Also, I’m not sure where to source the 1.94204 MHz crystal (or its multiples) from. It doesn’t seem to be a standard value.

      Ag Primatic

      P.S. Don has an error in one of his 8-bit divider values. The value for F7 should be 174, not 176. Using 176, the pitch error would be -22 cents! Maybe I’ll send a correction to Popular Electronics….

  4. Exceptional Hack! I tried for a while to do this without an ISR using a similar jump table between nearly-identical portions of code. However, this is a MUCH cleaner solution that doesn’t take a bunch of hard-coded unrolled loops. Eventually I ran out of time and wrote this off. However, it’s incredibly refreshing to see that someone else had a better approach that I could learn from. Kudos!

    One minor optimization that you can do (if you’re so inclined) is that you can use 2 of the 8-bit hardware PWMs in the processor to generate 2 of the odd divider clocks (/239 and /253 are ideal since they fit in a single byte). From the summary in the article this means that you’re down to creating the 10 remaining pins via your software methods. That means you can get the same perfectly stable results with the stock 16MHz oscillator and no hardware changes to get all 12 signals (2 via PWM and 10 via software).

    Great Job! I’m also glad I wasn’t the only one to lose some sleep over this as well!

    1. I also tried initially to do this without an ISR. It was a painful exercise in clock cycle counting and trying to balance all paths through the code to be equal. The ISR made it so much cleaner, and it only costs a few extra clock cycles to enter and exit.
      I may be inclined to take you up on your optimization so we can all use a stock 16 MHz Arduino. Great idea!

      Ag Primatic

      1. So I coded up your optimization of running 2 of the outputs directly from the unused timers and the other 10 via software, and it worked! With a 16 MHz clock, I can get all 12 outputs now.

        Thanks for your suggestion, EMControl!

        Ag Primatic

  5. On a Propeller, generating 12 symmetrical clocks would be a matter of starting 6 cogs and letting each cog starting both of its Numerically Controlled Oscillators with different parameters. Once the clocks are running, it won’t even need to execute any code anymore. Just saying.


      1. I think a $0.50 ATTINY816 with a 20 MHz crystal is the lowest cost solution for a top octave generator. All the other suggestions seem to be much higher in cost.

        Mentioned so far:
        Teensy 3.2 $19.80
        Teensy 3.6 $29.25
        Propeller $4.47

        Plus, the ATTINY816 is available in a 20-pin SOIC package (tiny, but still hand-solderable), and it’s 5V compatible.

        1. Slightly disingenuous. You’re comparing the price of discrete components (without the PCB that will be needed) with that of development boards. How much does a low-end ARM chip (from Microchip, Atmel, or STM, for example) cost? I can understand if you don’t want to bother with the research, but then don’t quote apples-and-oranges prices.

          1. Propeller is a $4.47 IC in 10K quantities, not a board. So that’s an apples to apples comparison.
            Teensy 3.2 uses a MK20DX256VLH7 processor that’s $4.85 in 10K quantities.
            Teensy 3.6 uses a MK66FX1M0 processor that is $10.59 in 10K quantities.

            You are welcome to provide data on your suggestions. Your target is $0.50.

          2. I didn’t say that there was a cheaper solution, and I doubt that there is. I was only contesting your 40:1 to 60:1 ratio between an ATtiny16 and an ARM.

          1. Here’s what I see as a problem with the PIC16 architecture in this application. Even though the PIC clock can run up to 32 MHz, the minimum instruction cycle time is 125ns. This is equivalent to an 8 MHz AVR processor. I doubt that this algorithm will function with only 8 instructions in the inner loop, even before trying to code it in PIC assembly.

          2. That PIC would be perfect. After initialization, zero code overhead.

            Ag Primatic – Those peripherals can be setup to take an external input pin and divide it down by a user selectable scaler with a user selectable offset. So it doesn’t matter if your main system clock is operating at 32kHz.

            I would love a hackaday highlight or competition where projects use microcontroller peripherals instead of throwing clock cycles at problems.

          3. 10 HW PWMs is very close to 12, but I’m sure that some people would miss the remaining two notes per octave… Are you suggesting implementing the other two in firmware? Have done it it?

            Price isn’t super-critical here, but Ag’s solution is definitely cheap and cheerful.

          4. Hmm. From the data sheet, here’s the list of timing-related digital peripherals for the PIC16F15344:

            • Complementary Waveform Generator (CWG)
            • Two Capture/Compare/PWM (CCP) module
            • Four 10-Bit PWMs
            • Numerically Controlled Oscillator (NCO)

            From which set of peripherals would you get the 12 divided-down frequency outputs with zero code overhead?

  6. Equal Tempered is not is in tune with it self. Other temperaments should be easier to compute in fewer steps. The irrational number difference of ET tuning is a mess. If you are going this far you should have several temperaments available just like most modern keyboards and organs. Then there is Arabic tuning etc.

    If this temperament thing is puzzling look it up and hear the difference. has something like 3000 tuning scales of the whole musical world.

    1. This is an issue with all synthesizers that derive their pitches from a single oscillator. The bottom line is, the higher the frequency you’re deriving the pitches from, the closer to the irrational ideal you can get.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.