Developed On Hackaday: Sometimes, All You Need Is A Few Flags

The development of the Hackaday community offline password keeper has been going on for a little less than a year now. Since July our beta testers have been hard at work giving us constant suggestions about features they’d like to see implemented and improvements the development team could make. This led up to more than 1100 GitHub commits and ten thousand lines of code. As you can guess, our little 8bit microcontroller’s flash memory was starting to get filled pretty quickly.

One of our contributors, [Miguel], recently discovered one compilation and one linker flags that made us save around 3KB of Flash storage on our 26KB firmware with little added processing overhead. Hold on to your hats, this write-up is going to get technical…

Many coders from all around the globe work at the same time on the Mooltipass firmware. Depending on the functionality they want to implement, a dedicated folder is assigned for them to work in. Logically, the code they produce is split into many C functions depending on the required task. This adds up to many function calls that the GCC compiler usually makes using the CALL assembler instruction.

This particular 8-bit instruction uses a 22-bit long value containing the absolute address of the function to call. Hence, a total of 4 flash bytes are used per function call (without argument passing). However, the AVR instruction set also contains another way to call functions by using relative addressing. This instruction is RCALL and uses an 11-bit long value containing the offset between the current program counter and the function to call. This reduces a function call to 2 bytes and takes one less clock cycle. The -mrelax flag therefore made us save 1KB by having the linker switch CALL with RCALL instructions whenever possible.

Finally, the -mcall-prologues compiler flag freed 2KB of Flash storage. It creates master prologue/epilogue routines that are called at the start and end of program routines. To put things simply, it prepares the AVR stack and registers in a same manner before any function is executed. This will therefore waste a little execution time while saving a lot of code space.

More space saving techniques can be found by clicking this link. Want to stay tuned of the Mooltipass launch date? Subscribe to our official Google Group!

26 thoughts on “Developed On Hackaday: Sometimes, All You Need Is A Few Flags

  1. I’m not an expert in these things, but I wonder if your development is going on for an unusually long time. With indies I hear it’s very fast and you need the funds quickly to sustain overall effort. Perhaps your beta would’ve been released as a product if developed by someone working on it alone. Or maybe it’s a part-time job. Anyway, curious. Neat that you’re talking about developing it.

    Any particular reason to optimize your mcu rather than starting with a bigger one?

    1. Well there’s an enormous amount of development on the software side of things as well (chrome plugin, python scripts for bundle generation), and now that we have beta testers we’re changing a few things as well.
      All of us are developing during our spare time.
      Main reason we stick with this MCU is arduino compatibility. Anyway until now it has proven good enough for our use..

  2. I saved lots of space by changing:
    void setLED(bool on) {
    if (on) {
    P1OUT |= 1;
    } else {
    P1OUT &= ~1;
    }
    }

    Into two calls:

    void setLEDOn() {
    P1OUT |= 1;
    }

    void setLEDOff() {
    P1OUT &= ~1;
    }

    The first one typically isn’t inlined and it has to pass an arg. The second was always inlined. This depends on the usage of setLED of course, in my code it was always setting the LED to a static state; either true or false. Sometimes more is less!

      1. Just looked at the github and it looks like you’re using the Arduino IDE.
        Arduino builds libs in a funky way that can lead to problems with lto.
        http://nerdralph.blogspot.ca/2014/07/gcc-lto-call-graph-generation.html

        Using Ino (inotool.org) would give you more control over the build process. And it would be easier to use gcc 4.9.1 vs trying to integrate it with the Arduino IDE. I’m pretty sure the 1.5 beta Arduino nightly builds are still on 4.8.1 (possibly with a couple patches).

      1. yeah i was more referring to the pc relative nature of the code gen vs relax , which isn’t what the article was referring too,. i’d assume the non working code is just too far for a relative jump and should be a warning anyway. i don’t see any specifics in the gcc manual otherwise, there might be some bugs related to it though. add more pragma/attributes etc that i haven’t seen.

  3. if no one minds a bit of code golf (ha ha).. if i were tight on code/data space here are some of the things i’d consider after premature optimisation was dealt with, each is on a case by case with careful testing to see if plus or minus, not just globally changed with no checking (as in internet coding) and i did just glance over bits of code while reading coffee and ordering 80/20, but these are also general tips.

    a bunch of true/false flags in 8 bits, use bitfields instead where possible.
    lots of temp buffers, where possible and reentrant/race conditions aren’t going to happen consider a global buffer with indirect accesses, this is two fold in that it reduces stack usage and can reduce code size if implemented correctly, if you’re able to use naked functions with no stack setup for instance, maybe some relative addressing gains
    move single globals into structs, that’ll give you relative addressing.
    volatile can be costly, a lot of optimisers will just stop prematurely when they see one, so consider that
    static isn’t const, make sure if the data is ready only, that its const (unless you’ve got more ram than codespace) same goes for pointers or arrays of points,often people forget to make both sides of a pointers to pointers const

    1. Hey Charliex,

      We actually talked about using a global buffer but for the moment we prefer using temp buffers as we have a dedicated function monitoring stack usage and prefer code clarity :).
      Would you have some literature on the benefits of const vars in function calls? That’s quite an interesting topic.

      1. const vars is a misnomer, i’m taking about read only data (typo as ready only in my previous post)
        so if you have a data struct that is filled in and read only, just marking it static only changes its scope, it doesn’t move it out of RAM , again coming down to whats more expensive for you ram or rom.

        so static const vs static for predefined RO data arrays, for pointer arrays const * const ptr; vs const *ptr;

        I do disagree on the code clarity of using a shared buffer vs local temps, you can keep code readability since instead of fixed local auto array, its just an aliased pointer the accesses all look the same. It’s also just as simple to add overwrite guards to a global buffer vs an auto stack one, since if you kill the stack, you kill the execution and limited recovery.

        You can still use monitoring but its also making you consider how much auto space you need, embedded devices obviously run in a limited space so you do want to have all your data has to be fixed sizes, since that’s the case you can pre allocate in the global buffer, you can’t go over the max anyway.

        It does make it harder in terms of making sure you can be re-entrant or multi threading etc, but if its all single threaded simple code, its not an issue, adding code guards takes care of that.

        most if not all of this (including relax) is covered in the avr-gcc notes .

        http://www.atmel.com/images/doc8453.pdf

        http://www.tty1.net/blog/2008/avr-gcc-optimisations_en.html

          1. The anim.c code is a work-in-progress and does not represent the rest of the code. Those static structures will become media files in the SPI flash.

            The AVR aligns to 8-bits.

  4. Can you store each function just once in the code then use much shorter tokens that point to the location of each function? For any function used 2 or more times that would save space.

    Texas Instruments used that method in their Extended BASIC.

  5. If the project is hitting the limits of available resources in the MCU developing the stock firmware you are going to make updates and fixes difficult.

    Also, being a device that is meant to be open and hackable, you will all but block those who want to add their own features to the stock firmware since there is no flash space left for them to use.

    Definitely need a bigger MCU.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.