Getting The Lead Out Of The Arduino Runtime

mhvlib_arduino_efficiency_runtime

Ah, the Arduino.

Love it or hate it, there’s no denying that part of its accessibility comes at the expense of speed and efficiency. We honestly like the platform as well as all of the others out there, because we believe that everything has its proper place and purpose. The crew over at Make, Hack, Void think that the Arduino dev boards are well and good, but that the core of the Arduino runtime could use some improvement.

They have taken it upon themselves to dig deep into the code and make some of the improvements that many advanced Arduino users have been clamoring for. Their MHVLib is an efficiency oriented runtime library which works on all AVR microcontrollers, whether they be standalone uCs or Arduino-branded hardware.

They have changed the way that the Arduino handles pin and port information, as well as how object and buffers are allocated in memory. Their code still relies on an Arduino-style bootloader, though they recommend Optiboot since it’s about a quarter of the size of the Arduino version.

There’s a complete list of what has been implemented available on their site, and you can grab the code via their GIT repository if you want to give it a try yourself.

25 thoughts on “Getting The Lead Out Of The Arduino Runtime

  1. Teensyduino has provided highly optimized functions on Arduino for well over a year.

    I’ll take a look and see how this compares. I’m try to git clone it right now, but their server isn’t responding.

    Maybe I really ought to create a benchmarking library (eg, use Timer1 to measure actual elapsed cycles) and publish some hard numbers.

  2. There is a project called “arduino lite”. I’ve browsed the source code and find it is almost impossible to improve the speed of IO manipulation because they used look up table.

  3. It’s not impossible!

    If you have a Teensy board, you can quickly confirm Teensyduino’s digitalWrite is at least twice as fast as Arduino’s. I replaced the lookup tables with a computed jump, and hand-optimized assembly implementation. For non-const cases, it costs a few hundred extra bytes in code size, but the speedup is quite impressive.

    For const input cases, digitalWrite in Teensyduino is implemented as a single instruction! Ben Combee deserves credit for the idea, which he posted in early 2009. I wrote the first working implementation in November 2009, and it’s been in Teensyduino ever since.

    It’s also been sitting, unused, since November 2009 in Arduino’s issue tracker!

    http://code.google.com/p/arduino/issues/detail?id=140

    Teensyduino speeds up the other common functions too, like digitalRead, pinMode, etc.

    It is possible. Much of it depends on difficult inline assembly programming, but difficult != impossible.

  4. Looking over the MHV code, it is indeed very interesting. It does NOT implement the Arduino API, which the Hack-a-Day summary seems to imply, but rather a very interesting alternate API that’s been designed from the ground up to be efficient for the AVR hardware.

    For example, instead of digitalWrite(pin, LOW), you use mhv_pinOff(MHV_PIN *pin). The MHV_PIN is either a struct in RAM with pointers in the non-const case (getting around the PROGMEM table lookups), or a crafty macro which should optimize to a single instruction.

    There’s a huge amount of work in this thing to provide many optimized APIs. It’s very interesting and appears to have a lot of very careful thought into efficiency. I’ll study it more tomorrow.

  5. You can do a lot with the built in functions if you abuse them a bit, like twiddling the ADC manually so you can do stuff while it works but using the other functions while it works.

  6. At the expense of bloat and inefficiency, english is not my first language but that made me read it twice.. Isnt it at the expense of efficiency and whatever the opposite of bloat is (leanness?)

  7. FYI, we don’t depend on the bootloader for MHVlib, you can develop quite happily by uploading via your favorite AVR programmer.

    Next upcoming feature is USB Keyboard emulation via the V-USB library. I’m testing this on the Sparkfun AVR Stick, which does not have a bootloader :)

    1. The normal Arduino code doesn’t depend on the bootloader either. I run regular Arduino sketches all the time without the bootloader. Unless I’m making a board I anticipate will be used by others, I typically can’t be bothered to route lines to the serial port along with the cap and whatnot for the reset line. I just compile and drop in the hex file from the Arduino IDE’s temp directory via AVR Studio.

      I’m not sure why the article states that.

  8. I dunno. I’d be happier if someone had taken the Arduino APIs and put in all the speedups that various people have made over the last few years (and that have been mostly-rejected by the Arduino team.) I’m not really happy with the current trend that implies that “ease of use” requires a C++ class for each piece of functionality :-(

    Doesn’t Atmel have their own “easy to use library” of functions for AVR (AVR Software Framework, or something?) Though I’m not sure I understand why “an experienced developer” wanting higher efficiency needs a library to do digital IO…

    Although Arduino has been shipping with optiboot as the bootloader since the Uno first came out.

    1. The Arduino libraries also provide hardware access in a chip independent way, avr-libc doesn’t. It comes at a speed cost. I rather use avr-libc with register access myself.

  9. Why can’t someone take the time and write a preprocessor for the regular Arduino API.
    It could take the easy to use C++ classes and convert them into C code (since there isn’t any dynamic creation of class instances – which would be nuts on an avr anyway – it should be doable). In addition to that, statements like digitalWrite(5, LOW) could be converted to PORTD |= (1<<5) (I have honestly no idea which pin the digital arduino pin 5 is actually mapped to, this is just an example).
    If you try to create a new API IMHO, this is destined to fail, because people aren't going to rewrite their code only for a bit faster execution time. (at least mos of them won't).
    If they only have to run a preprocessor and afterwards avr-gcc over their code, that would be way easier.

    1. I did exactly that! Well, exactly as in functionally the same. You don’t need to write your own preprocessor, because gcc’s C preprocessor together with gcc’s some awesome builtin functionality is perfectly capable of doing the work.

      All you need is macros or inline functions that turn digitalWrite(pin, LOW) and digitalWrite(pin, HIGH) into single instructions, as long as “pin” is a const to the compiler.

      It’s been sitting unused in issue #140 for almost 2 years.

      http://code.google.com/p/arduino/issues/detail?id=140

      Admittedly, this is a bit trickier on Arduino Mega’s pins which aren’t mapped to I/O registers in the bit addressable space. For those specific pins, a 3 instruction sequence is generated, and it needs to be protected from interrupts (see issue 146)

      Alvaro Lopes did some additional work to deal with those specific pins on Mega, which has also sat unused. At one point it was committed, but then it was reverted.

      These sorts of optimizations have been written. They do work. I’ve published this stuff in Teensyduino for 2 years, and many people have used it very successfully on Teensy boards… usually blissfully unaware than their I/O is happening far faster than it would on an Arduino.

      1. Sounds great, but does it also convert classes?? C++ is not really meant to be used on an 8bit µC. I bet you could at least save some memory converting the class structures to standard C.
        For a better performance it might also be useful to make some functions inline functions / macros.

      2. I have spent many long hours analyzing disassembly of the generated code. The C++ classes are about the same as using C where a pointer to a struct is passed as the first arg. There is very little, perhaps nothing at all, to be gained by automated translation from C++ to C.

        Huge gains are possible by carefully optimizing code itself. The Arduino code has lots of code that’s designed with many goals other than performance. Sometimes macros and inlining can help. Every situation is different. Usually the biggest gains involve substantial redesign of the code.

        The serial ports (eg, HardwareSerial class), as an example, are horribly inefficient because the class stores lots of pointers and variables to allow it to work with all the various ports. If it were redesigned for a separate class per port, most of that pointer dereferencing could turn into compile-time constants. This idea is in the issue tracker somewhere. I don’t recall anyone submitting an actual patch, but if they did, it would likely go unused.

        As a serial port benchmark, Teensyduino supports only a single serial port, since there’s just one UART in the chips Teensyduino supports. The code I provide compiles to a very efficient implementation. I can say that with confidence because I have done the work to carefully analyze the generated code and tweak the C++ side to get a nearly optimal generated assembly output.

        Every optimization is different. But I can tell you, from MANY long hours of carefully analyzing disassembly and crafting many highly effective optimizations to Arduino, that C++ itself is not a substantial problem. The compiler actually is pretty good, given the register and calling conventions and language semantics it must obey. The two ways to make large gains are better code design, or substantial use of hand-crafted assembly (enough to get around the calling conventions and semantics that restrict the compiler).

      3. There is a “fastDigitalWrite” function out there, which is implemented as a huge macro. It will optimize the static write cases to a single instruction.

        But avr-gcc isn’t the best compiler out there for avr. I managed to shave off 20-30% code by converting to asm a few times already.

        C++ and the arduino libraries don’t make it better. There is a reason the arduino’s always use the chips with as much flash as possible.

  10. Paul,

    I love the work you’ve done on the Teensy development, and it looks like you know what you’re talking about.

    If you have feedback or other improvements for MHVLib, please ping me, I’d love to hear what you have to say. I’d be interested in adding support for your Teensy boards too, so if you feel like taking a few minutes to cobble together a new MHV_io_AT*.h, I’m receptive :)

    I’m also thinking of writing a MHVLib/VUSB library capable of taking to your HID Listen program to enable debugging on the MHVBoard without the serial port. Have you considered adding the ability to send data back to to board, to turn it into a generic console app?

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.