Fixing Arduino’s Serial Latency Issues

arduino_latency

[Paul] wrote in to tell us about some interesting Arduino latency issues he helped nail down and fix on the Arduino.

It seems that [Michu] was having some problems with controlling his Rainbowduino project we featured earlier this year, and he couldn’t quite figure out why he was experiencing such huge delays when sending and receiving data.

Searching online for answers turned up very little, and since [Michu] was using Processing, the pair designed a set of tests to see what kind of latency was being introduced by Java. Pitting an Arduino Uno and an Arduino from 2009 against a Teensy 2.0, the tests gauged the latency of native data transfers versus transfers facilitated by Java via the rxtx library it uses for serial communications.

The results were pretty stunning. While both of the Arduinos lagged behind the Teensy by a long shot, their latency values under Java were always 20ms at a minimum – something didn’t add up. [Michu] poked around in the rxtx code and found a mystery 20ms delay programmed into the serial library. It made no sense to him, so he changed the delay to 2ms and saw a drastic increase in performance when transferring less than 128 bytes of data.

The pair’s fix doesn’t seem to affect latency when larger amounts of data (>1kB) are being transferred, but it makes a world of difference when manipulating smaller chunks of data.

For the sake of disclosure, it should be noted that [Paul’s] company produces the Teensy mcu.

42 thoughts on “Fixing Arduino’s Serial Latency Issues

  1. This could not have come at a better time. I have been fighting a serial issue that I could not track down for a a few days this week. I never thought to look into the library, since it is such a basic function I assumed it should be solved.

    I have mixed feelings about the “easiness” of the arduino or similar platforms. It seems like I get to a prototype faster, but the “more difficult” C or other development systems prove much easier to augment or debug and the development time ends up being a wash.

  2. So, forgive my ignorance, but I believe I am experiencing some latency issues with my MaxSonar Arduino range finder setup. could this be to blame? or is this topic limited to arduinos with a computer interface?

  3. Very often serial recieve rutines do work with a buffer eg. like above 128bytes
    If it’s fulll you will get a data recieved interupt and nobody think on this.
    If you recieve less than the buffer size there will be a timeout like above 20mS

    so if you do look for only a few bytes then reduce the buffer size or see if it’s posible to check for every RX chr

    It’s the same problem when you use a USB to RS232 adaptor on a PC … the USB do typical tx when it’s buffer are full like 2k data or when a time out ocours … that’s why it stinks if you like to eg. controll a CNC or something timedepending

    Wiljan

  4. I would bet the delay was put there to solve a bug that may have cropped up on a different platform. Java may be cross platform, but the implementation varies significantly from one platform to the next, in my experience.

    For real work, I wouldn’t use the IDE anyway. I’ve crashed the serial console in the IDE dozens of times. On windows, Putty works reliably and does so for literally months at a time (only gets shutdown for patch install/reboot cycles).

  5. Hi, Paul here… who wrote the native benchmark… Michu wrote the Java stuff and did most of the legwork on this project, so he really deserves most of the credit for all these results.

    Anti-Java rhetoric is really misplaced. It’s terrible native C coding in the widely used RXTX library at fault. This is been present for years, and this lib is very widely used, yet until now nobody seems to have actually tracked down the cause of the bad latency.

    The native tests do show a number of interesting results, probably the most shocking would be that Uno’s latency is actually a step backwards from Duemilanove.

    Well, of course Teensy is much faster, but that would be expected since it’s native USB emulating a USB-serial converter, not slow serial going though one. For anyone who thinks everything from the Arduino IDE is slow, the code running on Teensy in those test is built by the Arduino IDE.

  6. These kind of delays are common in communications, and serve a useful purpose.

    Look up Wikipedia’s article on “Nagle’s Algorithm”, which describes a similar delay used in TCP/IP communications.

    Once you’ve read and understood that, realize that serial communications over USB is also packetized; and then the purpose of the delay will no longer be a mystery.

  7. So this really has absolutely nothing to do with the arduino, and everything to do with finding a bug in the rxtx library which is used by many java apps, correct?

    I’ve used rxtx in some custom java apps to talk to microcontrollers, but didn’t need the speed, so didn’t notice any latency.

  8. @andrew – that usleep(20000) code is inside the RXTX library, which is what Java (either Processing or Arduino IDE, or any Java program) uses when talking to serial devices.

    Any Java program (this testing was on Mac OS-X) using serial ports will experience that 20 ms delay, even when hardware like Teensy can respond in only 1 ms, or Arduino in 2 ms to 4 ms.

  9. @goldscott – yes, the biggest news here is a bug in RXTX. The measurements also show (absent the RXTX bug) considerable differences in response times between Arduino Duemilanove, Arduino Uno, and Teensy. Of course Teensy is much faster, because it’s native USB.

    Perhaps surprising though is how Uno is actually slower than Duemilanove for small data sizes. One of the improvements in Uno is a 8u2 microcontroller which can allow for better latency. This was specifically mentioned as an advantage when Uno was released. But as it is currently implemented, it’s actually worse. Not as shocking news as a “fixme” 20ms delay lurking for many years inside a very widely used library, but perhaps still newsworthy?

    When you actually benchmark these things carefully, these surprising results of less-than-optimal implementations turn up.

  10. Well there are lots of reasons one might put something like that in, and 20ms lies well below the usual 40ms which marks about where you’ll notice something is “laggy”. My assumption is that this was placed in to prevent over-polling on the device. A lot of USB devices will try to avoid interrupting the processor on every single byte received and instead wait until they filled (or partially/half filled) a buffer.

    I know that the FTDI driver in linux actually also features a 20ms time-out (I’ve forgotten the exact amount of time) but we had to change some kernel parameters when building from source to make the delay more negligible on our submarine.

  11. @Paul

    Thanks Paul. The serial latencies are interesting. Are all the Atmel parts running at the same frequency? Same baud rate? How are serial comms handled, interrupts or polling?

    I’m asking because I don’t have any experience with the arduino “engine.”

    Also, is the rxtx software still being actively maintained?

  12. I ran into this a little, when setting up my Maker Faire project. I have about 460 watts of LEDs in a ceiling, configured as 128 24bit RGB pixels all controlled with realtime music visuals in Processing (thanks for the earlier post Hackaday, I’ve been getting plenty of good Processing effects). Processing serializes the pixel color data and sends it to an Arduino Uno, which manages some gamma adjustment and the actual shift register output. I had naively started by sending out individual bytes, but quickly found it to be too slow. That was the 20ms being added to each byte, apparently. Serializing everything into an char array, then using Processing’s ability to send an array out the serial port as a unit, solved the issue. I guess there’s still a 20ms delay at the beginning of the transmission, but it’s not preventing me from getting smooth performance at 25fps.

  13. Packaging “asynchronous” data bytes into bigger chunks (USB messages, DMA buffers, user buffers, etc) for more efficient handling (balancing throughput vs latency, and trying to keep resource contention down) turns out to be a really difficult problem. I have what I consider an ideal implementation in mind, but I haven’t seen a UART vendor implement it :-(

    That there is a delay is rxtx for USB-originated traffic is a bit shocking. I would have thought that the “problem” was one for the USB/Serial chip to solve.

  14. To remove latency you need to do ‘harm’ somewhere else, like in too many interrupts that need to be handled (and require more power on low-power devices), it’s always a weighing what you need and can get away with.

  15. The quoted code snippet is not enough to clarify the purpose of that delay (and I’m really not going to look up the actual library source right now) – but I get the feeling that it either should be there as it is (in which case you’re just making nasty unpleasantess more likely to happen by shortening it), or (more likely) it shouldn’t be there at all and it just masks some other shoddy programming?

    That being said, using a timeout to report received serial data IS the standard practice, as far as I know. The buffer trigger levels (and the exact timeout value) should be configurable (including the “one byte” threshold, when every single byte gets immediately reported = zero latency) in any decent serial library; but other than that, unless you really want an interrupt for every single byte – not really efficient for massive transfers – how else do you expect to handle a non-full buffer?

    My point: the lib should be configurable, and the user should know to optimize for the kind of traffic he expects to conduct – sparse or bulk. There’s a compromise to be made either way. And yes, hardware-to-hardware is a different beast – it doesn’t need to navigate endless message queues to deliver byte by byte, like a PC does if you set it up with zero latency.

  16. The “ideal uart” I had in mind will terminate a buffer whenever it is full, whenever there was an appropriate termination character, whenever there was a timeout, AND WHENEVER The application layer is prepared to read it. It is this last part that is missing; apis will say hasSerialDataP(…) and the API will say NO, because its all being buffering below the API layer. But it shouldn’t. If there are partially complete buffers available, they should be passed immediately upward through the API; after all, if the higher level code is asking, it is presumably able to do something with any data that exists. Having it go off for a scheduler delay and try again is pretty inefficient if it could have done something already. This automagically throttles the process to the load of the running system. If the system is lightly loaded, it might “Pull” data from buffers quite frequentl If the system is more heavily loaded, it might only need to see data that is buffered in the normal places.

    SOMEONE should look at the RxTx library in more detail. My first thought is that this delay was intended for some specific serial hardware (probably no longer in existence) and might not apply to USB/Serial devices.

  17. So what are the code changes that are actually needed? there are 3x usleep(20000); in the code snippet, they say they’re getting 20ms delay, so which one of them needs changing?

  18. I still don’t get how to fix the Serial library, sorry, I’m a bit stupid at “simple” things… Its just that I’m learning so many things at once that my brain is nearly exploding!! ;-)

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.