Debugging With Serial Print At 5333333 Baud

Debugging with printf is something [StorePeter] has always found super handy, and as a result he’s always been interested in tweaking the process for improvements. This kind of debugging usually has microcontrollers sending messages over a serial port, but in embedded development there isn’t always a hardware UART, or it might already be in use. His preferred method of avoiding those problems is to use a USB to Serial adapter and bit-bang the serial on the microcontroller side. It was during this process that it occurred to [StorePeter] that there was a lot of streamlining he could be doing, and thanks to serial terminal programs that support arbitrary baud rates, he’s reliably sending debug messages over serial at 5.3 Mbit/sec, or 5333333 Baud. His code is available for download from his site, and works perfectly in the Arduino IDE.

The whole thing consists of some simple, easily ported code to implement a bare minimum bit-banged serial communication. This is output only, no feedback, and timing consists of just sending bits as quickly as the CPU can handle, leaving it up to the USB Serial adapter and rest of the world to handle whatever that speed turns out to be. On a 16 MHz AVR, transmitting one bit can be done in three instructions, which comes out to about 5333333 baud or roughly 5.3 Mbit/sec. Set a terminal program to 5333333 baud, and you can get a “Hello world” in about 20 microseconds compared to 1 millisecond at 115200 baud.

He’s got additional tips on using serial print debugging as a process, and he’s done a followup where he stress-tests the reliability of a 5.3 MBit/sec serial stream from an ATMega2560 at 16 MHz in his 3D printer, and found no missed packets. That certainly covers using printf as a debugger, so how about a method of using the debugger as printf?

28 thoughts on “Debugging With Serial Print At 5333333 Baud

    1. If you want a trace of parameters or functions executed and in parallel there is a _hard_ real-time requirement for the function.

      Then if you would slow it down the real-time requirement may not be met and by definition of hard real-time, a catadioptric thing will happen.

      In all other cases package loss or slow down may be feasible, too at low speeds. Obviously one can also use a better compression – printf style of tracing is not efficient. As the fmt part could be known to the receiver already. Thus could be replaced with reference and compressed.

      1. Yup. I have built some pretty elaborate systems for debugging real-time systems including cranking the baud rate up, using something like the ft245 or ft240x for parallel output, and writing macros to dump specific data types in conjunction with comments in the code that are extracted by a trace script and used to translate the byte stream to format from the extracted comments. I have had an opportunity yet to properly extend these tricks to work with Cortex SWD, but they should work even better given a chip with an ETB. The tricky part I haven’t looked into yet is how to extract the SWD stream from whichever manuf’s IDE I’m stuck using…

        1. Don’t bother with all the elaborate systems. Just waggle a PIO instead, and then use that (or several waggles) to narrow down the bug location. This takes close-to-zero time (thus not impacting the RT). Then and only then, you can use printf (or breakpoint/debugger) at leisure.

          1. I do a ridiculous amount of that too. Both subsystems together (especially with a scope decoding serial in sync with the pin traces). The complexity of that particular project makes one or the other on their own insufficient.

      2. printf debugging is a compromise: it’s easy and only need a USB/serial to monitor, but it takes some time to compute the string and send the output.
        Using a dedicated software to decode error number and display is way faster but is so boring to write.
        I tend to also use a “custom” light printf (no float obviously), sent to serial port at fast as possible (1Mbaud mininimum). Then I can keep all the the debug output without impacting real time requirement (like hardware and device read/write)

        1. I know, it has all been done before. I made of version of what you are showing in 1981, It looked almost exactly like yours but in i8085 assembler.

          The point I was trying to make was – Simplify the embedded side as much as possible and have the USB-uart handle the complexity of speed and buffering

          I believed this was an original idea – I am aware that people wrote assembler uart routines before 1981 so I am not claiming fame for that.

          I really do not want to steal your fame or glory I was just hoping this could be of inspiration to some one else. AND “I was impressed/surprised by my findings”

          Even if I had red your two posting I do not see the point, on the contrary you might have copied my unpublished work from 1981 ;-) (sorry could’t help it ;=)

      1. It is not the high baudrate that makes this great, it the low overhead, if you save/restore the necessary register you could use _dprintc() from within an interupt routine. When did you last get away with printing from a interrupt routine ? Believe me I have seen many try ;-(

        1. To print from interrupt routine, I usually use delayed-processing printing. The print function itself just saves the pointer to the (const char) string, plus 2x 32bit arguments. Do the printf processing later in main. Works a treat, and does not delay the ISR even if your UART is 9600baud (but you may run out of buffer space).
          Gets easier if you have an RTOS under you, too, as the printing can be done by a low-priority thread.

  1. Uhm, the ATmega2560 can run its four UARTs at up to fosc/2 without any complicated generated bit banging code. So you could fart out 32 Mbaud @ 16 MHz if you like, assuming you have software to reassemble the four parallel lines into a single one. Certainly at least 8 Mbaud

  2. You can also sometimes use an SPI port to generate UART-compatible data, and SPI ports tend to run faster than UART peripherals. The FTDI high-speed (FTx232H) converters can do 4,6 and 12 Mbaud (possibly also 8, can’t remember the divider constraints offhand).
    As mentioned above, for debug output it’s not about the total throughput, but being able to chuck debug data out with minimal impact on the timing of the code.
    Of course serial decode on a scope is also very useful for looking at debug data in real time, and its relationship to other events – I did a video on this a while ago : https://www.youtube.com/watch?v=EdfHzpEKtZQ

  3. Of course an important question is: what is the actual throughput? It’s nice to be able to generate characters fast but if there’s a lot of time between characters, it might not help you that much.

    As it happens, I just created a serial output module for the Propeller (http://obex.parallax.com/object/870) which can work at up to 8 Mbps, but the actual throughput for a nul-terminated string is “only” about 500,000 characters per second. And printing a signed decimal number takes about 224 microseconds, worst-case, including binary to BCD conversion.

    ===Jac

  4. Hi to all. Nice code from StorePeter. I think the most important thing is: 1) no overhead 2) no uart/spi etc involved 3) Tx of char is done when the code is done. 4) Leave it in your code forever so debugging – then it has “no” impact on timing bq it’s part of the application 5) if possible dont convert and send floats etc,. Just single chars. You still have 256 to choose among.

  5. Oh, good old ’90 style debugging.
    Now we use tools like segger RTT, ETM and so on.
    And don’t use string please if you need realtime debug. Any call to printf kills it anyway and strings eat a lot of space for little to no information, use binary compressed traces.

  6. I agree with mac. Segger rtt is lightyears ahead of this.
    Why? The overhead to do the serial out will kill a hard realtime system.. 20us? I just missed my ble transmit window TYVM
    1 or 2 us is more tolerable.

    RTT uses background memory accesz to read and write the dsta
    . so the tx has zero overhead. Format conversion still costs tho so be careful

  7. Unless I am mistaken, 1 data bit != 1 baud in serial communication. So while your “Hello world” times aren’t visible, a rounding error if you will, it felt like you are implying a 5.3Mbps data rate and that stop bits or optional parity bits are being ignored.

  8. My isrs(avr) take less than a micro so this will ruin my program. Interesting idea but still not good enough for some programs integrating with existing hardware. As it is I monitor and log data in realtime using LA on a scope to analyze later.

Leave a Reply to ThorstenCancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.