Finding USB Bugs The Hard Way

Sometimes debugging just doesn’t go the way you want it to. When USB problems arise, you can usually use a protocol analyzer to find the issue causing trouble. For [Paul Stoffregen], it was only the first step in a long process to find the culprit.

Procotol Analyzer

The complaint that came up was from a customer whose 2 port USB hub wasn’t working on their Teensy 3.6. The hub had been tested on Linux, Mac, and Windows, so it made sense to test what was different about the Teensy. Furthermore, all other USB hubs worked on the Teensy. As it turns out, these weren’t the most helpful assumptions to make when finding the bug.

Any protocol analyzer can be used, for instance the Beagle480. The way it works is by passing through USB communication, making a copy of the communication coming in and out, and sending it to the PC.

 

Normally, the analyzer has a small buffer memory and must sustain fast data flow. Unfortunately, this can occasionally cause software lockup. From what could be gathered from the verbose printing, USB descriptors were found for the hub. As it turns out, the faulty hub was a Multi-TT type hub, while most others are single TT (transaction translator).

Fixing Software Lockup

Since it was necessary to get the rest of the descriptor data, fixing the software lockup was the next step. Writing in a panic function – a breakpoint of sorts – into the code allowed the USB host’s power to terminate, and stepping through the program revealed that while the 2 port hub was initially being read, some issue arose afterwards.

As it turns out, the issue relied on USB split transactions, used only between USB hosts and hubs. Communication happens by tokens, which begins with a SPLIT-START token.

 

As it turns out, the issue was that the tokens weren’t being sent in the correct order. The other hubs seemed to be handle this nevertheless. By applying a fix to the C++ code of the bad hub, which had previously not been implementing the data structure for accessing register properly, the hub was able to work again.The hub appeared to be rejecting bad token, which was causing the issue in the first place.

All in all, while I’m sure this had to be a head scratching experience, at least it gives us some insight into the low-level design of USB communication.

16 thoughts on “Finding USB Bugs The Hard Way

  1. >As it turns out, the issue was that the tokens weren’t being sent in the correct order.

    Did the article change after you posted this? Because this is not whats in the linked writeup.
    The issue was a bug in Kinetis K66 USB stack clobbering some config bits and hardcoding Start Split Endpoint Type into Bulk. Nothing to do with correct order of tokens being send.

      1. This made me laugh! Because I read the article, and it felt like “I thingie wasn’t working. We looked at it and couldn’t figure it out. We looked at it again and something was wrong. It works now.”

        The actual write-up is much better. Seems like another case of the Robustness Principle allowing incorrect code to work long enough to let your guard down.

  2. I’d like to run an inline analyser trapping on the sequences that gain access to the USB (memory) stick CPU such as for those devices that claim lower flash size than the devices’ die, as some of the ‘spare’ space is exploited by fishers for personal data off your main system storage then uploaded when ‘virus scans’ operate. From what I read it’s getting far more common, eg I’ve come across an associate who has a very good copy of a 16GByte memstick which is actually 128GByte which oddly too has far more byte traffic than normal to save 16Gbyte, ugh !
    Thanks for post :-)

  3. The bug was software running on the Teensy? I guess a protocol analyzer is a fine debugging tool, but I would personally have done all my debugging within the teensy itself. Sounds like a pain to diagnose either way…I’ve never tried to get inside a USB protocol stack, and I’m glad of it.

    Anyways, I had a very different USB problem. I use USB to send G-code from my PC to my 3D printer (with a crappy atmega arduino), and if a fan turned on or off elsewhere in the house, sometimes that connection would die. It required a manual re-start, usually ruining the print and even posing a potential fire risk because the hotend stalls out, still heated, in the middle of a puddle of PLA. I monitor my prints!

    I switched out the cheap laptop-brick 12V supply on the printer, thinking it was letting noise through. Used a nice modern ATX power supply because it was sitting around. Same problem. Still shooting in the dark, I remembered that my workbench is set up with a bunch of power circuits, so every other wall outlet is on a different AC phase. I put the printer on the same power strip as the PC, and the problem went away. *shrug*

    1. >The bug was software running on the Teensy

      yes, usb stack running on Kinetis chip

      >I would personally have done all my debugging within the teensy itself

      you would never find this bug. Everything looked perfectly fine on the Teensy side. Bug was overwriting couple of bits of hardware configuration very deep down. Those overwritten bits caused Teensy to send Bulk transfers instead of Control transfers in a _very specific configuration_ of Full speed USB 2.0 hub with Low speed keyboard at the end. You need Control endpoint to enumerate a device, no Control endpoint means you cant even see that something is attached. Quirky nature of the bug stemmed from some(most?) USB 2.0 Hubs being idiotproof and fixing buggy transfers on the fly (or more likely being less intelligent with hardcoded functionality overwriting bad bits). One USB hub giving trouble and exposing this bug worked 100% to the spec and passed everything as is without making any assumptions.

      It was a matter of looking at the wire and noticing something synonymous to a web browser trying to connect to port 80 of a DNS server (instead of port 53) with bug being deep inside network card firmware, looking at browser or even network card driver source code would never find it.

      As far as I can think the only other way of finding and solving this without USB protocol analyzer would be using your own USB low speed device emulating the Keyboard – this way you could see both sides of the conversation, what you intended to send and what the device actually received.

      As for cheaper open source Beagle alternatives, seems Travis Goodspeed descended design Facedancer is your best bet https://www.devalias.net/devalias/2018/05/13/usb-reverse-engineering-down-the-rabbit-hole/

  4. I’m not sure why TI calls the EUR2000 Beagle480 a “low cost solution”.
    Is it not simply some Beaglebone (or other Linux SBC) with a std USB port in “promiscuous” mode. Maybe it uses the PRU’s, but I doubt it as those are too slow for 480Mb/s.
    I have not looked further into it, as tools like that are simply above my budget.

    I have successfully captured some low speed (1.5Mb/s) USB with an EUR 6 Cypress CY7 based Logic analyser board and Sigrok / Pulseview and have looked briefly at the Software decoders of Pulseview, which give an amasing amount of detail about the USB signalling (T-states, bit stuffing, checksums, Keep alive and other sorts of packets and much more)

    Looking at USB gives a very “hands on” approach for anyone starting to study low level USB on a low budget.

    1. I think you are confusing the Beagle USB 480 (made by Total Phase) with the BeagleBone, which is a embedded development board / SBC.

      There is a significant difference in hardware here. The Beagle USB 480 uses what looks like an ASIC, and is a completely different product from a completely different manufacturer.

    2. Even though both use the name “Beagle”, the Beagle 480 protocol analyzer from Total Phase is not at all related to the BeagleBoard from Texas Instruments. Beagle 480 came first, in 2007. The first BeagleBoard was “introduced” in 2008 (when they actually were sold is harder to find). The first low-cost BeagleBone Black with PRUs came in 2013.

      Internally, the Beagle 480 uses a USB PHY connected to a FPGA and DRAM buffer chip. I believe a Cypress FX2 chip is used for the connectivity to the machine monitoring the USB traffic. There’s no processor similar to TI’s Sitara. The FPGA does all the heavy lifting.

      In 2010 a man using the name “Bushing” launched a Kickstarter campaign to create OpenVizsla, an open source USB analyzer with essentially the same hardware as Beagle 480. That was in the earliest days of Kickstarter. The campaign is reputed to be the oldest which appears to fail and then finally did deliver years late. Apparently a business partner spent all the money. Bushing worked tirelessly for years to complete the project without funding and finally did manage to deliver working hardware with the help of several corporate sponsorships. The OpenVizsla circuit board is covered with sponsor logos, sort like Nascar. Very little work was ever done on the software side. Sadly, Bushing passed away not long after completing the project.

      Very recently, Kate Temkin has been working on the software side and developing a GreatFET shield called Rhododendron, which is meant to become a relatively low-cost 480 Mbit/sec USB analyzer. Kate’s design discards the large DRAM buffer, which saves cost. Rumor has it NewAE Tech may be working on a USB add-on for ChipWhisperer, which will work with Kate’s software.

      The future for open source and eventually lower cost USB analyzers is looking bright, mostly thanks to Kate picking up the software side all these years after OpenVizsla. Who knows, maybe someone will even someday figure out how to connect a USB PHY to the BeagleBone’s PRUs?

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.