What Could Go Wrong? I2C Edition

I should really like I2C more than I do. In principle, it’s a brilliant protocol, and in comparison to asynchronous serial and SPI, it’s very well defined and clearly standardized. On paper, up to 127 devices can be connected together using just two wires (and ground). There’s an allowance for multiple clock-masters on the same bus, and a way for slaves to signal that the master to wait. It sounds perfect.

In reality, the tradeoff for using only two wires is a significantly complicated signalling and addressing system that brings both pitfalls and opportunities for debugging. Although I2C does reduce the number of signal wires you need, it gets dangerous when you have more than a handful of devices on the same pair of wires, and you’re lucky when they all conform to the same standard. I’ve never seen twenty devices on a bus, much less 127.

But still, I2C has its place. I2C was designed to connect up a bunch of slower, cheaper devices without using a lot of copper real estate compared to its closest rival protocol: SPI. If you need to connect a few cheap temperature sensors to a microcontroller (and their bus addresses don’t clash) I2C is a great choice. So here’s a guide to making it work when it’s not working.

Physical Layer

Image from Sparkfun's I2C Tutorial: https://learn.sparkfun.com/tutorials/i2c
Image from Sparkfun’s I2C Tutorial

Let’s start off by looking at the wires, because that’s where a surprising number of glitches and complications can creep in. Whereas SPI gets by with the minimal amount of protocol overhead but pays the price in a florescence of wires, I2C only requires you to lay down two tracks: one for the clock (SCK) and one for data (SDA). There is a price for this simplicity when interfacing systems that run at different voltages.

I2C devices can also be constructed with a single transistor per line, because the two lines are pulled up by an external resistor (or resistors). This sounds good, but can cause problems with high-speed signals and high-capacitance lines.

So let’s get down to details.

Pullup Resistor vs. Parasitic Capacitance

The problem with relying on pullup resistors is “parasitic capacitance”. While we normally think of a capacitor as being made of two large conductive plates with a dialectric (or air) between them, the same charge-storage capacity exists between two parallel wires as well. And this means that there’s some small capacitance between the I2C signal lines and the PCB’s ground plane, or any other adjacent signals for that matter.

i2c.schThis parasitic capacitance means that the voltage level on the signal line (data or clock) can’t change instantaneously, but some finite current needs to be pumped into the wire to charge up this accidental capacitor. With SPI and asynchronous serial, this is not much of a concern because the high and low voltage levels are both driven by transistors on board the chips in question. Parasitic capacitance matters a lot in I2C because the devices pull down the signal lines, but they’re pulled back up by resistors (or a constant-current source in higher-performance designs).

This means that, while a high-to-low transition can be next to instantaneous, a low-to-high transition will always take some time as the line charges back up. This means that the signal lines can’t be too long, to keep parasitic capacitance at bay. 400 picofarads is the maximum, according to the standard. The optimal choice of pullup resistor varies with line capacitance, desired speed, and the strength of the transistors in the various devices, but 4.7 kOhms is normal.

scope_15Here’s some up-close shots of an I2C conversation. (We’ll discuss the protocol below.) Note that the downward transitions are nearly instantaneous, but it takes some time to rise back up to VCC. If these transitions needed to be faster, a smaller resistor could be used for the pullup, with increased power consumption in the bus being the negative side-effect.

See those tiny up and down spikes on the data line that occur in time with the clock (and vice-versa)? That’s the effect of another parasitic capacitance that couples the clock and data lines together. It’s not enough to change any of the digital values, and it decays back down pretty quickly, but in a more extreme form this could cause problems too.

Keeping the lines short and minimally coupled to each other and their surroundings is the cure. If you’re running other data lines over/under the I2C lines, arrange for them to cross seldom, and at 90-degree angles. A long wire with a weak resistive pullup is almost as good as an antenna.

Bidirectional — Voltage Level Conversion

The data line (SDA) is necessarily bidirectional: both the master and the slave ICs have to read its voltage. But when you’ve got chips that work at different voltages, the line has to be pulled up to a high voltage on one side, and a lower voltage on the other.

Fortunately, there’s an application note for that (PDF). The standard solution is to use one MOSFET per line and abuse the intrinsic body diodes to pass signals through from the high side to the low side. It’s a nice hack, and you’ll enjoy reading it. You can buy these simple converters pre-made from most hobbyist-friendly shops, or directly from the Far East.

The Protocol

So far, I2C sounds only a little bit tricky, but the truly tricky bits all take place at the protocol level. To start out, the data line is always set up when the clock line is low, and can be read out any time the clock line is high. (This definition eliminates the phase/polarity issue with SPI, and the voltage-inversion problems with async serial.) There are two exceptions to this rule: if the data line transitions high to low with the clock high, it’s a start signal, and if data transitions high during a high clock, it’s a stop. Every transaction starts with a start signal and ends with a stop. This, in principle, lets every device on the line know when a conversation is underway, and tells them not to begin a conversation of their own until they see a stop.

Image Marcin Floryan. https://en.wikipedia.org/wiki/I%C2%B2C#/media/File:I2C_data_transfer.svg
Image: Marcin Floryan

Because all of the devices share the same two wires, they need addresses. Additionally, the slave devices need to know whether they’re supposed to send or receive data on the single wire. The first byte of any transaction, after the start signal, is a seven-bit address and a single direction bit that tells the slave whether it’s expected to read or write data. For instance, with an LM75 temperature sensor, you can read the two-byte temperature value by sending the device’s address ($48 in this example) combined with the last bit set to one — to read the data. The master then sends two bytes worth of clocks, during which it receives the data, and then sends the stop signal. Well, with one more complication.

Transaction detail from Maxim's LM75 datasheet
Transaction detail from Maxim’s LM75 datasheet

Every byte sent on the I2C line is acknowledged (positively or negatively) by the party (slave or master) that just received the last byte. It’s purpose is for the device to say “OK, I got that, go on”. So when the master is reading bytes from the slave, the master acknowledges each one before the next byte is sent. If the master is writing bytes to the slave, it’s the slave that has to acknowledge. A low voltage level, pulling the data line down, is the positive ACK signal, and it means that everything’s alright.

Things are more complicated with the negative-acknowledge (NACK) signal. Depending on context, it can mean “I’m not ready” or “that didn’t make sense” or even “I’m done receiving now”. Because of this multitude of possible meanings, one has to read the data sheet carefully to see when reading data from slaves because the master needs to know when to ACK and when to NACK.

For instance, with the LM75 temperature sensor in its easiest mode, the master sends the address and a read bit, and then reads two bytes back. Since the master sent the address, the slave acknowledges to say it’s ready. Then the slave sends a byte, which the master acknowledges with an ACK. Then the slave sends the second byte, to which the master responds with a NACK and a stop signal to say it’s done.

Master: Start Signal
Master: Address + Read Bit
Slave:  ACK
Slave:  First byte of data
Master: ACK
Slave:  Second / last byte of data
Master: NACK
Master: Stop Signal

Wrapping up the protocol, there’s also multi-part communications. If the master needs to send a byte to the slave, and then read some bytes back, for example, it needs to first send the address with a write signal and then re-send the address with the read signal. To signal the change of mode, the master sends the so-called “repeated start” signal, which is just a regular start signal when there’s been no preceding stop. Again, here’s an example with the LM75, first telling the LM75 that is wishes to read the high temperature setting, and then reading the two bytes back.

Master: Start Signal
Master: Address + Write Bit
Slave:  ACK
Master: Set the high-temperature register for reading
Slave:  ACK
Master: (Re-) Start Signal
Master: Address + Read Bit
Slave:  ACK
Slave:  First byte of data
Master: ACK
Slave:  Second / last byte of data
Master: NACK
Master: Stop Signal

Note that in both of these examples, when the master is reading an arbitrary amount of data from the slave, the last byte is “acknowledged” with a NACK. This doesn’t mean that it missed a byte, but instead signals the slave that it won’t be requesting any more data. It’s an “all done!” signal. If you’re having trouble with the second transmission after power-on, have a look that you’re NACKing at the end, otherwise the slave can get confused.

Debugging Help

This whole business with ACK and NACK can get tricky, and it’s definitely one source of bugs. On the other hand, the acknowledge bit provides a bit of information (tee-hee!) about how the communication is going.

Bus Pinging

The first thing to do when debugging an I2C system at the protocol level is to make sure that the slave device is there and listening. Since the slave is supposed to respond with an ACK after hearing its address on the data line, this can be used to scope it out. This should be your first stop in debugging an I2C setup: if the slave doesn’t even acknowledge its address, you’ve probably got wiring troubles.

bus_scanBecause there are only 127 possible addresses on a regular I2C bus, this can be generalized to check all the devices on the bus. An I2C bus scan simply goes through each address, and prints out which addresses lead to an ACK. This is a great check to verify that each device’s address is what you think it should be. A bus scan and verification probably belongs in the power-on self-test of any serious I2C system. Modern I2C chips also have a chip ID field that can be read out. Use ’em if you got ’em.

Check ACKs

I’ll be the first to admit, I usually throw away the ACK bits when all is working well. When an I2C setup is behaving, it just wastes CPU cycles to test it with every byte. When things aren’t working, on the other hand, the ACKs are a treasure. Tracing through a non-working multi-byte communication by reading the ACKs lets you know when things went south.

If you’re using an AVR chip, have a look at the datasheet for their “two-wire interface” (which is I2C, but doesn’t infringe on Philips/NXT’s trademark name). The hardware I2C module has a status code for almost every step along the way, including testing for expected ACKs and more. I’m sure other chips do something similar. If you’re having troubles, check status codes after each byte and compare it with the signal diagrams in the device’s datasheet. You’ll get it straightened out.

Odds and Ends

I2C, as described so far, is pretty complicated, with a lot of hoops to jump through and opportunities for bugs to creep in. It gets worse.

Address Collisions

While it sounds cool to have an address space that’s able to accommodate up to 127 devices, the reality is that any I2C bus will probably only have a handful connected up. There are two reasons for this. The first is that each device added to the bus brings along a little parasitic capacitance. The second is that device addresses aren’t fully flexible, and they can even conflict across different ICs.

Three Address Pins: Maxim's LM75 Datasheet
Three Address Pins: Maxim’s LM75 Datasheet

To take the LM75 example again, all of the chips have a hard-coded base address of hexadecimal 0x48 when they come from the factory. They also have three pins that, when grounded or pulled up to the supply voltage, let one specify up to eight possible addresses by adding the binary value of the three lines to 0x48. This means that you can only have eight LM75s on your bus at a time. Period. Not 127.

And you can’t use different chips if their addresses overlap, either. NXP’s PCF8563 RTC chip has a read and write address, both of which conflict with two of the eight addresses selectable for a 24C32 EEPROM, for instance. Many years ago, Adafruit started keeping track of these addresses and making a master list. Unfortunately, it looks like they gave up a long time ago as well. You’re on your own to read the datasheets and avoid address collisions.

A possible solution to the addressing problem is extended 10-bit addressing, which was adopted as a standard in 1992, but I still don’t see all that many devices using it. (Maybe it’s me.) The scheme uses a fixed 5-bit prefix (0b11110) followed by two address bits and the read/write bit in the first byte, and then a second byte with the remaining eight bits of the address. 1,024 addresses should be enough for the entire universe of I2C devices, right?

Multi-mastering and Clock Stretching

This is where things get really hairy. Because of the pull-down, float-up behavior of the I2C signal lines, it’s electrically possible for two devices to send opposite signals at the same time. (The low wins.) This, in principle, can be used to extend functionality in two different directions. If a device expects to be sending a high signal on either SCK or SDA, but the line is being pulled low by another device, something is afoot.

Clock-Stretching

If a slave is pulling the clock down, it’s called “clock-stretching” and is a signal to the master to pause until the slave is ready. In principle, the master shouldn’t send until it notices the slave has released the line, and it floats back up. Of course this relies on the master checking for a low signal when it thinks it’s letting the line float high. If you’re using a reputable microcontroller’s I2C hardware, you’re probably in good shape — it will check the clock line. If you’re using some back-alley bit-banging I2C routines that won’t pause when requested, you’re going to lose data.

A more subtle problem can occur when the line simply stalls due to clock stretching. In principle, any slave device can stretch the clock by pulling it low, and they can do so for an indefinite amount of time, and the master must respect this and not send new data until the slave releases the clock line. This can wreak havoc with data on the bus that needs to get out on a schedule, in the best of cases. In the worst case, the entire bus can be effectively DOS’ed simply by pulling the clock line low.

Devices like EEPROMS or flash memories are notorious clock stretchers. They often write data in pages, and writing the data takes a finite amount of time. So they’ll take in 64 bytes (for instance) as fast as you can transfer them, but then they’ll stall when they hit a page boundary and need to write out. Watch out for these chips! Other EEPROMS have double buffers that essentially sidestep this problem, reading into one while they write out the other.

Multi-master Arbitration

The I2C standard allows for multiple clock-masters to take turns on the bus. Ideally, every potential master is keeping track of start and stop signals. When the line is free, it can start its transmission. But what happens when two masters decide to start at once?

They “arbitrate”. If a master tests the line when it’s supposed to be high, and finds it to be low, it’s a sign that the line isn’t free after all. All devices that currently aren’t pulling the line down should stop their transmission. The idea is that before the address is finished, all the conflicting masters should have backed down. Needless to say, mixing devices that are and aren’t multi-master capable is a recipe for disaster.

Solutions

Call me a coward if you will, but my solution to multi-mastering and clock-stretching issues is to avoid them. And that means using multiple I2C bus networks. By far the simplest way to do so is to buy a microcontroller that has more than one hardware I2C controller built in.

Adding a second I2C bus can also help isolate a rogue clock-stretcher that blocks other, time-critical, transmissions. Reserve one bus for the low-latency transmissions, and segregate the clock-stretchers to their own bus. Or if you’re running a mix of chips — some that only run at 100 kHz while others run at 400 kHz or higher — running a fast bus and slow bus will allow you to get the maximum speed out without confusing the slow ones. More busses means more flexibility.

But if you really need to go overboard and control a bazillion I2C devices with the same address on a single bus, the solution is an I2C multiplexer. These are special chips let you run, say, 16 devices with the same address from a single I2C port by selecting among them using some more lines from the CPU. This is a special situation, for special snowflakes.

Troubleshooting Checklist

So as you can see, there’s a lot going on with I2C, and a lot to go wrong. I2C debugging does not compare with raindrops on roses or whiskers on kittens.

  • Go make yourself a calming tea or other beverage before you start.
  • Double-check wiring. Are SCK and SDA mixed up? Are voltage levels right/same?
  • Are you running the bus at the right speed for the slowest slave? And are the pullups strong enough? Verify the speed and waveform shapes with a scope.
  • Can you get the initial ACK from your target device? Do a bus scan.
  • Still failing? Pull all but your target device off the bus and get the initial ACK working. Add other devices back on one by one. This helps find bus-hogs.
  • Once you get the initial ACK working, use these to step through the rest of the transaction, verifying one byte at a time.
  • Are you setting the read/write mode correctly for each transaction?
  • Are all parties ACKing and NACKing when they should? Sometimes you’ll see a glitch between the master releasing the line and the slave asserting a NACK. This is normal, and actually a sign that all’s well.
  • If you switch read/write directions, are you sending a restart and the address again? Does the device support this?
  • Double-check address clashes in the datasheets. Scan the bus and make sure that you see the right number of devices.
  • Are you multi-mastering the bus? Can you avoid doing so easily?
  • If you need to troubleshoot a truly tricky problem, put different-value resistors on the output of all chips on the bus. The low voltage values will be slightly different for each chip, allowing you to see on a scope which chip is talking at any given time, and diagnose when they’re talking over each other.

Good luck with I2C! And let us all know in the comments if you’ve got any other specific troubles or solutions. We got some great responses for the other two articles.

39 thoughts on “What Could Go Wrong? I2C Edition

  1. One place you are likely to find 20 or more devices on one i2c bus is server motherboards with 16 (or 32) slots of ram. Each dimm has an i2c chip with the timing parameters, and often shares an i2c bus with any fan controllers present on the board. Though newer chipsets are more likely to have multiple i2c host controllers present.

    1. I have a server that supports 32 memory Dimms. and there is a i2c management chip on the motherboard (Sun Server) for the 20 fans in the case. The ram is broken up to each processor (4 processors) as well as it’s own management chipset per processor.

      It’s an amazing piece of work, but even the big guys avoid a lot of i2c devices on a single bus

  2. One way to resolve the address collision problem, is to use an I2C parallel output part, and assign some of the output bits to the select inputs of a mux. Since the clock is unidirectional, you can run it through a plain old mux. Now, it does take an extra transmission to select (by sending a mux select byte to the parallel register device) which identically addressed part you want to talk to, but that’s better than not being able to talk to it at all.

    1. Reading the above, perhaps it’s not clear that each identical device gets its clock input connected to one of the mux outputs. The selected device sees a clock, all the other devices see nothing.

      1. Yes, clock can be bidirectional. I don’t think I’ve ever dared to do that. Best I’ve done is two masters at one end…then the mux trick still works as long as it’s between the masters and the slaves.

        You could probably make the mux trick work bidirectionally if you had to, but it was originally designed to discriminate between multiple identically addressed slaves.

  3. The standard solution is to use one MOSFET per line and abuse the intrinsic body diodes to pass signals through from the high side to the low side.

    It’s actually more clever than that. The MOSFET conducts as long as either side (drain or source) is at 0V, and doesn’t conduct when both sides are at the gate voltage or higher. (e.g. look at the GTL2000 datasheet)

    Anyway, I’m curious: do you have anything to say about some of the more esoteric communication protocols such as 1wire?

      1. There are keyfobs used in some electronic locks that does. Dallas Semiconductor as I remember. But they were bought by Maxim. Got a sample ‘key’. It looks like a button cell.
        They even have it in a programmable version – so basically a 1-wire EEPROM
        I haven’t used it unfortunately..

    1. Word. I should have said this in the (already-epic) article. The new super-fast-speed I2C flavors require these, or even push-pull drivers like SPI, on the line to make them work.

      At some point, the extra value of I2C — that it works with dirt-cheap hardware and few lines — gets lost in all of the improvements. (That said, if you _need_ fast I2C, you need it.)

      I stick around 100kHz. Then things “just work” every time, 75% of the time.

  4. One thing that wasn’t mentioned – SMBus devices. They are compatible with I2C but unlike I2C, SMBus devices have limits for both minimal and maximal speed, I2C devices only for maximal speed! If you’re sending clock pulses too slowly, SMBus device will reset itself and you have to start over. Learned this lesson with TC74 temperature sensor after few hours of tearing my hair in despair

    1. Was about to write a post about this. This can cause you some grief on older micros.

      Also some linux kernel gpio based drivers can also run out of spec too, sometimes you just need to hook up a scope to see what’s going on, rather than pulling your hair out.

  5. I’ve done over 42 I2C devices connected to a raspberry pi. Parasitic capacitance will definitely get you, so you need to use a rise-time accelerator like the LTC4311 instead of just pull-up resistors.

  6. Interestingly (to me, anyhow) a “well-known operating-system”‘s open-source drivers for its GPUs’ EDID/DDC (I2C) busses is bit-banged… One would think these GPUs to have inbuilt I2C master-controllers, no? Just load a couple registers and let it go… right? Instead, bit-banging (and *blocking* delays, to boot!)
    There are lots of weird workarounds in there for monitors known to have flakey EDID signals.

  7. Be careful with level shifters, especially at the higher speeds. In some situations, they will stretch the clock via their turn around delay. I recently worked on a project with 1MBPS (fast plus) mode. The master communication ports were 3.3V parts and the slave was a 2.5V part. Some Master devices would run the full 1MBPS and some would only run 800KBPS. It turned out that the slower ones were watching for clock stretching and the level shifters were delaying the clock rise. The slave device was absolutely able to handle the 1MBPS, so no data was lost in either configuration. I spent a long time figuring out why a brand new, expensive com device would not run at the advertised speed.
    The level shifters can also break the arbitration mechanism, because they don’t pull all the way to ground when they are driving on one direction.

  8. My job is easier with i2c devices since I debug them with a digital analyzer of some sort and i2c protocol interpreter. You figure out immediately what’s wrong when you can see address, data and acknowledge bit.
    A frequent trap is a bit shifting of 1 bit in the i2c address (7-bit), 50% of datasheets give you a left aligned address and Murphy does the rest…
    The protocol analyzer worth every cent and pay for itself on the first debug session.

    1. Absolutely agree! Would also like to share some of my experience – in too many cases the digital signals we need to analyse are low speed, so a cheap PC based LA is one of the best investments for hobby and even many tasks in professional engineering (I hope measurement aficionados don’t kill me now). We have some 600EUR PC based, 20 channel LA for our embedded sw department and use it much more often compared to the Tek oscilloscopes with their protocol interpreters. The cheap LA parses more protocols at 8-10 times lower cost, does not block a certified measurement instrument for trivial tasks and analysis can be done more conveniently. And while talking about highest value tools per dollar – LAs and oscilloscopes can’t match the humble bus pirate.

    2. Thanks! I should have mentioned the bit-shift address problem. I actually like to think of the address as being the already-bit-shifted number, and never store the other one anywhere b/c it’s never used.

      But then (for this very article!) I used a new I2C bus-scan routine that returns just the 7-bit number. I had to remember to shift it over again. I’ve done this so many times, that I forgot to even mention it as a gotcha. (Man, I2C is a mess.)

      1. Call me a coward if you will but my solution to the I2C mess is to avoid it.
        And to add insult to injury the FAE insist on pronouncing it
        “I-squaaared-C” with a smug dragged out emphasis on the “squaared” part.

    3. +1 A protocol analyzer will help you with a lot of problems. Also, you can store the captured data and parse it for easier reading. I did a lot of PMBus work which is a variant of SMBus that rides on top of the I2C physical layer with slightly different specs. Parsing the data stream for our parts made it a lot easier to tell what was happening to them and why they did what they did. The Total Phase Systems Beagle analyzer has worked well for me for many years.

  9. that a great article!!

    i wish i had such a lecture BEFORE adventuring in a fairly spreaded I2C bus with a dozen of slaves.. the costant current mirror in place of the “resistor pull-ups” could really do magic.

    my two cents here is about the “SDA stuck” corner case, that i rarely see covered in the “I2C horror stories”.

    BTW it’s a pretty serious situation as it deserves an explicit bullet in official I2C doc, see paragraph 3.1.16 of http://www.nxp.com/documents/user_manual/UM10204.pdf :

    “If the data line (SDA) is stuck LOW, the master should send nine clock pulses. The device
    that held the bus LOW should release it sometime within those nine clocks”

    i made a review of the linux kernel I2C subsystem, and indeed it’s definitely present as “library” BUT it’s supported “OOB” only in some “driver/HW” combos”. these are surely the SOCs more “ruggedized” against this pitfall, IMHO!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s