Sparkfun Ships 2000 MicroViews Without Bootloaders

microview-fail

Everyone has a bad day right? Monday was a particularly bad day for the folks at Sparkfun. Customer support tickets started piling up, leading to the discovery that they had shipped out as many as 1,934 MicroViews without bootloaders.

MicroView is the tiny OLED enabled, Arduino based, microcontroller system which had a wildly successful Kickstarter campaign earlier this year. [Marcus Schappi], the project creator, partnered up with SparkFun to get the MicroViews manufactured and shipped out to backers. This wasn’t a decision made on a whim, Sparkfun had proven themselves by fulfilling over 11,000 Makey Makey boards to backers of that campaign.

Rather than downplay the issue, Sparkfun CEO [Nathan Seidle] has taken to the company blog to explain what happened, how it happened, and what they’re going to do to make it right for their customers. This positions them as the subject of our Fail of the Week column where we commiserate instead of criticize.

First things first, anyone who receives an affected MicroView is getting a second working unit shipped out by the beginning of November. Furthermore, the bootloaderless units can be brought to life relatively easily. [Nate] provided a hex file with the correct bootloader. Anyone with an Atmel AVR In-System Programming (ISP) programmer and a steady hand can bring their MicroView to life. Several users have already done just that. The bootloader only has to be flashed via ISP once. After that, the MicroView will communicate via USB to a host PC. Sparkfun will publish a full tutorial in a few weeks.

Click past the break to read the rest of the story.

So what went wrong? The crux of the problem is a common one to manufacturing: An incomplete production test. For many of their products, Sparkfun loads a single hex file containing the production test and the optiboot bootloader. The test code proves out the functionality of the device, and the bootloader allows the customer to flash the device with their own sketches. The problem is the bootloader normally connects to a PC host via USB. Enumerating a USB connection can take up to 30 seconds. That’s way too slow for volume production.

Sparkfun opted to skip the bootloader test, since all the pins used to load firmware were electrically tested by their production test code. This has all worked fine for years – until now. The production team made a change to the test code on July 18th. The new hex file was released without the bootloader. The production test ran fine, and since no one was testing the bootloader, the problem wasn’t caught until it was out in the wild.

The Sparkfun crew are taking several steps to make sure this never happens again.They’re using a second ATmega chip on their test fixture to verify the bootloader without the slow PC enumeration step. Sparkfun will also avoid changing firmware during a production run. If firmware has to change, they’re planning to beta test before going live on the production line. Finally, Sparkfun is changing the way they approach large scale production. In [Nathan’s] own words:

Moving from low volume to mid-volume production requires a very different approach. SparkFun has made this type of mistake before (faulty firmware on a device) but it was on a smaller scale and we were agile enough to fix the problem before it became too large. As we started producing very large production runs we did not realize quality control and testing would need very different thinking. This was a painful lesson to learn but these checks and balances are needed. If it didn’t happen on Microview it would have happened on a larger production run someday in the future.

Everyone has bad days, this isn’t the first time Sparkfun has lost money due to a mistake. However, they’re doing the right thing by attacking it head on and fixing not only the immediate issue but the underlying thought process which allowed the problem to arise.

47 thoughts on “Sparkfun Ships 2000 MicroViews Without Bootloaders

      1. The thermostat’s hardware exposes the sys_boot5, which, if driven high, has the device boot from USB. That was “mistake” 1. Secondly, if pushing the button on the thermostat for about 10seconds or so, the device reboots with sys_boot5 being high as well. I believe this was a leftover from some application note/development version. Not exposing the sys_boot pins from the BGA to the user [anywhere on PCB] and fixing that button press issue would have somewhat secured the device [still does not defend against replacing the NAND chip].

        In any case, the folks at Nest Labs have released the means of “unlocking” the device without having to resort to USB booting. See their site for details.

  1. Dat feelin’ when you spot the mistake when it’s too late.. I think most of us have been there at one point or other, it’s hmmm, maybe, but oh wait, nope, uh oh.. realisation kicking in , hair stands up on back of neck and oh crap, floor opens up beneath you.

    Great way to deal with it though. kudos indeed.

  2. Kickstarter – put your money on useless plastic crap. Sparkf**k – waste rest of your pennies on overpriced electronic crap. But then is it really bad if someone makes money on those pathetic makers (not to be confused with hackers)?

  3. I’m actaully very pleased, I ordered a microview without an idea for a project, but now I’ll have two microviews because the first project is ‘burn the bootloader and get it working’.

    I have a few friends and collegues that didn’t know about this, they spent time trying to figure out why their sketches didn’t work. The email to let people know the problem is titled “Project Update #15: MicroView: Chip-sized Arduino with built-in OLED Display! by Geek Ammo”
    Instead of “Microview not working? Read This NOW!”.
    Which is probably how they missed it.

    I also get two free extra NFC rings from another kickstarter (vouchers for the replacements still to come) due to manufacturing difficulties.
    Kickstarter issues are working out very well for me!

    1. Have to hand it to Sparkfun:
      – They delivered on time, the product is very good quality and packaged well.
      – The price is pretty low
      – They handled this problem really well.

      I think I’d want to use them if I did a croudfunding thingy!

  4. I looked at the widget earlier on. Naturally at the time I had nothing to use it for so I did not buy one. Oddly enough I normally get great service from them. The one time when I package did not contain an expected ordered item they were interested in what was in there.

  5. Isn’t is pretty standard in medium to large production to pull every N product an fully test it? Don’t get me wrong, I think spark fun has done the correct thing here, stepped up, admitted, and set about proposed solutions to fix the problem. If they randomly pulled 1:1000 of the units off the production line for testing, this would have been caught. Probably before any faulty product had been shipped. In this case that would have been 11 devices. One or two of those devices would have shown up as faulty.

    Again kudos to SF for actually owning up to the problem!.

  6. Respect to Sparkfun, they’re dealing with the problem in the most possible user friendly and honest way. If they had a shop in the EU I’d immediately support them by buying something, but shipping rates from the US are a turn off.
    Well… they have a distributor a few km from me and I already purchased from that one, not the same thing though.

  7. Definitely Kudos to SparkFun. I hope this won’t hurt them financially too much. I think most people would have been satisfied with a tutorial and credit instead of a completely new unit. Well done!

  8. Hindsight is 20/20, but instead of completely eliminating a 30 second test, they could have performed this test on every 50th unit or so.

    Even if it weren’t for the firmware issue, this is still important. There could be a quality control issue with any part of the process. It may affect some percentage of units, rather than all. You can’t test the first unit in a run, and assume that because nothing has changed, the rest will work too. At any point during the run, did you start using a new reel of components? Then something could have changed, if that reel is counterfeit, recycled, or contains components that failed factory checks and made it into the market anyway.

    1. And all this is what Sparkfun is owning up to…they need to change the way they think about production for runs this size.

      What is awesome about all this is how they are handling it. How many companies out there would just ship out new units at their expense without complaint? How many would also give away the fix and TELL their customers how to implement it? The rest of the manufacturing world could learn a lesson from Sparkfuns mistake and more from their response to that mistake. As far as mistakes go, this one will work out to the satisfaction of all parties involved and generate tons of good PR for Sparkfun.

  9. Sparkfun made big failure when they sent almost 2000 units without testing a single one but … you just need an AVR ISP programmer to flash bootloader yourself and fix the problem.
    Don’t get me wrong, but my opinion is that person who is trying to make something with an AVR based board, and does not have that $3 programmer in his/her toolbox is much bigger failure than Sparkfun’s mistake.

    1. It would probably be cheaper for Sparkfun to just ship a cheap ISP programmer to everyone that got a “defective” unit. That way everyone would have the tools to move on from “sketches” and learn real embedded programming.

    2. >Don’t get me wrong, but my opinion is that person who is trying to make something with an AVR based board, and does not have that $3 programmer in his/her toolbox is much bigger failure than Sparkfun’s mistake.

      Don’t get me wrong, but my opinion is that person who is trying to make something with an AVR based board, and does not have that $3 programmer in his/her toolbox is just a “maker”.

      FIXED

    3. “without testing a single one”

      They tested every single one. That’s what the production test rig is for. The tests just didn’t cover the specific issue with the bootloader, and they’ve upgraded the test rig to include it.

  10. conspiracy theory: item was shipped faulty (mind you: NOT DEFECTIVE, but easily fixable!) on purpose, company steps up immediately, gets coverage on popular blogs… customers and non-customers alike give heaps of kudos, company becomes “the good guys” -> marketing win mucho?

      1. I dont subscribe to the theory. but if it were true, the money was not lost on shipping, it was spend on advertising. Cheap for a good viral PR campaign probably. And it also seems to have generated plenty of goodwill according to the comments here.

          1. Might be because I am not a native english speaker that it comes across as trolling – I did not intend to at least.

            I said that _if_ the conspiracy was true, and i do not believe it is. That was not meant in sarcasm or irony.

            But in the theoretical situation that it is. then the expenses on mailing people a second part would be comparable to having spend the money on some other ad-campaign. And that it would not be expensive compared to for example a viral video campaign (purely my opinion).

            I do notice I got the part about it succeeded at creating goodwill to sound like it was part of the conspiracy theory, what i did mean was that regardless of the reasons – they are currently enjoying a lot of goodwill as evidenced by the positive comments about their reaction.

            Please do not take it any more serious than it was meant to be. It was really nothing more than a slightly silly tangent.

  11. Ah, a missing bootloader isn’t THAT bad on an AVR. I mean, you need a programmer anyway so what’s the big deal? They could make a simple script for users to run to flash the bootloader, should be no harder than doing it to a normal arduino.

    Either way, it is good publicity. Beats Apples response of “You are using it wrong”

  12. “The problem is the bootloader normally connects to a PC host via USB. Enumerating a USB connection can take up to 30 seconds. That’s way too slow for volume production.

    I humbly suggest not using a Windows PC for USB enumeration tests. Udev under Linux (the default on most modern distros) never takes more than a few hundred milliseconds to enumerate. There are no excuses for not fully testing your production stuff.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.