Everyone has a bad day right? Monday was a particularly bad day for the folks at Sparkfun. Customer support tickets started piling up, leading to the discovery that they had shipped out as many as 1,934 MicroViews without bootloaders.
MicroView is the tiny OLED enabled, Arduino based, microcontroller system which had a wildly successful Kickstarter campaign earlier this year. [Marcus Schappi], the project creator, partnered up with SparkFun to get the MicroViews manufactured and shipped out to backers. This wasn’t a decision made on a whim, Sparkfun had proven themselves by fulfilling over 11,000 Makey Makey boards to backers of that campaign.
Rather than downplay the issue, Sparkfun CEO [Nathan Seidle] has taken to the company blog to explain what happened, how it happened, and what they’re going to do to make it right for their customers. This positions them as the subject of our Fail of the Week column where we commiserate instead of criticize.
First things first, anyone who receives an affected MicroView is getting a second working unit shipped out by the beginning of November. Furthermore, the bootloaderless units can be brought to life relatively easily. [Nate] provided a hex file with the correct bootloader. Anyone with an Atmel AVR In-System Programming (ISP) programmer and a steady hand can bring their MicroView to life. Several users have already done just that. The bootloader only has to be flashed via ISP once. After that, the MicroView will communicate via USB to a host PC. Sparkfun will publish a full tutorial in a few weeks.
Click past the break to read the rest of the story.
So what went wrong? The crux of the problem is a common one to manufacturing: An incomplete production test. For many of their products, Sparkfun loads a single hex file containing the production test and the optiboot bootloader. The test code proves out the functionality of the device, and the bootloader allows the customer to flash the device with their own sketches. The problem is the bootloader normally connects to a PC host via USB. Enumerating a USB connection can take up to 30 seconds. That’s way too slow for volume production.
Sparkfun opted to skip the bootloader test, since all the pins used to load firmware were electrically tested by their production test code. This has all worked fine for years – until now. The production team made a change to the test code on July 18th. The new hex file was released without the bootloader. The production test ran fine, and since no one was testing the bootloader, the problem wasn’t caught until it was out in the wild.
The Sparkfun crew are taking several steps to make sure this never happens again.They’re using a second ATmega chip on their test fixture to verify the bootloader without the slow PC enumeration step. Sparkfun will also avoid changing firmware during a production run. If firmware has to change, they’re planning to beta test before going live on the production line. Finally, Sparkfun is changing the way they approach large scale production. In [Nathan’s] own words:
Moving from low volume to mid-volume production requires a very different approach. SparkFun has made this type of mistake before (faulty firmware on a device) but it was on a smaller scale and we were agile enough to fix the problem before it became too large. As we started producing very large production runs we did not realize quality control and testing would need very different thinking. This was a painful lesson to learn but these checks and balances are needed. If it didn’t happen on Microview it would have happened on a larger production run someday in the future.
Everyone has bad days, this isn’t the first time Sparkfun has lost money due to a mistake. However, they’re doing the right thing by attacking it head on and fixing not only the immediate issue but the underlying thought process which allowed the problem to arise.
Kudos to Sparkfun. Everyone makes mistakes, but it’s good that they are stepping up, admitting the problems, and making the fixes.
Good thing that ST uC (others?) has DFU in ROM so that fw can be upgraded just by asserting a pin (BOOT0) and DFU will appear on PC.
handy when need to upgrade fw without programmer and can help tackle this kind of problems.
You sir are an acronymn junkie. IMHO…….
Not disabling this feature is exactly how the Nest thermostat got rooted :)
The ST first-stage bootloader isn’t disable-able – it’s in ROM. But if the chip is protected all you can do with it is wipe and reflash the chip.
The thermostat’s hardware exposes the sys_boot5, which, if driven high, has the device boot from USB. That was “mistake” 1. Secondly, if pushing the button on the thermostat for about 10seconds or so, the device reboots with sys_boot5 being high as well. I believe this was a leftover from some application note/development version. Not exposing the sys_boot pins from the BGA to the user [anywhere on PCB] and fixing that button press issue would have somewhat secured the device [still does not defend against replacing the NAND chip].
In any case, the folks at Nest Labs have released the means of “unlocking” the device without having to resort to USB booting. See their site for details.
Well, plus an epoxy/underfill. It’s not that hard to access a BGA pin that’s not routed, given enough patience. If you only need to do it once, it’s not bad.
Dat feelin’ when you spot the mistake when it’s too late.. I think most of us have been there at one point or other, it’s hmmm, maybe, but oh wait, nope, uh oh.. realisation kicking in , hair stands up on back of neck and oh crap, floor opens up beneath you.
Great way to deal with it though. kudos indeed.
Kickstarter – put your money on useless plastic crap. Sparkf**k – waste rest of your pennies on overpriced electronic crap. But then is it really bad if someone makes money on those pathetic makers (not to be confused with hackers)?
Hackers use any and all tools at their disposal. Show us a link to something you’ve done that didn’t use anyone else’s pre-made parts.
Step 1, mine raw copper ore. Step 2, build refinery. Step 3, wire drawing machine factory.
Typical maker, letting stars do all the heavy lifting. Real hackers fuse their own heavy elements from hydrogen.
“If you want to make an apple pie from scratch, you must first create the universe.” —Carl Sagan
People, people…don’t feed the trolls…
I’m actaully very pleased, I ordered a microview without an idea for a project, but now I’ll have two microviews because the first project is ‘burn the bootloader and get it working’.
I have a few friends and collegues that didn’t know about this, they spent time trying to figure out why their sketches didn’t work. The email to let people know the problem is titled “Project Update #15: MicroView: Chip-sized Arduino with built-in OLED Display! by Geek Ammo”
Instead of “Microview not working? Read This NOW!”.
Which is probably how they missed it.
I also get two free extra NFC rings from another kickstarter (vouchers for the replacements still to come) due to manufacturing difficulties.
Kickstarter issues are working out very well for me!
Have to hand it to Sparkfun:
– They delivered on time, the product is very good quality and packaged well.
– The price is pretty low
– They handled this problem really well.
I think I’d want to use them if I did a croudfunding thingy!
I looked at the widget earlier on. Naturally at the time I had nothing to use it for so I did not buy one. Oddly enough I normally get great service from them. The one time when I package did not contain an expected ordered item they were interested in what was in there.
affected not effected
Isn’t is pretty standard in medium to large production to pull every N product an fully test it? Don’t get me wrong, I think spark fun has done the correct thing here, stepped up, admitted, and set about proposed solutions to fix the problem. If they randomly pulled 1:1000 of the units off the production line for testing, this would have been caught. Probably before any faulty product had been shipped. In this case that would have been 11 devices. One or two of those devices would have shown up as faulty.
Again kudos to SF for actually owning up to the problem!.
1 in a thousand units from 1934 units produced would be 1.934 devices (or rather 1 or 2) pulled not 11.
I guess ‘N’ was too high.
Respect to Sparkfun, they’re dealing with the problem in the most possible user friendly and honest way. If they had a shop in the EU I’d immediately support them by buying something, but shipping rates from the US are a turn off.
Well… they have a distributor a few km from me and I already purchased from that one, not the same thing though.
Definitely Kudos to SparkFun. I hope this won’t hurt them financially too much. I think most people would have been satisfied with a tutorial and credit instead of a completely new unit. Well done!
Very cool that they are open about the error, and fixing it. This could become a trend.
What, a vendor that owns up to their mistakes and takes care of their customers. Must not have any MBA on staff.
^ this comment … +1
Hindsight is 20/20, but instead of completely eliminating a 30 second test, they could have performed this test on every 50th unit or so.
Even if it weren’t for the firmware issue, this is still important. There could be a quality control issue with any part of the process. It may affect some percentage of units, rather than all. You can’t test the first unit in a run, and assume that because nothing has changed, the rest will work too. At any point during the run, did you start using a new reel of components? Then something could have changed, if that reel is counterfeit, recycled, or contains components that failed factory checks and made it into the market anyway.
Yes, this is the crux of moving from PVT to MP.
And all this is what Sparkfun is owning up to…they need to change the way they think about production for runs this size.
What is awesome about all this is how they are handling it. How many companies out there would just ship out new units at their expense without complaint? How many would also give away the fix and TELL their customers how to implement it? The rest of the manufacturing world could learn a lesson from Sparkfuns mistake and more from their response to that mistake. As far as mistakes go, this one will work out to the satisfaction of all parties involved and generate tons of good PR for Sparkfun.
Dang! And I didn’t buy one of those. If you need somebody to load the boot loader for you, I can do it.
Sparkfun made big failure when they sent almost 2000 units without testing a single one but … you just need an AVR ISP programmer to flash bootloader yourself and fix the problem.
Don’t get me wrong, but my opinion is that person who is trying to make something with an AVR based board, and does not have that $3 programmer in his/her toolbox is much bigger failure than Sparkfun’s mistake.
It would probably be cheaper for Sparkfun to just ship a cheap ISP programmer to everyone that got a “defective” unit. That way everyone would have the tools to move on from “sketches” and learn real embedded programming.
This thing most likely doesn’t have a standard ISP since it’s so small, so at least half the people wouldn’t be able to flash it anyway.
http://makezine.com/2014/08/21/how-to-fix-your-broken-microview/
Looks like a 10 minute job for anyone with basic electronics skills.
ardweenies can’t solder. everybody knows that.
Honestly this is a great opportunity for hackerspaces to help out folks who aren’t up to cracking the case open themselves. It’s a chance to bring in and educate new members.
>Don’t get me wrong, but my opinion is that person who is trying to make something with an AVR based board, and does not have that $3 programmer in his/her toolbox is much bigger failure than Sparkfun’s mistake.
Don’t get me wrong, but my opinion is that person who is trying to make something with an AVR based board, and does not have that $3 programmer in his/her toolbox is just a “maker”.
FIXED
“without testing a single one”
They tested every single one. That’s what the production test rig is for. The tests just didn’t cover the specific issue with the bootloader, and they’ve upgraded the test rig to include it.
Perfect solution to a possible PR disaster and I’m sure it will cost them less in short term and in the long run and make them gain more loyal customer.
Received a MV this week. Within 4 or 5 hours of posting the problem, received a personal email from Marcus. Totally Class Act!! Go Sparkfun and Marcus!
conspiracy theory: item was shipped faulty (mind you: NOT DEFECTIVE, but easily fixable!) on purpose, company steps up immediately, gets coverage on popular blogs… customers and non-customers alike give heaps of kudos, company becomes “the good guys” -> marketing win mucho?
Pretty stupid way to go about it – the money they lost shipping out replacement units could have bought a bunch of advertising instead.
I dont subscribe to the theory. but if it were true, the money was not lost on shipping, it was spend on advertising. Cheap for a good viral PR campaign probably. And it also seems to have generated plenty of goodwill according to the comments here.
You’re not saying it was true, you’re just asking questions…about how 9/11 was a hologram.
FYI: this is called “concern trolling”
Might be because I am not a native english speaker that it comes across as trolling – I did not intend to at least.
I said that _if_ the conspiracy was true, and i do not believe it is. That was not meant in sarcasm or irony.
But in the theoretical situation that it is. then the expenses on mailing people a second part would be comparable to having spend the money on some other ad-campaign. And that it would not be expensive compared to for example a viral video campaign (purely my opinion).
I do notice I got the part about it succeeded at creating goodwill to sound like it was part of the conspiracy theory, what i did mean was that regardless of the reasons – they are currently enjoying a lot of goodwill as evidenced by the positive comments about their reaction.
Please do not take it any more serious than it was meant to be. It was really nothing more than a slightly silly tangent.
Ah, a missing bootloader isn’t THAT bad on an AVR. I mean, you need a programmer anyway so what’s the big deal? They could make a simple script for users to run to flash the bootloader, should be no harder than doing it to a normal arduino.
Either way, it is good publicity. Beats Apples response of “You are using it wrong”
“The problem is the bootloader normally connects to a PC host via USB. Enumerating a USB connection can take up to 30 seconds. That’s way too slow for volume production.
I humbly suggest not using a Windows PC for USB enumeration tests. Udev under Linux (the default on most modern distros) never takes more than a few hundred milliseconds to enumerate. There are no excuses for not fully testing your production stuff.