First, it was the WiFi router: my ancient WRT54G that had given me nearly two decades service. Something finally gave out in the 2.4 GHz circuitry, and it would WiFi no more. Before my tears could dry, our thermometer went on the fritz. It’s one of those outdoor jobbies that transmits the temperature to an indoor receiver. After that, the remote for our office lights stopped working, but it was long overdue for a battery change.
Meanwhile, my wife had ordered a new outdoor thermometer, and it too was having trouble keeping a link. Quality control these days! Then, my DIY coffee roaster fired up once without any provocation. This thing has worked quasi-reliably for ten years, and I know the hardware and firmware as if I had built them myself – there was no way one of my own tremendously sophisticated creations would be faulty. (That’s a joke, folks.) And then the last straw: the batteries in the office light remote tested good.
We definitely had a poltergeist, a radio poltergeist. And the root cause would turn out to be one of those old chestnuts from the early days of CMOS ICs – never leave an input floating that should have a defined logic level. Let me explain.
The WRT54G was the hub of my own home automation system, an accretion of ESP8266 and other devices that all happily speak MQTT to each other. When it went down, none of the little WiFi nodes could boot up right. One of them, described by yours truly in this video, is an ESP8266 connected to a 433 MHz radio transmitter. Now it gets interesting – the thermometers and the coffee roaster and the office lights all run on 433 MHz.
Here’s how it went down. The WiFi-to-433 bridge failed to connect to the WiFi and errored out before the part of the code where it initialized GPIO pins. The 433 MHz transmitter was powered, but its digital input was left flopping in the breeze, causing it to spit out random data all the time, with a pretty decent antenna. This jammed everything in the house, and apparently even once came up with the command to turn on the coffee roaster, entirely by chance. Anyway, unplugging the bridge fixed everything.
This was a fun one to troubleshoot, if only because it crossed so many different devices at different times, some homebrew and some commercial, and all on different control systems. Until I put it together that everything on 433 MHz was failing, I hadn’t even thought of it as one event. And then it turns out to be a digital electronics classic – the dangling input!
Anyway, hope you enjoyed the ride. And spill some copper for the humble pull-down resistor.
 
            
 
 
    									 
    									 
    									 
    									 
			 
			 
			 
			 
			 
			 
			 
			 
			 
			
Interesting…
Thanks!
I would like to get back to working with SDR/Spectrum analysis to see RF signals in the home.
Ok, this one I definitely enjoyed. Some great reminders here.
Pull ups or a software fix? Initialize those GPIO before trying to set up the Wifi.
This is always something to think about with any kind of control system — what is the after reset state of various signals. Is it acceptable? Is it OK even for the short duration before software gets running and has a chance to configure things? Often it is, but if not, then pullups and perhaps inverters may be in order.
Yes, configuring IOs should always be the first thing a micro does.
Every pin shall have a default state.
However, even this can take a few tens of µs, and this can be enough for some applications to be dangerous. And it is here external pull ups/downs are good to have. (though, they are more or less always good to have. But good luck telling that to the cost cutting higher ups.)
Couple of things:
First, instead of relying on CPU initialization and/or pull-up resistor to stabilize the data input to the RF module, the the CPU should have to actively turn on DC power to the RF module under program control only after the data is in a sane state.
Second, watchdog timers are your friend in RF applications. Use two timers. In the old days, we used the two halves of a 74LS123, but you could use two NE555s or a CMOS 556 or whatever. One timer should be tied to a CPU interrupt input and say set to tigger an interrupt routine every 100 mS. The interrupt routine should generate an output tied to reset both timers. The second timer should have a constant of say 200 mS. Under normal circumstances as long as the CPU and firmware retain sanity, the second timer will never time out. The second timer should trigger a power cycle of the system. It is tempting to just hit the rest pin of the CPU, but it is better to add circuitry which can do a system power cycle. CPUs can and do get confused enough that they ignore their reset pins.
Third, a spectrum analyzer would have been handy in trouble shooting the situation. One RF device failing is fluke, but when the second and third device go south in short order, you’re probably looking at an EMI/EMC issue. You could use an RTL dongle as an SDR with PC spectrum display, but I’d recommend getting hold of the TinySA which is a hand-held stand-alone device which is much handier for pin pointing miscreant RF sources than schlepping a laptop and dongle around. The TinySA is under a hundred bucks and a very handy tool to have in your RF sleuthing too box.
Yes, being able to toggle downstream devices is a good solution.
And same for watchdog timers.
Likewise can one rely on external logic if the downstream device is more independent in its operation, but where spurious inputs can still be devastating.
However, triggering interrupts for reseting the watchdog timmer isn’t ideal for ensuring that the processor works.
The interrupt can trigger nicely at the given interval and reset the watchdog, but the main code can be stuck in a loop being completely unresponsive while the thing it controls is dangerous as hell.
The main program loop should have a somewhat known cycle time (preferably constant), and the watchdog timer should be a bit longer than the longest cycle time. (usually twice is safe, but more can be needed.) And then our watchdog timmer gets reset by the main loop. But even this isn’t fool proof, but at least we know our main code isn’t stuck and if we have other safety checks in software these should at least still work unless they have a bug.
Definitely should be configuring IO before networking. That would have nipped the problem in the bud. Lesson learned.
But the lesson that I already knew — initializing pin states physically rather than in code — is definitely the right thing to do when there are e.g. big motors or anything with “power” attached.
Especially with something like an ESP8266 or RP2040, where the chip takes flash from an external SPI device, startup is very very fast on a human scale, but not instant on a electronic scale. I bet it’s milliseconds rather than microseconds. (Why is my oscilloscope staring at me like that?)
Your scope clearly wants you to program a simple piece of software that initializes a pulled up pin as a low output. As to see how long it takes from power on to pulling down. And then check what bank is fastest to respond, and how it gets effected by toggling more inputs and outputs, etc.
But order of operations is important in embedded projects. It is important to reach “default state” quickly. And if the micro itself isn’t fast enough, then external components will be required.
But yes, if the thing under control can be dangerous, more care has to be taken.
Back in the 1990’s a friend moved into a new home. One of his TV channels was a mess, mostly horizontal lines. He knew I was an “RF guy” and asked me to help. Found that his Stanley garage door opener (installed by the prior owner) was the source.
Turns out the Garage Opener’s LM7812 VReg power supply was designed wrong. It was missing the stability cap on the input pin. No place on the PCB for it, the engineer simply ignored the requirement. Added a 0.47uF cap and fixed it.
And a couple years ago all my 433MHz devices went off-line. Car FOB’s stopped working when near my house, home security alarm wouldn’t arm, and my lawn sprinkler went into rain mode even though it was sunny weather.
Using an RF network analyzer I found the culprit. It was caused by a low battery that powered a La Crosse temperature/humidity sensor that was mounted outside under the roof eaves. A new battery fixed it, no more RF mess.
Long story short, RF goblins are lurking all around us. Sometimes they get angry and come out to haunt you.
That remebers me of an experience with an 7812, an east german model in TO3 case. Decads ago I tried it in a sloppy test circuit and did not yet know, how important these caps are. When I tried it, my soldering iron’s temperature display (LED bar) went out and it started heating. I even noticed, that it dispakyed a higher than setpoint temperature after I tested this regulator.
It turned out, that the regulator happily oscillated in the SW region of 2-3MHz. That influenced the temperature measurment circuti and caused the soldring station to heat.
Interesting that the architecture is open loop yet capable of turning on something that could possibly start a fire.
From what I’ve seen most 433MHz systems are unidirectional, so they cannot just do simple “handshake”… For me this story is also a good reminder to not just rely on the basic “minimum-size command message + parity bit” on homemade wireless systems (although I don’t say it’s what OP was doing, their original roaster post talking about WiFi cobtrol), especially when they control a device on the main or a heating resistance ^^! Without going for the full rolling code encryption, I guess just adding a CRC check and for some commands the requirement of a second confirmation message (that refers uniquely to the command message) and that has to be sent in a timely manner would add a bit of safety.
But, if I had a remote-controlled coffee roaster, I wouldn’t mind going all the way to encryption and rolling key, just to avoid being spoofed and messed with by a neighborhood nemesis who doesn’t like my amateur coffee-roasting experimental fragrances. =)
Had a somewhat similar experience with my home automation setup, was caused at the time by a RF enabled window contact that went haywire due to a low battery… can be very tricky to find the culprit.
That’s why I build most of my stuff. So I can pull down/up unused pins.
They talked about “unintentional emissions” in school when I was 11 or 12 or so. I couldn’t have imagined one ending up all around the house, though. How humiliating and awkward.
How is this in any way humiliating?
No one tell him.
I guess nobody had that conversation with you…
The comment is almost asking about itself.
I have long debated if a microcontroller should have its IO pins in a default state. Either dedicated EEPROM cells per IO pin, or fuses. So that we can define default state ourselves and have it be there the very instant the micro gets power.
Now, often the micro boots fast enough for this to not be an issue. But sometimes it is an issue to have a pin floating for even a few tens of µs.
Now, if something can fail dangerously, then it is our responsibility to design it safe to begin with. Pull ups/downs is a trivial solution, sometimes a logic chip can help. But these adds to our BOM cost and why cost cutting companies tends to “miss” these safety critical components. A micro where we can define default state would remove this need. (and if the micro fails, then it can likewise be dangerous regardless of pull ups/downs.)
Indeed, when developing firmware for a microcontroller, one of the first things I to is to initialize pins to their normal, idle states. But that sometimes isn’t fast enough, or simply doesn’t occur if the controller is held in it’s reset state, or being programmed in-circuit (all I/O pins are inputs, and flop in the breeze).
Since I also design the boards, I consider what would happen to the remaining circuits if the MCU were to not be installed. Inputs to the MCU can generally be ignored, unless they also affect something else. Outputs from the MCU almost always need to be pulled to a known, safe, state.
This is NOT a case of unintentional emissions it is a case of intentional emissions which are unwelcome. Unintentional emissions are what happens when a switching circuit or something which is NOT intended to transmit any radio emissions does so via an accidental antenna structure in the wiring/boards. This device contained a 433MHz transmitter (either a module, or a chip with assocated circuitry designed to deliberately make a 433MHz signal), the emission was perfectly intentional, it was just being turned on when it wasn’t wanted.
I _knew_ someone was going to come in and ham-lawyer me on this one.
Of course you’re right. But now you have to come up with a better title. :)
Unintentionally transmitted data, or short, unintentional transmissions?
I guess unintentional intentional emissions didn’t quite roll of the tongue right
The Ham radio story in our area of a funny occurrence was about a water pump in a sucken boat at the harbor that was still powered from land and would turn on every once in awhile when contact was made and cause spurious emission to jam GPS in a 5 mile radius.
Back in the days of dial up internet, my speeds would significantly drop every night just after dusk for about an hour. No one else in the neighbourhood had this problem, just me. Local phone company was baffled too. By chance I was coming home at night and saw our streetlight ( on the same pole as our telephone connections ) was out, and it suddenly flickered on. Found out that the ballast was failing and, sure enough, while it was trying to light sent out all kinds of copper wire disrupting RF. Had the streetlight fixed, phone problems solved. Weird it never seemed to have caused audio interference.
I have received AM radio on a telephone with corroded contacts on the plug making a rudimentary diode.
That’s how dental work becomes an AM radio too.
A friend had an idiot neighbor with a CB and linear. You could hear him talk on the microwave (as well as everything else with a speaker).
Uncle Charlie didn’t care. Not one bit.
Ultimately ninjaed a pin through his coax, tightened him right up, would have liked to see the smoke get out.
Back in the dialup era, I was troubleshooting a client’s brand new PC. The modem would not connect, but the one in his old computer would. By chance, a telco guy showed up to replace a broken phone jack. I asked him to bring in his line tester to check the phone line. Wasted more time arguing with him about the need for testing it because the modem wouldn’t connect than it took him to just get the tester from his truck and plug the damn thing in.
TADA! Voltage way too high and something else wonky! The problem wasn’t in the apartment but in the distribution box for the whole building. Not bad enough to affect voice calls or many older 33.6K modems but played merry hell with 56K.
So he had to call in to get someone to come out and actually fix the problem because it was out of specification. The telco still didn’t care that the problem caused malfunctions in some non-voice equipment, they were only fixing it to meet the government regulations.
Once it was repaired the new PC could get online, and I suspect the voice call quality and internet for everyone in the building was improved.
Years ago I worked for a small ISP and CLEC telco (Competitive Local Exchange Carrier). We had a sudden outage in a bunch of fiber lines. Turned out that a small squirrel had squeezed itself down the plastic line cover on a wooden pole near the office. Once it got to the junction at the bottom of the pole it couldn’t turn around, so it chewed the large fiber bundle to pieces. The new hire, on his third day, had been told he’d never have to touch anything with fiber. He took the cover off the box and got a very angry squirrel surprise. Then he was tasked with learning how to correctly trim and splice a large fiber optic cable. Part of the repair included putting something in the end of those line covers on the poles to keep out the wildlife.
This, folks, is how civilization will come to an end someday.
Some little embedded thing will fail, with unpredictable side effects, which will cascade to something important.
I might even still be alive to see it, on January 19, 2038.
According to ALF, a boating accident may have caused the destruction of MelMac.
Having a hand-held spectrum analyzer would have greatly simplified trouble shooting a problem like this. The unit called TinySA is super convenient because it is palm size with an LCD display. They’re very affordable, readily available on your favorite shopping web site. Anyone who works with RF should have one.
I thought about pulling out the trusty RTL-SDR dongle, but by the time I had gotten to thinking it could be a jamming problem, I already knew the frequency it was jamming on, and that lead me straight to the culprit.
Still might be fun to see what the “data” looks like.
What a lovely – and typical – story! Thanks for sharing, it is so nice to see I’m not alone :-) I have seen such “unrelated failures” many times, of course not only relating to 433, and I am sure many other times I might have cured something but missed that true source of troubles.
If there’s something strange…
The Official Motto of the State of California.
I had issues with the esp8266 it self, dunno if was because the old SDK or whatever, but it would go frenzy after a few hours or days of work, and would slowdown my WiFi to a craw, i could measure my laptop speed in kilobytes (WLAN to LAN) even it being closer to the access point.
This is back when esp8266 has first shown here as the next big thing, with the only issue the lack of documentation, and boy how you guys were right.
I was taken by how many commented problems resolved to issues with batteries.