Arduino Watchdog Has Bite And Doesn’t Need Treats

My dog Jasper isn’t much of a watchdog: he’s too interested in sleeping and chasing my cats to keep an eye on things. Fortunately, [Vadim] has come up with a more reliable alternative with this simple Arduino watchdog. It’s designed to work with crypto coin mining rigs, but it could be easily adapted for other high-uptime uses, such as file servers or doomsday weapons.

The way it works is simple: a small program on the watched computer sends a command over the serial port: a polite “hello”. The Arduino watchdog picks this up and responds with an equally polite “HELLO”. That starts the watchdog running. A simple Java program on the watched computer then sends a ping every five seconds over the serial port to let the watchdog know it is still running okay.

If the watchdog doesn’t receive this ping, it uses reed relay wired into the reset pins of the computer to trigger a reset. It then waits for the watched computer to say hello, starting the process again.

[Vadim] includes a demo video where the system resets an unreliable crypto mining rig. It does have limitations, of course: if the mining program crashes without taking down the entire computer, the watchdog won’t be triggered, and it won’t work if the problem requires a full hard power reset rather than a soft reset. dIt’s a neat little build that could be easily modified to handle all these issues, though, and you don’t need to keep feeding it treats to keep its attention, unlike Jasper.

17 thoughts on “Arduino Watchdog Has Bite And Doesn’t Need Treats

  1. Clever. It’s automated. We could have used that.

    Years ago I worked at a place with a remote PC running ads (think community classifieds) on a public TV network. Every now and then the computer would freeze and either get stuck on one ad or display nothing. The computer was in the transmission tower. Miles away. Uploading ads was done remotely via modem – dial the computer, upload new ads, hang up. So a clever daughter card / watchdog was installed – if the modem didn’t answer after a bunch of rings, it was assumed the computer was hung, and the board would activate the reset switch rebooting the computer.

    So, whenever someone called complaining the ads weren’t working, we’d just call the computer, let it ring a bunch of times and a few minutes later the ads would start cycling again.

    1. Should copy NASA and use 5 Arduinos. Have 1 stay on backup and 4 working together. If something goes wrong and all 4 doesn’t agree in sync, disable the offending one, and run with 3. If something again goes wrong and all 3 doesn’t agree in sync, disable the offending one, and run with 2. If something again goes wrong and both doesn’t agree in sync, disable both and activate the backup.

      AFAIK NASA never needed the 5th backup

  2. It’s not necessary to use a relay to isolate the reset line. It’s pulled up to Vcc on the motherboard, and all you need to do is ground it for a moment.

    The Arduino (or other reset-generating device) can be happily powered from the PC’s 5v line, or 5v-standby line, which means its ground will be at the same potential as the rest of the PC. Then just use a single transistor as a low-side switch to pull the reset line to ground. (Might even be able to do it with a GPIO pin itself; I don’t know how much current you need to sink to do the pulling-down.)

    I opted not to implement a watchdog but a remote “magic packet” reset, here:

    This saves you from needing a relay and its associated coil snubber diode, although in TFA it looks like they’re not using a snubber, counting on the GPIO’s own protection diodes to save it. That’s risky business but the coil on that relay is pretty small so it might be fine. Also it hopefully doesn’t switch too often, so pulses should be rare!

      1. A lot of people have heard about the inductive kickback and recommend diodes. If you don’t know what you’re doing, it’s a safe approach that will keep things running even if the diode wasn’t necessary.

        There are micro steppers that are meant to be driven directly from GPIO pins: They take max 20mA. There are bunches of respected people who recommend using freewheeling diodes or even a stepper driver. After a discussion on a forum I decided to put things to a test. I promised I would do 1 million steps of the stepper but ended up killing the experiment at 5 million. Cheap chinese arduino survived, cheap chinese stepper survived.

        So…. Those freewheeling diodes in the AVR processors can take 10mA continuous (Alas, that’s not in the datasheet!). They can take (check the datasheet for ESD specs for the number. You need the 0.5 C U^2 formula) of instantaneous energy. If you can do the math and remain below “long term max power” and below “instantaneous power dump” then things are fine. If the math is too hard for you, you need to pay the tax: An occasional diode that might not have been necessary.

        One of the reasons that 10mA is not in the datasheet is that you can blow things up if you’re not careful. When that 10mA is coming from say a 12V powersupply nearby, you have two or three pins sinking that 10mA you can easily start feeding the 5V rail with more current than is being consumed…. That’s when you get into trouble.

        But if you drive something between 5V and your GPIO, the kickback current cannot raise the 5V line: It is consuming as much as it is kicking back. And if you’re driving something between 5V and your GPIO, when the load is supposed to be off, drive the pin HIGH. The diode won’t even come into play! (In the old days this configuration with pin high = device off was common: drive-low strength was WAY more than drive-high. On an AVR the difference is small. Actually, either way is fine nowadays.

        1. Yup, automotive gauge-clusters with MCU-driven needles often use those tiny steppers directly from GPIO, they work great. If you’re making a million of something, it’s worth doing the math AND the experiments to confirm that the specs weren’t fiction.

          But for a one-off, especially with chips that might be gray-market, and specs that aren’t in the datasheet, and for hardware that’s specifically designed to pull you out of a pickle when you’re not present to handle the situation personally? Yeah, the diode is cheap insurance.

  3. Also you might program a soft for internal watchdog in case a program fails but not the entire computer. Like services on windows ? Pretty simple to implement, on unix i think there might be some excellent ones.

    1. That’s an excellent idea. Say, have the watchdog program actually check that some other part of the machine is functioning (some output file is being regularly touched, the network interface is still up, whatever) as a condition of petting the dog. If any of those checks fail or hang, the watchdog doesn’t get its attention, and eventually reboots the machine.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s