Making your projects more reliable

Lead Image © Oleksiy Tsupe, 123RF.com

Patting the Dog

A watchdog timer is a great way of improving reliability for little cost in small, inexpensive computers such as the Raspberry Pi and Arduino.

Computers sometimes lose their way. A power glitch, RFI (radio frequency interference), hanging peripherals, or just plain bad programming can make your small computer hang, causing your applications to fail. It happens all the time. For example, how often do you have to reboot your PC? Not very often, perhaps, but once in while your Mac or PC will freeze requiring you to cycle the computer's power.

Raspberry Pis will sometimes freeze because a task does not free up sockets or consumes other system resources. Arduinos will sometimes freeze because of brownouts on the power line or a short power interruption. They might also freeze because they run out of system resources such as RAM, stack space, or both, which are very limited resources in an Arduino. Sometimes even programmers make mistakes.

With small computers, you can give your device a chance to recover from faults by using what is called a watchdog timer (WDT). A WDT is an electronic timer used to detect and recover from computer malfunctions (Figure 1).

Figure 1: Typical watchdog timer and computer setup.

If the computer fails to reset the timer (also called "patting the dog") on the WDT before the WDT timer expires, the WDT signal is used to initiate either corrective actions or simply to reboot the computer. WDTs are critical for remote systems such as the Mars Exploration Rovers and other space probes that are not physically accessible to human operations. Otherwise, they could become permanently disabled. SwitchDoc Labs uses an external WDT in the Project Curacao system [1] [2], because it is 3,500 miles away and inaccessible most of the time.

Smaller projects also can use WDTs. Just because a computer is close by, doesn't mean it is convenient to reboot, and sometimes you just want a reliable system. Although WDTs can initiate actions other than a reboot, I'll just examine the reboot scenario here.

WDTs have two characteristics: how long the timer lasts before it times out (Wto) and what happens when the timer times out (Wact). Watchdog timers come in two major types: internal and external. To begin, I'll look at the internal timers in the Raspberry Pi and Arduino.

Rasp Pi Internal Watchdog

The BCM2835 system on a chip that powers the Raspberry Pi has a WDT on board with 20 bits, and it counts down every 16µs for a Wto of 16 seconds. This means you have to write to the internal WDT earlier than every 16 seconds or the WDT will fire. To load the internal watchdog kernel module, run:

sudo modprobe bcm2708_wdog

Now run lsmod and look for the line in bold below:

Module Size Used by
bcm2708_wdog 3537 0

This verifies that the watchdog module was loaded successfully. Now modify /etc/modules to load the module on boot by running;

sudo echo bcm2708_wdog >> /etc/modules

then, use the watchdog(8) daemon to pat the dog:

sudo apt-get install watchdog chkconfig
sudo chkconfig watchdog on
sudo /etc/init.d/watchdog start

The watchdog daemon requires configuration on the Raspberry Pi. You need to modify /etc/watchdog.conf to contain the following lines only:

watchdog-device = /dev/watchdog
watchdog-timeout = 14
realtime = yes
priority = 1
interval = 4

The last line pats the dog every four seconds. Finally, enter:

sudo /etc/init.d/watchdog restart

at the command line to set up the internal Raspberry Pi watchdog.

Testing the Rasp Pi Watchdog

Now, to test the internal watchdog, you edit a file called forkbomb.sh. Put the following commands in the file,

#!/bin/bash
swapoff -a
:(){ :|:& };:

and execute it with:

sudo sh -x forkbomb.sh

The fork bomb works as follows: The function is invoked twice and the pipeline is put in the background; each successive new call on the processes spawns even more calls to ":" (the function). This leads rapidly to an explosive use of system resources, slowing response to a halt and killing the ability of the Raspberry Pi to pat the WDT. If you don't turn the swap drive off, the fork bomb has to fill that also, which makes the bomb much, much slower. See also the "Raspberry Pi Watchdog Problems" box.

Raspberry Pi Watchdog Problems

The internal watchdog has several problems.

1. The internal watchdog does NOT power cycle the system: It reboots the Raspberry Pi. This means it does not restart in all conditions, especially in low-power/brownout conditions often experienced with solar-powered systems.

2. If the Raspberry Pi takes longer to boot up than 14 seconds, the watchdog can fire, which puts the Raspberry Pi in an infinite bootup sequence. This can happen; I have done it.

3. If you halt the Raspberry Pi (sudo shutdown -h now), the Raspberry Pi will never reboot. If your program does this by accident, you are finished.

4. I have found the internal watchdog to be unreliable. I never could track it down, but it feels like some kind of conflict between userspace and kernel space.

5. In some situations the Pi becomes unresponsive, but the heartbeat might still occur (e.g., high load situations).

6. The internal watchdog is not completely independent of the Raspberry Pi. Theoretically, this should not matter, but the Raspberry Pi running Linux is a complex system.

Buy this article as PDF

Express-Checkout as PDF

Pages: 8

Price $2.95
(incl. VAT)

Buy Raspberry Pi Geek

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content