It is easy to think that a Linux shell like Bash is just a way to enter commands at a terminal. But, in fact, it is also a powerful programming language as we’ve seen from projects ranging from web servers to simple utilities to make dangerous commands safer. Like most programming languages, though, there are multiple layers of complexity. You can spend a little time and get by or you can invest more time and learn about the language and, hopefully, write more robust programs.
Signals
If you are running a Linux program, even a shell script, it is subject to receiving a signal under certain conditions. For example, SIGINT
, or signal 2, is what happens when you press Control+C. There are plenty of other signals, though. A very few signals, like signal 9 which is the SIGKILL
will terminate your program no questions asked and you can’t stop it. But most of the other signals can be caught. You can either ignore them or take some action.
Some signals come from the system. Here’s a list of common signals and their number.
1- SIGHUP
(Hangup)
2 – SIGINT
(Interrupt; Control+C)
3 – SIGQUIT
(Quit)
4 – SIGILL
(Illegal instruction)
9 – SIGKILL
(Kill)
If you want to see a long list, try trap -l
from the command prompt. My system lists 64 different signal names.
You can use the kill
or killall
command to send signals to processes:
kill -1 4234 killall -9 emacs kill -SIGHUP 3152
In addition to the standard signals, Bash has a few special ones, too. Here’s a list, but you should check out the Bash manual under trap to get the details:
- EXIT – When shell exits
- ERR – When an error occurs (see the Bash manual for specifics)
- RETURN – When a shell function or sourced script finishes
- DEBUG – Before each command executes
What Happens?
Most of the time, when your program or script gets a signal, it will stop. There are a few exceptions and it depends on other things. For example, using nohup
will protect your program from SIGHUP
.
In a shell script, you can use the trap command to “catch” a signal or a list of signals. You have three options:
- Provide no action which sets the signal to the default handler
- Provide an empty action (e.g., ”) which sets the program to ignore the signal
- Provide a bit of code to run if the signal occurs
For example, to ignore SIGQUIT
and SIGHUP
, you could write:
trap "" SIGQUIT SIGHUP
Or if you aren’t in the mood to ignore, you could write:
trap "echo Bye; exit" SIGINT
To return to the default, use:
trap SIGINT
Simple, right? Try this:
#!/bin/bash trap "echo ; echo Bye ; echo ; exit" SIGINT while true do sleep 1 done
Run that and then press Control+C.
Easy, But…
That’s simple enough, but there is a slight inconvenience. If you trap more than one signal with the same code, you have no simple way to figure out which signal caused the trap. It would be nice if you could have a trap function that serviced a bunch of different traps that could understand which signal occurred using a case or if statement, for example.
This isn’t built into Bash, but you can do it with a little work. In fact, I wrote trappist to do just that for you. Here’s how it works: You include the trappist.sh
file in your script and then write a function called trappist_trap
. It will take a single argument that tells you what signal fired. If you don’t provide one, a dumb default will be there that you can override later.
You can call trappist_init
in several ways. If you don’t provide any arguments, then all signals you can catch will direct to your trap function. If you like, you can pass an @ as the first argument, followed by a list of signal names with a + or – in front of them. Like this:
trappist_init @ +SIGINT -SIGQUIT -SIGHUP
The order of the signals doesn’t matter. This command line catches all signals, but uses the default handler for SIGINT and ignores SIGQUIT and SIGHUP. You can also omit the @ sign if you like.
Another way to call the init
function is with an equal sign:
trappist_init = SIGINT SIGQUIT +SIGHUP -SIGUSR1
In this case, only SIGINT and SIGQUIT will go to the trap function. SIGHUP will get the default handler and SIGUSR1 will be ignored.
A typical trap function might look like this:
function trappist_trap() { case $1 in SIGALRM) TRAP_DOWNCT=3 # After 10 seconds go back to 3 echo ^C reset ;; . . .
Internals
The script is pretty easy to figure out. At the heart is a loop that adds traps to the system, one at a time, with arguments attached. The only two tricky things are how the script tries to detect your trap handler and you don’t have one, it uses eval
to create a simple function for you.
The actual setup turns into:
trap "trappist_trap $t" $t
This line takes a signal named in t, traps it, and causes the correct signal name to pass as an argument. After that, it is pretty easy to see how things work.
If you think about it, the signals are a lot like interrupts, although some of them don’t fire right away — in other words, only a few of the signals mentioned occur immediately. However, by default, each “interrupt” has an entry in a vector table. Trappist
populates the table to push everything to a single “interrupt service routine.”
Note that trappist
wouldn’t be necessary if there was a way for the script to figure out the signal. You could write: trap trappist_trap
SIGQUIT
SIGINT
SIGHUP
… You would then have to figure out the signal in the trap function. Of course, if you want to treat all signals the same, you don’t have to worry about that.
We’ve talked about some of the ins and outs of stopping hangups before. We’ve also looked at scripting with binary files.
thanQ ;)
So is there any way to send a signal, to stop the process no questions asked, to a program when your system is too frozen up to open a mate-system-monitor window and let you look up the process ID number of the dodgy program which is slowing your whole system? Any way to send signals to processes by name rather than proces number?
Look at the man page for ‘killall’
Of course there is. Several actually… Try something like “kill $(pidof program name)” or “pkill program name”.
pkill
Google for a man page.
There’s also killall (again, man is your friend). However, if you are totally frozen out, it is hard to get any of this going.
“killall” followed by the name of the process will do it.
kill -9 is not guaranteed to work as a normal user.
Sometimes a process you own can be hung so bad that it has stopped communicating completely and not even a normal kill is enough. It lurks as a zombie process that is undead and unkillable by normal means. In this situation it is necessary to roll out the big guns….
sudo kill -9 {PID}
For the process in question, you might as well have pulled the power cord, and has the same consequences. Any files open for writing by the zombie process may now be corrupt and need repair … if the program is even capable of fixing it…
As I understand it, you can’t kill zombie processes because they are already dead.
A zombie process isn’t really a process anymore. The process has already terminated and freed almost all of its memory and its process status is set to ‘Z’ (for Zombie, of course). At this point, all that’s left of the process is a small bit of memory that contains information about the now-dead-and-gone process (that is, the exit code and usage stats). Its process table slot will _not_ be released until the current parent process calls some variant of the ‘wait()’ function to reap that information. Thus, no attempts to ‘kill’ a zombie process directly will be successful (even ‘kill -9’ aka ‘kill -KILL’) because the zombie process is _already dead_!
Normally, a process slot doesn’t remain in the ‘Z’ state for long after a child has died, but sometimes poorly written parent processes don’t check for dead children or ignore the SIGCHLD signal and will accumulate a bunch of zombie children. (I’ve seen parents with thousands of zombie children.)
Since ‘kill’-ing zombies doesn’t do anything, you must instead either convince the parent process to reap its dead children or you must kill the _parent_ process itself. (If the parent process is stuck for some reason, sending it an ‘abort’ signal (i.e., ‘kill -ABRT’) usually works since the SIGABRT signal is often not caught or ignored, but SIGKILL will always work as a last resort.) Finally, when a parent process of a zombie dies, the zombie becomes an orphan and is inherited by the primordial ‘init’ process. The ‘init’ process in turn will _always_ reap a zombie process and free up its slot in the process table (‘init’ just ignores the exit code and other info).
There’s a typo for the `trap` cmd, it’s `trap -l`, no space.
Not sure how that happened. I’m going to blame WordPress. Thanks — will fix.
Bash is fine as a shell, but it really sucks as a programming language. There are any number of batter alternatives. Don’t waste time doing anything that resembles bash programming.
My thought as well. On one occasion I started to write a program in bash, felt frustrated, and decided instead to learn perl, which I’d never used before. Instant success. Only regret is that I probably should have used python instead.
Linux is by far the best system , and all programs were free and that is what they don’t like …