Linux-Fu: One At A Time, Please! Critical Sections In Bash Scripts

You normally think of a critical section — that is, a piece of a program that excludes other programs from using a resource — as a pretty advanced technique. You certainly don’t often think of them as part of shell scripting but it turns out they are surprisingly useful for certain scripts. Most often, a critical section is protecting some system resource like a shared memory location, but there are cases where a shell script needs similar protection. Luckily, it is really easy to add critical sections to shell scripts, and I’ll show you how.

Sometimes Scripts Need to Be Selfish

One very common case is where you want a script to run exactly one time. If the same script runs again while the original is active, you want to exit after possibly printing a message. Another common case is when you are updating some file and you need undisturbed access while making the change.

That was actually the case that got me thinking about this. I have a script — may be the subject of a future Linux-Fu — that provides dynamic DNS by altering a configuration file for the DNS server. If two copies of the script run at the same time, it is important that only one of them does modifications. The second copy can run after the first is totally complete.

Atomic Files

The problem is one of atomicity. You could, for example, create a temporary file with an obscure name and make sure that the file doesn’t exist. But what if someone else checks at the same time? You both note the file isn’t there and then you both create the file thinking you are running alone. No good.

It turns out files are a possible answer, but the locking of a file using flock is the right way to do it. There are two general options here: call flock to execute a command for you and it will acquire the lock and release it when the command completes, or you can use flock in your own script to protect a block of code.

The Easy Case

The idea behind flock is that it will put a lock on a file. The file can be open for reading or writing — it doesn’t really matter. You can get a shared lock or — the default — ask for an exclusive lock. You can only get an exclusive lock if there are no other locks on the file.

What file do you use? It depends. For a script, sometimes it is worth using the script file itself as the lock. That certainly makes it unambiguous although if someone copies the script to a new name, lookout. Another answer is to use a temporary file. Most systems will have /var/lock directory for just this purpose.

Consider a case where you have a script that works with a file. One option to the script deletes the file, but you don’t want to do that if another instance of the file is using it at the time. You might do something like this:

flock "$0"  rm "$SHAREDFILE"

This gets an exclusive lock. Other parts of the script might look like this:

flock -s "$0" awk -f script.awk "$SHAREDFILE"

The -s means the lock is shared, so anyone else asking for a shared lock will get it, but the exclusive lock for the rm will block until the file — which in this case is the shell script itself — is unlocked.

Unblocking

Of course, sometimes you don’t want to wait forever. You can use the -n option to tell flock to not block. Or use -w to wait for a specified number of seconds (which does not have to be an integer). By default, if the lock doesn’t work, flock will return a 1, but you can change that by using -E to select a different code since the command you run may also return a 1 for some reason.

Critical Section Blocks

Sometimes you don’t want to run a single command. You want to lock up an entire portion of a script. You can do that, too. The flock command can take a numeric file descriptor. The file can be open for reading or writing, but it must exist. If you use the script file, that’s a sure bet that it exists, of course.

You can use all the same options for blocking and time outs and you have to open up a file descriptor using bash constructs. There are several ways to do that — and I’ve made a repo of examples which I’ll reference below — but I usually just use a redirect in a subshell. You can also use exec to get the same effect.

Have a look at this script. It is a bit contrived, but it prepares a log file and then calls itself twice to create two different entries in that log file. There’s no critical section protecting the log, so you’ll see after the script completes how the output from both subprocesses mix together.

You can use the easy method by just having flock lock the script file before calling the subprocesses. That’s the approach in cs.sh that you can see here. However, in a script like this it is usually more effective to use the block with a numeric file handle. Here’s the same example using that style of flock. The subprocesses look like this:

( flock 99 
echo Here is a log entry from A along with a directory of /etc >>"$LOGFILE"
ls /etc >>"$LOGFILE"
echo That is all from A >>"$LOGFILE" ) 99<"$LOCKFILE"
exit 0

Of course, this is a silly example. It would be just as easy in this case to run process A, wait for it to complete, and then run process B. But that’s not always the case and in a complex script, you may have work that both processes can execute in parallel, only waiting for specific critical sections. In that case, running the processes in series would be less efficient.

Singleton

A very common use case for the critical section block is to stop a script from running more that one instance at any given time.

#!/bin/bash
LOCKFILE="$0"
( if flock -n 99
then
  echo Running task
  sleep 20
  echo Done
  exit 0
else
  echo You can only run one copy of this script at a time!
  exit 1
fi ) 99<$LOCKFILE"

This pattern is useful if you have a program that you plan to run periodically using something like cron. You might run it every minute, but — on occasion — the program might take more than a minute to complete. A flock barrier can prevent a large number of copies from running all at once.

In all of these cases, the file closing automatically releases the lock, so you don’t have to explicitly let go of the lock. If you ever do need to release it early, the -u option is your friend.

A Few Notes

Turns out flock isn’t part of the POSIX standard, but it is pretty common. Before Linux kernel 2.6. 12, flock didn’t work correctly over NFS, either. If you use /var/lock or /tmp, those are very unlikely to be NFS mounted, anyway. If, for some reason, you are on a system without flock, mkdir is not assured to be atomic, but it usually is, so that could be another option since it also returns a status if it created a directory or not.

This locking technique is one of those things you won’t need every day. But when you do need it, it is invaluable.

The opposite of running things one at a time is running them in parallel. You normally don’t need to unlock a lock file, since flock takes care of that when the file closes, but if you want to be super defensive, maybe consider cleaning up anything that needs cleaning using a trap.

12 thoughts on “Linux-Fu: One At A Time, Please! Critical Sections In Bash Scripts

    1. They do not just go away. They also don’t go away if it runs the script/subshell successfully. It’s a little dangerous to attempt to perform self-cleanup and rm the lock file within the script itself, so it’s better to just leave it.

  1. I use mkdir, it will atomically test and return. You only have to be sure to remove the lock when the script ends, or the critical section is done.

    if mkdir /tmp/mylock ; then
    do things
    rmdir /tmp/mylock
    else
    echo Someone has my lock
    fi

  2. Turn on noclobber in the shell – historically use O_EXCL – simple. Or use ln – if the target exists it fails. For the sake of sanity do not use file locking, seriously do not habitually use locking as your goto for this. Nothing on Unix spreads faster than badly written shell scripts, please, I implore you, do not release more to the wild.

  3. Bash is a quirky programming language with no debugger.

    You’re much better off using a language with a good interface to external commands, such as perl. In perl, putting something in backtick quotes executes the command, and you can save the output in a variable – just like in bash.

    The difference is that perl can be debugged, so you can stop at lines and print out the variables. You can type in a new expression to see if the output is what you *thought* you wanted, you can single step, and you can terminate when a problem is detected before the rest of the script runs and buggers everything up.

    I’ve just now added an alias to my personal .bashrc file, and it needed to do something complicated with pieces of a filename, and the ‘awk’ interface was quirky and required a few rounds of debugging… which is nigh impossible in bash.

    I had to resort to copy/pasting the line into a new window just to see what the results were without damaging the directory contents. One by one, for just about each line.

    You’re much better writing commands in something else. I hear python is good for commands as well.

  4. If you do system level scripting eventually you learn to do some kind of locking. Usually after something that was supposed to only run once a night at midnight takes longer than a day to run and you wind up with a machine that will not let you log in and is saying out of memory or just responding like the electrons are moving through jello.

    I tend to be primitive and in *nix and check for a file in /tmp and create it if it is not there, and delete it at the end of the script. The advantage of this is many *nix’s clear /tmp upon reboot so if the system restarts you are up and going without any manual intervention. It is still a good idea though to make the system send you an email if the script does not run because it is still locked. With locking it should not run recursively and take the system down but you still will want to know that it is not running and you should have a look at it.

  5. “You normally think of a critical section — that is, a piece of a program that excludes other programs from using a resource”

    I thought a “critical section” was an area of code that gets run a lot of times such as within a large loop. It’s an area that must be optimized because it runs so much.

    1. Nope, technically a critical section is a segment of code (a section) that has to be performed as an atomic action, but due to accessing shared state with other processes, cannot guarantee atomicity without some form of intervention (semaphore, lock, mutex, etc).

Leave a Reply to XCancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.