Linux Fu: Watch That Filesystem

The UNIX Way™ is to cobble together different, single-purpose programs to get the effect you want, for instance in a Bash script that you run by typing its name into the command line. But sometimes you want the system to react to changes in the system without your intervention. For example, you might like to watch a directory and kick off some program automatically when a file appears from a completed FTP transaction, without having to sit there and refresh the directory yourself.

The simple but ugly way to do this just scans the directory periodically. Here’s a really dumb shell script:

#!/bin/bash
while true
 do
   for I in `ls`
    do cat $I; rm $I
   done
 sleep 10
done

Just for an example, I dump the file to the console and remove it, but in real life, you’d do something more interesting. This is really not a good script because it executes all the time and it just isn’t a very elegant solution. (If you think I should use for I in *, try doing that in an empty directory and you’ll see why I use the ls command instead.)

Increase Elegance

Honestly, you want something more elegant right? Modern kernels (2.6.13 and later) have filesystem notifications in the form of an interface called inotify. You can use these calls programmatically with the sys/inotify.h header file. There is also a set of command-line programs you can install, usually packaged as inotify-tools.

One of those tools is inotifywait and it makes for a nicer script. For example:

#!/bin/bash
while true
 do
   if FN=`inotifywait –e close_write,moved_to --format %f .`
   then
    cat $FN
    rm $FN
   fi
 done

That’s better, I think. It doesn’t wake up frequently, only when something has changed. I figure any sane program putting something in the directory will either open the file for writing and close it, or it will move it. Either way will work and the %f tells the command to report the file name. There are other events you can wait for as well, of course.

If you are wondering why the move case is necessary, think about how most text editors and network download software works. Usually, a new file doesn’t have the final name until it is complete. For example, Chrome will download the file test.txt as test.txt.crdownload or something like that. Only when the file is done will it rename (move) the file to test.txt.

If you want to try the command without a script so you can see the effect, just open up two terminal windows like this:

In the lower terminal, issue the inotifywait command. Don’t forget the period at the end which tells it to monitor the current directory. Then in the other terminal create a file in the same directory. The name of the file will appear in the first terminal and the program will exit. The script just takes advantage of this behavior to set the FN variable, takes action, and then relaunches inotifywait. You can ask the program not to quit, by the way, but that makes scripting a little more difficult. However, it also removes the problem of a file changing while you are doing your processing.

The other command line, inotifywatch, also outputs file change events but it watches for a certain amount of time and then gives you a summary of changes. I won’t talk about it any further. If you think you need that capability, you can read the man page.

A New Cron

The script is still less than ideal, though. Presumably, a system might have lots of different directories it wants to monitor. You really don’t want to repeat this script, or a variation of it, for each case.

There is another program for that, called incron (you will almost surely have to install this one). The incron program is like cron but instead of time-based events, the events are based on file notifications. Once you install it, you will probably have to change /etc/incron.allow and /etc/incron.deny if you want to actually use it, especially as a normal user.

Suppose you want to run a script when a file appears in the hexfiles directory. You can use the command incrontab -e to edit your incron table. The format is very picky (it wants spaces, not tabs, for example). Here’s a line from the file that will do the job:

/home/alw/Downloads/hexfiles IN_CLOSE_WRITE,IN_MOVED_TO /home/alw/bin/program_cpu $@/$#

The $@/$# at the end provides a full path to the file affected. You can also grab the vent time as text ($%) or a number ($&). You can monitor all the usual events and also set options to do things like not dereference symbolic links. You can find it all in the incron man pages.

GUI

I’m not a big fan of GUI editors, but I know I’m in the minority. If you like, there’s a Java-based incrontab editor available. There isn’t much documentation, but you can import your incrontab — if it exists — from /var/spool/incron/your_user_id. If you look at the image below, you can see it offers a form that builds the incron table line for you.

You can find the system files in /etc/incron.d, usually. All the locations can be set by the /etc/incron.conf file, so if you aren’t sure where to look or you want to change the location for the table files, start there.

Go Forth and Watch

Using incron is quite elegant. A system program does all the waiting and our script only runs when necessary. It is easy to look and see all the things you have notifications set for. You can do a lot with these tools, and not just in the embedded space. How are you going to use them?

29 thoughts on “Linux Fu: Watch That Filesystem

  1. It looks like something converted your backticks to single quotes in the inotifywait example. It’s better form to use $(command) instead of backticks, and it won’t get mutilated by your CMM.

      1. Real Admins don’t care about systemd vs. initd. Real Admins have their own init script (often perl) as an easier and more flexible alternative to all those tetchy little standard files in /etc. All your static IPs, mount points, etc, everything in one handy file.

        There’s actual uses for such a hack, but there’s really good reasons not to, especially on multiuser, production systems that others will have to support.

      2. There is nothing to discuss anymore. Every major Linux distro uses systemd now. And incron is now unmaintained for years. There are alternatives to incron and systemd-path, but most of them are not in the repos of your distro..

    1. I know I should not feed the troll.
      But you know some people are having fun with init scripts and are running without systemD. I use OpenRC, So if any solution involves installing systemD and sh*tloads of dependencies, it is NOT a solution. Especially for an embedded system.

  2. For the love of $DEITY, don’t parse the output of ls. In case anyone sees the first script and thinks to themselves “it’s dirty, but it works”, you’re wrong. Read this: https://mywiki.wooledge.org/ParsingLs . Long story short: you can easily fix the globbing behaviour in an empty directory by doing the `shopt -s nullglob` command at the beginning of your script. BUT DON’T PARSE LS — EVER!

  3. “I’m not a big fan of GUI editors,”
    +1

    “but I know I’m in the minority.”
    …sssssssssht… don’t mention it too often or all the hipsters will crowd shell providers and IRC!

  4. Scripts that scan a directory for new files need to have time/date checking on them and some other type of “GO” indicator, or secondary target directories. Example: I want to move a transaction file to a disaster mirror and apply the transaction log to keep it current. The big problem is you need to make sure you aren’t writing to the file and are done with it totally, this can be solved by using 2 directories and renaming the file to the target directory when ready – then you know it’s safe to use. Also, recursion can occur it you run the same job on multiple machines for workload balancing. I setup a whole system for a similar case where a record comes down from ERP software, and you need to apply it to a reporting SQL database, and of course that gets applied to the disaster backup. It can be a real pain, especially for things like applying updates, where you need to take down jobs and start them in the correct order. Timing is obviously a big factor – you need to write it to be entered at any time of the process without backfiring.

    1. Yes that’s what I do! I’m surprised it took so long to come up.

      Before I found ‘watch’ (which is not posix but gnu so not available by default on systems like macOS), I used to use a simple bash loop with a delay.

  5. The comments here are my idea of positive diversity. Thanks everyone, it is very helpful to have all that information and different perspectives, put so logically.

  6. fswatch on Linux, macOS, BSD, and Windows would be a portable way to implement this same functionality. Just replace inotifywait with fswatch and modify the arguments appropriately in your script. fswatch is in ports for BSD and macOS, available via Debian and Ubuntu repos, and you’ll have to use Google to find a Windows build. fswatch will use the operating system’s native filesystem monitor: inotify on Linux (inotify is way too slow for rapidly changing files), FSEvents on macOS (no known limitations), kqueue on BSD (limited number of files it can watch), ReadDirectoryChangesW on Windows (only reads entire directories; you have to figure out which files changed on your own), “File Events Notification API ” on Solaris (no known limitations), and “poll” which stats in a loop on any POSIX compatible system (usable but non-performant). i.e., if rapid file change notifications on lots of files are what you need then Solaris is likely the best option. But then that is the kind of workload it was designed for.

    1. I’m noticing that inotifywait doesn’t seem to notice if I move/copy files to the directory over sftp. Seems to work with everything else’s I’ve used, though. :/

  7. My main beef with inotify is that it doesn’t have any provisions for a global wildcard. It should of course be root only and security-restricted, but Crashplan and other backup applications that try to keep up with file changes are currently forced to request a watch on every single file in the system. This mandates that the backup program trudge through the entire filesystem upon boot to initiate the watches, then maintain them along with the associated kernel overhead (which is significant, when watching millions of files).

  8. inotify is quite handy. I employed it several years ago when hired to track down an active intrusion on a university server which had entirely too many users (and professors who delegated their accounts to assistants, and very likely did not change credentials between) – isolating the host to fix things wasn’t an option. They had wordpress too, which was a major pathway. By the time I came onboard, the intruders had many backdoors on the system, and all it took was one of them to springboard more. As I’d identify backdoors and disabled them, I’d add them to an inotify process I’d written so that I could more directly associate inbound traffic and see where else they’d head after finding something missing.

    1. “Whack-a-mole”-ing a compromised WordPress install is a super PITA. I tried that one time, but ended up just wiping reverting to a backup (thankfully) then updating.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.