Linux Fu: The Linux Shuffle

Computers are known to be precise and — usually — repeatable. That’s why it is so hard to get something that seems random out of them. Yet random things are great for games, encryption, and multimedia. Who wants the same order of a playlist or slide show every time?

It is very hard to get truly random numbers, but for a lot of cases, it isn’t that important. Even better, if you programming or using a scripting language, there are lots of things that you can use to get some degree of randomness that is sufficient for many purposes.

The Root of Random

In your device directory are two quasi-files you might not have noticed before. The /dev/random and /dev/urandom files will output as many random bytes as you might want to read. Why are there two? The kernel grabs noisy data from different places. For example, it might read crypto hardware or measure time intervals between disk accesses. These numbers are not easy to predict and can make a good source of difficult to guess numbers. However, for a certain number of random bits you need a certain amount of random noise. The /dev/random device file fills with these environmental random bits, and if it needs more random measurements to complete the request, it will block until it gets them. The /dev/urandom file, on the other hand, will provide an “unlimited” number of bytes; it works by periodically re-seeding a pseudo-random number generator with environmental randomness.

If you program in any normal language, it is easy to just open either of these files and read the number of bytes you want. In normal shell scripting, it is easy, too. For example:

head -c 3 /dev/random | od -t x1 -A none

This command will give you three hex bytes. If you prefer, you could change the x1 to get decimal numbers or anything else you want.

Better Shell

Of course, the Shell knows you want to do this. Bash keeps $RANDOM updated and you can read from it if you prefer:

for i in {1..5}
do echo $RANDOM
done

This will give you five random numbers each time.

Better Still

This is easy, but we can still do better. After all, suppose you have a bunch of sayings in a file, one per line. Even with a random number, you’d need to skip the lines and worry about how many lines are in the file total. There’s a better way: the shuf command.

This command seems simple at first but is actually quite powerful. The bare command reads a file, or standard input, and permutes it based on a random number. There are options to feed it your own source of random numbers if you care.

Sometimes you don’t want all the items in the file. For example, picking a single quote from a file, you might just want the next random song. The -n option limits the output to the first line or lines. If you want to shuffle numbers, you can use the -i option. For example:

shuf -n 1 -i1-10

This command will give you a single random number between 1 and 10. Very easy!

Back to the picking a random quote from a file, that’s as easy as:

shuf -n1 input_file.txt

Combined with a list of files, this can pick random files easily, too:

ls *.mp3 | shuf -n 1

When to Choose Which

Note the shuf command is part of the GNU Core Utilities, so some machines won’t have it. In BSD, the jot command is somewhat similar. For a more portable script, it would probably be wise to check that shuf exists, maybe look for jot, and if you find neither, try to see if $RANDOM changes. You could process the raw number with awk. Absent that, you could check for /dev/urandom and /dev/random, which would also require some processing.

With these tools, you can write delightfully unpredictable scripts. (Of course, some of our scripts are less than delightfully unpredictable, too. But we can’t blame /dev/urandom for that.)

If you want to dig deep into /dev/random, check out Elliot’s writeup of the Linux entropy collecting system.

35 thoughts on “Linux Fu: The Linux Shuffle

  1. Interestingly, unlike shuf, mp3 player style shuffle is not actually random. If it were, you would get repeated songs more than you’d expect.

    This is a really cool little one liner, but I’ve usually regretted most of the bash scripts I’ve written.

    I’m sure it makes sense if you’re a sysadmin, but for home use, I think Python is just so much nicer than bash.

    The main thing I like about bash is that it’s really convenient when you have a logicless list of commands to run, but when you start doing heavier programming, the syntax gets a bit annoying.

    1. The used language depends on the life expectancy of the script.
      If it is less than a year, today’s hot language is good
      If it is less than five years bash shouldn’t break in that time
      If it is more than five years, sh and C89 (no external libs allowed). The C source should be readable from the script for recompilation.

      If you need to go older system, Linux sed and awk for example are quite luxurious.

      Also writing some stdin to stdout processing software is better than invoking tens of simple programs. Just keeping the processing simple and wrapping it with sh will give flexibility. Also you get srand(), rand() for not so random stuff and arc4random() for more random stuff on newer systems.

      The only more irritating scripting problem than old script breaking by bit rot is old perl script breaking.

      1. Python should work for the foreseeable future with minimal maintenance, but yeah, it’s not perfect.

        I’m really surprised how casually most languages take backwards compatibility. All this Agile RefactorMercilessly doesn’t encourage stable anything.

        There should really be a decent high level language (Preferably just a fork of an existing one) that everyone just agrees they will leave alone.

        There are tons of file formats we can still read from the 90s, but for some reason OOP scripting languages seem to always want to shuffle it up constantly.

        Archival Oriented Programming would be pretty awesome.

        1. There already is: python 2. It is no longer supported and therefore forzen and therefore left alone. Python2 is not going away anytime soon simply because it is no longer updated.

          1. I can easily imagine quite a few distros ditching 2 in the next few years, but it should be pretty safe.

            Still, something like Wren+a very basic embedded style GUI might be a better choice. A decent modern language you can just compile yourself and include directly in the project, if you want to be really sure it’s going to stay around.

    2. “Interestingly, unlike shuf, mp3 player style shuffle is not actually random. If it were, you would get repeated songs more than you’d expect.”

      I don’t remember well but it was either early Winamp or Foobar that when playing “random” mp3 at some point was repeating constantly the same songs.

      1. A random shuffle does not give you repeats of songs back to back. It’s a permutation of the songs in the playlist; each one gets played once but the order is a random permutation. It has nothing to do with “truly random” or not.

  2. Do NOT use /dev/random for embedded systems. They do not generate enough entropy to keep your task from blocking and you do not need “randomer numbers”. A PRNG seeded with HWRNG and system entropy is more than unpredictable enough. I don’t care that GPG and Systemd do it, most kernel developers disagree with those decisions and they are debating changing this behavior because crypto operations in early boot keep hanging machines.

    1. Question: I’ve been intrigued with creating tiny Linux instances, like the Business Card Linux project. Can you pick a random source, urandom/random/etc. with something like BuildRoot?

      1. Generally you always get both and software gets whatever it asks for. If you have a supported CPU or TPM with HWRNG instructions setting rng_core.default_quality to something sensible like 700 or 1000 will cause the kernel to credit entropy from them and not block on random or uninitialized urandom.

        1. That’s neat, I wasn’t aware of that. I’ve been reading up on how Linux and micro-controllers handle proper “random”. Do it internally to a SoC or leverage some type of external entropy source.

    1. Here’s a good reason to use something besides urandom:

      You are writing a long-running unit test that requires random data, and you want the same set of random numbers each time so that your test yields the same result every time and is suitable for CI development. In this case you need to be able to seed the random number generator with the same seed every time, which is not supported with urandom.

      If you think this is some sort of degenerate special case then your tests are probably not very good.

      I’m sorry but most people tend to run their unit tests after the machine boots up

  3. “Who wants the same order of a playlist or slide show every time?”

    I do. Random playlists always seem to cause jarring mood changes and I can’t imagine how hard it would be to write a talk for a random slide show.

    1. Super annoying how every player wants to be “helpful” in tracking down every piece of music on any of your drives and finds an mp3 alert sound in an application’s directory and slips that in between your actual music. It’s getting so that I want to set up a different user for each genre of music and lock down the search to their own home/user directory.

      1. If it’s so short that it needs to be restarted daily then the songs would get old regardless of the order. Plus retail music is meant to be bad enough to annoy customers on the first play through. That’s why order doesn’t matter; it’s all the same rotten tripe and none has any mood other than anxiety that you might not get out of the store before the next track starts. Thankfully most retail has gotten rid of their music; silence is better.

    1. A good question is: Is it in POSIX? Which as far as I know the answer is no. And I don’t think it’s one of the almost-POSIX things like ampersand redirection either.

      Notably, it’s not in Busybox’s ash, so it’s probably unavailable in an initramfs or in resource constrained embedded environment. If you do a lot of system integration work, learning to use POSIX isms over bash or ksh isms is a useful skill because of this. Reducing the number of constructs in the language also makes it more orthogonal which may make scripts require less knowledge to read, even though it makes things a little more unpleasant to write.

    1. I’ve often wondered, and your query led me to actually looking for the answer.

      “Fu” derives from a Taoist concept of returning, particularly returning to basics. There is some implication in the Taoist concept of this being cyclical, which does not seem to apply to computerese. The computer use of “Fu” might also carry an implication of skill.

      In addition to these Linux-Fu articles which focus on basic Linux text commands, you might also consider GIMP’s script-fu, a scripting language for GIMP plugins.

          1. Etymology does not govern meaning, so that wouldn’t help. Knowing it is a loan-word used to mean “skill” is already to know the meaning. If the intent had been different, it wouldn’t affect the current meaning in this context.

  4. I am soooo new and sooo old. I started writing programs on green screen Commadore 64 then the Trash 80. Sorry TRS80. I can get buy enough if I read it. Prob is windows 7 isCapt. Indo not understand how to load bionic puppyLennox. I’ve tried several ways..nuttin. also issues the IPv6 No Acess Availabel. I’ve reset config, wonsoc, twice. I see the date back on oymt and drove it back a week seemed to fish iit. But now it’s been a week and it’s doing it again. Any help is appreciated. Ty.
    Zippydoo

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.