What Is Entropy And How Do I Get More Of It?

Let’s start off with one of my favorite quotes from John von Neumann: “Any one who considers arithmetical methods of producing random digits is, of course, in a state of sin. For, as has been pointed out several times, there is no such thing as a random number — there are only methods to produce random numbers, and a strict arithmetic procedure of course is not such a method.”

What von Neumann is getting at is that the “pseudo” in pseudorandom number generator (PRNG) is really a synonym for “not at all”. Granted, if you come in the middle of a good PRNG sequence, guessing the next number is nearly impossible. But if you know, or can guess, the seed that started the PRNG off, you know all past and future values nearly instantly; it’s a purely deterministic mathematical function. This shouldn’t be taken as a rant against PRNGs, but merely as a reminder that when you use one, the un-guessability of the numbers that it spits out is only as un-guessable as the seed. And while “un-guessability” isn’t a well-defined mathematical concept, or even a real word, entropy is.

That’s why entropy matters to you. Almost anything that your computer wants to keep secret will require the generation of a secret random number at some point, and any series of “random” numbers that a computer generates will have only as much entropy, and thus un-guessability, as the seed used. So how does a computer, a deterministic machine, harvest entropy for that seed in the first place? And how can you make sure you’ve got enough? And did you know that your Raspberry Pi can be turned into a heavy-duty source of entropy? Read on!

Everything You Needed to Know About Entropy…

But first, to hammer von Neumann’s point home, here’s an algorithm for a PRNG that will pass any statistical test you throw at it: start counting up integers from one, hash that counter value, increase the counter by one, and append the new counter value to the previous hash and hash it again. The result will be a hash chain, a “random” looking series of numbers that is entirely predictable — that’s the “pseudo” in PRNG. If I knew you were doing this, and that you’d only used up 100 or so numbers so far, it wouldn’t take me long to do the same thing and figure out where you were in the series. I’d have a much harder time cracking your series if I only knew that you were using the same algorithm but started somewhere between 1 and 1,000,000 instead of at 1.

Claude Shannon‘s concept of entropy is essentially this: count up the minimum number of yes/no questions it would take to figure out whatever your secret is. That number is the same as the base-2 logarithm of the number of possible secrets that could have been drawn, or equivalently the number of digits it would take to write out the number of possible secrets in binary.

Let’s say you had started your counter in the hash chain at 1, and had used no more than 128 values so far. There are only 128 possible states that your PRNG machine can be in. Expressing 128 in binary requires seven bits, \log_2 128 = 7 , and if I started asking you if the number was greater than 64, and subdividing the remaining interval each time, it would only take me seven guesses.

Entropy is a measure of the number of possible choices from which our secret value could have been drawn, and it’s a way to measure hardness-to-guess, strength of passwords, and it’s what people mean when they say that some procedure generates “kinda random” versus “very random” numbers. A random number between 1 and 1,000,000 has just under 20 bits of entropy. (The English language has around 1,000,000 words — you should never lose at “twenty questions”, or use a single word as a password.) A good long-term password should probably have in excess of 128 bits of entropy.

Quick quiz: If you generate a 256-bit random number from a good cryptographic PRNG, it will have 256 bits of entropy, right? Wrong! If you didn’t pick a seed completely randomly from at least 2256 possibilities, your number will have fewer than 256 bits of entropy. So let’s go harvest some entropy.

The Deep End of the Entropy Pool

Turn on your computer, or fire up a microcontroller. How much uncertainty is there in the system? It’s running exactly the same BIOS as last time you booted up, and probably all of the same OS and higher code. A computer is a deterministic machine running predictable code. Maybe the wall-clock time is different, but that’s pretty easily guessable. This means that the most reliable source of randomness exists just where any sysadmin would tell you it does — between the chair and the keyboard.

Any Linux system, on a single-board computer or on your high-powered workstation, pulls entropy from the last few digits in the timestamp of interrupts fired off by keyboard, mouse, disk drive, or other hardware events. Assuming you can’t press keys repeatably with microsecond accuracy, this is probably pretty random. These different sources of new data get hashed together with old data, and the amount of entropy in the pot increases slightly with each stirring.

Estimating how much additional entropy is added to the pot each time is difficult, bordering on impossible, but the Linux kernel has been shown to make a conservative underestimate. (This used to be the way it was done — anyone know if this is still in place?) Functionally, however, each keystroke or mouse update adds about one or two bits of entropy to the pool. These add up over time, and the system also keeps track of how many bytes you pull out.  The result is like a gas gauge for entropy.

What’s My Entropy?

Anyway, it’s easy to see how much entropy you have available, and you can learn a lot by watching it go. Type cat /proc/sys/kernel/random/entropy_avail to see how many bits of entropy your computer has stored up right now. If all’s well, it should be somewhere near 3000 but below the maximum value of 4096 bits — anything greater than 256 is probably fine unless you’re going to be generating long SSH keys right now.

Now let’s watch entropy accumulation in real time. Type:

watch -n 1 cat /proc/sys/kernel/random/entropy_avail

to see your entropy estimate every second. Don’t do anything at first and you’ll see the value increasing slowly, if at all. That’s probably disk access or other system interrupts. Now type or move your mouse and watch the numbers jump. For instance, I had 2886 bits at the start of this sentence, but now I’ve got 3078, just from typing.

Now let’s see if we can use of some of that entropy. The first, and easiest way to burn up your entropy pool is just to dump whatever’s in /dev/random, the kernel’s random number generator. Type:

cat /dev/random > random_bits.bin or hexdump /dev/random

in another window and watch your entropy available drop to zero as random values are output to the terminal. Press ctrl-C to stop the madness, and watch how moving your mouse or typing on the keyboard will rebuild up the entropy. That’s it.

You should probably never do this in practice. Indeed, the whole point of gathering up entropy was to seed a good PRNG. Once that is done, there’s little point in pulling more entropy out of the pool unless you have reason to believe that your system is compromised. This brings us to /dev/urandom.

/dev/urandom

Now open up a browser to https://www.hackaday.com and watch the entropy available. The entropy probably won’t fall, even though I asserted that the SSL connection would require a secret random number to be generated. That’s because any modern browser uses /dev/urandom instead of /dev/random for its random numbers. urandom is a PRNG that’s periodically re-seeded from the system’s entropy pools when they contain enough estimated entropy.

Since the output of /dev/urandom is a PRNG, it’s only as good as the seed, but since the seed is drawn from the system entropy pool when it’s got more than 256 bits of entropy (as of kernel 4.8) you could hardly ask for more. And since this PRNG is re-seeded periodically when the entropy pool has enough in it, it’s very hard to exploit even if your computer is taken over by baddies.

/dev/random versus /dev/urandom is a bit of a holy war in the history of the Linux kernel, where the main difference to the end-user is that /dev/random will wait to deliver numbers until it has enough entropy, while /dev/urandom is a free-running PRNG that’s already been seeded from /dev/random, ideally with enough entropy. The one time that the difference between the two is really crucial is at boot-up, before there is enough entropy in the system to properly seed the PRNG in /dev/urandom. The rest of the time, just use /dev/urandom.

Raspberry Pi and the HW RNG

But consider the fate of a standalone, headless server (or a microcontroller for that matter) with no human typing or mousing around, and no spinning iron drive providing mechanical irregularity. Where does it get entropy after it starts up? What if an attacker, or bad luck, forces periodic reboots? This is a real problem.

The easiest answer is a built-in hardware random number generator. Once the purview of the paranoid, on-chipset hardware to generate random bits is now relatively common. Recent Intel chipsets support the RDRAND instruction natively, and even the Raspberry Pis have a hardware RNG onboard. Let’s test it out!

As above, you can check on your RPi’s available entropy with cat /proc/sys/kernel/random/entropy_avail, but let’s do something more dramatic. Try to get 1024 bytes of randomness from /dev/random, which only buffers up 4096 bits, or 512 bytes. Type sudo dd if=/dev/random of=/dev/null bs=1024 count=1 iflag=fullblock and wait a while. Then press Ctrl-C because you’re going to be waiting for a long time, even if you’re wiggling the mouse. Human entropy is good entropy, but it takes a long time to get 512 bytes’ worth. This is the situation on every reboot. You shouldn’t be generating SSH keys for a while.

Now let’s get the hardware RNG contributing to system entropy: sudo apt-get install rng-tools. You can tweak configurations in the /etc/default/rng-tools file. In particular, for a RPi2, you might need to uncomment the HRNGDEVICE=/dev/hwrng line. Otherwise, you should be good to go. Retry that command to dump 1024 bytes of randomness: I get 135 kB/s throughput on my RPi3 — it’s done in eight milliseconds.

Even with the hardware random number generator in place, the system will mix hardware and human entropy, which is great news if you don’t fully trust the silicon black box. In the config file mentioned above, you can tweak the relative percentages of entropy from different sources. Cool. And remember, the main point of something like this in a headless system is just making sure that it has sufficient entropy to seed the PRNG in /dev/urandom before programs start asking it for random numbers.

You can test the quality of the random numbers that come out of the hardware RND easily enough: rngtest -c 1000 < /dev/hwrng and waiting 45 seconds or so will run the FIPS 140-2 test suite on 2,000,000 bits of random data. (You will need to be root to access /dev/hwrng directly like this.)

 
root@raspberrypi:/home/pi# rngtest -c 1000 &lt; /dev/hwrng 
rngtest 2-unofficial-mt.14 
Copyright (c) 2004 by Henrique de Moraes Holschuh 
This is free software; see the source for copying conditions. 
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
rngtest: starting FIPS tests... 
rngtest: bits received from input: 20000032 
rngtest: FIPS 140-2 successes: 1000 
rngtest: FIPS 140-2 failures: 0 
rngtest: FIPS 140-2(2001-10-10) Monobit: 0 
rngtest: FIPS 140-2(2001-10-10) Poker: 0 
rngtest: FIPS 140-2(2001-10-10) Runs: 0 
rngtest: FIPS 140-2(2001-10-10) Long run: 0 
rngtest: FIPS 140-2(2001-10-10) Continuous run: 0 
rngtest: input channel speed: (min=476.082; avg=899.493; max=907.881)Kibits/s 
rngtest: FIPS tests speed: (min=7.080; avg=14.217; max=14.352)Mibits/s 
rngtest: Program run time: 23057351 microseconds 

It surely looks like statistically random data. Of course, so does a counter fed into a hash chain. But more to the point, it verifies that the hardware RNG is up and running, is not obviously broken, and is spitting out over 1 kB of random numbers every second. That should get the Pi up over the initial 256-bit hurdle at startup within a millisecond. What to do with all of the leftover entropy? If you’ve got a Raspberry Pi on your home network, you’ve got an entropy server for any other computers that might be running out. Or post the data to an MQTT topic on your home network, and you’ve got your own freaky numbers station. Need a one-time pad? dd if=/dev/hwrng of=hwrng-test-data.bin bs=1024 count=1024 will dump 1 MB of sweet randomy goodness to a file.

Takeaways

Pseudorandom number generators are deterministic functions, and they can only spit out random numbers with as much entropy as their seed. Getting 256 bits of entropy can be difficult, especially at boot time or on a headless computer with no human interaction. Almost all of the time, however, the Linux kernel has your back. It’s watching what you do, and adding your human randomness into the mix, and that’s probably the best source of randomness there is. This is what should be (and is) seeding PRNGs on your system.

For devices like the Raspberry Pi, there is also a hardware random number generator that can be used to accumulate entropy to partially make up for lack of human interaction. If you’re doing anything security-relevant on your Pi, install rng-tools and get it running!

55 thoughts on “What Is Entropy And How Do I Get More Of It?

  1. “”Getting 256 bits of entropy can be difficult, especially at boot time or on a headless computer with no human interaction.””

    If it’s a PC, maybe stick every ancient HDD you’ve got in it, between the times to spin up, the thermal recalibrations and reseeks due to flaky sectors, I think you’ll get a lot of entropy. Don’t have any data on them of course, but maybe write non-critical logs to them or something so they get accessed.

    1. I have often thought that you could produce a great deal of entropy by using the headlines from a number of sites that have frequent updates. News sites, blogs, Trump’s twitter feed, maybe a few others. Then the time of day on a remote server, adding in some network latency.

      I have not tried, but it seems plausible to me

      1. Weather maps might be a nice one, if you filter for the high-frequency information. Or pictures of the Earth from space, lots of clouds all being chaotic, and all sorts of levels of magnification.

  2. Without direct access to the pre-hashed raw data that is, or is not, being de-skewing prior to being hashed to generate the “1 kB of random numbers every second” for /dev/hwrng on the RPi you have no idea if the actual quality of the randomness is cryptographically good or not. Another issue is that you don’t know the exact source of randomness, you could assume it it is one or more avalanche noise sources or johnson–nyquist noise from a resistor or sampled ring oscillators.

    So as an additional source of randomness – yea thumbs up, as a 100% trusted single source of all randomness – NO thank you.

    1. 99.9% agree. :)

      Software entropy is auditable but slow. HW entropy is fast, but you just have to trust the manufacturers. The Linux kernel folks (particularly Theodore Ts’o) have been adamant about the auditability. Post-Snowden, post EC-RNG, post-whatever-else, I think that stance is probably justified.

  3. Elliot Williams – Very good article again Elliot!

    Sorry for being so fastidious or pedantic, but, When you say ENTROPY you need to add an adjective to differentiate it from Thermodynamic Entropy the original meaning of the word Entropy circa 1865 by Clausius . You obviously are referring to Shannon Entropy or Information Entropy (circa 1948).

    Sorry – but people use that word too cavalierly when they are not talking about Thermodynamics.

    1. I think both uses of the word entropy are equally common. Statistics has a weird habit of poaching terms from mechanical engineering whenever the formulas look similar, i.e. a “probability mass function.”
      I agree this was a great article!

    2. haha – you don’t think that the fact this is an article about computer generated random _numbers_ gives 100% of the audience enough context to know the difference?

      Maybe when I just said “computer” I actually meant “someone that computes”, I should have been more specific…

      1. Mark – I was not concerned with the audience per se only Elliot Williams. We “nerds” tend to poach words a lot. When we go into private sector jobs and start using our poached words to a DIFFERENT audience (that pays our paychecks?) we start to look benighted; like with the word ENTROPY. Imagine sitting at a engineering job with PHD’s and executives sitting around the conference table and we are spouting off our high tech ideas with all of our poached words. Well eyes start to roll and glaze over and the rest is history…

        I know you feel it is an common assumption to use poached words/terms that a close knit group like HaD is, however, we are an eclectic bunch with PHDs, engineers, etc. So in good practice we should modify our lexicon and poached words and at least have a sentence in the beginning or footnote explaining what we really mean i.e. Information Entropy. Not all of us here are nerds and geeks. We even have kids trying to learn something new. At least let them know the correct way to say something before poaching words.

        FYI – At one time in the recent past COMPUTERS did refer to human women calculators in high tech jobs. (*high tech is a relative term for the day). But now, you are correct, it is a common assumption that we are talking about a electronic box that does it. One day we will realize that A.I. is a poached term and understand what human intelligence really is and that A.I. in it’s present form is no where nearly equivalent to human brains. Faster and more volume yes. But our human neural network is much more sophisticated than any machine could ever achieve.

    3. Actually they are the same thing (barring a Boltzmann factor which gives the thermodynamic entropy proper units). In equilibrium thermodynamics one can count microstates and assign a probability distribution to the system. Then one can calculate the Gibbs entropy (which can be shown to be equivalent to Clausius’s definition see: http://bayes.wustl.edu/etj/articles/gibbs.vs.boltzmann.pdf ) which is mathematically equivalent to Shannon’s entropy. Interpretation of the entropy of a thermodynamic system is the following: if we want to know what the state of a thermodynamic system we either have to keep track of every particle or we can assign a probabilities to microstates. Through measurement we gain information about the system which constrains the possible microstates and their respective probabilities.

  4. FYI – You do not need solely PSEUDO random numbers to communicate with someone in a surreptitious electronic fashion. This method can be called asynchronous. However, you can use a truly random sequence if you do it in a SYNCHRONOUS method (IOW – both use same live source of RNs). But if an interceptor has access to same source it becomes compromised – a bit. For instance: if two or more parties are in the same local geographical area, linked by RF, LASER, or landline, using programmable voice scramblers, star charts to choose a particular star, and all are equipped with telescopes with photocell detectors on the eye-piece, the atmospheric effect of causing star twinkle can be converted into a true-nature-RNG. However, it fails when the parties are too far apart due to different atmospheric effects elsewhere on the planet.

    In WW2 the military used a highly secret and successful secure voice communication system that used synchronous true-RNs recorded on a phonograph record that all parties had to start at the exact same time and maintain synchronicity. It was called Project X, Green Hornet, or SIGSALY. It allowed POTUS to secretly talk to UKPM via HF RADIO in real time while all Nazis heard was a undecipherable buzzing sound like hornets. It was more secure than ENIGMA. The true-RNs were just recorded large mercury-vapor rectifying vacuum tubes (aka Valves in UK).

    http://www.upl.co/uploads/spy-vs-spy1509655259.jpg

    1. SIGSALY was revolutionary, but it suffered the problem of all OTP based crypto systems – the need to securely distribute the pads.

      For solving the problem of two heads-of-state communicating securely over an insecure medium, that wasn’t that big of a deal (one courier delivering a tamper-evident-sealed pouch full of records every so often), but it’s not something that could have been used in the same contexts that Enigma was (for example).

      1. Nick Sayer (@nwsayer) – I’m not sure they needed any couriers. I think when the field ops guys installed these behemoths, they brought with them a cadre of sealed phonograph discs to choose from. It was like choosing a whole filing cabinet of OTPs. Each one had a number stamped on the seal. They would probably transmit in the clear that number or they would use pre-scheduled OTP lists. The Nazis or the Japanese could do nothing physical about it as the SIGSALY locations were under heavy guard by armed soldiers. I don’t think there were any handcuffed couriers to nab by Axis spies.

        The Nazis tried to break SIGSALY but it was too hardened. You could only try to do it in the wild. There was no way to bribe/coerce someone to get you a disc. Even if you could get one it would be worthless as they used a series of encryption methods that today even stagger our minds how a 1930-40’s mindset could think through it. Of course people from HUT-SIX (Alan Turing et al) helped.

        Enigma was broken due to lazy Nazi end-users, theft of equipment, the Bombe, and a new technology (to them) called Colossus. SIGSALY was a totally different beast. You have to Google it and try and understand the technology used. Your brain will hurt after awhile and smoke will come out your ears. Makes you appreciate the math brainiacs of that day. Imagine what we could do if we had them in the private sector and not in Snowden’s old BLACK VAULT division in Maryland.

      1. RW ver 0.0.1 – Polaris’ (or any bright star – except Sol) “twinkle” would be different let’s say over Washington DC and simultaneously over Moscow. Why? because the Earth’s atmosphere (including pollution) with it’s cold fronts and warm air is what causes the twinkle effect. So my system depicted above would not work. However, if the two operatives were in the same town, city, county, or state, the twinkle effect would be pretty much the same at both locations. You might use a LASER to assist in lens deformations to compensate for atmospherics the way astrophysicists do but that’s a little much.

        I get your Wind Talker reference. It was a good idea for the time. However, you’d have to be prepared to stop him from being captured by the enemy. I guess you know what his handler had to do in case that was about to happen? The WT knew it too. :-(

        I also see the futility you are thinking of having two operatives in such close proximity to each other. However, just as the American poet Edgar Allan Poe said once, the best place to commit a crime is right in front of the police station, or hiding in plain sight. Can you imagine how much data you could convey to a partner in just such a configuration with just short range IR or RF communication toys made by Hasbro or Ohio Arts? Or ASL, just don’t face each other. Or putting up a QCODE on a wall which brings you to a One-Time-Existing encrypted website or is just the OTP encrypted QRCODE text? Or using a IR LASER FSO COMM system to aim at a centralized LASER portal for 2-way comm with an unknown partner at an unknown location.

        My latest idea I’ve been toying with is using a LASER FSO audio comm system modulated with FreeDV. Talk about a hardened intercept target. The steady PCM signal won’t kill the LASER diode either the way AM modulation tends to do sometimes. I might put that into .IO after I build a working prototype. :-/

  5. Meanwhile, a few years back, in the security department of a large games machine corporation:


    int get_random_number()
    {
    return 3; //Dice roll, garunteed randomness
    }

    Good write-up,
    The way I thought of randomizing the random seed and other randomification(via noise/entropy) was to check for a change in an unknown variable(IRQ, async data i.e. start of uart against a “when” attribute).
    Or like as though Schrodinger decided instead of opening the box to see if the cat exploded or not, he just simply threw matches at various boxes and waited for the key “Thud” sounds and then confirmed the cat’s deaths against when the guess started to when they happened would be somewhat random….

    So couldn’t the bit-wise entropy be seen as Schrodinger’s bits*?
    (*Innuendo not intended, no pun about remnants of cat intended either)

  6. My Orthrus project has a need for random numbers to generate the keys. The latest incarnation of it uses the SAMS70N19, which has a TRNG in it. The only problem is that there’s very little documentation on it beyond how to use it. There’s no particular assurance available that it doesn’t suffer from side channel attacks either to try and determine what values might have been recently generated or perhaps to try and influence the generation of new values. The chip isn’t specifically designed to be a security chip, per se, so one wonders how hard they worked to try and prevent those sorts of shenanigans.

  7. Dude what are you doing!? You’re gonna kill the HAD-store sales on 1TP books!

    Nice article but my only quibble is that having raised the difficulty of obtaining entropy in a microcontroller, you don’t propose any solution for us low-rent hackers. Sure, Pi is relatively easy, but what about the ardweenies who want *quality* random blinkenleds or us cheap f**ks using STM32 blue-pill boards which have enough grunt to do real crypto but can’t produce an unpredictable key without external help.

        1. Thanks for the link, very interesting, never heard of that particular technique before. So they are indirectly measuring the count of electrons overshooting/undershooting when charging/discharging capacitors towards a fixed midpoint voltage. And the primary source of the fluctuations is Johnson–Nyquist noise in the resistors (although there would be a mishmash of other noise sources contributing – shot noise, cross-talk)

        1. Very interested in this tack — would like to build a multi-source entropy machine to rule them all.

          But I’m not so sure that these devices aren’t running a PRNG internally. They are not specific about it in the datasheet, and say things like each random number output is guaranteed to be unique — which suggests a PRNG with a large period? Or am I reading too much into their vagueness?

          Of course, if you bought two or three of these chips, never let their ID #’s out, and combined them, it would be nearly foolproof. (Famous last words!) The chips are just a buck each, and support I2C. Should be doable.

          1. I’m pretty sure they *are* running a PRNG internally, but that’s OK for most of us as long as it’s seeded from a valid HW random source. I note there are some differences in phrasing on the ATECC508 vs ATSHA204A datasheets. The former claims to be compatible but the description provided is a lot less reassuring. I haven’t done any testing on their “random” outputs.

            I don’t think anyone actually needs huge wads of true entropy for cryptography, just a stream of externally-unpredictable+uninfluenceable bits based off a big-enough seed. For that purpose, a PRNG is fine (hence: urandom is totally fine) until the hash it’s based on gets broken and even then, there are ways of mitigating against hash breaks by discarding most of the PRNG outputs.

    1. On micros: much harder. Touché.

      Notable exception: the ESP8266 has a HW RNG on board. As always, HW black box, but it passes the sniff tests.

      Voja did this excellent project where he looked at the (random) state that RAM comes up in when powered on. https://hackaday.com/2015/06/29/true-random-number-generator-for-a-true-hacker/ The trick is that only some bits in memory are unstable, and you have to find them. Per-chip.

      Or make a ring oscillator out of some logic chips. But you run into the very real possibility of it syncing up with the micro through power rail glitches.

      Or any of the transistor- or zener-noise analog projects and run it into an ADC. Those also require clean power, but are a lot less sensitive than the ring osc method. Ironically, zener noise is minimized at around 3.3V, so it’s a horrible fit for microcontrollers…

      Having watched the ADC pins wiggle in the breeze, and seen the 50 Hz (here) power-line frequency show up prominently, I’m scared of using them for more than a couple of bits every once in a while — they’re too correlated at a fairly low frequency. Other people have used this method.

      So yeah. Sorry. That’d be a great “Ask Hackaday” piece!

      1. Of course as soon as I post that I don’t know of a good HW RNG solution for the STM32F103 chip…http://www.gniibe.org/FST-01/fst-01.html and http://www.gniibe.org/memo/development/gnuk/rng/neug.html

        Basically the scheme is to sample four ADC channels that should be tied to relatively fixed values — Analog VCC, temp, and two rails pulled up to (digital) VCC. The idea is that there is some intrinsic measurement noise, and that maybe it should be worth a few bits per sample. These are then repeated, pooled, and hashed.

        I have no way of evaluating this without building one, but it’s doable in a half-hour or so. I’d love to see the pre-treatment bitstream to get an idea of how much entropy there actually is there. Worth doing!

        1. I have tried something like this – measuring Vcc on an Arduino (I think). There is often not a lot of entropy available from measuring Vcc or internal temperature.

          Some STM32s have a built-in HW RNG peripheral but not the ubiquitous/cheap F103C8 :(

          Using CRC32 as a whitener is probably reasonable (and hey, free!) but not, I suspect, as a CSPRNG, which is what its role boils down to if your hardware RNG goes monostable.

  8. use a non ideal random input to shuffle a predefined array. shouldn’t the unpredictability approach perfection the longer the array is shuffled. The input would need to be a true random source such as the lsb’s from and A/D convert digitizing resistor or zener noise.

    1. The details at this level change often enough that I’m not even sure any more. Last I knew, it was SHA-1, but since that’s a bit old in the tooth, I’d imagine they’re on to something better. But it’s also hard to exploit, and the kernel really just needs whatever function to be good at mixing.

      FWIW, entropy used to be simply XORed in, but there was concern that an advanced attacker could manipulate one of the entropy streams in order to effect the outcome. So then it was switched to SHA to make that significantly harder. It’s still possible to brute-force a few bits (http://blog.cr.yp.to/20140205-entropy.html) if you control one of the streams, and this could be valuable for exfiltrating data out if you’re rootkitted, for instance.

      It’s there in the kernel source somewhere… It is all in principle knowable, but I don’t know.

      There’s also been a large change in /dev/urandom lately — I think it’s using the ChaCha20 cipher now, which is pretty much the new hotness.

      Anyone?

  9. Good article. I just wanted to add my foray int PRNG. It might prove useful to people.

    TLDR: Just because a number is deemed random enough, it does not mean every bit of the number is held to the same standard. IE. Number might be random, but (Number % 2), might not be.

    I was porting a maze generation algorithm from Linux GCC to Microsoft VS. It was one of the simpler ones with the project, and all it did was populate the maze with walls everywhere, and then break walls with a 50% chance, it solved the maze and checked the solution length. If the solution length was too short, the maze was deemed too easy and rejected. Another maze was generated and so on.

    On Linux, this went through approximately 20-30 maze generations before it got an agreeable one. The maze was pretty small, so this was over in a blink of the eye. Pretty small maze (16×16 was typical).

    I ported the code to Microsoft VS, and got it to compile. Then I was greeted with the program ‘freezing’. I went through debugging, and eventually found out that an agreeable maze was never found (over 100 K mazes were generated but rejected). I checked the throughput and it was generating in excess of 10K mazes per second, but none of them met the required length.

    The 50% chance of breaking a wall was done by checking if a number was even or odd (Num % 2). The person who wrote the algorithm surmised that the chance of a PRNG number being even or odd was more or less random (as I did too). This was before the C++ 11 standard that standardized this the PRNG algorithms (If you specified it). MS VS uses a linear congruential engine instead of what GCC used (if you look it up it’s a super easy way to generate PRNG for a linear congruential engine- 3 operations per number).

    I replace the code from taking the modulo 2 of the number to instead checking if the number generated was greater than half of the max possible number generated, and it worked the same on Linux and Windows. Apparently the individual bits of a pseudo random number generator cannot be assumed to be random since modulo 2 checked the LSB.

Leave a Reply to KlaymenDKCancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.