In the embedded world, twiddling a few bits is expected behavior. Firmware is far enough down the stack that the author may care about the number of bits and bytes used, or needs to work with registers directly to make the machine dance. Usually these operations are confined to the typical shifting and masking but sometimes a problem calls for more exotic solutions. If you need to descend down these dark depths you invariably come across the classic Bit Twiddling Hacks collected by [Sean Eron Anderson]. Here be dragons.
Bit Twiddling Hacks is exactly as described; a page full of snippets and suggestions for how to perform all manner of bit math in convenient or efficient ways. To our surprise upon reading the disclaimer at the top of the page, [Sean] observes that so many people have used the contents of the page that it’s effectively all been thoroughly tested. Considering how esoteric some of the snippets are we’d love to know how the darkest corners found use.
The page contains a variety of nifty tricks. Interview content like counting set bits makes an early appearance. There’s more esoteric content like this trick for interleaving the bits in two u16’s into a single u32, or rounding up to the next power of two by casting floats. This author has only been forced to turn to Bit Twiddling Hacks on one occasion: to sign extend the output from an unfortunately designed sensor with unusual length registers.
Next time you need to perform an operation with bitmatch, check out Bit Twiddling Hacks. Have you ever needed it in production? How did you use it? We’d love to hear about it in the comments.
I feel like a huge number of these are “barrel shifters are awesome.” Which of course means if you’re on an architecture that doesn’t have one, many of those trucks will actually be pretty bad.
Two words ;-)
“Hacker’s Delight” A great read from back in the day when there weren’t that many clock cycles to spare.
thanks!
If you scour the Java JDK source code you find many references to HD (“Hacker’s Delight”) pages. Although the mentioned website contains many even more obscure yet useful tricks like conditionally doing something without branches.
Yep, sounds like something I would love to read but totally unreadable on a phone.
Yep, one of the other great things we had back then were books… ;-)
An e-Reader can be a good alternative. At best that A4 sized one of a coworker. But that was really expensive, I think around € 700,-
Books were fine for their time, but they also have drawbacks: They can be heavy and lack a proper search function.
Excercise is good for you. Bookmarks…where did that name and concept originate?
A bookmark isn’t much like a search function. A good index might be enough, but not guaranteed.
Mosaic, wasn’t it? (c:
A similar page that I have found useful on a couple of occasions: http://aggregate.org/MAGIC/
Counting leading/trailing zeros are useful when doing fixed-point math.
So basically a C tutorial on doing the stuff that anyone who uses assembler does all the time.
Not the bit twiddling I expected.
You’re looking for Pornhub. It might help you lighten up a bit
Using an assembly language doesn’t give you magic knowledge, you need to learn these things as much as someone using C.
It does if you’re using a Control Data Cyber and you know about the “count bits” instruction :-)
// now an entry in my Storehouse of Useless Knowledge.
If you worked in COMPASS (the assembler for the CDC), you’d get plenty of bit-twiddling practice packing an unpacking 6-bit characters (12-bit if they were lower case) from those 60-bit words. You’d also experience 1’s complement arithmetic, with its distinct positive and negative zeros. You’d practice writing code where some registers can be used for fetching, and different ones for storing, but there is neither a fetch nor store instruction. You’d understand why it is sometimes useful to set an address register to the contents of itself!
It almost seems like they were trying to break all the rules, until you realize the machine was developed before the rules of modern byte-addressable machines became so widespread.
Been there, done that, got the brain-damage.
There’s also the free book (PDF) by Jörg Arndt with a chapter called ‘Bit wizardry’:
Matters Computational: Ideas, Algorithms, Source Code, https://www.jjj.de/fxt/fxtbook.pdf
in the early 1990s I discovered what was later named the “Carry Rippler”: a bit trick to iterate over all “subsets” of one-bits in a word.
void subsets(int v)
{
int u=0;
do printf(“%d\n”,u); while(u=u-v&v);
}
https://www.chessprogramming.org/Traversing_Subsets_of_a_Set
I once used it in a program to generate all addresses in a sparse table.
It is also used in the Stockfish chess engine: https://en.wikipedia.org/wiki/Stockfish_(chess)
Very useful, thanks for sharing this.
Counting set bits was an opcode on the old CDC Cyber 73 I first learned on. I never understood what it was for until much later, when I learned that NSA code breakers used it for cryptanalysis.
Heh. Ninja’d.
I doff my hat (with 5081 card tucked in the band) to you, fellow CDC Assembler programmer!
UMASS/Amherst, 1974, Cyber74
That’s a good read if you want your brain to hurt (or the one of your coworkers), but it remembers me this phrase “Do not optimize too early” or something. This does not mean “Do not optimize at all and use 100 frameworks”!
But if you like (or need to…) writing code for low-computing-power microcontrollers those things from the link might be useful.
Except for a low-computing-power microcontroller, half of those tricks result in *worse* code. Many for instance, don’t have a barrel shifter, so the “compute the sign of an integer” or “absolute value without branching” tricks are incredibly bad if the compiler’s stupid, although even in those cases if you cast it as a rotate rather than a shift down, it’d be faster. That is, instead of (mask = (v >> 7)) you do (mask = (v <>7) ). Sometimes the compiler’s smart enough to recognize some of these tricks. Most of the time not so much.
It’s unfortunate, because those tricks seem to be optimized for x86 – and optimization at this level for x86 has very little benefit outside of a few applications. And sadly there’s not a lot of explanation for many of them.
Okay, the rotate syntax gets borked here. Rotates are “or together shift down by 7 and shift up by 1”.
Yeah, you are probably right. I was thinking about AVR and indeed, no barrel shifter there.
We haven’t used anywhere nearly all of those but we’ve used a chunk, particularly ways of calculating parity very quickly for generating packets for communicating with the chips we designed. (Fun: programming a bitbang SPI interface using a digital pattern generator on automated test equipment.) I separately figured out how to calculate absolute values without branching while playing Human Resource Machine, a cute programming game on Steam.
Check out 7 Billion Humans. More of the same fun!
Exapunks is also fun.
There is a nice way to rotate 2D bitmaps without iterating over all pixels. I implemented it once 9 years ago for a b/w LCD display, but I can’t find the code. It’s similar to the algorithm used by OpenSSL for the initial and final permutation of DES:
https://github.com/openssl/openssl/blob/master/crypto/des/des_local.h#L160
PSA: if you do any gpu programming in any form, this page is incredibly useful! Especially the conditionally do something without branching. These aren’t dragons, they’re magic tunes to be used properly!
My fave trick that I didn’t see there but expected to: endian-swapping. Say you want byte k from a word of The Wrong Endianness, you can XOR the address with (byte address granularity – 1), i.e. 3 for 32-bit or 7 for 64-bit:
int input;
int *ip=&input;
unsigned char *cpi=(unsigned char *) ip;
unsigned char ch=cpi[k ^ 3];
There’s probably a better way but this has always amused me.