Unionize Your Variables – An Introduction To Advanced Data Types In C

March 2, 2018

Programming C without variables is like, well, programming C without variables. They are so essential to the language that it doesn’t even require an analogy here. We can declare and use them as wildly as we please, but it often makes sense to have a little bit more structure, and combine data that belongs together in a common collection. Arrays are a good start to bundle data of the same type, especially when there is no specific meaning of the array’s index other than the value’s position, but as soon as you want a more meaningful association of each value, arrays will become limiting. And they’re useless if you want to combine different data types together. Luckily, C provides us with proper alternatives out of the box.

This write-up will introduce structures and unions in C, how to declare and use them, and how unions can be (ab)used as an alternative approach for pointer and bitwise operations.

Structs

Before we dive into unions, though, we will start this off with a more common joint variable type — the struct. A struct is a collection of an arbitrary amount of variables of any data type, including other structs, wrapped together as a data type of its own. Let’s say we want to store three 16-bit integers representing the values of a temperature, humidity, and light sensor.

Yes, we could use an array, but then we always have to remember which index represents what value, while with a struct, we can give each value its own identifier. To ensure we end up with an unsigned 16-bit integer variable regardless of the underlying system, we’ll be using the C standard library’s type definitions from stdint.h.

#include <stdint.h>

struct sensor_data {
    uint16_t temperature;
    uint16_t humidity;
    uint16_t brightness;
};

We now have a new data type that contains three integers arranged next to each other in the memory. Let’s declare a variable of this new type and assign values to each of the struct‘s field.

struct sensor_data data;

data.temperature = 123;
data.humidity    = 456;
data.brightness  = 789;

Alternatively, the struct can be initialized directly while declaring it. C offers two different ways to do so: pretending it was an array or using designated initializers. Treating it like an array assigns each value to the sub-variable in the same order as the struct was defined. Designated initializers can be arbitrarily assigned by name. Once initialized, we can access each individual field the same way we just assigned values to it.

struct sensor_data array_style = {
    123, /* temperature */
    456, /* humidity    */
    789  /* brightness  */
};

struct sensor_data designated_initializers = {
    .humidity    = 456,
    .temperature = 123,
    .brightness  = 789
};

printf("Temperature: %d\n", array_style.temperature);
printf("Humidity:    %d\n", array_style.humidity);
printf("Brightness:  %d\n", array_style.brightness);

Notice how the fields in the designated initializers are not in their original order, and we could even omit individual fields and leave them simply uninitialized. This allows us to modify the struct itself later on, without worrying much about adjusting every place it was used before — unless of course we rename or remove a field.

Bitfields

The bitfield is a special-case struct that lets us split up a portion of an integer into its own variable of arbitrary bit length. To stick with the sensor data example, let’s assume each sensor value is read by an analog-to-digital converter (ADC) with 10-bit resolution.

Storing the results in 16-bit integers will therefore waste 6 bits for each value, which is more than one third. Using bitfields will let us use a single 32-bit integer and split it up in three 10-bit variables instead, leaving only 2 bits unused altogether.

struct sensor_data_bitfield {
    uint32_t temperature:10;
    uint32_t humidity:10;
    uint32_t brightness:10;
};

We could also add a 2-bit wide fourth field to use the remaining space at no extra cost. And this is pretty much all there is to know about bitfields. Other than adding the bit length, bitfields are still just structs, and are therefore handled as if they were just any other regular struct. Bitfields can be somewhat architecture and compiler dependent, so some caution is required.

Unions

Which brings us to today’s often overlooked topic, the union. From the outside, they look and behave just like a struct, and are in fact declared, initialized and accessed the exact same way. So to turn our struct sensor_data into a union, we simply have to change the keyword and we are done.

union sensor_data {
    uint16_t temperature;
    uint16_t humidity;
    uint16_t brightness;
};

However, unlike a struct, the fields inside a union are not arranged in sequential order in the memory, but are all located at the same address. So if a struct sensor_data variable starts at memory address 0x1000, the temperature field will be located at 0x1000, the humidity field at 0x1010, and the brightness field at address 0x1020. With a union, all three fields will be located at address 0x1000.

What this means in practice is easily shown once we assign values to all the fields like we did in the struct example earlier.

union sensor_data data;

data.temperature = 123;
data.humidity    = 456;
data.brightness  = 789;

printf("Temperature: %d\n", data.temperature);

Unlike the struct example, the value printed here won’t be the assigned value 123, but 789 instead. Since every field in the union shares the exact same memory location, any time one of the fields gets assigned a value, all other field’s previously assigned values are overwritten. For this reason, it rarely makes sense to have fields with the same data type inside a union, but instead mix different types together. Note that the data type sizes don’t need to match, so it’s no problem to have a union with, for example, a 32-bit and a single 8-bit integer, the 8-bit value is simply truncated if needed. The size of the union itself will be equal to the biggest field’s size, so with a 32-bit and a 8-bit integer, the union will be 4 bytes in size.

Using Unions

A union essentially gives one memory location different names and correspondingly different sizes. That might seem like a strange concept, but let’s see how that can be used to easily access different single bytes within a longer data type.

union data_bytes {
    uint32_t data;
    uint8_t bytes[4];
};

Here we have a 32-bit integer overlapping with an array of four 8-bit integers. If we assign a value to the 32-bit data field and read a single location from the bytes array, we can effectively extract each individual byte from the data field.

union data_bytes db;
db.data = 0x12345678;
printf("0x%02x\n", db.bytes[1]);

The actual output will depend whether your processor architecture is little-endian or big-endian. Little-endian architectures will interpret the array index 1 as the integer’s second least significant byte 0x56, while big-endian architectures will interpret it as the integer’s second most significant byte 0x34.

The same principle used to extract a byte works also the other way around, and we can use unions to concatenate integers. Let’s consider a real world example involving the ATmega328’s analog-to-digital converter. The ADC has a 10-bit resolution, and looking at its registers, the converted value is stored in two separate 8-bit registers — ADCL and ADCH for the lower and higher byte respectively. A struct with two fields named after those two registers seems like a good choice for this, and since we also want the whole 10-bit value of the conversion, we’ll use the struct together with a 16-bit integer inside a union.

union adc_data {
    struct {
        uint8_t adcl;
        uint8_t adch;
    };
    uint16_t value;
};

As you can see, the struct has neither a type name nor has the field itself a name, which lets us access the fields inside the struct as if they were part of the union itself.

union adc_data adc;

adc.adch = ADCH;
adc.adcl = ADCL;

printf("0x%04x\n", adc.value);

Note that accessing the struct fields anonymously will only work as long as there are no name conflicts. If there are duplicate field names, the struct itself will require a field name. Once the struct has its own identifier, we can also add a type name to the struct itself, which lets us use it also outside the union.

union adc_data {
    struct register_map {
        uint8_t adcl;
        uint8_t adch;
    } registers;
    uint16_t value;
};

union adc_data adc;
struct register_map adc_registers;

adc.registers.adch = ADCH;
adc.registers.adcl = ADCL;
printf("0x%04x\n", adc.value);

Once the register values are stored in the struct fields, we can read the full value from the 16-bit `value` field. Of course, it doesn’t require a union to combine those two register values, we could also just use bitwise shifting and an OR operation:

printf("0x%04x\n", (ADCH << 8) | ADCL);

Truth be told, there is actually nothing unique about unions. In whichever way you are using them, you could achieve the same with either bitwise operations or pointer casts. But that equivalence is exactly what makes them interesting.

Shortcuts with Unions

Let’s have another look at the previous byte-extraction example and see what other options we have to get a single byte out of an integer. As we remember, we had a union with a 32-bit integer and an array of four 8-bit integers:

union data_bytes {
    uint32_t data;
    uint8_t bytes[4];
};

The most common way to extract parts of any value is combining bitwise shifts with an AND operation, however, in this particular case, we can also cast the 32-bit value to a series of 8-bit values. Well, let’s just implement all of these options and see how that will look like.

uint32_t value = 0x12345678;
union data_bytes db;
db.data = value;

// shift one byte to the right and extract the LSB
printf("0x%02x\n", (value >> 8) & 0xff);
// cast to uint8_t pointer, access it as an array
printf("0x%02x\n", ((uint8_t *) &value)[1]);
// cast to uint8_t pointer, access via pointer arithmetic
printf("0x%02x\n", *(((uint8_t *) &value) + 1));
// simply take the union field
printf("0x%02x\n", db.bytes[1]);

Taking a closer look at the pointer casts, we basically tell that whatever is located in the memory address of the 32-bit value, is in fact a collection of 8-bit values. Now, applying the same terminology to the union declaration, we basically tell that whatever is located at the union‘s memory address is either one 32-bit or four 8-bit values, so just like we can do with the cast — except, with a union, we will be very explicit which one of those two types it will be when we access the value. In a sense, unions provide a shortcut to data type conversions, while at the same time making sure the data itself is used in a way that makes sense and is valid in its context, with the compiler keeping you honest. You could say that unions are to pointers what enums are to a bunch of preprocessor constants.

Looking into floating point numbers

Let’s have another example and explore floating-point numbers, IEEE 754 single-precision floating-point numbers to be precise — also known as a float. If you ever wondered what a float looks like to a CPU, just make it think it’s an integer. Obviously not in a “cast an int to float to remove the fraction part” way, but in a “raw IEEE 754 binary32 format” way.

union float_inspection {
    float floatval;
    uint32_t intval;
} fi;

float f = 65.65625;
fi.floatval = f;

printf("0x%08x\n", fi.intval);
// ..or then again with pointers
printf("0x%08x\n", *((uint32_t *) &f));

Both will output 0x42835000 which won’t tell us much without thoroughly studying the binary32 format, which is a combination of a sign, exponent, and fraction value with a standardized bit width. Recalling the concept of a bitfield, we can extend the union with a struct, helping us taking the binary32 format apart. For completeness, the same data is also extracted with bitwise operations as a non-union alternative.

union float_inspection {
    float floatval;
    uint32_t intval;
    struct {
        uint32_t fraction:23;
        uint32_t exponent:8;
        uint32_t sign:1;
    };
} fi;

float f = 65.65625;
uint32_t i = *((uint32_t *) &f);
fi.floatval = f;

printf("%d %d 0x%x\n", fi.sign, fi.exponent, fi.fraction);
printf("%d %d 0x%x\n", (i >> 31), ((i >> 23) & 0xff), (i & 0x7fffff));

I’ll leave it for you to decide which option is clearer to read and easier to maintain. Either way, the output will give us a sign value 0, exponent 133, and the fraction 0x35000. Following the format’s definition, we can construct the initial floating point number 65.65625 back from it. So if you ever end up analyzing some raw data dump or binary blob and come across a floating point value, now you know how to use a union to find out what number it represents.

That’s All Folks

There are two more things to worry about when using unions to peer inside other data types: endianness and alignment. Most computers and microcontrollers are little-endian, but watch out for Motorola 68k and AVR32 architectures which are big-endian. For performance reasons, different processors also like to align memory on 2-byte or 4-byte boundaries, which may mean that two uint8_ts might be located four bytes apart in memory. In GCC, you can use the aligned attribute to control this behavior, but you may be subject to a speed penalty and it’s beyond the scope of this article.

This concludes our expedition into structs and unions. Hopefully we could give you some new insights and ideas of how to arrange your variables, and some convenient alternatives to handle them. Let us know if you can think of other ways to make use of all this, and in what peculiar ways you have used or come across unions before.

61 thoughts on “Unionize Your Variables – An Introduction To Advanced Data Types In C”

Roman says:

March 2, 2018 at 10:19 am

The third code snippet, printing in the end: aren’t those “0x” nonsense?

Report comment

Reply
1. Erik Johnson says:
  
  March 2, 2018 at 11:02 am
  
  Notrly, the first 2 vars are printed in decimal, then that last one is in hex so to keep the output clear they prefix the hex with 0x
  
  Report comment
  
  Reply
2. Melvin says:
  
  March 2, 2018 at 11:06 am
  
  That’s correct given that the printf is not formatting the data as hex
  
  Report comment
  
  Reply
3. Sven Gregori says:
  
  March 2, 2018 at 11:13 am
  
  Whoops, you are right, I had a hex number printed there in an earlier draft and forgot to properly adjust it.
  Fixed now – thanks for pointing it out.
  
  Report comment
  
  Reply
F says:

March 2, 2018 at 10:19 am

Using a nameless struct inside a union is a nifty trick.

Report comment

Reply
sbrk says:

March 2, 2018 at 10:50 am

Unions were designed to save space, using the same memory to store two or more different types of data. They were not meant to be used to extract bytes, nibbles, or bits, nor to implicitly cast data. Using unions to do so is unsafe, as the compiler has wide latitude to implement the actual storage however it wants to.

Mis-using unions in these ways is non-portable, and will likely result in entertaining hours of bug hunting.

Report comment

Reply
1. jpa says:
  
  March 2, 2018 at 10:56 am
  
  Most platforms have a quite rigid ABI specification, so the compiler doesn’t have much freedom on how to lay it out. But yeah, the C standard itself makes no guarantees here.
  
  Report comment
  
  Reply
  1. sbrk says:
    
    March 2, 2018 at 11:15 am
    
    I’ve been bitten, more than once, by these tempting features.
    
    As follow-on advice: use the native register size [unsigned] int whenever you’re not dealing with a value range which is precisely the 2^sizeof(short) or 2^sizeof(char). Range-checking at the start of your API will spend many fewer cycles than the generated machine code and bus cycles to manipulate less-than-register-sized values. You might think you’re saving space, but it just isn’t worth it at today’s memory sizes.
    
    Report comment
    
    Reply
    1. Sandro says:
      
      November 15, 2023 at 4:24 pm
      
      Unsigned int is not guaranteed to be the native register size, just FYI.
      
      Report comment
      
      Reply
2. Joel says:
  
  March 2, 2018 at 11:24 am
  
  If you understand why padding from struct alignment borks portability inside a union, than it can be useful.
  Most C/C++ people will also avoid using direct in-line assembly as well, and when necessary wrap an abstraction with a meaningful comment explaining why it was done. Almost every modern mcu I have used will have a specific subset of special macros to handle platform specific eccentricities.
  
  The gcc tool suite may sometimes generate unoptimized binaries on some platforms (sometimes this feature is useful too), but it does support most modern processors rather consistently. I really am thankful industry decided to embrace an unofficial standard compiler after 35 years of pain-in-class compilers. =)
  
  Report comment
  
  Reply
3. bullestock says:
  
  March 2, 2018 at 1:05 pm
  
  And yet this trick has been used both on Windows (https://msdn.microsoft.com/en-us/library/windows/desktop/ms738571(v=vs.85).aspx) and older BSD versions.
  
  Report comment
  
  Reply
4. Julian Skidmore says:
  
  March 3, 2018 at 10:02 am
  
  Agreed.
  
  Report comment
  
  Reply
snarkysparky says:

March 2, 2018 at 10:52 am

Passing a pointer to the struct to get lots of info where its needed is a fine and dandy trick

Report comment

Reply
Adam says:

March 2, 2018 at 10:53 am

It’s important to remember that unions in C++ don’t follow the C convention. After writing to one field of a union, accessing the other fields is undefined behavior (although most compilers will implement them the same way as C).

Report comment

Reply
MattAtHazmat says:

March 2, 2018 at 10:58 am

Be super careful when using bitfields- when a bitfield overlaps a storage unit boundary, the behavior is implementation defined AKA: NOT PORTABLE.

Report comment

Reply
default_ex says:

March 2, 2018 at 11:40 am

For those using C# you can achieve a union using explicit struct layout. Mark the struct with a StructLayout attribute with the parameter LayoutKind.Explicit and then apply FieldOffset attributes to each field. The Field Offset attribute accept an argument for how many bytes from the start of the structure to offset to that field. No bit-level alignment like C/C++ but even byte level alignment is very powerful. Just take care during your order of initialization since the C# rule about every field of a struct must be initialized is still enforced for an explicit struct.

Report comment

Reply
Breettull says:

March 2, 2018 at 11:41 am

This seems like a good way to make nightmares for anyone trying to port your code to another platform some day. (including possible yourself!)

Report comment

Reply
1. F says:
  
  March 2, 2018 at 1:55 pm
  
  Not just another platform, I’ve had terrible problems with structure alignment on Solaris, when trying use gcc to build some code that simply refused to compile correctly with Sun CC. Small test programs can save you hours of head scratching when you’re trying to debug this stuff.
  
  Report comment
  
  Reply
Lord Nothing says:

March 2, 2018 at 12:13 pm

i do this with a lot of serialization code but you have to be careful transporting them between different architectures. like transporting a struct from an mcu to a computer did require bit shifting at one end to decode the data correctly.

Report comment

Reply
mime says:

March 2, 2018 at 1:56 pm

I’ve never seen bitfields in C.. Can anyone give an example which IDE for microcontrollers (PIC,AVR,MSP430) supports that?

Report comment

Reply
1. Moryc says:
  
  March 2, 2018 at 11:50 pm
  
  I use bit fields in a struct with XC8 in MPLAB X. It’s a great way to hold status flags that are less than 8 bits wide. This is also the way compiler makes it possible to use the names of individual bit fields of registers…
  
  Report comment
  
  Reply
HarveyBallWanger says:

March 2, 2018 at 2:15 pm

As a non C programmer this looks hideous. Nonetheless the “here’s a possibility” sort of article is always fun to read.

Report comment

Reply
1. F says:
  
  March 2, 2018 at 3:45 pm
  
  If you’re writing in some other language and you need to unpack structures that are passed back and forth from C code, you will most certainly have to learn how to do this stuff.
  
  Report comment
  
  Reply
smeeg says:

March 2, 2018 at 2:16 pm

You can learn a lot about a man by the way he pronounces “unionized”.

Report comment

Reply
1. Tim Trzepacz says:
  
  March 2, 2018 at 3:17 pm
  
  Un-Ionized?
  Union-Ized?
  Uni-Oni-Zed?
  Unio-Nized?
  
  Report comment
  
  Reply
  1. Rodney McKay says:
    
    March 7, 2018 at 3:29 am
    
    onion-ized
    
    Report comment
    
    Reply
2. Jonathan says:
  
  March 3, 2018 at 7:25 am
  
  And by the way he {mis}spells it :)
  
  Report comment
  
  Reply
pardobsso says:

March 2, 2018 at 3:05 pm

Nice article.
One thing (already mentioned here) that from time to time bites me it’s how structs are packed. This one from Eric Raymond covers many odd cases in more detail: The Lost Art of C Structure Packing

Report comment

Reply
Tim Trzepacz says:

March 2, 2018 at 3:26 pm

A few comments:
A lot of folks don’t like bitfields because there is no specification on how the compiler will pack them, so your code might not work correctly on other systems. As stated “some caution is required”.
They are also disliked because folks expect the compiler to make less efficient code to deal with them than they can code by hand. Maybe that concern has dissipated in this era of super-high speed processors, but for microcontrollers I imagine it is still important.
Finally, as specified they only work with int sized-items. Some compilers might work beyond the spec to allow packing into unsigned chars, and short ints, etc. but you can’t count on it.
Many programmers avoid them and stick with doing explicit bitwise operations so that they know their code will always work.
I like bitfields myself, as they make code much more readable, but I have to be very aware of how I am using them if I ever intend to port the code to any other platform.

Also, thanks for letting me know about “designated initializers” in C. I learned well before C99 and did not know that they were a thing!

Report comment

Reply
jibé says:

March 2, 2018 at 4:35 pm

This reminds me the REDEFINES clause in Cobol, but in C it seems to be there only to confuse coders and generate bugs.
Maybe because C is just assembly language badly written. Just joking…
Exemple in Cobol :
05 A PICTURE 9999.
05 B REDEFINES A PICTURE 9V999.
05 C REDEFINES A PICTURE 99V99.

This is clear to read and at least not compiler-dependant !

Report comment

Reply
1. Jonathan says:
  
  March 3, 2018 at 7:30 am
  
  Perhaps, but it does depend on whether you’re writing on a blackboard or stuffing your face :)
  
  Report comment
  
  Reply
ian says:

March 2, 2018 at 11:05 pm

I’ve only been program in C for a bit over 40 years (I still have an original copy of K&R on the shelf) :-), and I’d generally say that if are using the ‘union’ feature you either a) don’t know what you are doing b) your program is crap, or c) both.

If you want to treat a block of memory as a block of memory, use a block of memory. If you want to use a type variable, use one. If you want to convert it from one to the other do so deliberately so you get it right in the context that you are using.

The ‘union’ approach will just lead to bugs, side effects, and hard to maintain code.

Report comment

Reply
1. Moryc says:
  
  March 3, 2018 at 12:26 am
  
  Please, Wise One, enlighten me, how I should correctly solve a following problem:
  
  I’m writing for 8-bit micro so memory is accessed in single bytes. I have three short ints that hold 3 16-bit wide calibration values.set by user. I want to save them to internal EEPROM, but I can only write 8 bits at a time, and I have to read them in the same order upon reboot. So I packed them in a struct and unionized it with array of 6 chars. I then can write first char from array to first EEPROM address, then second to second, etc. I read it in the same order.
  
  O, Wise One, show me the correct way to do it. Enlighten me. Show me the way…
  
  Report comment
  
  Reply
  1. Redhatter (VK4MSL) says:
    
    March 3, 2018 at 2:31 am
    
    I hope WordPress doesn’t mangle this…
    
    struct config_t { uint16_t cal_x; uint16_t cal_y; uint16_t cal_z; };
    int eeprom_write(size_t addr, uint8_t byte);
    int save(const struct config_t* const cfg) { const uint8_t* ptr = (const uint8_t*)cfg; size_t sz = sizeof(struct config_t); size_t addr = 0x12340000; /* EEPROM address */ while (sz) { int res = eeprom_write(addr, *ptr); if (res < 0) return res; ptr++; addr++; sz--; } return sz; }
    
    Report comment
    
    Reply
    1. ian says:
      
      March 3, 2018 at 4:00 am
      
      nah, too complex. Too many variables as well.
      
      First problem is that you can only write one byte at a time. So you write a routine (you could do eeprom bounds checking if required etc) – and assuming you aren’t doing more than 254 byte writes, which you probably aren’t on a constrained system.
      So a simple one would be (and I agree this could be much more optimized if speed was a problem)
      
      bool write_eeprom_block(size_t addr, uint8_t data[] , uint8_t length)
      {
      for(uint8_t i=0;i<length;++i) eeprom_write(addr+i, data[i]);
      return true;
      }
      
      (given it does need error checking, better to have it return a true/false even at this stage).
      
      Then simply call it
      
      write_eeprom_block(addr, (uint8_t*) &cfg, sizeof(cfg);
      
      no unions to be found, works on ANY structure.
      
      Report comment
      
      Reply
      1. Moryc says:
        
        March 3, 2018 at 7:44 am
        
        So you replaced an union with explicit type conversion, which does exactly the same thing but without word “union”. And with pointers, which should be avoided at all costs. Besides this is rather application-specific thing and won’t be ported to other platforms without considerable rewriting…
        
        Report comment
  2. Redhatter (VK4MSL) says:
    
    March 3, 2018 at 2:32 pm
    
    So you replaced an union with explicit type conversion, which does exactly the same thing but without word “union”. And with pointers, which should be avoided at all costs. Besides this is rather application-specific thing and won’t be ported to other platforms without considerable rewriting…
    
    Indeed… using a union is no safer in this case. Plus, I really don’t see how you can write a non-trivial application without using pointers. Your entrypoint typically has the prototype int main(int argc, char** argv);; whoops there’s a pointer right there!
    
    Unless you live by declaring everything statically in one place where it’s all globally accessible (ugh!), you’re going to have pointers.
    
    Report comment
    
    Reply
    1. ian says:
      
      March 3, 2018 at 4:16 pm
      
      yes, pointers are much better (properly used) than union is.
      union has side effects. It is that simple. you update what looks like an variable, and another variable changes.
      You can do that with pointers too, but reasonable programmers don’t ie they don’t have two different pointers to the same object unless there is some type of management code..
      And even then, dereferencing a pointer is clearly changing something in memory..
      
      Unless you have a highly specific reason to use unions, I don’t think you should..
      
      Report comment
      
      Reply
      1. Redhatter (VK4MSL) says:
        
        March 4, 2018 at 5:09 am
        
        About my only use case for unions is emulating the “Variant” type in VisualBASIC; something like:
        struct variant_t { union { void* ptr; uint64_t uint; int64_t int; double dbl; uint8_t byte[8]; } value; uint32_t size; uint8_t type; uint8_t flags; };
        
        That’d allow for a number of common C data types, with a struct member that defines which of those union members is relevant and how to interpret it. i.e. if type == TYPE_CHAR; then ptr should be considered a char*. If something is horrendously big; you might use a bit in flags to indicate that size is measured in 16-bit words or 32-bit long words. If you’ve got a tiny string; you’d stuff it in byte and the type would be set accordingly.
        
        Yes, the struct is 16-bytes long, but it’d let you handle just about anything.
        
        Having said this, it’s rare that such a beast is needed.
        
        Report comment
2. Chip says:
  
  March 3, 2018 at 9:41 am
  “generally” may be true, as the vast majority of programmers do not “generally” deal with hardware registers or have memory constraints that limit how much storage can be utilized. I, too, have 40 yrs experience, with most of that dealing with hardware manipulation (CPU, PCI, etc.). Your comment about code being crap for using unions and structs is just plain wrong. Yes, there are appropriate times and places to use unions and structs, but to diss them wholesale is bogus; I use these constructs as necessary, not on a whim.
  
  Here is a snippet from Second Edition of [i]The C Programming Language[/i] by Brian W. Kernighan and Dennis M. Ritchie
  
  http://www2.cs.uregina.ca/~hilder/cs430-833/Reference%20Materials/The%20C%20Programming%20Language.pdf
  
  6.9 Bit-fields
  
  When storage space is at a premium, it may be necessary to pack several objects into a single machine word; one common use is a set of single-bit flags in applications like compiler symbol tables. Externally-imposed data formats, such as interfaces to hardware devices, also often require the ability to get at pieces of a word.
  
  Here is the definition of Intel’s IA32_MC7_STATUS register:
```
typedef union
{
    uint64_t Uint64;
    struct
    {
        uint64_t MCACOD          :16;    // bits 15:0
        uint64_t MscodDataRdErr  : 1;    // bit  16
        uint64_t RSVD_17_17      : 1;    // bit  17
        uint64_t MscodPtlWrErr   : 1;    // bit  18
        uint64_t MscodFullWrErr  : 1;    // bit  19
        uint64_t MscodBgfErr     : 1;    // bit  20
        uint64_t MscodTimeout    : 1;    // bit  21
        uint64_t MscodParErr     : 1;    // bit  22
        uint64_t MscodBucket1Err : 1;    // bit  23
        uint64_t MscodDdrType    : 2;    // bits 25:24
        uint64_t RSVD_31_26      : 6;    // bits 31:26
        uint64_t OTHER_INFO      : 6;    // bits 37:32
        uint64_t CORR_ERR_CNT    :15;    // bits 52:38
        uint64_t CORR_ERR_STATUS : 2;    // bits 54:53
        uint64_t AR              : 1;    // bit  55
        uint64_t S               : 1;    // bit  56
        uint64_t PCC             : 1;    // bit  57
        uint64_t ADDRV           : 1;    // bit  58
        uint64_t MISCV           : 1;    // bit  59
        uint64_t EN              : 1;    // bit  60
        uint64_t UC              : 1;    // bit  61
        uint64_t OVERFLOW        : 1;    // bit  62
        uint64_t VALID           : 1;    // bit  63
    } Bits;
} IA32_MC7_STATUS;
```
  Using the above named fields is far easier to read and maintain, and removes worry about which bits are being manipulated. A former coworker was not aware of bit fields in C, and thus used masks to access bits in registers, sometimes incorrectly.
  Report comment
  
  Reply
  1. ian says:
    
    March 3, 2018 at 4:28 pm
    
    I think you missed a bit of what I said – I in no way that stucts were bad! They are good! Use them everywhere!
    
    In your example, I see why you have used a union. The function is expecting a uint64_t, but it really isn’t one as it is a packed bit field. And you (I assume) are never going to manually change Uint64, you are just going to pass it to a function that you don’t control. If memory is super tight etc etc I can see why you would do that.
    
    Still, it would be better to encapsulate it and pass an address of the structure instead, it would be much clearer and less prone to errors..
    
    Though my viewpoint is significantly biased as I
    1) write code that some other idiot is going to have to maintain or change in the future.
    2) sometimes that idiot is me,as I am still supporting code I wrote 30 years ago…
    
    So things that reduce the chance of errors is good!
    
    Report comment
    
    Reply
    1. Moryc says:
      
      March 4, 2018 at 1:35 pm
      
      Every register access in world of PIC programming using XC8 uses the combination of register bitfields in struct unionized with name of register. So for example I can access a bit in a port using PORTABits.RA0 or write to entire port with PORTA. So if compiler and IDE use unions all the time, why it’s evil to use them in my code?
      
      Also to improve readability of code I use meaningful names and lots of comments. Anyone who says that code should be its own documentation is a moron…
      
      Report comment
      
      Reply
  2. Matt says:
    
    March 4, 2018 at 9:16 am
    
    This is precisely what isn’t portable.
    
    Report comment
    
    Reply
Chris says:

March 2, 2018 at 11:12 pm

I use unions for embedded code on microcontrollers with small amounts of RAM. I’ll declare a global array of unions, each element of which can be a single uint32_t, two uint16_t variables, or four uint8_t variables (I generally avoid floating point in my embedded code). Make convenient scratch-pad variables without having to declare them within each function.

Report comment

Reply
1. ian says:
  
  March 2, 2018 at 11:34 pm
  
  You won’t be having any floating point code if your worried about 8 bytes of ram for non overlapping scratch-pad variables! I must admit that it’s only been very recent embedded chips (last year or so) that I have ever been tempted to use floating point – and to be honest I still haven’t found anywhere I want to. The floating point library is just too big!
  So I get what you are doing, but the way I do it is to have just a blob of memory that every function knows it can use, but can’t rely on if they call anything else or exit.. The you don’t have to worry about accidently writing to one variable while whacking another (the union way..)
  
  Report comment
  
  Reply
  1. Megol says:
    
    March 3, 2018 at 7:57 am
    
    Floating point is useful for some data. Writing FP code is trivial in most cases (with non-IEEE floats) and take little space.
    
    And exactly how is your solution better than the union? Just use the union knowing data isn’t reliable if they call something else -> no problem, some advantage (nicer code).
    
    Report comment
    
    Reply
  2. Chris says:
    
    March 3, 2018 at 4:16 pm
    
    Everyone’s needs are a little different. To me, the primary advantage of floating point numbers is their ability to hold very very large numbers or very very small numbers, but I don’t need that for most of my code. I’d rather use a (u)int32_t and fixed point precision to make the math less computationally intensive.
    
    As for the union, I’m using it to cast the “blob” of memory you refer to as the kind of variable type that I need. It makes the code very clean. It’s always is clear from inspection how I am using the memory and I don’t have typecasts sprinkled all over the place.
    
    Report comment
    
    Reply
2. Julian Skidmore says:
  
  March 4, 2018 at 12:34 am
  
  “Make convenient scratch-pad variables without having to declare them within each function”
  
  So you’re doing by hand what the compiler does? On a proper CPU, a function’s local variable allocation is always ideal, because exactly the right amount of space is allocated as its needed, per function and only the required space that’s needed for live variables in the entire program or thread is currently occupied.
  
  On an 8-bit PIC, the compiler attempts to do this by creating a statically allocated stack, but variables from different functions at the same call level will occupy the same RAM addresses: they are part of a ‘union’. The only difference with doing it by hand is that you’re likely to make more mistakes.
  
  Report comment
  
  Reply
msat says:

March 3, 2018 at 12:04 am

I’m a C n00b and just came across unions and this name.variable stuff in some code I was looking at, which I didn’t understand and didn’t yet bother to look into, and then BAM!: HaD has a post on this very topic just in time! The comments have also been helpful for things I should look out for if I intend to write code that uses them.

Report comment

Reply
1. Redhatter (VK4MSL) says:
  
  March 3, 2018 at 2:33 am
  
  Yep, putting that secret remote desktop client on your computer sure did help. :-P
  
  Report comment
  
  Reply
  1. Ken N says:
    
    March 3, 2018 at 7:25 am
    
    I guess the tracker on my PC isn’t working correctly; I could have used some help on structs about a month ago.
    
    Anyway, this was a fairly meaty and interesting article. It has sort of scared me off of unions, though I am now aware of them, and in future I will likely fall across a problem where they will be useful.
    
    Report comment
    
    Reply
    1. Redhatter (VK4MSL) says:
      
      March 3, 2018 at 2:51 pm
      
      Hey, we can’t be watching *everybody’s* PC; there’s only so much monitor space around here!
      
      Report comment
      
      Reply
2. Chris says:
  
  March 7, 2018 at 6:39 pm
  
  One great thing about structs is using them for passing data between functions. Sometimes, especially when you need to update some sort of state variable, you don’t know at the outset everything that you need to include in that state variable. Structs (and to a lesser extend unions) to the rescue! Simply pass a struct or union (or better yet a pointer to one, since structs pass by being copied onto the stack otherwise) to your function. If you find that you forgot something, just add it to your struct definition and it automatically is passed along for the ride without having to modify your function definition (i.e. the parameter list stays the same); you just access the new members of the struct.
  
  Report comment
  
  Reply
Stajp says:

March 4, 2018 at 3:29 am

The only usable case I (would) use unions is when one function is a dispatcher for others, depending on the input data, without the need of using void pointers and guessing the data type/format. For example signalling between different layers of software.
Create a union of structs which all have the same format for the few of the first variables (eg. signalId, senderId, timestamp) and the rest completely different. Now the dispatcher can check those first values (due to same begining of stuctures they are always at the same memory addresses), cast the rest in the correct structure and call the needed function. Minimum use of memory and the type cast is always correct…

Report comment

Reply
Jake Brodsky says:

March 5, 2018 at 6:28 am

Most of you are writing as if the one, the only, and the primary concern of ALL C-language programming is to write portable code.

But sometimes, it doesn’t matter. I know, that sounds like sacrilege these days. But let’s get real. Sometimes you need throwaway interface code that does something very specific on a very tight platform.

Even when C was first written, it was obvious that there would have to be platform-specific things that would need to be rewritten. The goal of writing tight, memory efficient code was still very much a concern. That’s why the union keyword exists. That’s why C has all those wonderful ways you can shoot yourself in the foot with pointers.

Yes, like pointer math, unions are dangerous tools that can be used and abused very badly. But sometimes, you need them.

If speed and memory efficiency are secondary concerns and portability is the primary concern, then go use whatever high level language you want. The C programming language is probably not the language you should be using.

And if you REALLY don’t care about portability but want ultimate speed and memory efficiency, macro assembly language is rapidly becoming a lost art.

Report comment

Reply
1. Fred says:
  
  March 6, 2018 at 2:37 pm
  
  Hear hear!
  
  I write code for very memory-constrained microcontrollers, and sometimes I have to write assembler just to fit it all in. But when I don’t, I use C, and portability is a long way down my list of priorities. Tight (and readable) code is at the top of my list. And as you say, pointers & unions are two ways of assisting with that goal.
  
  Report comment
  
  Reply
2. Chris says:
  
  March 7, 2018 at 6:40 pm
  
  This ^
  
  Report comment
  
  Reply
Rafael says:

August 19, 2020 at 10:55 am

This code:

union adc_data {
struct {
uint8_t adcl;
uint8_t adch;
} <– here.
uint16_t value;
};

Is missing a ; there.

Report comment

Reply
1. Sven Gregori says:
  
  August 19, 2020 at 11:06 am
  
  Now that’s some determination!
  But yes, you’re right – thanks for that, fixed it.
  
  Report comment
  
  Reply
Todd Kroeger says:

September 23, 2021 at 1:43 am

Thank you Hackaday! I did a search to verify that an anonymous structure inside a union worked that way, but it took something like 20 nearly worthless links before I got to this one. Why didn’t I just search HaD first?

Report comment

Reply
luqmaan s says:

September 6, 2023 at 12:02 am

Fantastic introduction to advanced data types in C! I appreciate how you explained structs, unions, and bitfields clearly with practical examples. It’s an insightful read for anyone diving into C programming. Thank you!

Report comment

Reply