The X Macro: A Historic Preprocessor Hack

If we told you that a C preprocessor hack dated back to 1968, you’d be within your rights to remind us that C didn’t exist in 1968. However, assemblers with preprocessors did, and where there is a preprocessor, there is an opportunity to do clever things. One of those things is the so-called X macro, which saw a lot of use in DEC System 10 code but probably dates back even earlier. You can still use it today if you like, even though there are, of course, other arguably better ways to get the same result. However, the X macro can be very efficient, and you may well run into it in some code, too.

Background

Preprocessing used to be a staple of programming. The idea is that code is manipulated purely at the text level before it is compiled. These days, languages with a preprocessor usually handle it as part of the compiler, but you can also use an external preprocessor like m4 for more sophisticated uses.

Modern languages tend to provide other ways to accomplish many of the tasks handled by the preprocessor. For example, if you have a constant you want to set at compile time, you could say:

int X = 32;
y = X;

But then you’ve created a real variable along with the overhead that might entail. A smart compiler might optimize it away for you, but you can be sure by writing:

#define X 32
y = X;

A modern compiler would prefer you to write:

const int X=32;
y = X;

But there are still some common uses for macros, like including header files. You can also make more sophisticated macros with arguments so you don’t incur a function call penalty, although modern usage would be to mark those functions as inline.

The Problem

Which brings us to the X macro. With all great hacks, there is first a problem to solve. Imagine you have a bunch of electronic parts you want to deal with in your code. You don’t want a database, and you don’t want to carry a bunch of strings around, so you define an enumerated type:

enum parts { part_LM7805, part_NE555 }; // will add more later

Of course, you will eventually want to print them, so you do need to store the names somewhere, right?

const char *partnames = { "LM7805", "NE555" }; // will add more later

This is all fine until you add a new part like, say, a 2N2222. You must remember to update both the enumerated type and the string or havoc will ensue. This seems easy until you realize that you might define the enumerated type in a header file but only define the string array in a source file. It is easy to get them out of sync.

The Hack

The idea is to define a macro that handles all the definitions of parts in one place:

#define PARTS \
X(part_LM7805,"LM7805") \
X(part_NE555,"NE555")

Now when you declare the enum and the string array (which may not be in the same file, remember):

#define X(a,b) a,
enum parts { PARTS };
#undef X

#define X(a,b) b,
const char *partnames= { PARTS };
#undef X

If you carefully read the code, you can see how it works. The PARTS macro defines a list of items using the X macro. Before using the list, you define X to “select’ one of the pieces. The first #define makes X() return its first argument, and the second #define, the second. Because these preprocessor macros run before the code is interpreted, this causes the preprocessor to write the same code as in the original example. The advantage is that the ID and the name are joined together in the text which makes it harder to forget to add or update one when making changes.

Even Better

Using modern C preprocessor syntax, we can do even better by using token pasting and the stringize operator.

Here’s a quick tutorial if you haven’t encountered these oddball preprocessor operators. The stringize operator # converts whatever you put after it into a quoted string. The token pasting operator ## joins two tokens into one token. So:

#define print(str) printf("%s\n", #str);
#define declare(type, prefix, var) type prefix##var;

declare(int,global_,v);
print(Hello!);

Not that either of these are a good idea, mind you. But you can see that the declare macro will define an integer called global_v and the print macro will print the token that follows it as a string.

Consider this:


#include <stdio.h>
#define PARTS \
   X( LM7805, 0.20 ) \
   X( NE555, 0.09 ) \
   X( 2N2222, 0.03 )

// create enum
#define X(a, b) part_##a,
enum parts { PARTS };
#undef X

//create string table
#define X(a, b) #a,
const char *partnames[]={ PARTS };
#undef X

// create price table
#define X(a, b) b,
float partprice[]= { PARTS };
#undef X

int main()
{
  enum parts p=part_NE555;
  printf("%s costs %0.2f\n", partnames[p], partprice[p]);
  printf("%s costs %0.2f\n", partnames[part_2N2222], partprice[part_2N2222]);
  return 0;
}

Here, we define a table of parts and prices. (Made up prices, to be sure.) The enumerated type uses part_##a to create things like part_NE555. The string table uses #a to get a string “NE555” into the source code. Finally, the price table uses b.

Simple, yet effective. Sure, you could use a structure or an object to help. There are also plenty of other ways you could deal with this in the preprocessor. For example, you could define everything in one file and use #if to select what parts of it are included in different parts of the code. Regardless, the X macro is an elegant hack and it does solve the problem and has been since at least 1968.

The preprocessor can do some pretty amazing things. For example, we’ve built a cross assembler using it. We’ve even seen people do logic gate simulations in the preprocessor.

45 thoughts on “The X Macro: A Historic Preprocessor Hack

    1. I’ve heard this trope so often, yet many of the uses of macros are not covered by templates or constexpr and those bring new issues, build-times and debugging being only two of them.

      And what is the actual reason to “Avoid macros as much as possible” ?? What is the issue that this strategy tries to prevent?

      1. It’s C++, they have squirrely ideas about what’s good or bad.

        For example, they despise printf-style formatting, because:

        std::ios_base::fmtflags f(cout.flags());
        cout << setprecision(5) << setw(15) << number;
        cout.flags(f); // Reset cout to default

        is much better than %15.5f, because three whole lines to do something you can type in 6 chars is so much more self-explanatory, don't you agree?

        1. I’ve run into that too with overuse of objects (C++) and other ‘obscure’ features of C++ by some over zealous C++ users … just because they can. Instead of a few lines of code, they end up with a mountain of code and modules :) … But hey, it’s c++! Nothing against C++ users (I am one too) but with my roots in C, I like to keep it simple and straight foreword for the next guy that has to maintain the code.

          As for macros, my motto is to keep it simple and use only when makes sense.

          1. More terse isn’t always more good. But I absolutely agree that big C++ projects need to have style guides that lay out which subset of language features to use and which to ignore. Otherwise you have a team of regular debs, and then one dork who actually takes advantage of preprocessor and templates both being Turing complete. So 90% of the codebase is reasonable, and reading the last 10% is like getting a train run on you by the King in Yellow, Cthulhu, and the Watcher in the South.

        2. There are some benefits to the old C++ approach although in many cases they’re not worth the verbosity.

          The old C++ approach can be evaluated at compile time and a type-specific implementation could be inlined and optimized. I believe the C approach still evaluates format strings at runtime although I could be wrong.

          The old C++ approach does not require you to pick a code based on the size of the type you are printing, you just print it and the compiler chooses an implementation. The C approach requires getting the format string correct based on the type – do I use %f, %lf, %Lf, %d, %u, %ld, %lu, …? Maybe not so hard for cases where you have an explicit int or long yourself but what about system-defined typedefs that might be different on different systems? Though admittedly these days if you compile with -Wall -Werror generally the compiler will catch many issues.

          The C++ folks also think the old C++ approach is not ideal and added std::format in C++20 and are adding std::print in C++23. This will let you do

          cout << std::format("{:15.5f}", number);

          or eventually

          std::print("{:15.5f}", number);

          instead of

          printf("%15.5f", number);

          with (I believe) more compile-time benefits.

        3. I use both C and C++ for embedded. We do not even use C standard library for printing, but use some external library. This is due to stack and heap usage. It is often smaller in size and faster too.
          I prefer C printf over the C++ string streaming. There are now more convenient libraries in C++.
          I’ve been using C++ in embedded since C++17. Prior to C++11 the language lacked features that are very useful and embedded compilers lacked support.

      2. Cmake can add build-times and even commit hashes. It can generate header files for you.
        Macros cause a lot of headaches and are notoriously hard to debug. There are often better alternatives that also give the compiler the ability to do some checks and optimizations.

    2. Even a simple std::map can handle this in a cleaner approach. You could argue that lookup times on a std::map are expensive on embedded hw and that’s true, also related to the size of the map itself. One should always aim to use the better tool for the job. Not arguing for or against c++ but the lack of type safety in C has always been a problem. A lot of which go away with experience and time but we are prone to make mistakes

    3. Absolutisms like “avoid macros as much as possible”, like in real life, are ignorant in programming. Context is for kings.

      In application software, yes, one doesn’t tend to use macros because they are hard to debug and maintain and are not really worth the performance gain on modern hardware vs. that cost.

      But in software running on constrained hardware, drivers, or firmware, the stuff typically written in C, they are desirable because one tends to actually care about every grain of performance they can get.

      So going back to context, this article being about C and on a hardware hacking site, I’d say your sentiment is in error.

  1. I like this form:

    #define PARTS(X) \
    X( LM7805, 0.20 ) \
    X( NE555, 0.09 ) \
    X( 2N2222, 0.03 )

    // create enum
    #define PARTS_TO_ENUM(a, b) part_##a,
    enum parts { PARTS(PARTS_TO_ENUM) };

    because it allows to have predefined helper macros with descriptive names that can be predefined in a header file, instead of using name X from global scope.

    1. Really nice alternative. I will probably use it since I recently added some X macros to some greenfield code and this way probably is more self documenting and without redefinition of macros. Thanks!

  2. Don’t know… Instead of horrible nested macros (nothing wrong with macros but when they are too many and nested it’s difficult to understand) i prefer some Perl/Python/… that will read a simple text file and generate the needed C-stuff. Then you can just do a #include “parts_generated.c” (yes, .c) and it will work just fine.

    1. I prefer Lisp or Scheme where the language can easily interact with itself to generate these kinds of things in the normal language instead of through a preprocessor’s weird structure.
      I’ll never use Python for build tools again. It has been a special kind of hell dealing with version issues between different build environments, and python’s unwillingness to provide a forwardly compatible dialect of the language.

    2. This is also what I do.

      But that’s also saying that I’ve probably reinvented a good part of the preprocessor wheel in Python, simply b/c I didn’t know it.

      Stringize and token pasting were new to me, and I _know_ that I’ve written templatey stuff in Python to do something similar. Kinda fun to see how they used to do it back in the old days.

  3. For an example usage on modern code, take a look at the code in Flipper Zero’s scene-based apps. For some reason they decided to store handler pointers in arrays by handler type instead of grouping them together by scenes in structs, but this is how they populate those arrays.

  4. Is it just me and my old eyes, or are the directive lines (prepended with #) in the code examples in a particular dim color? Are they colored as if they were python comments or something? It would be less critical if they actually were comments, but they are important part of the code examples. (Also, as already noted, #defined constants don’t use the equal sign (=), which is what brought me here in the first place, but I like the redefinable X-macro hack. The enum/enum text string issue happens all the time.)

  5. Be carefull:

    Assuming that #define X 32 is similar to const int X=32 can be very misleading, one must know their compiler and ecosystem

    The C compiler I use often CCS C, that is drasticaly two differnt things.
    :: #define X 32 would be used like a find and replace in source
    :: const int X = 32, would cause the compiler to put 32 in code space memory (not ram)

    When I store a chrset in rom, I use: const chrSet0[size] = { …. };

    1. That’s true but the fact that it puts it in code is because it knows its is constant and that’s my whole point. Sometimes you really do just want text substitution at the point of use and it is nicer to not have to guess if the compiler will do it or not.

    2. >> const int X = 32, would cause the compiler to put 32 in code space memory (not ram)

      Actually, the compiler will use the value of X directly in any expression that X appears – unless “constant evaluation” (aka “constant folding”) is disabled. This means that Y = X; will compile as though it were Y = 32;

      On the other hand, if X either is a global variable or its address is used, then there will also be a constant stored in code space – something that can’t be done when X is declared using #define

      Another special case is:
      volatile const int X = 32;

      In this case, X is treated as a variable – a read-only variable. The volatile qualifier tells the compiler that X can be modified “externally”. This could mean that X is “mapped” to a read-only hardware register, or shared with another thread, or an interrupt service routine.

      Often, the declaration would be:
      extern volatile const int X;

  6. i love C preprocessor, i think it is a good balance between smarts and power. i’ve definitely seen a lot of other languages make far more complicated generic programming tools that are much less efficient and powerful and expressive. yeah, i hate C++. i’d go so far as to say the lack of the C preprocessor is actually my single least favorite thing about java. my one exposure to rust ran me into this awful hack, a real failure of generic programming, maybe the guy who came before me didn’t know what he was doing but i couldn’t stop thinking about how rust’s impressive type system fell victim to the worst of C++’s problems when it comes to trying to put together a generic type…the C preprocessor would’ve done a better job.

    the only thing “better” is forth, but it’s got its own challenges…to me, sadly, forth is a toy.

    but i have to say, even though i’m a “compiler guy”, i am still sometimes astonished by the C preprocessor. i have known the rules, i know where to look them up, but i’m still often surprised by the things that suddenly become possible iff you nest your macros. i can’t even summarize it well off the top of my head, i think it is something like macros within macro arguments get evaluated if you call another macro from within your macro. iow, the rules for when macro operands are evaluated make for some very powerful expressive opportunities. but the thing you wind up with doesn’t look expressive at all :)

  7. Oh my god please don’t use this in real code, just use a modern language. In nim:

    type Part = object
    name: string
    price: float

    const LM7805 = Part(name: “LM7805”, price: 0.20)
    const NE555 = Part(name: “NE555”, price: 0.09)
    const p2N222 = Part(name: “2N222″, price: 0.03)
    echo LM7805.name, ” costs “, LM7805.price

    And before you say “it takes up more memory because it’s a variable”, no it doesn’t because const is evaluated at compile time, so the compiler can just replace the parts in the echo with it’s value.

    1. you missed the point of the exercise. now create two different arrays, one array of part numbers ando one of part names. and do it without typing LM7805 twice.

      i’m not saying there’s not a case to be made for expressing yourself differently — i would say generally it’s true that if your macros are complicated then you should consider reframing your problem — but your example doesn’t accomplish anything the X macro here does.

      1. Nim and other modern languages have really good macro support. If you reaaaaaallly don’t want to type the name twice:
        “`
        type Part = object
        name: string
        price: float

        template createPart(partName: untyped, priceInp: float): untyped =
        const partName = Part(name: astToStr(partName), price: priceInp)

        createPart(LM7805, 0.20)
        createPart(NE555, 0.09)
        createPart(p2N222, 0.03)
        echo LM7805.name, ” costs “, LM7805.price
        “`
        I don’t see why you would do all that array stuff. But I guess my point is don’t do crazy crap in C unless you want bugs.
        (hopefully hackaday uses markdown?)

        1. Gaaah is it bbcode?

          type Part = object
              name: string
              price: float
          
          template createPart(partName: untyped, priceInp: float): untyped =
              const partName = Part(name: astToStr(partName), price: priceInp)
          
          createPart(LM7805, 0.20)
          createPart(NE555, 0.09)
          createPart(p2N222, 0.03)
          echo LM7805.name, " costs ", LM7805.price
          
        2. Does this technique let you put the tables in flash? I’ve reluctantly been doing the X macro method for years, because when you have fixed tables of over a hundred entries, changes can be a nightmare and error-prone if you have to type the information twice.

        3. “I don’t see why you would do all that array stuff.” like i said, it’s a good idea to examine your problem if you find yourself using complicated macros. but nonetheless and all the more, your example still doesn’t accomplish what the X macro accomplishes.

          it’s fine if you don’t like the X macro approach in this article but your nim example simply isn’t related at all. the whole point is to accomplish “all that array stuff”. i find a use for an approach like that one oh i don’t know about once a year. especially when making test cases, there are instances whene you want to do “all that array stuff.”

        4. heh and, on a hunch, i tried to run your example through nim. “apt install nim” worked ok. figuring out how to invoke “nim c buh.nim” wasn’t hard. but i get a bunch of error messages from glibc’s stdlib.h: __BEGIN / __END_NAMESPACE_STD, and __extension__.

          so, you know, i’ll see your “I don’t see why you would do all that array stuff” and raise you a “I don’t see why you would want to have a less-well-supported language toolchain” 🙂

  8. I definitely agree that having the preprocessor features implemented as part if the language is safer and more manageable. However there’s great power in being able to generate any construct with a preprocessor, and to do it in a straightforward manner. The overhead of doing some things in the language proper, both in learning to do it and in reading the resulting code becomes too much at some level, to the point of diminishing returns.

  9. Wow, thanks for the history lesson. I was familiar with this idiom but didn’t know it had such deep roots. I first saw it in the code for LuaJIT and have been using it ever since.

    As well as being good for initializing large arrays, you can use this to generate code just by using ; or && or || or + or whatever instead of the , in the X macro. You get the benefits of a data-driven coding style without the static (space) or runtime overhead of looping over an array. This definitely does take up significantly less space sometimes, depending on the use case and target architecture.

  10. Not knowing about this macro magic trick, I was the guy trying to understand and fix a large code base that used it extensively for state machines… ugh! A convenience editor like SlickEdit parses the code (called tagging) and should be able tell you where an enum like part_NE555 is defined and all the places where it is used, but fails out of the box :(

    The convenience of defining it in one place and avoidance of duplication errors is outweighed by readability / comprehension.

    Try a text search for “part_NE555” and you won’t find where it is defined in the example (line 4 and/or line 9). Now imagine a 1000 H and C files.

    It’s a nice trick but caused me a lot of mental pain.

Leave a Reply to CCancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.