Modernizing C Arrays For Greater Memory Safety

Lately, there has been a push for people to stop using programming languages that don’t promote memory safety. But as we still haven’t seen the death of some languages that were born in the early 1960s, we don’t think there will be much success in replacing the tremendous amount of software that uses said “unsafe” languages.

That doesn’t mean it’s a hopeless cause, though. [Kees Cook] recently posted how modern C99 compilers offer features to help create safer arrays, and he outlines how you can take advantage of these features. Turns out, it is generally easy to do, and if you get errors, they probably point out unexpected behavior in your original code, so that’s a plus.

We don’t think there’s anything wrong with C and C++ if you use them as you should. Electrical outlets are useful until you stick a fork in one. So don’t stick a fork in one. We really liked the recent headline we saw from [Sarah Butcher]: “If you can’t write safe C++ code, it’s because you can’t write C++.” [Cook’s] post makes a similar argument.  C has advanced quite a bit and the fact that 30-year-old code doesn’t use these new features isn’t a good excuse to give up on C.

The biggest problem is something that has been around for a long time that C99 names “flexible arrays.” That is when you say something like: int bits[] or, historically, int bits[0]. These are really not arrays but pointers that probably point to an array of an unknown — to the compiler — size. Even worse is that many structures will have a flexible array at the end to indicate they are nothing more than a header to a larger data structure.

For example:


struct packet {

unsigned seqno;
unsigned len;
unsigned src;
unsigned dst;
byte data[4];

};

Given a pointer to this structure, you can access, say, data[20] and that’s not an error. Presumably, the len field tells you the size, but the compiler doesn’t know that, nor would it know if it is the size of the array, the whole structure, or something totally different.

There are several possible cases and [Kees] goes through them all. Well worth a read if you use or maintain C code that uses arrays. We look at some cases, too, especially with those tricky unions. While everyone likes to pick on C as being unsafe, it is pretty green.

75 thoughts on “Modernizing C Arrays For Greater Memory Safety

  1. Electrical outlets in the UK are still safe from sticking a fork in them, because the life and neutral holes are blocked by plastic until the ground pin (which is longer) hits its own plastic lug and pushes them out of the way.

      1. Modern fuseboxes contain RCDs – I think they’re required to – so that for things that don’t come with fuses built in, like cheap wall warts, there’s still a layer of protection. That fact has literally saved my life once.

      1. I saw a European guy on a train take out his Swiss army knife and stuck it into the sockets “earth hole” to open the plug and then forced an EU plug into the other two holes, he had to use quite a lot of force to get it in but he didn’t seem to care.

        I get he wanted to charge his laptop but taking out a knife and jamming it into an electrical socket on a train before ramming his EU plug into it isn’t the best idea.

    1. I thought you were going to say something useful! Funny thinking that ‘modernising’ C arrays could mean introducing some of the safety mechanisms that were included in Ada, 40 years ago.

      1. I saw the “great” Ada array checks that were implemented in the “original” Alsys compiler back in the 90s. It was “so good” it was turned off because it bloated the code so much it wouldn’t fit into the space built on the board. It also slowed down execution horribly. Intel attempted to fix that with its iAPX432 by implementing native instructions that performed range checks to reduce some of the bloat and execution penalty. But that processor died quickly as Intel ventured into RISC (via the i960 and ARM license, which were also both dropped quickly). Some of Ada’s benefits were from “self-documenting code” that really only consisted of often verbose variable names when many early C compilers only allowed short variable names (early DeSmet C and Dr Dobbs Small-C). Later generations of compilers did a better job of lots of things, including a lot of lint-like features, especially when you cross-checked the whole enchilada.

        Ada’s benefits were not nearly as good as the advantages of good code written by experienced programmers after going through real (not superficial or perfunctory) code reviews. A good code review is crazy hard, since every reviewer has to invest in the code to figure out what it does, but it benefits the program in that there are a lot of back up programmers that can jump in and finish or fix “unforeseen” code issues. Of course the program takes a BIG hit “up front”, but if done well, there’s tremendous payback in reduced backend costs that are way more expensive (think field updates that require expensive plane fare and personnel — yah, that got reduced once we could do remote updates, but that came later). While the “average” programmer is pretty crappy (too bad that’s the norm or normal in the Gaussian sense), superior coders are under appreciated and definitely underpaid, especially when most managers are average or worse programmers. The largest programming group I was ever a member of was about 20, and that was scary. I can’t imagine what it was like for those older (?) teams of hundreds to thousands of programmers (IBM, Microsoft, et al.). The best “team” I was on was me alone, for the system component that was assigned to me. That reduced the communications errors (OK, there were still system to SW issues). I worked on enough of the older systems to “know” what to look out for, areas that needed better algorithms, etc. And, the SW design really wasn’t slower than the older generation of products we had in spite of having more requirements and capabilities, better performance, and lower cost than the older systems.

  2. Comments like “If you can’t write safe C++ code, it’s because you can’t write C++.” show how naive and snobbish some people can be. When you have teams of developers all with different levels of experience working on a complex project with aggressive deadlines, unintended things are going to happen. The point of memory safety is to have guardrails in place so you can’t [easily] make these kinds of mistakes the soon become buried in a mountain of code that could later on have far reaching consequences.

    1. Fully agree. Some time ago I worked on a project for a company, and I used very strict memory management rules; and even with them, I had an odd memory bug. I had to build a python program that checked all the possible paths in each function to find where I forgot to follow them. It was very subtle.

          1. It’s entirely predictable behavior. Bounds check off? You get the contents of that space on ‘the heap’, type you asked for, negative indices etc.
            Test locally first, compiler implementations will vary. It’s been a long times since [ and ] were macros to pointer math.
            OS will help/stop to some degree. Depends how far out of bounds you ask for. Don’t do that.

          2. It is not predictable behaviour. Undefined behaviour by definition is unpredictable. Small changes to the source code elsewhere could lead the compiler to entirely eliminate the code that indexes out of bounds. Or it could segfault on one run and just return garbage in another run.

            You reduce the chances of unpredictable results by sticking a specific compiler version, but you don’t eliminate them.

    2. There are a LOT of “naive and snobbish” both for and against pretty much any programming language. The problem isn’t the language it’s humans. Humans are terrible and they will always be terrible so long as they are human.

      Also, “aggressive deadlines” pretty much ruin any chance of putting together good code.

      1. Reminds me of this old joke:

        “Doctor, it hurts when I touch this spot.. What can I do about it? – Then don’t touch it?”

        C always had been unsafe with memory. And it didn’t evolve for years for reasons I’ve never found out. Pascal was years ahead of C, despite being about the same age.

        I’m afraid the main reason why C (and thus C++) is so popular these days is because of Unix (and Linux) and its C dependance. And because Unix was cheap and thus thoroughly used in universities in the past. So ex-students kept thinking in C and carried the C syntax with them to their new jobs.

        Either way, I’m afraid it wasn’t popular because of its performance. It’s not a very good language, but more like a complicated macro assembler.

        That being said, it’s likely not easily possible to discuss such things from a purely rational point of view. On both sides the emotions are running high, maybe.

        It’s as if some naive person dares to criticize beloved FPS classics like Doom, UT or Castle Wolfenstein for being violent. There will always be defenders who say that these games with pistol holding protagonists are peaceful and a great contribution to world peace. Critics who argue that exploding skulls aren’t peaceful will then be called h*ters.

        About the same discussion, just less intense, is being happening with the safe/unsafe discussions of C/C++ and similar languages. I’m afraid there are so many fans who couldn’t let go of C/C++ and thus take any criticism as a personal attack. Especially the Linux scene is known for their rather, uhm, rough mailing lists. ;)

        1. In the 70s, when C was coming of age, the state of optimizers for compilers was fairly primitive on anything smaller than a mainframe. C was close to assembly language; as you note, better than a macro facility for assemblers but not by too much. And this was true on machines even as big as a PDP. C let the programmer *be* the optimizer (see “register” storage class, now usually ignored).

          Eventually Pascal would catch up, but by then it was too late. ISO “Standard” Pascal largely formalized some of the “extensions” that commercial compilers had added to stay relevant. A good part of the reason was because of the calling conventions of IBM PC BIOS and of MSDOS. As late as Windows ME, the header files were still littered with “_pascal” in reference (deference? [grin] ) to this.

          It was into the early 90s before compilers on “small” machines (PCs, maybe the SunOS 4.X era) could consistently beat even average C programmers at things like register allocation. Better Hackers ™ could even exploit cache layout by knowing the order that variables would be assigned in and specifically how they would be padded.

          As long as you follow the MISRA C guidelines, you’ll be fine. It’s more or less “the subset of C using only features available in Fortran 77”.

          1. As has been pointed out previously in such discussions, when the language is not safe, and you use libraries, there will always be unsafe constructs, and you cannot review them all in detail.

            Being very disciplined yourself will not fix how well those libraries are coded/designed, nor how well other people in general follow standards.

            But since this goes far beyond coding guidelines it is a major issue.

          2. And nothing changed about the part where C and C++ both still only shine when programming with the specific hardware implementation in mind. The only thing that changed is that nowadays you can at least account for the optimizer to do a more or less well known set of transformations for you. But effectively you still need to know all of them, and most of the STLs implementation details to really cash out on the work already done for you.

            And counting how often you end up double checking with godbolt whether all your target compilers did get the hint … It’s still just a higher level assembler frontend after all.

            It only starts getting weird whenever you treat C or C++ as if it was a substitute for a higher level language, and start introducing new convenientions for safe patterns which any true higher level language would actually been able to enforce.

            And yet – Visual C++ and alike just didn’t catch on due to simply being far too late available for the full range of platforms, and until Rust we honestly didn’t even get any modern alternative which didn’t need to ship with a massive, embedded unfriendly runtime.

        2. Your Unix/Linux connection isn’t very sound.

          For starters, Windows was the dominant operating system for at least 3 decades. Why didn’t C evolve? Well it did actually, but Microsoft didn’t update their compiler past C89 for 25 years. It was the dominant system and compiler, so that’s the standard you had to code to.

          But whether it’s Windows or Linux (or Apple OSes, Android, etc), at the bottom of all the APIs, SDKs, interpreters, runtimes, and compilers; to interact with the OS and hardware, a call is most likely made to a function written in C.

          Furthermore, most Universities haven’t taught C in general programming classes since at least the mid-late 90s. In 1999 when I started college, it was Java or *maybe* C++. You didn’t use C unless you were in systems or embedded classes.

          That is why it sticks around. There is nothing mature enough to handle these use cases. Rust is trying to be that replacement, but any moderately close to metal code is so full of “unsafe” rust code that it loses all claim to memory safety.

          C is not the right programming language for everything, but until something else comes along C is the right programming language for what it’s good at. Most software can and should be written in something else: Rust, Go, Python, Haskell, or even Javascript. On the flip side: some prominent applications written in those languages should definitely NOT be.

          The fundamental error a lot of people make is thinking that one programming language is better than some other language. It’s nonsensical. The “best” programming language is the RIGHT language for the problem you’re solving.

          I think that most programmers should learn C because a) it is one of the best analogs to how the computer actually works (which is important no matter the language) and b) there are tens of billions of lines of C code that need to be maintained and updated. (Not to mention that it is a defacto standard programming interface for libraries of all sort).

          If we produce a generation of programmers who all rely on high level languages to make “good software”, then will we have the low-level knowledge to make the next compilers and runtimes they rely on?

          1. The foundation for Apple’s first OSes was actually Pascal and a lot was written in Pascal, and as others mentioned the Windows API had Pascal calling conventions. In the 80s and 90s Pascal was very popular on Windows, but barely on Linux.

            Linux/Unix was always very very much more biased towards C/C++ than Windows. A lot of games and commercial apps were developed in Pascal.

            The C culture is very apparent throughout Unix/Linux and all its helper languages like Bash/Csh, Perl, Tcl/Tk, even Python was initially developed because C was so awkward to prototype in. Python until 3.0 was also almost unusable under Windows as long as your username had “special” characters, since it never could properly manage character sets (console vs GUI etc).

            You can clearly see the difference in software design when it originates from DOS/Windows vs. Unix/Linux, with the latter being focused on smaller tools implementing a small function, and the latter focusing on a whole program based on the user POV of a whole function. IDEs were therefore also typical on DOS/Windows far before they appeared on Linux, actually Pascal might have introduced the first IDE.

          2. People seem to forget how many embedded systems they use in their everyday life and just about every program run on them will require direct register access which is something that memory safe languages do not like or do not allow so in order to write that program you have to use workarounds to let you do it and that removes the whole point of using the language in the first place, you lose the aspect of it being memory safe.

            If you are fighting with the programming language or compiler to get a main feature of your code to work or have to use a lot of workarounds, you are using the wrong language.

        3. C is as widespread as it is due to a lot more than Linux, that is just a small part of it. Just about every embedded system in the world uses C. Why? It has low overheads and runs fast and doesn’t create large programs. Embedded programming also often needs direct register access which is frowned upon in a lot of these memory safe languages. It is far simpler to program a lot of embedded devices in C than in other languages, especially python or more specifically micropython. There isn’t support for many other languages, the only ones that are supported by the chip manufactures tends to be C/C++ and sometimes micropython but it is not a great language.

          The whole argument against C in favour of other languages sounds a lot like the arguments for high level languages instead of HDLs like verilog or vhdl, the higher level languages don’t offer the same performance or control which is what is needed in these systems. As with HDLs it is researchers that push these new languages not the people who actually use the languages. In a lot of cases these new languages start out as someone’s masters project or PhD when they are really just looking for something to do for their project.

      2. So tired of old is bad just because it’s old. C code is going to outperform any other “modern” bloated slow, garbage collection bollox language and for good reason, you have full control and no guard rails = no extra range checking code inserted slowing things down and that was a big deal in the old days when CPUs were slower and it made a difference. C / asm is great for anything where you need to get the most out of the CPU. You also need to know what you are doing and what’s really going on on the CPU. 30 years ago there were many coders like this. Now, I am up to my eyeballs in developers who don’t even know what a bit is.. standards dropped because of all the hand holding and lowering the bar so even idiots can write a memory guzzling kludge of an app or backend that requires ridiculous resources to do trivial tasks … now this resentment against C/C++ is the only way this ilk of “programmer” can deal with not being competent as a programmer. C++ frankly have the right to be snobbish if they know what they are doing.. they are better coders than you are.

        1. Computers are no longer faster PDP-11s, and the C model of a machine is woefully obsolete now. https://queue.acm.org/detail.cfm?id=3212479 Even the assembly language model of an x86 bears little relation to its internals any more; processors are now built to interpret a four-decade-old processor’s instruction set as quickly as possible, rather than to be programmed directly in anything reflecting their real architecture. Indeed, one directly relevant change is that speculative execution means you can usually interleave things like bounds checking with actual execution for free, pretty much – the pipeline won’t be fed quickly enough to be doing anything else anyway, and the conditional jumps in the failure cases will be predicted to fall through.

          Standards haven’t dropped; complexity has increased – by a power law! Every generation of CPUs has different optimisation requirements, all of them more complex than the average chess game to think through, almost to the point where as soon as someone has a grasp on one it’s time to send it to the recycler. And that system complexity has begat software complexity, as everyone now has to cope with trying to build cathedrals on endlessly shifting sands. Sure, thirty years ago, a lot of coders could keep an entire system in their heads – only because systems were simple, and small, enough that that was possible! Those days are gone, long gone, and outside of societal collapse (and microcontrollers, where simple architectures, short pipelines, tiny interrupt latencies and single-cycle memory still rule the roost… for now) they’re not coming back.

        2. You see it a lot with FPGA and HDLs now too, software engineers think it is just a programming language and can’t understand why they can’t just treat it like any other programming language despite not knowing a single thing about digital logic or how it actually works. Then they go and try and make high level languages and they all lack the basic features needed in a HDL, they lose all control over timing and the detailed workings of the logic they are trying to create. If you ask them though they will say it is better and faster than normal HDLs despite then having no idea how digital logic works. Their high level languages create the worst looking HDL code you have ever seen and are very poorly optimised and their timing is all over the place and impossible to change, it works fine for the few basic projects they try it out on and then they think it is a replacement for the industry standard HDLs when in reality if you try to use their HDL for anything more complex than a beginner level project it completely fails to produce a functional design.

          The problem is people trying to improve things they don’t understand and they try to make it easier for themselves by adding more and more layers of abstraction rather than learning how it actually works. It seems a big part of that is masters or PhD students that are just looking for something to do for their project without actually trying to solve any problems, then people like the idea of their language and take it a lot further than it was ever meant to go and then we end up with hundreds of languages that can’t compete with the ones that people in industry actually use but they try and push their new languages on everyone anyway.

    3. You stated the inherent problem of your program: “..teams of developers all with different levels of experience..”, but you never stated if the management required code reviews. Well, did they? Even simple tools like Lint help and generally mandated pre-code review (and the review should come before a test release). These are often inexperienced management issues more than inexperienced team members. And it’s crazy for “upper” management to dictate rules or the lack of them when they are clueless as to how any of it’s done. I’ve been a member of too many teams that went up in smoke after releasing something prematurely under poor management. That’s when the entire team, good and bad programmers can get burned altogether.

      I know that unskilled management will kick code through or step-over code reviews at their own risk too often (and then blame it on “inexperienced programmers”). If an inexperienced coder is shown what I did wrong, they generally learn from that and become better. But, tight schedules is a “boss” thing, where they “know better” or really don’t give a crap about it with attitudes like “we can fix the bugs after a release” (think, Microsoft and almost any version of Windows). How is it possible that we still have day zero exploits when we know from a post-mortem analysis what type of bugs endlessly happen over and over again.

      1. You shouldn’t have to focus on such issues in code reviews, they are meant for higher level design issues, not for literal micromanagement…

        Next we’ll argue that a bad compiler that ignores syntax errors but by luck still produces working code is fine, since the cases when it wont can be catched by code reviews.

        The point of tools is to let you focus on the things that matter, not have to compensate their lacking quality in design.

    4. All languages have their own problems. Those same developers will write bad Java, C#, JS although they are “safe”. In SW projects the problems are elsewhere. I would always take modern C++ over C# or Java.

      1. I’d argue that a language such as Rust will make you think about memory management in a much more structured way, and will therefore lead to better designs.

        Especially C++ is very convoluted when it comes to memory management and there are many implicit rules to follow and special constructs, while those are explicit in Rust.

    1. I know whst you wrote was meant to sound funny and ironic, but..

      RAD IDEs like Visual Basic Classic or Delphi did a great job at prototyping.
      Code isn’t everything, we know. :)

      An application made by a humble VB6 programmer, who is an expert in his field (astronomy, botany, model making etc) can sometimes fulfill the purpose better than a professionally made application by an excellent coder who has no idea about the field. Because, the VB6 dude knows the needs of his/her colleagues.

      It always depends.. Sometimes a quick&dirty protype can last for years.

      There’s an old saying that supports this.:
      “nothing lasts longer than a temporary solution”

      It’s like with teeth. A provisional filling may last for years.

      In an ideal world, both the VB6 programmer and the C/C++ coder dude would work together on eye level and produce an awesome final application. As team mates, so to say. 😎👍

      1. “An application made by a humble VB6 programmer, who is an expert in his field (astronomy, botany, model making etc) can sometimes fulfill the purpose better than a professionally made application by an excellent coder who has no idea about the field. Because, the VB6 dude knows the needs of his/her colleagues.”

        hahaha it’s true. i don’t disagree. but that word “sometimes” is doing a lot of heavy lifting. anyone who has ever been the hired computer geek to come clean up some science code written by a grad student years ago….iykyk :)

        1. FORTRAN goto nextIterStart
          nextIterStart contains a line#…didn’t know it was possible.
          Search for nextIterStart…115 occurances…
          PigFers are adding/subtracting from it, right out in public!

    1. Oh, come on, he wrong about snprintf() not respecting size, he’s wrong about _countof() being part of the standard, he’s wrong about why strtok() is unsafe, he’s wrong about why strtok_s() is safe… Don’t you just love when people give you incorrect advice? ;-)

      1. C was not my first programming language. That would be either FORTRAN IV or BASIC+Plus on RSTS/E on a PDP-11/40. After that, I learned PDP-11 Macro Assembler, lisp, ALGOL60, PL/C, PL/1, PASCAL (wrote a compiler for it in a combination of FORTRAN IV and assembly), COBOL, and BLISS, along with a few other assembly languages prior to learning C in the early 1980s.

        I was a software engineer long before I learned C.

        I was involved in the ANSI committee that drafted what eventually became the C89 standard while working for a company developing and selling C compilers for multiple architectures. At the time, I was convinced that Modula would become a major player in the computer programming world – Turbo-PASCAL was the most popular compiled language on the IBM PC.

        Nonetheless, I became a C programmer and have remained so. I can program in C++, JavaScript, Python, tcl, and even Ada, but I find that C allows me to deal with data structures directly, and for what I do (embedded platforms, safety and security critical, performance sensitive, limited memory, and network communications) I find C to be the best tool.

        It is possible to write safe code in C. If you use proper pointer and array hygiene and are scrupulous about bounds checking at the public interface layer, you can even write safe libraries. I have written C code for DO-178B level A avionics where we had to write our own runtime support library for GCC so that the source corresponding to every executable instruction was under our control and the MCDC coverage could be analyzed.

        You pretty much know exactly what you are getting with C so long as you understand pointers, arrays, structs and unions. Knowing how to use them safely is vital to producing correct code.

        I am often surprised when I rediscover that many software engineers I work with don’t really understand how data is arranged in memory, how pointers to structures work in C, or even how to construct linked data structures. I am often the only person on a team who understands DMA descriptor chains/rings, circular buffers, and low-level memory management. This is because, for over a decade, people are not being taught how data is organized in memory. Instead, they learn how to use templates, class libraries, and managed languages that hide the actual representation from the programmer. Some were even taught to fear pointers, and didn’t realize that every object reference in Java is through a pointer.

        I’m rambling, sorry…

        My point is that C is a valuable tool and attempts to eradicate it are doomed to failure. Rust is okay, but once you get into the low level code where you cannot throw exceptions and must resort to “unsafe”, the Rust code is less clear, thus less safe, than the equivalent C code. C code can be safe, but there’s a cost to the programmer to ensure that safety. Rust code can be safe without the cost to the programmer but at the expense of runtime efficiency and the increased complexity of the language runtime.

        1. > It is possible to write safe code in C. If you use proper pointer and array hygiene and are scrupulous about bounds checking at the public interface layer, you can even write safe libraries.

          There are always 3,000 “ifs” to making C safe, and oops, forgot one and now somebody died, or oops, forgot another and now 1,000,000 customer details leaked. If C can be made safe if only one used “proper pointer and array hygiene”, then why not just encode that hygiene in a type system and make a better C.

          C is a valuable tool because no better alternatives with the same degree of platform support, but to make C the best choice, you should be using tools that can verify some of those safety checks at compile-time that C currently lacks, like Frama-C or CBMC.

  3. C is an ‘super assembler’, essentially.
    An in-between of assembler and a high-level language.
    That’s why it is so mercy less and has so many things we may consider to be shortcomings.

    If you need a real high-level language, there’s still Pascal, among other languages.
    BASIC is such a high-level language, too. ;)

    1. C is no “super assembler”. It’s way better than any “super” macro assembler. What the original authors of C intended was to make something that can be compiled on older, more memory constrained computer architectures (originally from DEC) with as little as 8KB of total “RAM”. Sure, it can be used to write “bad code” (the old CUG magazine even had an obfuscated C contest yearly), but that can be done in any language. Pascal is not “better” than C, it was an alternative that promoted one of its deficiencies as a “feature” which originally allowed no forward references and relied on nested procedures (like PL/M, based on PL/I) to allow the compiler to be one pass (since the programmed had to do that heavy lifting). Even Wirth replaced it with Modula. So, “which” (flavor of) Basic are you referring to? Kemeny & Kurtz’s original interpreter or Microsoft’s evolution and extensions of the language that includes compiled images and “object oriented” extensions, among others. What about Ada, Prolog, Scheme, etc.? Or even the ancient and ever evolving FORTRAN?

      1. Pascal with no forward declaration wasn’t even common in the 80s, contrary to C the ISO standard was never relevant to the currently in use (and widespread) dialect.

        String processing was always a strength of Pascal vs. C, since it could be done safely, and later would be even reference counted, same with accessing arrays, which had boundary checking.

  4. i just think it’s important to keep trade offs in mind. it’s easy to blithely say that C lacks features other languages have had for years, as if the existence of these other languages proves these features are trivial and therefore C is failing to keep up. but it’s simply not true. every language has faced a struggle between expressive flexibility and type safety, between efficiency and memory safety. every language has struggled with compiler complexity, implementability, compatibility. every language has struggled to have a solid runtime and to achieve good interoperability with the rest of the world. every language struggles even to achieve familiarity!

    and C is no exception to those struggles. in my opinion, it hits a sweet spot…but i don’t mind if other people don’t use it or don’t like it. it definitely extracts some costs in exchange for its flexibility!

    i wish luck to people trying to reproduce C’s rewards without its costs. i’m not gonna say there’s been no progress on that front. but let’s be straightforward about the trade offs. there are costs in terms of performance, compatibility, familiarity, and maintainability. people’s choices reflect their unique priorities, and not just ignorance or closed-mindedness.

    but, you know, props to ocaml, negs to rust, grudging respect to java, laughs at javascript, slow down it’s python, shrugs at scheme, tickles to forth, thank god i don’t have to use ruby/tcl/php! we’ve all got opinions. :)

  5. The writer of the article summary here on hackaday seems to think flexible arrays exist because they are some gross thing made by people who don’t know how to do stuff safely. They actually are quite useful for things where you want serialized representation to be exactly the same as memory representation (e.g. memory mapped) and other similar performance critical applications. The article linked doesn’t really say flexible arrays are bad but discusses how they can be made safe. Most of which is that you can’t tell if someone declaring T foo[1] or T foo[0] really means T foo[] (which didn’t exist until C99).

    See a summary of advantages here: https://developers.redhat.com/articles/2022/09/29/benefits-limitations-flexible-array-members#flexible_array_members_vs__pointer_implementation

    Another interesting point is that this type of ability to make a array and size in a flat structure (1 malloc()) doesn’t really exist in any other programming language even C++. You can’t use a constructor/destructor pair object, because it is an error for a C++ structure to not have static size (templates can very in size but they are still static given a fixed set of parameters). There is an interesting proposal to make this possible in C++ here https://developers.redhat.com/articles/2022/09/29/benefits-limitations-flexible-array-members#flexible_array_members_vs__pointer_implementation

    That article also provides more applications. Unsafe arrays, unions, etc. in C is all about performance with safety tradeoffs. Most pedestrian programmers don’t need this stuff, but when you need it you need it, and you can’t do it in any languages other than C.

    1. You can actually in Pascal, if you explicitly chose so, mostly for compatibility with C and the many headers that use such techniques, but it’s there.

      C# probably can do the same in unsafe code sections, for similar reasons.

  6. > We really liked the recent headline we saw from [Sarah Butcher]: “If you can’t write safe C++ code, it’s because you can’t write C++.” [Cook’s]

    This is true. This corollary is also true: almost no one can write safe C++.

  7. Many compilers are written in C. It’s a great bootstrap language to get a system going and expand capabilities into other areas, even different languages. It’s because it’s close to the hardware that it is a great language to learn as you learn more about the inner workings of computers and microprocessors. Assembly is better in that regard but certainly not by much.
    Nowadays 99% can be written in a language with safeguards against overflows and other bugs where the remaining 1% can be written in C or assembly if the extra performance/efficiency is needed.
    But it’ll be interesting to see how C is further developed, maybe it will receive some modernization.

    1. “safeguards against overflows and other bugs” … Not sure if it should. We need a language that compiles to what is coded, not with a bunch of extra code wrapped around our code for ‘safe guards’. That is the beauty of ‘C’ . It is also one reason we didn’t move to C++ in our real-time control systems back when. There was to much that could or would be ‘hidden’ behind the scenes that might cause a degradation in performance or run not as you expected… Stick to straight ‘C’ and those problems are behind you (like writing in assembly). Of course if the compile ‘suggests’ there might be a problem in area ‘x’ .. then I am all for that. Just don’t introduce any ‘special code’ that the programmer isn’t aware of.

      You can write safe ‘C’ code too (or any language). Just a little harder. You just have to be more aware of what you are doing. Trade offs!

  8. Ada was developed (for the military) under the misguided pretense that “correct” (e.g. bug free) code could somehow be mandated by a “good” programming language. This is as futile as the attempt to create the complete “ground-up, pure” definition of math done by Alfred Whitehead as discovered by his student in Gödel’s Incompleteness Theorems, concerning the limits of provability in formal axiomatic theories, much like the endless attempt to define the best and most pure programming language or even practices. The axiom is that any attempt to thwart the idiot, seriously underestimates the power of idiocy. And, what is the definition of a “perfect program”? One that has no bugs? One that cannot be “corrupted” by bad data? There are too many avenues for issues to crop up. I recall my (not-so-smart) bosses reason for removing me as the SW lead of a new project: “you write too many ifs!”. Geeze!

    1. > Ada was developed (for the military) under the misguided pretense that “correct” (e.g. bug free) code could somehow be mandated by a “good” programming language.

      It can. It’s even been done with theorem proving languages, like CompCert.

      The Ada attempt wasn’t misguided at all, Ada programs are empirically known to be less problematic, easier to read and reason about. Ada to this day has safety and other features that are so useful but still uncommon in languages (like range types).

      1. Apparently, CompCert uses Coq to “mechanically verify” correctness of compiler’s constructs (Wikipedia states its used for C++ and not Ada), but how do you verify the specification? It’s not a proof, per se. And by using the term “empirical”, you imply that “by experience” (use without “issues”?) that it’s “correct”.

        I worked at a Thomson CSF that used the original Alsys Ada after it was purchased by the parent company. Ada had a strict verification suite and process that had to be run for each iteration of the compiler AND run time (which was critical to the support of “real-time” operations) on EACH and EVERY target processor. We used it on an Intel 80186 back in the 90s. Some the the same is done for certified avionics code applicable to Do-178x (I haven’t kept up since retiring). There, it’s even more critical to verify the testing suites and there are requirements that the highest level of certification can only be done for a specific processor mask step that has “experience” of use before it’s accepted. However, in the early day, our head SW engineer from France hand-modified the Ada kernel to work on the specific 80C186 we used and I was ignored when I said that that invalidated the Ada “environment’s” certification. That of course required access to the source of the Ada run time which wasn’t cheap.

        1. You’re mixing some things up from.my reply.

          CompCert is a formally verified optimizing C99 compiler written in Coq, a theorem prover. The proof that Coq is verifying is that CompCert faithfully conforms to the C99 specification, and that all of its optimization passes preserve C99 semantics. This is a counterexample to your claim that choice of language cannot ensure bug-free code. It can, it just costs time, money and effort. This response had nothing to do with Ada specifically, although SPARK Ada is a great choice of you want formally verified safety properties out of the box.

          By “empirical”, I’m referring to the multiple empirical studies comparing software quality of Ada programs vs. C/C++ programs.

          As for verifying specifications, there are a couple of options: 1. Encoding the specification in a theorem prover will root out logical inconsistencies, 2. Multiple independent specifications that are then checked against each other. I wouldn’t call specification errors “bugs” though, bugs are implementation errors not specification errors.

    2. I think you have to differentiate a little. Reduction of complexity and increasing clarity is helpful, as is axiomatization of math. Only that endeavour made it possible in the first place to determine expressivity and the limits of what can be proven.

      Good language design allows to focus on the bigger problem, and when it gets out of your way instead of baby sitting you, then that’s definitely good.

      Allowing as much control as possible, while also having good defaults (so as to not have to control when it does not matter). This mix is possible, and many languages experiment with finding this sweet spot.

  9. Code from academic papers is almost always bad, no matter the age/experience of the academic, unfortunately. Maybe that’s why it is often hard to get access to the code…

    It seems a large part of academia sucks at explaining certain concepts, and it shows in the code, which is equally unstructured and written to be understood only by the person who wrote it.

    Unfortunately the all too frequent mindset of “left as an exercise to the reader” will extend to writing good code, that is explicit about its concepts and meaning, and has enough documentation (relevant to the problem, not coding patterns).

  10. Oh, come on, he wrong about snprintf() not respecting size, he’s wrong about _countof() being part of the standard, he’s wrong about why strtok() is unsafe, he’s wrong about why strtok_s() is safe… Don’t you just love when people give you incorrect advice? ;-)

  11. Reminds me of the debugging heap I created for one C project, which allocated each block with (adjustable size) signatures before and after it to detect simple out-of-bounds writes at runtime. Burned extra memory and a few cycles, but caught some coding errors we would have had a hard time resolving otherwise.

    Some of that was merged into the similar option in IBM’s C compiler for OS/2.

Leave a Reply to Greg ACancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.