Coffee With Kernighan

There was an interesting tidbit buried in a Computerphile video released last week (below the break), featuring professors [David Brailsford] and [Brian Kernighan] having a chat over coffee. Among other topics, they discuss the history and current state of various text processing tools. We learn that [Kernighan] has taken on a summer project of updating the AWK text processing language to handle UTF-8 text, an omission he admits is embarrassing in this day and age. He is also working on a second edition of The AWK Programming Language book, which hasn’t been updated since being first released in 1988.

[Brian Kernighan] is a legend in the world of Unix and computing, working at Bell Labs during the 70s where Unix and C were developed. Among the many accomplishments in his career, he is well-known as the co-author with [Dennis Ritchie] of The C Programming Language, first published in 1972 and still being used decades later, AWK mentioned above, and major updates to troff. More recently, he co-authored The Go Programming Language book in 2015.

If an updated UTF-8-capable AWK interests you, keep an eye on the AWK GitHub repository where [Kernighan] anticipates an update, once he wraps his head around git a little better. We’re happy to see [Brian] so active at 80 years old. If you want to learn more about those early days at Bell Labs, we reviewed [kernighan]’s very interesting UNIX: A History and a Memoir a couple of years ago. 

29 thoughts on “Coffee With Kernighan

        1. Nice analogy. I agree.

          But someone will come along and say “yeah but no one in the construction business uses hammers any more, they just use modern nail guns…. to blow a hole through their foot now, instead of just a thumb bruise … :) . “

        2. I suspect it is mostly about how indirection, pointers, and arrays are handled. It is pretty sucky. There are some mistakes you can make that require a deep understanding to figure out. There is a lot of good info in “Expert C Programming: Deep C Secrets” by Peter Van Der Linden who was at SUN when he wrote it.

          1. the one thing i can say about C is that nothing requires a deep understanding to figure out. you do have to understand the whole thing, but the whole thing is shallow.

            C++ is the language where you have to understand a deep thing. ugh. the need to understand it is so urgent, and its depth is so staggering.

            i agree there are benefits to languages like java where there are deep things but a shallow understanding usually gets the job done because the language does a good job of isolating you from them.

          2. But if you were writing in assembly language, everything would have to be worried about.

            If you see C as high level, you have more expectations than if you see it as just a step above assembler.

          3. I agree it’s about pointers, arrays and the declaration syntax. As far as efficiency modern compilers are so good even Lisp can run efficiently. At a more general level the argument is C doesn’t have higher level abstractions and doesn’t have garbage collection, so programmers spend too much time housekeeping and dealing with memory leaks. Now given that C came out before garbage collection algorithms had matured you can give C a pass on that, but ALGOL had come out before C and it looked like nothing from ALGOL syntax made it into C.

        1. I use AWK every single day and have for decades. As a system engineer and architect it is an invaluable tool. As i have moved into public cloud and Kubernetes over the last 5-10 years, it continues prove it’s worth.

          1. yeah i personally can’t stand awk but if you already know it why wouldn’t you use it? i mean, i imagine a lot of people can’t stand perl. one thing is for certain, i will be using *something* for the purpose of trivial manipulation of text datasets in pipes. whether it’s perl or awk or just stringing together more primitive tools like sed, grep, sort, uniq

        2. Learning awk is trivial. It’s like grep but with multiple choice. It’s not for programs. It’s like sed but with more options. I was skimming web pages for COVID data so learned awk in a few hours and had awk scripts I’m still using to extract data from web pages. It was faster than writing a Python or Perl script. Definitely worth learning the basics to write little data extraction utilities.

  1. Wonderful post, thanks Chris. Today is a good day :-)
    The first decade of my programming career was in a COBOL environment. I’d moved around a little in the mainframe industry but was looking for something.. something.. well something else.
    My brother dropped an already yellowed copy of Kernighan and Ritchie on my desk and the change was begun. A year later an altogether heftier Bjarne Stroustrup landed on the same desk and the conversion was complete. Now both books are in my daughter’s library who is older than I was when I first received them. I’ve never actually read programming books since (don’t need to with the internet etc) but there is a certain nostalgia about learning a skill from a well written technical book.

  2. While there are people who think that programming is low labor and who are competing to autoassign the most high-sounding and buzzword-filled title, here is a man who has set up modern computing and who commits to git at 80. Maximum respect.

  3. Weapon testers at Sandia National Laboratories were ~exclusively written BASIC using standalone computers. hp 9800 series were a favorite. IBM 5100 and Wang 2200 used BASIC standalone computers … as opposed to mainframe hardware technology.

    Bomb fuzing software was written in assembler.

    Sandia labs initiated building a rad hard 8085 and support chips. Project failed … at cost of hundreds? of million $s.

    As project leader of the NSA-sponsored Missile Secure Cryptographic Unit which used 8085 parts, we decided to us 8080 fig-Forth and metacompiler software [opposed to assembler] technology on advice of Ray Duncan of Laboratory Microsystems.

    Project complete way ahead of schedule.

    Sandia decided to build a rad hard 8051 because it obviated need for so many support chips.

    I was assigned to lead porting 8085 fig-Forth port to the 8051. Nautilus metacompiler author Jerry Boutelle
    was hired oversee port … including selecting other team members.

    Success. This resulted in a Sandia-approved book.

    Late 80s, early 90s saw invasion of c/c+ programmers. BASIC was out. Forth too.

    1 Buggy 2 malware c/c++ projects are appearing. Windows [update], LinuxS [sudo apt-get update], … Hacker exploits.

    I am one of three authors of the Boeing 767 Software Certification Policy, 1980.

    Boeing hardware engineers’ software standards required that no module to exceed one page of code.

    Reasons: 1 Locate module[s] with issues 2 Enable competent programmers to make fixes.

    After the c/c++ industry takeover in ~1991 is this practice still used?

    Linux and mainframe hardware technologies reported used in Boeing 737 MAX avionics software.

    FAA/NTSB appear involved in a cover-up of the MAX hardware/software technologies.

    Time to do something about this?

    1. Whose metacompiler did you use? Tom Wempe had one with Mountain View Press for 8088/8086 and 6502. It was called……PADS. Professional Application Development System? I have the binder in storage. (I was at Boeing when 7N7 started and all the new PDP-11/70’s where installed for each group.)

  4. Awk is designed to just do what you want with minimal effort when processing text data using pipelines. Pipes are one of the most useful innovations in unix. If you are wrangling ASCII files or doing pipeline verbs for signal processing awk is invaluable. It handles fields and pattern matching automatically. I write most of my dsp code first in awk with ASCII and the convert to c when it is finished. Many tools can stay in awk forever.

Leave a Reply to RpolCancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.