Linux Fu: Shell Scripts In C, C++, And Others

At first glance, it might not seem to make sense to write shell scripts in C/C++. After all, the whole point to a shell script is to knock out something quick and dirty. However, there are cases where you might want to write a quick C program to do something that would be hard to do in a traditional scripting language, perhaps you have a library that makes the job easier, or maybe you just know C and can knock it out faster.

While it is true that C generates executables, so there’s no need for a script, usually, the setup to build an executable is not what you want to spend your time on when you are just trying to get something done. In addition, scripts are largely portable. But sending an executable to someone else is fairly risky — but your in luck because C shell scripts can be shared as… well, as scripts. One option is to use a C interpreter like Cling. This is especially common when you are using something like Jupyter notebook. However, it is another piece of software you need on the user’s system. It would be nice to not depend on anything other than the system C compiler which is most likely gcc.

Luckily, there are a few ways to do this and none of them are especially hard. Even if you don’t want to actually script in C, understanding how to get there can be illustrative.

The Whole Shebang

I’m going to assume your shell is Bash. There may be subtle differences between shells, but shells will typically support a way to launch scripts known as the shebang — it’s the use of the hash and exclamation characters (#!) you’ve probably seen at the top of scripts.

When Bash sees you are trying to execute a file, it tries to figure out what kind of file it is using a magic number lookup the way file does. The file command actually uses a library called “magic” to do this and you can run man magic to see a database of sorts that is at work. In theory, there’s a text representation and a compiled version, but many common distributions don’t install the source by default. Regardless, the database looks for certain magic numbers in files to determine their type — programs don’t need to rely on file extensions, for example.

The exact format isn’t important, but a typical entry has an offset to look inside the file and a number or pattern to match. In the case of a shell script the magic number is 0x23 0x21 which is, of course, #!. In particular, system calls that execute something can tell the difference between a shell script and just a random text file.

Normally, you’ll see something like #!/usr/bin/bash which causes the file to run as a Bash script. Of course, this hardcodes the location of the system copy of bash. Some argue this is good because you think you have a chance at getting a known copy of Bash. Others argue that if you have an upgraded copy of Bash in your personal directories it won’t use that. If you agree with the latter group, you can try #!/usr/bin/env bash — that still hardcodes a path, but that executable only sets up the environment.

The interpreter, though, doesn’t have to be Bash or even a proper shell. For example, an Awk program might have #!/usr/bin/awk -f as a first line. So one strategy would be to build a script that can “launch” the underlying C “script.”

That’s one approach, but I took a different one. My original thought was that since #! looks like a preprocessor statement, a script file might be directly usable to the C compiler. That might have been true in the past, but a modern preprocessor throws an error when it sees something it doesn’t expect.

Marking C Files as Bash Scripts

I wanted to keep things simple. The following lines at the very front of a stand-alone C file is enough to make things work:

#!/usr/bin/env bash
#if 0
source cscript_simplec
#endif

The first line tells the system that this is a Bash script. You might be wondering why I would mark it as a Bash script when I’m trying to get to C. Well, the very next few lines are a Bash script. The #if and #endif statements are just comments to Bash. And the source command tells the shell to read cscript_simplec from somewhere on the directory path.

That source never comes back, so what’s after it doesn’t matter to Bash. However, this file will pass to gcc if the executable is out of date. Suppose this file is example.c. There will be an executable example.c.bin in the same directory. (This implies that the first person to run the script needs to have write permission to the directory.)

If the binary is newer than the source file, we simply run it using exec. This causes the program to overlay the current copy of Bash which saves a little memory compared to just running the new program. However, if the source is newer, the script rebuilds the binary first.

There’s a slight problem. Although most of the file will be legal C, the first line isn’t. Yet that line is crucial for the startup. The answer is to cut that line off. Here’s what cscript_simplec looks like:

if [ "$0" -nt "$0.bin" ]
then
  CCOPTS="${CCOPTS:--O3}"
  if ! tail -n +2 "$0" | gcc -x c "${CCOPTS}" -o "$0.bin" -
  then
    echo Compile Error on $0
    exit 999
  fi
fi
exec "$0.bin"

The final command on line 10 cuts off the first line and feeds gcc through the pipe. Because there’s no file name, we have to tell gcc that it is reading a C file (the -x option). You can set CCOPTS or it will default to -O3.

Of course, if you were going to send this out into the wild, you might want to just include this whole chunk — or something similar — in the script and forego the source command. That would work.

Complexity

It’s easy to change the code for something different like C++. Since this is scripting, it is pretty safe to assume there is one file and the executable is directly dependent on only the source file. However, if you want a bit more complexity — some would argue too much for a simple script too — you can turn to make.

Replace cscript_simplec with cscript_make if you want to try that. You’ll have to provide a makefile, too (example.c.make in this case). A suitable one is:

$(SCRIPT_OUT_NAME):$(SCRIPT_NAME)
       gcc -x c $(SCRIPT_NAME) -o $(SCRIPT_OUT_NAME)

Note you have to use $(SCRIPT_NAME) for the source file and $(SCRIPT_OUT_NAME) for the executable. This is a silly example, of course, but you could create a complex set of dependencies and compile options using a makefile. On the other hand, this seems to violate the simple principle, so you are probably better off just writing a normal C program at that point.

If you really need a high-level scripting language, you might consider Python or one of the many other interpreted languages available. However, understanding the mechanism and how to subvert the C compiler might still come in handy someday. After all, you can pull some ugly/beautiful hacks with the preprocessor and compiler.

44 thoughts on “Linux Fu: Shell Scripts In C, C++, And Others

  1. “In addition, scripts are largely portable”.

    Actually, considering that any computer that has bash or csh or ksh, or any other sh, is more than likely to have a c compiler installed, and code written for that is every bit as portable as any kind of shell script.

    1. macOS is the single most popular UNIX out there and it does not come with a c compiler installed.

      Neither RedHat, Ubuntu or CentOS includes a c compiler in the default desktop installation. Ask ANYONE who uses VMware, you will
      hit this issue immediately after you create a new VM and you have to compile vmtools.

      Solaris and AIX are what many people consider to be “Real” UNIX and neither of them comes with a c compiler.

      The default install of Cygwin does not include a c compiler.

      The reality of the situation is unless it was installed by the user, in general there is no c compiler on a computer.

  2. “When Bash sees you are trying to execute a file, it tries to figure out what kind of file it is using a magic number lookup the way file does. ”
    I am not sure why this is there because that is not how it works or if yes in some special cases I’m not sure how much it is relevant here. However then there is “In particular, system calls that execute something can tell the difference between a shell script and just a random text file.” which hints that author probably knows how it works. It is feature of linux kernel directly. It will run the executable after shebang and pass the file to it. If it was feature of the shell it would not work when you use exec system call directly on executable script from your compiled binary. Or the kernel would always need to spawn extra /bin/sh just to let it parse first shebang line and exec it which would be waste of time. And most probably bash also calls exec directly on such script executable without checking what is inside via magic.

    1. Yes exec does the work but from the user’s point of view the shell does the right thing. Just like if you were talking about redirection. Sure the library does the file writing through syscalls. But it’s still the shell that makes the call to begin with.

      1. I have to agree with @fanoush here, the wording make it seem like the shell is treating a script differently than any other executable. ( to the point that I had to go check in the source myself to be sure it was not the case )
        And since this serie is intendended to beginners, it might be a good idea to be as clear as possible, simply inserting “[… execute a file, it] passes it to the kernel wich tries to [figure out what…]” would be enough to not have it be ambiguous anymore. ¯\_(ツ)_/¯

        1. Well, but if you read the source you must have seen that it CAN interpret the #! syntax itself depending on how it is built. This allows it to support OS that doesn’t do it although of course Linux does. However, I still stand by my statement that if you want to be consistent than by your clarity rules, vi doesn’t save files. It calls the operating system which then writes the file. That’s actually technically true, but no one thinks that is more clear.

          Even your example is really not necessary for someone to know who is trying to learn bash and — again, go read execute_cmd.c in the bash source tree — could actually be incorrect.

          Virtually all programs use some kind of service from somewhere. No one says, “When you press save, the program calls the C library which then uses syscalls to open the file, write to it, and close it.” You say “the program saves the file” and unless you are talking about the internal architecture of the program, you don’t really care who is really doing the work. It might even be some XML export library. Don’t know and don’t care.

          But if you write your blog posts about bash internals, then yes, you’d be correct about the level of detail, but incorrect that bash will never interpret the #! by itself.

      2. The point is, the shell does nothing more than any other program calling exec(2). It is the kernel with its binfmt subsystem (binfmt_script.c in this particular case) that does the job of recognizing the file format.

        Very interesting is the binfmt_misc module, which enables the kernel to support any executable format, provided a userland “interpreter” for it exists. This is how QEMU can be used to transparently launch non-native ELF binaries.

          1. Oh, OK, nevermind. So you knowingly simplified it for target audience. Or even knowingly misrepresented it so you could write about file command and magic library like if the shell would do it in similar way – and indeed the shell could do it like that by itself without any kernel help, but it does not.

          2. Apart from the shell executable itself this shebang parser could be also implemented in libc exec syscall wrapper. I think some other unix kernels did not handle it so it was done like that there. That’s why I thought it matters to explain how it works in linux.

          3. Also, in at least some cases, bash DOES do this… look at file execute_cmd.c around line 5500.

            In particular:

            /* If the operating system on which we’re running does not handle
            the #! executable format, then help out. SAMPLE is the text read
            from the file, SAMPLE_LEN characters. COMMAND is the name of
            the script; it and ARGS, the arguments given by the user, will
            become arguments to the specified interpreter. ENV is the environment
            to pass to the interpreter.

            The word immediately following the #! is the interpreter to execute.
            A single argument to the interpreter is allowed. */

          4. Indeed bash can help. Thank you for finding the code. On the other hand, it is worth while noting, that HAVE_HASH_BANG_EXEC conditional appeared in bash ⩽ 2.0,

            As far as your vi example is concerned, indeed, the amount of code required for the data to take their place on a physical medium after vi calls write(2) (file system, block layer, device drivers) may be comparable with the whole vi (even vim) code base. I’d say that vi “saves files”, but it definitely does not write data to a storage device.

            Thank you for this interesting post.

  3. Line 3 of cscript_simplec should look like this
    CCOPTS=”${CCOPTS:–O3}”
    Please note the double minus sign before O3. This is needed because the first minus sign is part of the shell “Default values” command and is removed from CCOPTS

      1. I’m still thinking it through. I guess it’d probably require tossing the binary into a temp file. If you wanted to be efficient for subsequent runs maybe a system to keep the temp file around. Maybe name it by the md5sum of the script and go looking for it before compiling? So there might be even more shell infrastructure at the top of your nominally C program.

  4. Oh, come on, that’s not a hack ;-) … here you go:

    #!/bin/sh
    aout=$(mktemp)
    gcc -o ${aout} -xc – <<EOF
    #include
    int main(int argc, char* argv[])
    {
    printf(“hi mom\n”);
    }
    EOF
    ${aout}
    rm ${aout}

    1. Indeed. Why use something else if we have tcc?

      Fabrice Bellard is definitely one of greatest hackers of 21 century. He also wrote QuickJS, it could be used to write shell scripts in JS, if somebody is crazy enough to want it.

      1. I am a fan of Bellard myself. Last time I looked at tcc I remembered it having some limitations. Of course, it doesn’t do C++ so as a technique it isn’t very general. Bellard is frighteningly prolific and has some really interesting stuff like the Javascript PC emulator.

  5. If I had a personal version of Bash… I’d rename the main file to: /usr/bin/bash.old

    Then I’d create a soft link that points to my new version: /usr/bin/bash -> /usr/users/odd_stuff/shells/unabashed

  6. I believe the following sentence is wrong

    The final command on line 10 cuts off the first line and feeds gcc through the pipe

    It is the tail command on line 4 that strips the first line of the bash script. On line 10 the binary created by gcc is executed.

  7. You can make your own shebang commands, and avoid all that boilerplate in the C “script”. Unfortunately, I don’t know if #! can point to another script, so you might have to write your c-jane-run program in a compiled language. But the advantage would be that you’d only need the one line:

    #! /usr/bin/run-c

    1. You could make it extra fancy, and it provide a main() wrapper and basic #includes, and also call the appropriate package manager to install libs. So you could do something like this:

      #! /usr/bin/run-c libx
      xwindows_dialog(X_ALERT, “Hello whirled!”);

      Yes, I love volunteering other peoples time. ;)

  8. I can’t get it to install. Tried for hours, followed all the (poorly written) instructions, nothing. It takes it a while, but at the end it is still unclear if “cling” exists. I see no files, nothing has changed.

    Not ready for prime-time.

  9. I know it’s not the same as doing it with no extra tools needed but I added this functionality to my appbuild tool. Running c/c++ files as executables is very nice for an old schooly like me. :)

  10. I’m still not getting how this is at all helpful. If I understand it correctly, it still takes a minimum of two shell commands to : 1) invoke an editor to enter the C/C++ code, 2) invoke the script. And that’s assuming you have a generic script file that you don’t have to edit.

    If you include execution of the program in a Makefile, it still takes two shell commands: 1) invoke an editor to enter the C/C++ code, 2) “make”. And the Makefile in this case can be as simple as:

    all: foo
    foo
    cc foo.c -o foo
    ./foo

    I don’t know why people think it’s such a chore to build a C program. You can create a generic Makefile that does the whole build of all .c or .cpp files into an executable, and never have to think about it again. In fact, if I understand it correctly, if you type “make” in a directory that contains a .c file, you don’t even have to use a Makefile – “make” all by itself will compile all .c, .cpp, and other source file extension gcc recognizes for the specified executable. So if you type “make foo”, it will look for foo.c, foo.cpp, foo.whatever, and create an executable called “foo”. So if you’re looking to do a quick-and-dirty program, you just create the source file (say, foo.c), lthen “make foo”, then “./foo”. In most *nixes, it will even automagically search for header files in /usr/local/include and /usr/include, and link to libraries in /usr/lib and /usr/local/lib, without any further attention. To me, this is easier than getting python to find libraries.

    I stopped using python anyway, when I discovered that it wasn’t any better at helping me figure out what my mistakes were than gcc was, and I got code that was way faster. As for shell scripts, I still use those for tasks that can be broken down into sequences of commands, but once I go one step off the beaten path, it’s still easier to write “script” in C that runs a sequence of commands (using exec()) than to try to figure out the weird parsing that bash does.

  11. I do remember that I had hacked up something like that a while ago, which fit into a single hashbang line:
    `#!/usr/bin/awk !/^#!/ { print >> “.t.c” } END { system(“/usr/bin/gcc -L/usr/X11R6/lib -lX11 -o xtest .t.c ; rm .t.c ; ./xtest”) }`

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.