Linux Fu: Mixing Bash And Python

Although bash scripts are regularly maligned, they do have a certain simplicity and ease of creation that makes them hard to resist. But sometimes you really need to do some heavy lifting in another language. I’ll talk about Python, but actually, you can use many different languages with this technique, although you might need a little adaptation, depending on your language of choice.

Of course, you don’t have to do anything special to call another program from a bash script. After all, that’s what it’s mainly used for: calling other programs. However, it isn’t very handy to have your script spread out over multiple files. They can get out of sync and if you want to send it to someone or another machine, you have to remember what to get. It is nicer to have everything in one file.

The Documents are All Here

The key is to use the often forgotten here document feature of bash. This works best when the program in question is an interpreter like Python.


#!/bin/bash
echo Welcome to our shell script

python <<__EOF_PYTHON_SCRIPT
print 'Howdy from Python!'
__EOF_PYTHON_SCRIPT

echo "And we are back!"

Variations on a Theme

Of course, any shell that supports here documents can do this, not just bash. You can use other interpreters, too. For example:

#!/bin/bash
echo Welcome to our shell script

perl <<__EOF_PERL_SCRIPT
print "Howdy from Perl\n"
__EOF_PERL_SCRIPT

echo "And we are back!"

It might be confusing, but you could even have some sections in Python and others in Perl or another language. Try not to get carried away.

Just in Case

There’s the rare case where an interpreter doesn’t take the interpreted file from the standard input. Awk is a notorious offender. While you can embed the entire script in the command line, that can be awkward and leads to quoting hassles. But it seems silly to write out a temporary file just to send to awk. Luckily, you don’t have to.

There are at least two common ways to handle this problem. The first is to use the bash process substitution. This basically creates a temporary file from a subshell’s standard output:

#!/bin/bash
# Hybrid bash/awk Word count program -- totally unnecessary, of course...

echo Counting words for "$@"
awk -f <( cat - <<EOF_AWK
    
    BEGIN {
        wcount=0;
        lcount=0;
    }
    
    {
        lcount++;
        wcount+=NF;
    }
    
    END {
        print "Lines=" lcount;
        print "Words=" wcount;
    }
    
    EOF_AWK
) "$@"

echo Hope that was enough
exit 0

Yet Another Way

There is another way to organize your process substitutions, so they are all gathered together at the end of the script surrounded by a marker such as “AWK_START” and “AWK_END” or any other pair of strings you like. The idea is to put each pseudo file in its own section at the end of the script. You can then use any number of techniques like sed or awk to strip those lines out and process substitute them like before.

There are two minor problems. First, the script needs to exit before the fake files start. That’s easy. You just have to make sure to code an exit at the end of the script, which you probably ought to do anyway. The other problem is searching for the marker text. If you search the file for, say, AWK_START, you need to make sure the search pattern itself isn’t found. You can fix this by using some arbitrary brackets in the search string or breaking up the search string. Consider this:

#!/bin/bash
# Hybrid bash/awk Word count program -- totally unnecessary, of course...

echo Counting words for "$@"
# use brackets
#awk -f <( sed -e '/[A]WK_START/,/[A]WK_END/!d' $0 ) "$@"
# or
AWK_PREFIX=AWK
awk -f <( sed -e "/${AWK_PREFIX}_START/,/${AWK_PREFIX}_END/!d" $0 ) "$@"

echo Hope that was enough
exit 0

# everything below here will be the supporting "files", in this case, just one for awk

# AWK_START

BEGIN {
    wcount=0;
    lcount=0;
}

{
    lcount++;
    wcount+=NF;
}

END {
    print "Lines=" lcount;
    print "Words=" wcount;
}

#AWK_END

There is no reason you could not have multiple fake files at the end, each with a different pair of markers. Do note, though, that the markers are sent to the program which is why they appear as comments. If these were going to a program that didn’t use # as a comment marker, you’d need to change the marker lines a bit, write a more complex sed expression, or add some commands to take off the first and last lines before sending it.

That’s a Wrap

You could argue that you can do all you need to do in one language and that’s almost certainly true. But having some tricks to embed multiple files inside a file can make creating and distributing scripts easier. This is somewhat similar to how we made self-installing archive files in a previous installment of Linux Fu. If you’d rather script in C or C++, you can do that too.

24 thoughts on “Linux Fu: Mixing Bash And Python

    1. In fact, both Python2 and Python3 support the print(stuff) syntax, and contrary to the popular belief, you don’t need the “from __future__ import print_function” thing unless you want to print from inside lambdas. Generally, there’s no reason to use “print stuff”, no matter which Python version you’re using. I say, adjust all the snippets in the article to “print(stuff)” so they’re both 2- and 3-compatible =)

    2. I do this fairly often, but with PowerShell, instead of Python on Linux. If I am being honest, I even find myself reaching for it more than bash now for a lot of things since I can share the script between all my PCs and VM’s since it is cross-platform. So really, it ends up being the opposite, I have PowerShell scripts, then embed bash into it, when I absolutely have to.

  1. Stealing from the Chrome/IE meme: what do you do with bash? Use it to install python.

    Considering how apprehensive people can be to learning all the ends and outs of not just bash or any other shell but all of the common applications like awk, sed, grep, etc., It makes sense that we’re likely to see more shift to loading up a bloated python script with 100 linked libraries to do what used to be done on a single piped line. I would be interested to see what performance hits are taken when using a well scripted Python over a well scripted bash.

  2. I love bash scripting, especially when I can abuse the history command to generate them. Like if I need to fix a bunch of machines at once, I’ll just do a test system, then once I have the commands, I just dump my session history into a script file, strip the first few characters from the beginning of each line, and the toss a shebang at the top, and we’re golden.

    And then once I have the script written, I copy it it into some Busybox source I have, embed it into the binary, the compile statically-linked and preferring internal commands over external, and now I have executable file that can be distributed with zero worry over whether all the dependencies exist on the system, as this method just requires a functional kernel and some method to kick it off. I’ve taken it even further and just specified that script as the kernel’s init, and then boot it over PXE. Nothing more satisfying that being able to run a script without even needing an operating system on the device, much less any dependencies.

      1. Maybe I am stretching things a bit, but what is happening is that I am running the script without there being an OS installed on the system itself, and what is booted from the network is too minimal to fit any modern idea of what an OS is. It lacks any sort of interactive session, the kernel is compiled without support for any form of tty (so no way for a user or other system to interact with it once the kernel starts).

        How I am doing this is custom-building a linux kernel with “CONFIG_DEFAULT_INIT” is set to a path to my script-binary hybrid (which lives on a read-only NFS share). The kernel is compiled with an incredibly minimal configuration, pretty much the only device drivers that are included are the network cards, basic TCP/IP support (With NFS_ROOT support enabled), and NFS.

        The script is embedded into busybox, which is compiled with:
        CONFIG_FEATURE_PREFER_APPLETS=y
        CONFIG_STATIC=y
        FEATURE_SH_EMBEDDED_SCRIPTS=y
        and the various applets included for the script to run (like grep, chmod, mount, etc).

        These are then copied to a machine running busybox’s dhcpd and a NFS share that contains the kernel and the busybox binary with the script embedded.

        So the end machine will work as follows:
        1) Machines boots up
        2) iPXE ROM searches for a DHCP server
        3) server provides IP address, and some options to tell iPXE where to find the kernel and the kernel’s commandline options
        4) The kernel boots and sets up a very minimal amount of the system
        5) Kernel mounts the NFS share as its root mount
        6) Kernel then immediately calls my BusyBox-with-embedded-script, and since busybox was called using the name of the script, busybox immediately runs the script.
        7) When the script finishes running, the machine reboots (because from the kernel’s point of view, the init process terminated successfully, which is usually the kernel’s indication the user request a reboot).

        1. I think the Poster meant they did not understand how a cistom initramfs rootfs can be compiled into the kernel, or how most PXE tftp handles the initrd gz files.

          It is very common to have virus scanners (single large “kernel” file), and bootstrap PXE environments that load disk utilities like FOG (initrd based https://fogproject.org )

          The main issue is PXE was often targeted by several bad actors, and the bios/supervisory utilities that replaced it are worse in some ways.
          ;-)

          Cheers,
          J

          1. Ah, yeah, initrd and initramfs can get a bit complicated, but I am using neither here. Its been a long time since I’ve encountered a reason to actually use them anyway. Anything required to get a filesystem mounted on my systems is compiled directly into the kernel, and I use a recent enough BIOS that the whole disk is accessible through the standard BIOS calls, so no need to cram everything in the first 504 MB of the disk.

            Although even with those limitations, I’ve just gone the way of a very small root partition, and then having /usr, /var, and /lib as mount points to hold the big stuff (I statically link everything I need to get a working system, so I don’t need /lib until its time to start service daemons).

  3. So I’ve a bunch of Python scripts set up to tweak things on my laptop: for some reason Linux recognises most of the hardware keys but not the screen brightness keys, it doesn’t toggle the touchpad when a mouse is plugged in etc.. I didn’t know about Here Documentation, I’ll have to try that.

    I know Bash is a pretty fundamental skill, but I already know Python, and as a hobby hacker it’s not something I *need*. When it comes to learning things I’ve much bigger priorities, like learning to not be crap at C. And while Python might be slow compared to other languages, small jobs are still fast enough that the difference is pretty imperceptible.

    1. Id beg to differ. Learning shell scripting is immensely beneficial when you find some random box with a processor in it and go through the works of getting into an embedded Linux that has nothing more than a busybox environment and some custom apps. Hacking most routers starts this way and you’d be surprised how many cheap devices have embedded Linux in them, but rarely will these devices have even a partial working Python.

      1. Sure, if you do that kind of thing it is very useful, but for a hobby hacker like me who is more interested in microcontrollers it’s not that helpful. That makes C a far bigger priority for me than Bash.

        As always, horses for courses.

        1. C is certainly more important than Bash, and I would say Python too. Al writes a lot in bash b/c he knows it and likes it.

          I write tons of little utilities in bash, but when they get to be longer than 10-20 lines, there’s probably a better way to be doing it.

  4. Why I love shell scripting over Python/Perl/Ruby/whatever:

    1) No venv (that’s what docker containers are for, innit?)
    2) bash/bourne are ubiquitous
    3) shellcheck for syntax
    4) nothing tries to tell me I can’t use actual, visually-readable tabs instead of spaces
    5) nothing delights me more than condensing some 15K python script with 18 imports down to some conditional logic and a couple of piped $(stuff | stuff) calls.

    to be fair, I do like python’s exception handling.

  5. Thank you for the nice article. I used this possibility to interact with FTP and LFTP as well as sqlite3. I would also like to mention that you can still use variables in such a std_in redirect. e.g.

    # create filelist
    ftp -inv <<END_SCRIPT
    verbose
    open $HOST
    user $USER $PASS
    cd $RCD
    lcd $LCD
    mls ${FILEPATTERN} ${downloadfilelist}
    quit
    END_SCRIPT

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.