Linux Fu: Shell Script File Embedding

You need to package up a bunch of files, send them somewhere, and do something with them at the destination. It isn’t an uncommon scenario. The obvious answer is to create an archive — a zip or tar file, maybe — and include a shell script that you have to tell the user to run after unpacking.

That may be obvious, but it assumes a lot on the part of the remote user. They need to know how to unpack the file and they also need to know to run your magic script of commands after the unpack. However, you can easily create a shell script that contains a file — even an archive of many files — and then retrieve the file and act on it at run time. This is much simpler from the remote user’s point of view. You get one file, you execute it, and you are done.

In theory, this isn’t that hard to do, but there are a lot of details. Shell scripts are not compiled — at least, not typically — so the shell only reads what it needs to do the work. That means if your script is careful to exit, you can add as much “garbage” to the end of it as you like. The shell will never look at it, so it’s possible to store the payload there.

So Then What?

The only trick, then, is to find the end of the script and, thus, the start of the payload. Consider this file, deliver.sh:

#!/bin/bash
WORKDIR=$( mktemp -d )

#find last line +1

SCRIPT_END=$( awk '
  BEGIN { err=1; } 
  /^\w*___END_OF_SHELL_SCRIPT___\w*$/ { print NR+1; err=0; exit 0; } 
  END { if (err==1) print "?"; }
' "$0" )

# check for error

if [ "$SCRIPT_END" == '?' ]
then
   echo Can\'t find embedded file
   exit 1
fi
# Extract file
tail -n +$SCRIPT_END $0 >"$WORKDIR/testfile"

# Do something with the file
echo Here\'s your file:
cat "$WORKDIR/testfile"
echo Deleting...
rm -r "$WORKDIR"
exit 0
# Here's the end of the script followed by the embedded file
___END_OF_SHELL_SCRIPT___
A man, a plan, a canal, Hackaday!

Not exactly a palindrome, but there's no pleh for it.

Multiple Files

If you don’t mind transmitting script files full of binary garbage at the end, the recovered file might just as well be a compressed tar file or a zip file. The trick is to create your base script and append the file to it. So I might have deliver.sh0 as the entire file up to and including the ___END_OF_SHELL_SCRIPT___ identifier. Then to create the final script you can say:

cat deliver.sh0 bundle.zip >deliver.sh

Encode, Reuse, Recycle

Sometimes you don’t want binary characters cluttering up your shell script. Maybe you want to e-mail the script and you are afraid of what the various mail systems in the path might do to your data. It is easy enough to encode your binary data as text strings (with the associated size penalty, of course). For example, you could just as easily say:

cp deliver.sh0 deliver.sh
base64 bundle.zip >>deliver.sh

To recover the file, you’d need some additional work in the main body of the script, specifically after the tail command.

tail -n +$SCRIPT_END $0 | base64 -d >"$WORKDIR/bundle.zip"

Of course, you don’t have to store the file. You could just feed it to another program. A tar archive, for example, might have the line:

tail -n +$SCRIPT_END $0 | base64 -d | tar xf

Naturally, your script can do whatever you need to do to get ready and then maybe process the files after you unpack. You might, say, install a library or a font or merge a patch to the system’s existing files.

You could even embed an executable file in a script — even another script — and then execute that script which might unpack another script. It boggles the mind. Just remember that not every system will allow executables to reside on /tmp or on some mounted file systems, so plan accordingly.

Script Doctor

While bash scripting is often maligned and not without reason, it is very flexible and powerful, as this example shows. It is dead easy to embed files in a script and that opens up a lot of flexible options for distributing complex file setups and applications.

If you are writing serious bash scripts, we suggest you write them carefully. You can even find a “lint” program that can test for errors for you.

25 thoughts on “Linux Fu: Shell Script File Embedding

    1. Yip… good old Minix days. Sure….

      …or do you remember the tricks like `cat a.tar b.zip`?
      Or images that look different depending on their extension?

      Lots of fun for the whole supervillain family!

  1. It does bring back memory of my early UNIX days. HP/UX days to be more precise. HP support used simular like above shell script transport for binaries. I did surprise me.

  2. This works great, if you assume your user can receive binary files, has a compatible shell installed (and uudecode and sed for shar…).
    I recently had to stuff a few hundred megabytes of data over a serial connection, with no common set of transfer protocols available both systems. In the end I had to stuff a copy of z-modem on a floppy disk, after
    1: transferring 200MB of TAR archives over floppy failed (even gzipped and split to 48 disks, I could only get to disk 12 before the target machine would stop recognizing any more of the disks, even when using a GoTek running FlashFloppy.)
    2: building a shar archive, only to find out that I was missing uudecode on the target system.
    3: seeing that the manual page for the included communications software makes mention of supporting zmodem via sz/rz, but doesn’t actually include sz/rz.
    4: trying kermit, only to have it segfault on both host and client (presumably due to different versions?)

    1. That’s rough. In the old days I’d put LapLink on a floppy and plug in either the parallel cable or a null modem cable to transfer the rest.
      In the embedded world, I’d often get stuck on a system that had only a simple boot monitor in FlashROM and I’d have to either pop the chip off and reflash it (on of those coffin style sockets for surface mount parts). Once I hacked in y-modem to our boot monitor over the weekend things went a lot smoother and there were far fewer software developers hanging out in the hardware lab.
      Years later, I realize I could have done a simple serial upload in Intel HEX (or SREC) format of of file transfer protocol (x/y/z modem) and skipped the flashing nonsense. Using a short checksum utility to ensure the program was transmitted successfully. But once you get used to a particular workflow, even if it is a bad one, it’s hard to break away from that same thinking.

        1. PLIP is even newer. I used to run LANtastic between my two DOS computers over a parallel cable. The second computer had a Hercules card and therefor a second printer port. Allowing me to share a printer to boot!

    2. Sad to hear of your Kermit problems – it’s always been one of the most reliable in my experience, as well as one of the fastest (despite its reputation) as long as you use modern versions that can negotiate all the options. And negotiation is it’s super power, unlike all the zmodem variants that can’t talk to each other.

  3. Using AWK is just sloppy because A) you’re just writing a program inside a script and B) AWK isn’t properly implemented on most systems. Using sed would be a much better choice.

    1. I love this response so much – professionally I tent to replace awk invocations with sed wherever I can … And don’t even get me started on `cat | grep | awk […]`

  4. In my org, we have an automated process that uses a sessions history and a diff of all files changed to assemble a shar file of all the things that were changed. This get copied up to a file server and the file named after the change request or ticket that spurred the change. When done right, it means that we can restore a machine from an old backup, then just execute a pile of scripts off the file server to restore the machine to working condition (Application content like databases and stored files is kept on a high-performance SAN).

    Our mail server, though, will kill such scripts as well as anything else similar to it. One of the filter stages detects an attachment’s magic, then searches for blocks of other data to find any magics located in the file, and will kill the file if it doesn’t match the outer file. The purpose is to kill innocuous files (Like word documents, PDFs, etc) that contain a malicious payload. It’s annoying, but it has saved our asses more than once.

    1. > When done right, it means that we can restore a machine from an old backup, then just execute a pile of scripts off the file server to restore the machine to working condition.

      To restore (and to get new) machines to working condition is tooling like Ansible and Saltstack available.

      1. Because we don’t need, nor want, a tool that requires specialized training for something that isn’t all that much better than what we do already. Using a shar means we can use standard tools and our people don’t need to know more than moderate Linux skills rather than forking over piles of cash to some company that might just disappear next year when some other tech company swallows them, or some other tool becomes the dominant fad.

        And I’m not sure where everyone buys their computers, but ours don’t break so much we need an automated tool to fix them. We have 4300 machines, and 100 of those live in a factory that exposes those machines to pretty much every hazard know to humankind, and we still don’t get more than a handful of break/fix, and maybe a dozen or two software tickets in a month.

  5. While I’m sure there are situations where this could come in handy, I find it quite clunky to have to futz around with offsets or use awk, sed or similar for this. Seems overly brittle for limited benefit. I could see the benefit mostly if it’s a large blob that would frequently change and then be auto-appended to the script.
    In most situations where I needed a shell script to come with some embedded files, here documents were the much better solution.

    In bash a typical here document idiom would look like this:
    cat < file_with_stuff
    Stuff goes here
    More stuff
    EOF
    usestuff.sh file_with_stuff

    … or skip the temp file alltogether and just pipe the here document directly to whatever it is you’re doing.

    Similar constructs exist in a great many scripting languages:
    https://en.m.wikipedia.org/wiki/Here_document

    1. Grmbls. I only now saw the hackaday engine gobbled the bash sytntax in my comment. Turns out it tries to interpret “less-than somethingsomething more-than” as some sort of markup, and won’t show the text at all if the interpretation fails. I’ve tried again with html syntax, but the hackaday engine responds with “invalid security token”. Huh.

      Is there documentation on the hackaday comment engine’s syntax somewhere? I really feel kinda stupid but I didn’t find anything.

      I’ll dumb it down so the engine won’t gobble it. Please use brain-sed to replace less-than and greater-than with the correct syntax…

      cat less-thanless-thanEOF greater-than file_with_stuff
      Stuff goes here
      More stuff
      EOF
      usestuff.sh file_with_stuff

  6. A) Using a program inside a script… what, you can only use shell builtins? What do you think a shell script is? Sed is turing complete as well, so pretty much using it is also “writing a program”… other than a worse syntax than awk and being harder to read, who cares which you use? B) I’ve been using awk inside scripts for over 30 years now with no ill effects. There are also zero awks that would die on the simple usage above. Did someone dear to you die from awk-poisoning or something? (Just noticed shar was still installed on the mac by default, I haven’t used that in a long time….)

  7. This post reminds me of my time spent testing the blackberry playbook. I needed to deliver a tarball payload to the devices in the lab … But they didn’t have gunzip, or the gzip libs installed, so I wrote a script with embedded gunzip and untar binaries (statically linked), and the tarball artifact containing all the tests to extract and run the suites on the devices. I also made a script to weave together the script body and the latest artifacts from the build so we could make a Hudson job for it. It wasn’t a very sophisticated solution, but it was effective in getting the tests onto the targets, running the test suites, and reporting the results back.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.