Linux Fu: Bash Strings

If you are a traditional programmer, using bash for scripting may seem limiting sometimes, but for certain tasks, bash can be very productive. It turns out, some of the limits of bash are really limits of older shells and people code to that to be compatible. Still other perceived issues are because some of the advanced functions in bash are arcane or confusing.

Strings are a good example. You don’t think of bash as a string manipulation language, but it has many powerful ways to handle strings. In fact, it may have too many ways, since the functionality winds up in more than one place. Of course, you can also call out to programs, and sometimes it is just easier to make a call to an awk or Python script to do the heavy lifting.

But let’s stick with bash-isms for handling strings. Obviously, you can put a string in an environment variable and pull it back out. I am going to assume you know how string interpolation and quoting works. In other words, this should make sense:

echo "Your path is $PATH and the current directory is ${PWD}"

The Long and the Short

Suppose you want to know the length of a string. That’s a pretty basic string operation. In bash, you can write ${#var} to find the length of $var:


#/bin/bash
echo -n "Project Name? "
read PNAME
if (( ${#PNAME} > 16 ))
then
   echo Error: Project name longer than 16 characters
else
   echo ${PNAME} it is!
fi

The “((” forms an arithmetic context which is why you can get away with an unquoted greater-than sign here. If you don’t mind using expr — which is an external program — there are at least two more ways to get there:


echo ${#STR}
expr length "${STR}"
expr match "${STR}" '.*'

Of course, if you allow yourself to call outside of bash, you could use awk or anything else to do this, too, but we’ll stick with expr as it is relatively lightweight.

Swiss Army Knife

In fact, expr can do a lot of string manipulations in addition to length and match. You can pull a substring from a string using substr. It is often handy to use index to find a particular character in the string first. The expr program uses 1 as the first character of the string. So, for example:


#/bin/bash
echo -n "Full path? "
read FFN
LAST_SLASH=0
SLASH=$( expr index "$FFN" / ) # find first slash
while (( $SLASH != 0 ))
do
   let LAST_SLASH=$LAST_SLASH+$SLASH  # point at next slash
   SLASH=$(expr index "${FFN:$LAST_SLASH}" / )  # look for another
done
# now LAST_SLASH points to last slash
echo -n "Directory: "
expr substr "$FFN" 1 $LAST_SLASH
echo -or-
echo ${FFN:0:$LAST_SLASH}
# Yes, I know about dirname but this is an example

Enter a full path (like /foo/bar/hackaday) and the script will find the last slash and print the name up to and including the last slash using two different methods. This script makes use of expr but also uses the syntax for bash‘s built in substring extraction which starts at index zero. For example, if the variable FOO contains “Hackaday”:

  • ${FOO} -> Hackaday
  • ${FOO:1} -> ackaday
  • ${FOO:5:3} -> day

The first number is an offset and the second is a length if it is positive. You can also make either of the numbers negative, although you need a space after the colon if the offset is negative. The last character of the string is at index -1, for example. A negative length is shorthand for an absolute position from the end of the string. So:

  • ${FOO: -3} -> day
  • ${FOO:1:-4} -> ack
  • ${FOO: -8:-4} -> Hack

Of course, either or both numbers could be variables, as you can see in the example.

Less is More

Sometimes you don’t want to find something, you just want to get rid of it. bash has lots of ways to remove substrings using fixed strings or glob-based pattern matching. There are four variations. One pair of deletions remove the longest and shortest possible substrings from the front of the string and the other pair does the same thing from the back of the string. Consider this:


TSTR=my.first.file.txt
echo ${TSTR%.*} # prints my.first.file
echo ${TSTR%%.*}  # prints my
echo ${TSTR#*fi}  # prints rst.file.txt
echo $TSTR##*fi} # prints le.txt

Transformation

Of course, sometimes you don’t want to delete, as much as you want to replace some string with another string. You can use a single slash to replace the first instance of a search string or two slashes to replace globally. You can also fail to provide a replacement string and you’ll get another way to delete parts of strings. One other trick is to add a # or % to anchor the match to the start or end of the string, just like with a deletion.


TSTR=my.first.file.txt
echo ${TSTR/fi/Fi}   # my.First.file.txt
echo ${TSTR//fi/Fi}  # my.First.File.txt
echo ${TSTR/#*./PREFIX-} # PREFIX-txt  (note: always longest match)
echo ${TSTR/%.*/.backup}  # my.backup (note: always longest match)

Miscellaneous

Some of the more common ways to manipulate strings in bash have to do with dealing with parameters. Suppose you have a script that expects a variable called OTERM to be set but you want to be sure:


REALTERM=${OTERM:-vt100}

Now REALTERM will have the value of OTERM or the string “vt100” if there was nothing in OTERM. Sometimes you want to set OTERM itself so while you could assign to OTERM instead of REALTERM, there is an easier way. Use := instead of the :- sequence. If you do that, you don’t necessarily need an assignment at all, although you can use one if you like:


echo ${OTERM:=vt100}  # now OTERM is vt100 if it was empty before

You can also reverse the sense so that you replace the value only if the main value is not empty, although that’s not as generally useful:


echo ${DEBUG:+"Debug mode is ON"}  # reverse -; no assignment

A more drastic measure lets you print an error message to stderr and abort a non-interactive shell:


REALTERM=${OTERM:?"Error. Please set OTERM before calling this script"}

Just in Case

Converting things to upper or lower case is fairly simple. You can provide a glob pattern that matches a single character. If you omit it, it is the same as ?, which matches any character. You can elect to change all the matching characters or just attempt to match the first character. Here are the obligatory examples:


NAME="joe Hackaday"

echo ${NAME^} # prints Joe Hackaday (first match of any character)
echo ${NAME^^} # prints JOE HACKADAY (all of any character)
echo ${NAME^^[a]} # prints joe HAckAdAy (all a characters)
echo ${NAME,,] # prints joe hackaday (all characters)
echo ${NAME,] # prints joe Hackaday (first character matched and didn't convert)
NAME="Joe Hackaday"
echo ${NAME,,[A-H]} # prints Joe hackaday (apply pattern to all characters and convert A-H to lowercase)

Recent versions of bash can also convert upper and lower case using ${VAR@U} and ${VAR@L} along with just the first character using @u and @l, but your mileage may vary.

Pass the Test

You probably realize that when you do a standard test, that actually calls a program:


if [ $f -eq 0 ]
then ...

If you do an ls on /usr/bin, you’ll see an executable actually named “[” used as a shorthand for the test program. However, bash has its own test in the form of two brackets:


if [[ $f == 0 ]]
then ...

That test built-in can handle regular expressions using =~ so that’s another option for matching strings:


if [[ "$NAME" =~ [hH]a.k ]] ...

Choose Wisely

Of course, if you are doing a slew of text processing, maybe you don’t need to be using bash. Even if you are, don’t forget you can always leverage other programs like tr, awk, sed, and many others to do things like this. Sure, performance won’t be as good — probably — but if you are worried about performance why are you writing a script?

Unless you just swear off scripting altogether, it is nice to have some of these tricks in your back pocket. Use them wisely.

40 thoughts on “Linux Fu: Bash Strings

  1. Bash sucks as a programming language. Only donkeys use bash as a general purpose programming language.

    Why waste time and space even talking about it?

    Whenever I come across a multi-hundred line bash script some nutball wrote, I want to hunt them down and punish them.

    1. This is Hackaday. Using something for programming, that isn’t a programming language, is exactly in the spirit of what we do! See also: “anything is possible with enough 555s” and “flagrant abuse of the C pre-processor”.

    2. Long time ago (early 90s) I was the Unix systems guy for a little company, which went out of business.
      Another ex-employee and I made a living for a while as freelance consultants/developers to the now-unsupported user base, him doing the business and customer training aspects and me doing the software/hardware/tech/comms.

      Since we didn’t own the applications IP (compiled COBOL) and the systems were plain vanilla SysVr2 and Xenix with no internet connections, I wrote lots of shellscripts as filters to take the outputs of A/R, G/L and Reports and
      generate new data and reports from them. Management loved it. Everything worked real great except on the odd occasions when a single parenthesis etc. was misplaced in a thousand-line shellscript.
      Really wish I had colour-coded editing back then! I did everything in vi and kermit on my Compaq LTE laptop as a serial terminal.

    3. Long time ago (early 90s) I was the Unix systems guy for a little company, which went out of business.
      Afterwards, another ex-employee and I made a living for a while as freelance consultants/developers to the now-unsupported user base, he did the business and customer training aspects and I did the software/hardware/tech/comms.

      Since we didn’t own the applications IP (they were compiled COBOL) and the systems were plain vanilla SysVr2 and Xenix with no internet connections, I wrote lots of shellscripts and sed/awk scripts as filters to take the outputs of A/R, G/L and Reports and generate new data and printouts from them. Management loved it. Everything worked real great except on the odd occasions when a single parenthesis etc. was misplaced in a thousand-line shellscript.
      Really wish I had colour-coded editing back then, I mainly used vi and kermit on my Compaq LTE laptop as a serial terminal.

  2. Great article!

    Your dirname example could have been much simpler using the following non-greedy trailing match substitution:
    echo -n ${FFN%/[!/]*}/

    Perhaps as important than the code elegance, it avoids at 2 instantiations of expr, and their associated context switches. If you needed to parse a large list of filenames this becomes critical:

    ~]$ cat test.sh
    #/bin/bash
    while read -r FFN;
    do
    LAST_SLASH=0
    SLASH=$( expr index “$FFN” / ) # find first slash
    while (( $SLASH != 0 ))
    do
    let LAST_SLASH=$LAST_SLASH+$SLASH # point at next slash
    SLASH=$(expr index “${FFN:$LAST_SLASH}” / ) # look for another
    done
    echo ${FFN:0:$LAST_SLASH}
    done dir.list

    ~]$ cat test2.sh
    #!/bin/bash
    while read -r FFN;
    do
    echo ${FFN%/[!/]*}/
    done dir2.list

    ~]$ find /lib/ -t file | head -n 10000 > file.list
    ~]$ wc -l file.list
    10000 file.list

    ~]$ time test.sh
    real 5m52.066s
    user 1m10.284s
    sys 4m14.552s

    ~]$ time ./test2.sh
    real 0m0.216s
    user 0m0.140s
    sys 0m0.075s

    The time savings are a factor of over 10^3….

    Note though that the two scripts are not exactly the same functionally. test2.sh needs to remove a “/” and one or more characters that are not a “/” to work, so it will mangle a root level file or directory such as /foo.txt or /lib into /. But that limitation is avoided if you know that you have a list of pathed files. It would probably be easier to use sed or perl to handle those cases than fight with bash’s substitution…

    1. HaD comment scrubbing strikes again. I really wish we had a decent mark-up language here. Both scripts are reading from file.list and outputting into dir.list using the typical bash while read;do;done file redirection structure, but HaD dumped the redirections…

          1. Dunno if this site has some kind of mark-up-language implementation, but in HTML: Less-than-sign, the text “pre” (three letters, no quotes), greater-than-sign. End with the same tag, but with a forward-slash after the less-than-sign.

            If it does use mark-up, perhaps the same but with square brackets in stead of comparison operators.

            HTH!

        1. OK, let’s see if this works:

          #!/bin/bash
          while read -r FFN
          do
          LAST_SLASH=0
          SLASH=$( expr index “$FFN” / ) # find first slash
          while (( $SLASH != 0 ))
          do
          let LAST_SLASH=$LAST_SLASH+$SLASH # point at next slash
          SLASH=$(expr index “${FFN:$LAST_SLASH}” / ) # look for another
          done
          echo ${FFN:0:$LAST_SLASH}
          done dir.list

          #!/bin/bash
          while read -r FFN;
          do
          echo ${FFN%/[!/]*}/
          done dir2.list

  3. I wonder if the comment scrubbing also created the syntax error above:

    echo $TSTR##*fi}
    my.first.file.txt##*fi}
    echo ${TSTR##*fi}
    le.txt

    For me, bash is another shell I may have to understand if maintaining someone else’s work, but my brain got wired for csh many years ago, and there ain’t more room for another shell.

    P.S. yes, it’s also a programming language, albeit interpreted.

  4. A serious question for the author and a hacky example (for fun).

    1. In the discussion of “test”, is there a square bracket missing at the end of line 1? It should be a double-bracket, yes? As in —
    if [[ $f == 0 ]]

    2. A crazy idea for a demo script –
    #!/bin/bash
    read SANE
    while (( ${#SANE} > 0 }}
    do
    echo ${SANE^^[aeiouthsz]}
    read SANE
    done
    exit 0

    I do hope this survives the submission process.

    1. Yes, I don’t know if I did a typo or WordPress ate my square brackets (we fight WordPress to get code in posts all the time. So I may have had an HTML entity that I overagressively deleted).

  5. Bash is a great scripting language for one primary reason: I can log into any Linux box and my entire IDE is at my fingertips as long as my shell is set to /bin/bash.

    Oh sure, it’s not python. Who cares? I have no dependencies to worry about and if I stick to the basis of sed, grep, awk, tr, expr, and the like, I don’t have to worry about installing however many programs just so I can use one function.

    I’m one of those loonies who loves bash scripting because I can almost always bend it to my will. And when the tasks you need to automate are already in Linux, why go to the trouble of writing a python script that just does os calls to bash when I can just use bash to begin with.

    Bash is far more powerful and useful than the haters think.

    And hey, at least it’s not PERL! (runs for cover)

  6. I used to write shell in android environment. My one idea on a program to test file integrity with the md5sum applet and this would create a paradox
    Script like this:

    #!/system/bin/sh
    check=”$(busybox md5sum $0|busybox awk ‘{print $1}’)”
    echo “md5 of $0 is $check”

    if [ “$check” == “36e77f5fed92460796df627f6dd1d0ab” ] # this is paradox, because is writing this script where define as $0
    then
    echo “this message never show”
    else
    echo “md5 not equal”
    kill -9 $$
    fi

  7. Great article–I’ve been writing shell scripts for many years and a LOT of what is covered in this article I’ve been unawares of…So, dumb question: any good reference sites for BASH? I have tried to read the official GNU doc and it has a ton of obscura but I find it unparseable by my old brain.

  8. One of my favourite uses for BASH strings are to replace commands like dirname and basename. So can one use “${var%/*}” instead of $(dirname “$var”) and “${var##*/}” instead of $(basename “$var”). And the string replacement on variables i.e. “${var/old/new}” has also saved me countless calls to sed.

    Seeing all the idiotic replies … Word of advise! Do not use the Internet as your personal diary. Your suffering with the world is not anywhere as noteworthy as that of Anne Frank.

  9. This article has more broken code than I’ve seen in a long time.

    Unquoted variable expansion leads to broken scripts.

    For example: echo ${NAME^} will strip leading and trailing whitespace from $NAME and will reduce multiple consecutive whitespace characters with a single space. It should be: echo “echo ${NAME^}”

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.