SSH Can Handle Spaces In Command-line Arguments Strangely

One of the things ssh can do is execute a command on a remote server. Most of us expect it to work transparently when doing so, simply passing the command and its arguments on without any surprises in the process. But after 23 years of using OpenSSH on a nearly daily basis, [Martin Kjellstrand] got surprised.

It turns out that the usual rules around how things are parsed can have some troublesome edge cases when spaces are involved. [Martin] kicks off an example in the following way:

One would reasonably expect the commands figlet foobar bar\ baz and ssh localhost figlet foobar bar\ baz to be functionally equivalent, right? The former ultimately runs the command “figlet” with arguments “foobar” and “bar baz” on the local machine. The second does the same, except with ssh being involved in the middle. As mentioned, one would expect both commands to be functionally identical, but that’s not what happens. What happens is that ssh turns bar\ baz into two distinctly separate command-line arguments in the process of sending it for remote execution: “bar” and “baz”. The result is mystification as the command fails to run the way the user expects, if it runs at all.

What exactly is going on, here? [Martin] goes into considerable detail tracking down this odd behavior and how it happens, but he’s unable to ultimately explain why ssh does things this way. He suspects that it is the result of some design decision taken long ago. Or perhaps a bug that has, over time, been promoted to entrenched quirk.

Do you have any insights or knowledge about this behavior? If so, [Martin] wants to hear about it and so do we, so don’t keep it to yourself! Let us know in the comments, below.

48 thoughts on “SSH Can Handle Spaces In Command-line Arguments Strangely

  1. My least favorite SSH quirk is when ssh’ing into a headless Raspberry Pi, I could never get my locale set up, no matter how much I worked with raspi_config and localegen and every search result on the whole damn internet telling me what I was doing wrong.

    Finally I found this: https://stackoverflow.com/questions/2499794/how-to-fix-a-locale-setting-warning-from-perl

    It turns out my local box was forwarding parts of its environment over the ssh session into the destination pi.

    Dammit, it’s called a locale, not a remotale!

    1. Well the idea with the locale is for your local machine to have a locale, and it would be perhaps nice when you connect to a remote machine for it to respect your locale and talk to you like a local-local not a remote-local. Like a Brit going to France and finding that some French can speak English.

      The way you would like it, not per-se wrong, is that a Brit goes to France with the intention of speaking and practising French, and the French therefore speak French.

      1. Yeah, I certainly see the use, it’s just annoying that it’s done in secret. Like if I logged in and there was a message like “imported 14 environment variables from ssh origin host”, I could just ignore that line until I started having trouble with my environment, then it might occur to me to look back at that message and look deeper.

        But the way it’s done, it frustrated me for yeeeears until, totally by happenstance, I ran across that page explaining it. There’s no way I would’ve found that ssh feature, because I had no reason to look in the ssh docs, because nothing ever suggested that my local environment was causing issues with the remote system’s environment. The entire purpose of ssh seems to be the precise opposite, I thought!

        It’s a blatant violation of the principle of least astonishment, which states that an _astonishingly good_ feature can be just as confusing as an astonishingly bad feature, unless it is explained or explicitly invoked by the user. By silently affecting the remote system, ssh does something astonishing.

        1. Well there’s secret and there’s secret. If it’s documented then that’s hardly secret. In fact even the first level of verbose (ssh -v) lists variables being set – but again you need to know what you’re looking for first, as the result of these LC variables can indeed be very mystic and obtuse.

          Take a (rather ancient) software package that stores data in a defined format, where the headers are essentially ASCII, and in the headers can exist numbers. The format defines a fractional number using a decimal, and not a comma as is common in Europe. Cue saving a file with a European locale such that a comma is used in the header, producing a broken header. Opened on a different machine with the same package and a different European locale there is no problem as the ‘error’ cancels out.

          Now take this file and open it with a program that correctly knows to ignore locales when writing and reading numbers in the header – it complains your file is bad – but you open the file and it’s good. Or the user says they can open the file on machine X but not on machine Y which should be pretty much identical to machine X.

          If you weren’t aware about locales you wouldn’t know where to start looking.

          1. This tells me that the error messages are insufficiently informative.

            Granted, some programs were never updated since before locales were a thing, so part of the blame goes on the libraries that were upgraded to use locales.

      2. I have to slightly disagree with you here.

        Implemented correctly, the idea with locale (when properly implemented) is for the user to have a locale that suits the user’s expectations such that the machine adapts to the locale each user is most comfortable with.

        There is, of course, a machine-wide default locale (often aliased to the “C” locale), but that should just be a default and with proper environment settings in their .profile/.bashrc/.login/.whatever_their_chosen_shell_uses, any given user should be able to get the machine to respond with any (installed) locale they wish.

        It’s not as easy as one might hope, in that the following challenges exist:
        + There are many LC_ variables in addition to the LANG, LANGUAGE, and PAPERSIZE that affect Locale settings.
        + The locale desired (and all the various subcomponents) must be installed on the machine (requires administrator capabilities to install)
        + There are many different startup files where these settings may be required to be set in order to fully support the user, depending on access method (local login vs. ssh vs. various X display managers vs. VNC vs. etc.) and particular flavor of any given access method (e.g. xdm vs. gdm vs. ???).

        Nonetheless, with persistence and a little care, one can usually achieve the desired locale behavior on a per-user basis.

        This post may help: https://unix.stackexchange.com/questions/36575/setting-locale-for-user

    1. That seems perfectly obvious.

      The backslash makes the next character a non-separator to the command processor, so “bar\ baz” is received by the ssh program argv[n] as the string “bar baz”.

      ssh doesn’t process the arguments or check for things, it just forks a shell on the remote system and passes the arguments it received. The remote shell sees “bar baz” and processes it as typed.

      Doubling the backslash sends an actual backshash in the command string, so “bar\\ baz” is received by the ssh program argv[n] as “bar\” and argv[n+1] as “baz”. This may be a problem on the receiving end because it assumes ssh will put a space between the tokens.

      Or maybe ssh gets the entire command line and chops the “ssh” part off and sends the rest of the line as text, but backslashes are still processed by the shell as it will for things like pipes and backslashes.

      To be correct, I think you need an escaped backslash and an escaped space. “bar\\\ baz” will be received by the ssh argv[n] as “bar\ baz” one argument, which is I think what the user intended.

      1. I didn’t see that you already posted an explanation before posting mine. We mean to say the same thing. And technically you are correct that in general you should escape the space too (which I simply forgot), because else the local instance of ssh receives the arguments “bar\” and “baz” separately.

        However, bar\\ baz does seem to work as intended. I assume that ssh just joins the received strings together, adding spaces in between. So the remote shell still receives the string “bar\ baz” to interpret, and the remote program (figlet) still receives the single argument string “bar baz”.

        1. Is it therefore an SSH thing, or a shell thing.

          e.g.

          ssh foo ls *

          would that end up becoming on the remote host ls or ls .

          Where does the wildcard get expanded, where should it get expanded?

          Even more confusing is a certain program where it takes a wildcard as a parameter, but if that wildcard happens to match something in the current directory, then the wildcard is expanded to become whatever matches in the current directory, and not then sent to the program for it to interpret the wildcard, whilst if nothing matches in the current directory then the program does get the wildcard to act on.

          1. Well, it’s a little of both and more importantly, some of it is the interaction between the three. (Shell on local system, SSH, and finally, shell on remote system).

            Here’s what happens…
            Without quoting, you get all the expected shell interpretation in the local shell before it is handed to ssh. Therefore, the command:
            ssh localhost figlet foobar bar\ baz
            Is passed to SSH with an ARGV array that contains: (“ssh”,”localhost”,”figlet”,”foobar”,”bar baz”)

            SSH then interprets “localhost” as the target system and takes what remains and does the equivalent of PERL function join(” “, @ARGV) on what’s left (i.e. it has effectively done a shift(@ARGV) twice before it passes @ARGV to join).

            This results in the remote shell being invoked as ‘ -c “figlet foobar bar baz”‘.

            Unfortunately, even with quoting, SSH can be a bit less than helpful. The safest thing is to get the right number of escapes in all the right places, but that can also be tricky. My usual approach to this, since SSH is going to consolidate everything into a single argument anyway is to use single quotes wherever possible:

            ssh localhost ‘figlet’ ‘foobar bar\ baz’

            (sometimes you can get away with keeping the command in the same argument, sometimes you can’t, so it’s safest to separate it.)

            Usually the above command will work. On rare occasions, you may still need the extra escapes, but usually they’re not only unnecessary, but will break things.

            The problem is that different shells on different systems have different escape and quoting shells and there is some variance among SSH implementations as well. To top it off, you’re actually dealing with four things and not just three as I noted above… Local shell, ssh client on the local host, ssh server on the remote host, and finally shell on the remote host. The good news is that (at least in my experience) most variants of ssh client and server pass things across and hand off to the remote shell in a fairly consistent manner, so that usually reduces the variability by one step.

            Hope that helps.

          2. Widcards normally get expanded by the shell.
            If you have 3 files: A, B, and C and you would execute “rm *”, it would be expanded and executed as “rm A B C”.

            In case of SSH, escaping the “*”-character or not will determine if it gets expanded locally (no escape) of remotely through SSH (escaped). This is not really SSH related and valid for all command that can execute something on a remote machine.

            Also: Expansion gives problems when there are a large amount of files,so there are ways available to ensure correct behaviour in all circumstances, such as working with pipes and the “xargs”-command.

          3. While everything J_B says is mostly true, you can’t always count on remote expansion happening the way you expect. Depends on your choice of shells on the remote system and whether the SSH server implementation hands the arguments to the shell on the remote system or directly fork/exec()s the “command”. (Most will use the shell, so usually you’ll get the expected result as J_B describes, but I have encountered cases where this isn’t guaranteed. Notably some versions of windows optional SSH server.)

    2. I’d go with triple escape – backslash to escape the backslash, plus backslash to escape the space, so that you end up with the same form one level down.

      Unfortunately, when you’re using escapes to poke through shell levels, you usually need to just create a suitable test phrase to experiment with.

      1. And … I was wrong. Which implies that there is special handling in there, because where:
        ssh host echo word\ other
        will come through to ssh with “word other” as a single parameter:
        ssh host echo word\\ other
        will come through to ssh with “word\” “other” as separate arguments. So at some point the stack is flattening the list again and then re-parsing, which is kind of scary for scripting purposes.

        I am kind of surprised that there is no ssh option for “Pass the arguments through as direct arguments to execv(3) or similar without additional shell processing.”

          1. I haven’t found the precise code, but triple-\ doesn’t help. As someone mentioned above, it feels like it works like running a join(” “, …) on the arguments before passing to the remote shell, so the double-\ leaves a \ at the end of the argument, but then it it joined back into a complete command-line before being parsed again. So while the triple-\ results in a two-word parameter where the first word ends in \, after the join it all just looks the same.

  2. Additionally:
    > What happens is that ssh turns bar\ baz into two distinctly separate command-line arguments in the process of sending it for remote execution: “bar” and “baz”.

    The local shell turns “bar\ baz” into “bar baz” before passing it as a single argument to the local instance of ssh. The remote ssh instance (sshd) then passes “bar baz” to the remote shell, which interprets (!) it and passes it as “bar” and “baz”, so two strings in *argv[], to the called remote program (here figlet).

    Fun fact, inside scripts there may be more layers of quoting and string interpretation, leading to a doubling of escape characters for each interpretation (2^n growth). So next level deep would be “bar\\\\ baz”. At least I assume this is what the article is about (tl;dr).

  3. It’s worth noting that the whole terminal/tty/console/shell/… has evolved over many decades and it’s insanely powerful. That power does not come free, as evident in this blog entry.

    I’m tremendously thankful for the fact that I can ssh all over the place and thunder away commands on the keyboard.

    1. It’s not a bug, nor even esoteric.

      First, your local shell expands arguments, which eats the backslash and combines two arguments into one. Then ssh passes all the arguments into the remote shell, separated by spaces. If your arguments are unquoted, the remote shell can’t tell the difference between an argument with a space and two arguments.

      So, you need to quote for the remote shell. The quickest way to do this without nesting backslashes is to quote the whole string.

      ssh hostname ‘figlet foobar bar\ baz’

      (note the single quotes prevent the local shell eating the backslash).

      The reason this is *good behavior* and not a bug is that you get all the intuitive shell expansion on the remote end without having to invoke “bash -c” or anything.

  4. It’s a protocol problem. Thn only sane solution would’ve been executing an array of arguments, but SSH only offers executing a string. The impedance mismatch is so horrible, that I would not be surprised if somebody proved that executing a fully arbitrary command over SSH is theoretically impossible.

    1. There are problems inherent in executing an array of arguments, too. Consider the common situation of “` ssh ‘rm -r foo*’ “` (enclosed in “`s and interior spaces for clarity, hopefully the HTMLifier won’t break this too badly).

      If this were passed as an array of arguments, the remote system would receive “` rm -r foo* “` as the single string to process. One would have to, instead, use the far less intuitive “`ssh ‘rm’ ‘-r’ ‘foo*’ “`. Combining wildcards and spaces and sorting all of that is now left as an exercise for the reader. Consider the implications both in the way most SSH implementations currently work _AND_ in the way Alex has proposed.

      For all of its faults, the current system actually works very well in the vast majority of circumstances and fails only in relatively advanced and corner cases.

  5. I’m surprised almost no-one is pointing out that this has very little to do with ssh(1) and also everything to do with sh(1), the shell. You’ll get much the same surprise prepending “sh ” to any arbitrary command.

  6. I had a monitoring script running for some time doing some non standard stuff, so to say.
    ssh host -c bash <<EOF
    some nasty \$variables and
    REMOTE_VAR=\\\$LOCAL_VAR command arguments
    EOF

    Terrible to figure it out but the end result was nice. Every host gave a gzip txt that was parsed locally.
    Not going to that hell again…

      1. yeah. i kind of groaned when someone was “surprised” by this behavior, but of course everyone comes from somewhere. but “[Martin] goes into considerable detail tracking down this odd behavior and how it happens, but he’s unable to ultimately explain why ssh does things this way” made me sad about hackaday.

        i run into nuissances with spaces in various contexts but i don’t see the value in an article about not getting to the bottom of a shallow problem

  7. Interesting, I thought this silly behavior was common knowledge.

    I ran into it over a decade ago, when I started running firefox in a remote container (which failed on me when opening some url with spaces)…

    … and solved it the laziest way imaginable: by base64-encoding all argv (and then unwrapping it on the remote end).

    The article prompted me to write it up in detail, in case someone finds it useful: https://wejn.org/2023/07/how-to-stop-ssh-mangling-spaces-in-command-line-args/

    1. That one is actually easy to explain:

      “`
      $ cat b.rb
      #!/usr/bin/env ruby
      p ARGV
      p STDIN.read

      $ seq 1 5 | while read n; do ssh localhost ./b.rb $n; done
      Pseudo-terminal will not be allocated because stdin is not a terminal.
      [“1”]
      “2\n3\n4\n5\n”
      “`

      If you don’t want ssh to mess with stdin, then `ssh -n` is your friend.

  8. This is a typical case of forgetting about shell escaping rules.
    It’s one of those things that will surely bite in the butt in the long run if you don’t pay enough attention.

    1. If you use double quotes like that, it still won’t work because the \ will get eaten by the local shell.
      If you use single quotes as in ssh user@host ‘ls foo\ bad’ you should have better luck.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.