Handling Bash script with CRLF (carriage return) in Linux as in MSYS2?

8,896

Solution 1

As far as I’m aware, there’s no way to tell Bash to accept Windows-style line endings.

In situations involving Windows, common practice is to rely on Git’s ability to automatically convert line-endings when committing, using the autocrlf configuration flag. See for example GitHub’s documentation on line endings, which isn’t specific to GitHub. That way files are committed with Unix-style line endings in the repository, and converted as appropriate for each client platform.

(The opposite problem isn’t an issue: MSYS2 works fine with Unix-style line endings, on Windows.)

Solution 2

You should use binfmt_misc for that [1].

First, define a magic that handles files which start with #! /bin/bash<CR><LF>, then create an executable interpreter for it. The interpreter can be another script:

INTERP=/path/to/bash-crlf

echo ",bash-crlf,M,,#! /bin/bash\x0d\x0a,,$INTERP," > /proc/sys/fs/binfmt_misc/register
cat > "$INTERP" <<'EOT'; chmod 755 "$INTERP"
#! /bin/bash
script=$1; shift; exec bash <(sed 's/\r$//' "$script") "$@"
EOT

Test it:

$ printf '%s\r\n' '#! /bin/bash' pwd >/tmp/foo; chmod 755 /tmp/foo
$ cat -v /tmp/foo
#! /bin/bash^M
pwd^M
$ /tmp/foo
/tmp

The sample interpreter has two problems: 1. since it passes the script via a non-seekable file (a pipe), bash will read it byte by byte, very inefficiently, and 2. any error messages will refer to /dev/fd/63 or similar instead of the name of the original script.

[1] Of course, instead of using binfmt_misc you can just create a /bin/bash^M symbolic link to the interpreter, which would also work on other systems like OpenBSD:

ln -s /path/to/bash-crlf $'/bin/bash\r'

But on Linux, shebanged executables have no advantage over binfmt_misc, and putting garbage inside system directories is not the right strategy, and will leave any sysadmin shaking his or her head ;-)

Share:
8,896

Related videos on Youtube

sdaau
Author by

sdaau

Updated on September 18, 2022

Comments

  • sdaau
    sdaau over 1 year

    Let's say I have the following trivial script, tmp.sh:

    echo "testing"
    stat .
    echo "testing again"
    

    Trivial as it is, it has \r\n (that is, CRLF, that is carriage return+line feed) as line endings. Since the webpage will not preserve the line endings, here is a hexdump:

    $ hexdump -C tmp.sh 
    00000000  65 63 68 6f 20 22 74 65  73 74 69 6e 67 22 0d 0a  |echo "testing"..|
    00000010  73 74 61 74 20 2e 0d 0a  65 63 68 6f 20 22 74 65  |stat ...echo "te|
    00000020  73 74 69 6e 67 20 61 67  61 69 6e 22 0d 0a        |sting again"..|
    0000002e
    

    Now, it has CRLF line endings, because the script was started and developed on Windows, under MSYS2. So, when I run it on Windows 10 in MSYS2, I get the expected:

    $ bash tmp.sh
    testing
      File: .
      Size: 0               Blocks: 40         IO Block: 65536  directory
    Device: 8e8b98b6h/2391513270d   Inode: 281474976761067  Links: 1
    Access: (0755/drwxr-xr-x)  Uid: (197609/      USER)   Gid: (197121/    None)
    Access: 2020-04-03 10:42:53.210292000 +0200
    Modify: 2020-04-03 10:42:53.210292000 +0200
    Change: 2020-04-03 10:42:53.210292000 +0200
     Birth: 2019-02-07 13:22:11.496069300 +0100
    testing again
    

    However, if I copy this script to an Ubuntu 18.04 machine, and run it there, I get something else:

    $ bash tmp.sh
    testing
    stat: cannot stat '.'$'\r': No such file or directory
    testing again
    

    In other scripts with the same line endings, I have also gotten this error in Ubuntu bash:

    line 6: $'\r': command not found
    

    ... likely from an empty line.

    So, clearly, something in Ubuntu chokes on the carriage returns. I have seen BASH and Carriage Return Behavior :

    it doesn’t have anything to do with Bash: \r and \n are interpreted by the terminal, not by Bash

    ... however, I guess that is only for stuff typed verbatim on the command line; here the \r and \n are already typed in the script itself, so it must be that Bash interprets the \r here.

    Here is the version of Bash in Ubuntu:

    $ bash --version
    GNU bash, version 4.4.20(1)-release (x86_64-pc-linux-gnu)
    

    ... and here the version of Bash in MSYS2:

    $ bash --version
    GNU bash, version 4.4.23(2)-release (x86_64-pc-msys)
    

    (they don't seem all that much apart ...)

    Anyways, my question is - is there a way to persuade Bash on Ubuntu/Linux to ignore the \r, rather than trying to interpret it as a (so to speak) "printable character" (in this case, meaning a character that could be a part of a valid command, which bash interprets as such)? EDIT: without having to convert the script itself (so it remains the same, with CRLF line endings, if it is checked in that way, say, in git)

    EDIT2: I would prefer it this way, because other people I work with might reopen the script in Windows text editor, potentially reintroduce \r\n again into the script and commit it; and then we might end up with an endless stream of commits which might be nothing else than conversions of \r\n to \n polluting the repository.

    EDIT2: @Kusalananda in comments mentioned dos2unix (sudo apt install dos2unix); note that just writing this:

    $ dos2unix tmp.sh 
    dos2unix: converting file tmp.sh to Unix format...
    

    ... will convert the file in-place; to have it output to stdout, one must setup stdin redirection:

    $ dos2unix <tmp.sh | hexdump -C
    00000000  65 63 68 6f 20 22 74 65  73 74 69 6e 67 22 0a 73  |echo "testing".s|
    00000010  74 61 74 20 2e 0a 65 63  68 6f 20 22 74 65 73 74  |tat ..echo "test|
    00000020  69 6e 67 20 61 67 61 69  6e 22 0a                 |ing again".|
    0000002b
    

    ... and then, in principle, one could run this on Ubuntu, which seems to work in this case:

    $ dos2unix <tmp.sh | bash
    testing
      File: .
      Size: 20480       Blocks: 40         IO Block: 4096   directory
    Device: 816h/2070d  Inode: 1572865     Links: 27
    Access: (1777/drwxrwxrwt)  Uid: (    0/    root)   Gid: (    0/    root)
    Access: 2020-04-03 11:11:00.309160050 +0200
    Modify: 2020-04-03 11:10:58.349139481 +0200
    Change: 2020-04-03 11:10:58.349139481 +0200
     Birth: -
    testing again
    

    However, - aside from the slightly messy command to remember - this also changes bash semantics, as stdin is no longer a terminal; this may have worked with this trivial example, but see e.g. https://stackoverflow.com/questions/23257247/pipe-a-script-into-bash for example of bigger problems.

    • Kusalananda
      Kusalananda about 4 years
      Yes, convert the file to a Unix text file with dos2unix.
    • sdaau
      sdaau about 4 years
      Thanks @Kusalananda - that just reminded me to add an edit, because I specifically do not want to change the file itself, nor its CRLF line endings.
    • sdaau
      sdaau about 4 years
      Thanks @StephenKitt - I'm aware that MSYS2 will handle usual \n, but the problem is if I work on a repository with Windows people who otherwise don't care, the repository will end up being polluted with commits that are a constant change of line endings, which I want to avoid (added edits to OP).
    • sdaau
      sdaau about 4 years
      Also, for those wondering how can MSYS2 bash handle both \n and \r\n as line endings, it turns out, it is not trivial at all - see 0005-bash-4.3-msys2-fix-lineendings.patch for all the gory details.
    • C. M.
      C. M. almost 3 years
      As a side note, you do not need to convert the file to Unix-style line endings on disk: You can feed the script to bash via a filter (shell redirect and/or pipeline). You can also create a temporary file to convert and send to bash, leaving the original untouched. Finally, note that some poorly designed web servers will also convert text files "in transit" (although they are not supposed to), so you may have to deal with that even if the original file is already in LF-only style line breaks...
    • Eugenio Miró
      Eugenio Miró over 2 years
      This question and @Kusalananda's answer helped me a lot when using a bash script in a repo residing in a Windows drive from WSL. Thanks!
  • mosvy
    mosvy about 4 years
    You're overcomplicating yourself IMHO. Using binfmt_misc is simpler, see the example.
  • sdaau
    sdaau about 4 years
    Thanks @mosvy - had never heard about binfmt_misc before, good to know!
  • sdaau
    sdaau about 4 years
    Thanks @mosvy, that looks pretty neat!
  • icarus
    icarus almost 3 years
    This doesn't handle lines ending with `\` for line continuation.
  • C. M.
    C. M. almost 3 years
    This is a bad idea. Lots of things can break this, and this can break a lot of other things (such as @icarus noted) . The solution(s) of converting line endings properly is the best way.