Handling Bash script with CRLF (carriage return) in Linux as in MSYS2?
Solution 1
As far as I’m aware, there’s no way to tell Bash to accept Windows-style line endings.
In situations involving Windows, common practice is to rely on Git’s ability to automatically convert line-endings when committing, using the autocrlf
configuration flag. See for example GitHub’s documentation on line endings, which isn’t specific to GitHub. That way files are committed with Unix-style line endings in the repository, and converted as appropriate for each client platform.
(The opposite problem isn’t an issue: MSYS2 works fine with Unix-style line endings, on Windows.)
Solution 2
You should use binfmt_misc for that [1].
First, define a magic that handles files which start with #! /bin/bash<CR><LF>
, then create an executable interpreter for it. The interpreter can be another script:
INTERP=/path/to/bash-crlf
echo ",bash-crlf,M,,#! /bin/bash\x0d\x0a,,$INTERP," > /proc/sys/fs/binfmt_misc/register
cat > "$INTERP" <<'EOT'; chmod 755 "$INTERP"
#! /bin/bash
script=$1; shift; exec bash <(sed 's/\r$//' "$script") "$@"
EOT
Test it:
$ printf '%s\r\n' '#! /bin/bash' pwd >/tmp/foo; chmod 755 /tmp/foo
$ cat -v /tmp/foo
#! /bin/bash^M
pwd^M
$ /tmp/foo
/tmp
The sample interpreter has two problems: 1. since it passes the script via a non-seekable file (a pipe), bash will read it byte by byte, very inefficiently, and 2. any error messages will refer to /dev/fd/63
or similar instead of the name of the original script.
[1] Of course, instead of using binfmt_misc you can just create a /bin/bash^M
symbolic link to the interpreter, which would also work on other systems like OpenBSD:
ln -s /path/to/bash-crlf $'/bin/bash\r'
But on Linux, shebanged executables have no advantage over binfmt_misc, and putting garbage inside system directories is not the right strategy, and will leave any sysadmin shaking his or her head ;-)
Related videos on Youtube
sdaau
Updated on September 18, 2022Comments
-
sdaau over 1 year
Let's say I have the following trivial script,
tmp.sh
:echo "testing" stat . echo "testing again"
Trivial as it is, it has
\r\n
(that is, CRLF, that is carriage return+line feed) as line endings. Since the webpage will not preserve the line endings, here is a hexdump:$ hexdump -C tmp.sh 00000000 65 63 68 6f 20 22 74 65 73 74 69 6e 67 22 0d 0a |echo "testing"..| 00000010 73 74 61 74 20 2e 0d 0a 65 63 68 6f 20 22 74 65 |stat ...echo "te| 00000020 73 74 69 6e 67 20 61 67 61 69 6e 22 0d 0a |sting again"..| 0000002e
Now, it has CRLF line endings, because the script was started and developed on Windows, under MSYS2. So, when I run it on Windows 10 in MSYS2, I get the expected:
$ bash tmp.sh testing File: . Size: 0 Blocks: 40 IO Block: 65536 directory Device: 8e8b98b6h/2391513270d Inode: 281474976761067 Links: 1 Access: (0755/drwxr-xr-x) Uid: (197609/ USER) Gid: (197121/ None) Access: 2020-04-03 10:42:53.210292000 +0200 Modify: 2020-04-03 10:42:53.210292000 +0200 Change: 2020-04-03 10:42:53.210292000 +0200 Birth: 2019-02-07 13:22:11.496069300 +0100 testing again
However, if I copy this script to an Ubuntu 18.04 machine, and run it there, I get something else:
$ bash tmp.sh testing stat: cannot stat '.'$'\r': No such file or directory testing again
In other scripts with the same line endings, I have also gotten this error in Ubuntu bash:
line 6: $'\r': command not found
... likely from an empty line.
So, clearly, something in Ubuntu chokes on the carriage returns. I have seen BASH and Carriage Return Behavior :
it doesn’t have anything to do with Bash: \r and \n are interpreted by the terminal, not by Bash
... however, I guess that is only for stuff typed verbatim on the command line; here the
\r
and\n
are already typed in the script itself, so it must be that Bash interprets the\r
here.Here is the version of Bash in Ubuntu:
$ bash --version GNU bash, version 4.4.20(1)-release (x86_64-pc-linux-gnu)
... and here the version of Bash in MSYS2:
$ bash --version GNU bash, version 4.4.23(2)-release (x86_64-pc-msys)
(they don't seem all that much apart ...)
Anyways, my question is - is there a way to persuade Bash on Ubuntu/Linux to ignore the
\r
, rather than trying to interpret it as a (so to speak) "printable character" (in this case, meaning a character that could be a part of a valid command, which bash interprets as such)? EDIT: without having to convert the script itself (so it remains the same, with CRLF line endings, if it is checked in that way, say, in git)EDIT2: I would prefer it this way, because other people I work with might reopen the script in Windows text editor, potentially reintroduce
\r\n
again into the script and commit it; and then we might end up with an endless stream of commits which might be nothing else than conversions of\r\n
to\n
polluting the repository.EDIT2: @Kusalananda in comments mentioned
dos2unix
(sudo apt install dos2unix
); note that just writing this:$ dos2unix tmp.sh dos2unix: converting file tmp.sh to Unix format...
... will convert the file in-place; to have it output to stdout, one must setup stdin redirection:
$ dos2unix <tmp.sh | hexdump -C 00000000 65 63 68 6f 20 22 74 65 73 74 69 6e 67 22 0a 73 |echo "testing".s| 00000010 74 61 74 20 2e 0a 65 63 68 6f 20 22 74 65 73 74 |tat ..echo "test| 00000020 69 6e 67 20 61 67 61 69 6e 22 0a |ing again".| 0000002b
... and then, in principle, one could run this on Ubuntu, which seems to work in this case:
$ dos2unix <tmp.sh | bash testing File: . Size: 20480 Blocks: 40 IO Block: 4096 directory Device: 816h/2070d Inode: 1572865 Links: 27 Access: (1777/drwxrwxrwt) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2020-04-03 11:11:00.309160050 +0200 Modify: 2020-04-03 11:10:58.349139481 +0200 Change: 2020-04-03 11:10:58.349139481 +0200 Birth: - testing again
However, - aside from the slightly messy command to remember - this also changes bash semantics, as stdin is no longer a terminal; this may have worked with this trivial example, but see e.g. https://stackoverflow.com/questions/23257247/pipe-a-script-into-bash for example of bigger problems.
-
Kusalananda about 4 yearsYes, convert the file to a Unix text file with
dos2unix
. -
sdaau about 4 yearsThanks @Kusalananda - that just reminded me to add an edit, because I specifically do not want to change the file itself, nor its CRLF line endings.
-
sdaau about 4 yearsThanks @StephenKitt - I'm aware that MSYS2 will handle usual
\n
, but the problem is if I work on a repository with Windows people who otherwise don't care, the repository will end up being polluted with commits that are a constant change of line endings, which I want to avoid (added edits to OP). -
sdaau about 4 yearsAlso, for those wondering how can MSYS2 bash handle both
\n
and\r\n
as line endings, it turns out, it is not trivial at all - see 0005-bash-4.3-msys2-fix-lineendings.patch for all the gory details. -
C. M. almost 3 yearsAs a side note, you do not need to convert the file to Unix-style line endings on disk: You can feed the script to bash via a filter (shell redirect and/or pipeline). You can also create a temporary file to convert and send to bash, leaving the original untouched. Finally, note that some poorly designed web servers will also convert text files "in transit" (although they are not supposed to), so you may have to deal with that even if the original file is already in LF-only style line breaks...
-
Eugenio Miró over 2 yearsThis question and @Kusalananda's answer helped me a lot when using a bash script in a repo residing in a Windows drive from WSL. Thanks!
-
-
mosvy about 4 yearsYou're overcomplicating yourself IMHO. Using
binfmt_misc
is simpler, see the example. -
sdaau about 4 yearsThanks @mosvy - had never heard about
binfmt_misc
before, good to know! -
sdaau about 4 yearsThanks @mosvy, that looks pretty neat!
-
icarus almost 3 yearsThis doesn't handle lines ending with `\` for line continuation.
-
C. M. almost 3 yearsThis is a bad idea. Lots of things can break this, and this can break a lot of other things (such as @icarus noted) . The solution(s) of converting line endings properly is the best way.