Why do newline characters get lost when using command substitution?

102,687

Solution 1

The newlines were lost, because the shell had performed field splitting after command substitution.

In POSIX Command Substitution section:

The shell shall expand the command substitution by executing command in a subshell environment (see Shell Execution Environment) and replacing the command substitution (the text of command plus the enclosing "$()" or backquotes) with the standard output of the command, removing sequences of one or more characters at the end of the substitution. Embedded characters before the end of the output shall not be removed; however, they may be treated as field delimiters and eliminated during field splitting, depending on the value of IFS and quoting that is in effect. If the output contains any null bytes, the behavior is unspecified.

Default IFS value (at least in bash):

$ printf '%q\n' "$IFS"
$' \t\n'

In your case, you don't set IFS or using double quotes, so newlines character will be eliminated during field splitting.

You can preserve newlines, example by settingIFSto empty:

$ IFS=
$ a=$(cat links.txt)
$ echo "$a"
link1
link2
link3

Solution 2

Newlines get swapped out at some points because they are special characters. In order to keep them, you need to make sure they're always interpreted, by using quotes:

$ a="$(cat links.txt)"
$ echo "$a"
link1
link2
link3

Now, since I used quotes whenever I was manipulating the data, the newline characters (\n) always got interpreted by the shell, and therefore remained. If you forget to use them at some point, these special characters will be lost.

The very same behaviour will occur if you use your loop on lines containing spaces. For instance, given the following file...

mypath1/file with spaces.txt
mypath2/filewithoutspaces.txt

The output will depend on whether or not you use quotes:

$ for i in $(cat links.txt); do echo $i; done
mypath1/file
with
spaces.txt
mypath2/filewithoutspaces.txt

$ for i in "$(cat links.txt)"; do echo "$i"; done
mypath1/file with spaces.txt
mypath2/filewithoutspaces.txt

Now, if you don't want to use quotes, there is a special shell variable which can be used to change the shell field separator (IFS). If you set this separator to the newline character, you will get rid of most problems.

$ IFS=$'\n'; for i in $(cat links.txt); do echo $i; done
mypath1/file with spaces.txt
mypath2/filewithoutspaces.txt

For the sake of completeness, here is another example, which does not rely on command output substitution. After some time, I found out that this method was considered more reliable by most users due to the very behaviour of the read utility.

$ cat links.txt | while read i; do echo $i; done

Here is an excerpt from read's man page:

The read utility shall read a single line from standard input.

Since read gets its input line by line, you're sure it won't break whenever a space shows up. Just pass it the output of cat through a pipe, and it'll iterate over your lines just fine.

Edit: I can see from other answers and comments that people are quite reluctant when it comes to the use of cat. As jasonwryan said in his comment, a more proper way to read a file in shell is to use stream redirection (<), as you can see in val0x00ff's answer here. However, since the question isn't "how to read/process a file in shell programming", my answer focuses more on the quotes behaviour, and not the rest.

Solution 3

To add my emphasis, for loops iterate over words. If your file is:

one two
three four

Then this will emit four lines:

for word in $(cat file); do echo "$word"; done

To iterate over the lines of a file, do this:

while IFS= read -r line; do
    # do something with "$line" <-- quoted almost always
done < file

Solution 4

You can use read from bash. Also look for the mapfile

while read -r link
  do
   printf '%s\n' "$link"
  done < links.txt

Or using mapfile

mapfile -t myarray < links.txt
for link in "${myarray[@]}"; do printf '%s\n' "$link"; done
Share:
102,687
user3138373
Author by

user3138373

Updated on September 18, 2022

Comments

  • user3138373
    user3138373 almost 2 years

    I have a text file named links.txt which looks like this

    link1
    link2
    link3
    

    I want to loop through this file line by line and perform an operation on every line. I know I can do this using while loop but since I am learning, I thought to use a for loop. I actually used command substitution like this

    a=$(cat links.txt)
    

    Then used the loop like this

    for i in $a; do ###something###;done
    

    Also I can do something like this

    for i in $(cat links.txt); do ###something###; done
    

    Now my question is when I substituted the cat command output in a variable a, the new line characters between link1 link2 and link3 are removed and is replaced by spaces

    echo $a
    

    outputs

    link1 link2 link3

    and then I used the for loop. Is it always that a new line is replaced by space when we do a command substitution??

    Regards

    • jasonwryan
      jasonwryan over 9 years
      See Bash FAQ 001...
    • Angel Todorov
      Angel Todorov over 9 years
      Unquoted variables are subject to word splitting and filename expansion
    • Trevor Boyd Smith
      Trevor Boyd Smith almost 8 years
    • G-Man Says 'Reinstate Monica'
      G-Man Says 'Reinstate Monica' about 7 years
      If you look closely, you'll see that this question is not a duplicate.  This question is about the newlines between the lines of output from the command (i.e., at the ends of lines 1 through n −1).  That question, as its title suggests, is about the newline at the end of the output from the command (i.e., at the end of the last line).
    • sancho.s ReinstateMonicaCellio
      sancho.s ReinstateMonicaCellio almost 6 years
  • user3138373
    user3138373 over 9 years
    Also let's say I am not using quotes, then when I am applying the for loop, is it implicit that variable i will hold the value as the first file until it reaches a space which tells it that first file ends??
  • cuonglm
    cuonglm over 9 years
    I think The newlines are replaced with spaces because that's how echo works seems to be wrong.
  • mikeserv
    mikeserv over 9 years
    @cuonglm - it could be clearer, but the \newlines are replaced with field delimeters, and echo replaces the field delimiters with spaces - it concatenates its arguments on spaces. That's how echo works.
  • mikeserv
    mikeserv over 9 years
    for loops iterate over arguments. If you do IFS=\n; for word in cat file; do echo "$word"; done you'll get two loops and two lines printed. $IFS applies globally all of the time in much the same way as it does to read - except that the read/\newline relationship is pretty special.
  • Marek Zakrzewski
    Marek Zakrzewski over 9 years
    With all do respect to John WH Smith, I'm not sure who is upvoting the answer. for i in $(cat ..) is wrong. See the comment of jasonwryan. That is the way how you read lines from a file. cat(1) is used to concatenate multiple files together. It should NOT be used to feed file data to processes. There are far better ways to achieve this. The application might take a file as argument (eg. grep ^foo file); or you might want to use file redirection (eg. read line < file).
  • mikeserv
    mikeserv over 9 years
    @val0x00ff - it's not wrong because you say it is, certainly. what is wrong about it?
  • yorkshiredev
    yorkshiredev over 9 years
    @val0x00ff I used cat because that is what the OP was using in his question ;) The question isn't really about "how to read a file", but "why are newlines lost". As far as I'm concerned, I would always use read, which is why I edited my answer afterwards to add this solution. I understand that cat shouldn't be used to read a single file, but since it isn't the main topic, I didn't spend too much time on it.
  • Marek Zakrzewski
    Marek Zakrzewski over 9 years
    I upvoted the second explanation while IFS because it yet shows another way of feeding lines from a file. Again. @mikeserv about for word in $(cat file) is wrong and should not be used in bash scripts or any other form. Let me emphasise once again: Never do this: for x in $(command) or command or $var. for-in is used for iterating arguments, not (output) strings. Instead, use a glob (eg. *.txt), arrays (eg. "${names[@]}") or a while-read loop (eg. while read -r line). See mywiki.wooledge.org/BashPitfalls#pf1 and mywiki.wooledge.org/DontReadLinesWithFor
  • mikeserv
    mikeserv over 9 years
    @val0x00ff - I don't think you understand - $IFS is about arguments. Specifically, $IFS splits fields into arguments - that's its job. There are potential problems with that approach - but they are handled as easily as set -f; IFS=$delimiter - that's all you need do. For example, you could do the very slow while read -r line thing or you could do set -f; IFS=\n; set -- $(cat file). If you did that you'd get an array of the file's non-blank lines each in tact in $1 $2 $3... "$@". The wooledge wiki is typically an awful source of information - you should try to wean off of it.
  • geirha
    geirha over 9 years
    @mikeserv - it treats data as code, which is generally considered wrong in any language. Or, any language except bash, apparently.
  • mikeserv
    mikeserv over 9 years
    @geirha - this is not a true statement at all. It delimits fields on specified delimiters. If it is such an unpopular behavior, how is it awk is so ubiquitous? From the POSIX rationale: If the IFS variable is unset or is <space> <tab> <newline>, the operation is equivalent to the way the System V shell splits words. Using characters outside the \s \n \t set yields the KornShell behavior, where each of the non- \s \n \t is significant. This behavior .. was taken from the way the original awk handled field splitting.
  • geirha
    geirha over 9 years
    @mikeserv - "Take the data in this file and split it into words based on the characters in IFS, then for each of those words that happen to contain glob characters, attempt to replace those words with matching filenames". That certainly doesn't sound like treating data as data.
  • mikeserv
    mikeserv over 9 years
    Yes - @geirha - globbing is a problem. That is a very excellent point. This is why the shell offers the set -f option. You can either expand filenames with set +f or not do with set -f. I specifically address that in my own answer here. And, as far as I can tell, it's the only one here that mentions it.
  • Angel Todorov
    Angel Todorov over 9 years
    To properly set IFS to a newline, use ANSI-C quoting: IFS=$'\n' -- this (IFS=\n) sets IFS to the letter "n"
  • sancho.s ReinstateMonicaCellio
    sancho.s ReinstateMonicaCellio almost 6 years
  • Mike Q
    Mike Q over 5 years
    The IFS=$'\n'; is needed for looping properly the quotes will not do line by line properly without it (BASH4+)
  • Oly Dungey
    Oly Dungey almost 5 years
    Note: If you use printf instead of echo you avoid the IFS issue entirely
  • cuonglm
    cuonglm almost 5 years
    @OliverDungey it's not about echo or printf, it's about double quote"$a". the original question is using for loop, that's when field splitting occurs after command substitution.