Read a file and copy to another file

13,145

Solution 1

Quote your variable:

echo "$line" >> /tmp/dest_file

Solution 2

This has come up several times already on this site — see Understanding IFS and the linked questions. In this answer, I'm going to summarize what can go wrong and how to avoid it; see the linked threads for details.

read line performs the following actions:

  1. Read from standard input up to the first byte that is either a newline or null, and put the data in the variable called line.
  2. Strip off any backslash that is not at the end of the line. A double backslash \\ becomes a single backslash. In other words, backslash quotes the next character as long as it isn't a newline.
  3. If read stopped at a newline and the character at the end of the line is a \, strip the backslash-newline sequence and continue reading, appending to the variable line. Repeat until the first of: a newline that is not preceded by a backslash; a null byte; the end of the input.
  4. Strip the longest suffix of line that is made of characters in $IFS. By default, IFS contains a tab, a space and a newline, so this strips ASCII whitespace from the end of the value of line.
  5. Strip the longest prefix of line that is made of whitespace characters in $IFS.

For example, if the input is

 : hello\
world: :
wibble

then read line results in line containing : helloworld: : (no initial space) with the default value of IFS. If IFS has been changed to : (just a colon) then read line results in  : helloworld:  (with a space at the beginning and at the end). If IFS contains both : and a space then the result is : helloworld (no initial or trailing space).

To avoid the influence of IFS, set it to an empty value (note that this is different from unsetting it). You can set it only for the read command by writing IFS= read (see Why is `while IFS= read` used so often, instead of `IFS=; while read..`?).

To avoid backslash processing, pass the -r option to read.

Unless the shell is zsh, if there is a null byte in the input, then subsequent characters are lost. Shells are not designed to read binary data.

Thus the idiom for reading one line at a time is:

while IFS= read -r line; do
  … # process "$line"
end

When you use the variable line, make sure to always put double quotes around variable substitutions: "$line". Without double quotes, the shell first expands the value of the variable, then it breaks that value into separate words wherever it contains characters from IFS, and every word is interpreted as a wildcard pattern and replaces by the list of matching files (if there are no matching files, the pattern is left as is). So echo 'a* b*' | IFS= read -r line; echo $line expands to the list of files in the current directory beginning with a or b; to get the input unchanged, use echo 'a* b*' | IFS= read -r line; echo "$line".

Note also that the echo command sometimes modifies the string it prints. The exact way depends on the shell. Some shells process backslash escapes, and some shells recognize options. Using echo to output a string verbatim is only sure to work you know that the string does not contain any backslash and does not start with a dash (-). A reliable and portable way of printing a string as is is

printf '%s\n' "$line"

This prints a newline after the string, like echo. You can omit the newline by omitting \n in the command above.

Share:
13,145

Related videos on Youtube

BitsOfNix
Author by

BitsOfNix

Working mainly with Solaris and Linux. System administration!

Updated on September 18, 2022

Comments

  • BitsOfNix
    BitsOfNix over 1 year

    I'm having a issue, I need to copy the file contents and remove a couple of lines if they match the output from a previous command. But so far, I'm having an issue in maintaining the file lines exactly the same. I'm putting the simple part of the script as the if to omit the copy are not part of the problem as this happens with unaffected lines.

    For the example:

    In the original file I have the following

    Testing,      resuming text
    

    When run the script the fields become:

    Testing, resuming text
    

    I'm doing the following:

    #!/usr/bin/bash
    rm /tmp/dest_file
    while read line
    do
       echo $line >> /tmp/dest_file
    done < $1
    

    The problem I have with this is that the files will become different due to the tab formatted fields.

  • BitsOfNix
    BitsOfNix over 11 years
    Thanks a lot! I wasn't aware that quoting is also valid for variables.
  • Angel Todorov
    Angel Todorov over 11 years
    quoting is especially important for variables. Always quote variables, unless you specifically want the word-splitting effect you've seen here.
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' over 11 years
    This is necessary but not sufficient. The read call also strips leading and trailing whitespace, and processes backslash escapes.
  • Guru
    Guru over 11 years
    @Giles : +1, I am a big fan of your explanation, thank you ...
  • BitsOfNix
    BitsOfNix over 11 years
    though I searched over the site and google it I did not found what I was looking for. In either case I thank you for your explanation and the link for the explanation of IFS. It is now saved for future reference that I might need.