Respect last line if it's not terminated with a new line char (\n) when using read

10,122

Solution 1

read does, in fact, read an unterminated line into the assigned var ($REPLY by default). It also returns false on such a line, which just means ‘end of file’; directly using its return value in the classic while loop thus skips that one last line. If you change the loop logic slightly, you can process non-new line terminated files correctly, without need for prior sanitisation, with read:

while read -r || [[ -n "$REPLY" ]]; do
    # your processing of $REPLY here
done < "/path/to/file"

Note this is much faster than solutions relying on externals.

Hat tip to Gordon Davisson for improving the loop logic.

Solution 2

POSIX requires any line in a file have a newline character at the end to denote it is a line. But this site offers a solution to exactly the scenario you are describing. Final product is this chunklet.

newline='
'
lastline=$(tail -n 1 file; echo x); lastline=${lastline%x}
[ "${lastline#"${lastline%?}"}" != "$newline" ] && echo >> file
# Now file is sane; do our normal processing here...

Solution 3

If you must use read, try this:

awk '{ print $0}' foo | while read line; do
    echo the line is $line
done

as awk seems to recognize lines even without the newline char

Solution 4

This is more or less a combination of the answers given so far.

It does not modify the files in place.

(cat file; tail -c1 file | grep -qx . && echo) | while read line
do
    ...
done
Share:
10,122
michaelmeyer
Author by

michaelmeyer

Updated on June 23, 2022

Comments

  • michaelmeyer
    michaelmeyer almost 2 years

    I have noticed for a while that read never actually reads the last line of a file if there is not, at the end of it, a "newline" character. This is understandable if one consider that, as long as there is not a "newline" character in a file, it is as if it contained 0 line (which is quite difficult to admit !). See, for example, the following:

    $ echo 'foo' > bar ; wc -l bar
    1 bar
    

    But...

    $ echo -n 'bar' > foo ; wc -l foo
    0 foo
    

    The question is then: how can I handle such situations when using read to process files which have not been created or modified by myself, and about which I don't know if they actually end up with a "newline" character ?