"head" only printing one line?

9,814

Solution 1

I think it is line-ending related. Excel will save files with carriage return/line feed endings but head will be expecting line feeds only.

What output does this display: tr -d '\r' < messy.csv | head -10

If it displays the 10 lines correctly, that's your answer.

file can tell you the line ending for certain text files (it will print ..., with CRLF line terminators), but it doesn't do that for all text files (I believe it doesn't do it if it recognises the file as being something else, eg HTML).

Solution 2

You have \r only as the end-of-line character for lines 2 onwards (up to line 10 at least). Line 1 has \n as the end-of-line character. eg.

printf 'ABC\nXYZ\r123\r' | head

output (to the screen)

ABC

This is a display artifact related to terminal output. The \r kicks back to the start of the line and the next line overwrites it, and the last line gets overwritten –fully or partially– by the terminal prompt.

When the last \r delimited line is longer than the prompt, then that line is partially revealed (beyond the end of the prompt) – eg, In the following sampel output, the terminal prompt is just nn $ (5 characters), where nn is the n'th command issued).

72 $ printf 'ABC\nXYZ\rabcdefghijklmnop\r' 
ABC
73 $ fghijklmnop

To fix it

sed -i.bak 's/\r$//; s/\r/\n/g' file

The -i.bak option causes the input file to be updated inline and makes a backup file.bak. If you don't want a backup, just use -i.

Solution 3

Analyse your problem

head doesn't behave as you expect it. Replace it by a simple analysis tool od to see what is going on:

od -cx messy.csv

and then to see how head deal with this file:

head -2 messy.csv | od -cx

You will notice that head is dealing with the \r return ASCII code (0x0d) as it was conceived for:

make the "carriage return" of original type writer. It does just bring back the current cursor position ready for the next position to write at "the beginning of line".

Fix it

See the correct sed command here: fix '\r' from an Excel file

For the record

This Microsoft bug is a winner one: this coding of Excel end of line is wrong for: Windows, Unix (all), MacOS X.

You can't outperform it :).

Share:
9,814

Related videos on Youtube

Richard
Author by

Richard

Updated on September 18, 2022

Comments

  • Richard
    Richard over 1 year

    I've got a CSV file that's generated by saving as CSV from Excel. If I do "head" (or indeed "grep" or anything else) it only prints the first line:

    head -n 10 messy.csv
    10,15,11,21
    

    But if I open the file in a text editor, or in Excel, it has many lines in it:

    10,15,11,21
    9,11,17,19
    7,11,24,18
    ... 
    

    head works just fine on other files on the machine.

    Why is this? (I suspect it's something to do with line endings, but I don't know what.) And how can I fix it?

  • mjturner
    mjturner almost 9 years
    The -n <count> option is included in the POSIX specification so most, if not all, head variants should support it.
  • Richard
    Richard almost 9 years
    Thanks. The tr outputs the whole file as one long string! file messy.csv prints messy.csv: ASCII English text, with CR line terminators.
  • mjturner
    mjturner almost 9 years
    @Richard Very strange that your file only has carriage returns! Try tr '\r' '\n' < messy.csv |head -10 then
  • Peter.O
    Peter.O almost 9 years
    This is almost certainly not a Windows issue. the behaviour described does not apply to a \r\n line ending.. It is probably an Excel for Mac issue – "Basically, saving a file as comma separated values (csv) uses a carriage return \r" – see Excel and line endings
  • roaima
    roaima almost 9 years
    Mac systems use CR as line terminators. Fix the tr to swap CR for NL and it'll work.