How can I read first n and last n lines from a file?

16,355

Solution 1

Chances are you're going to want something like:

... | awk -v OFS='\n' '{a[NR]=$0} END{print a[1], a[2], a[NR-1], a[NR]}'

or if you need to specify a number and taking into account @Wintermute's astute observation that you don't need to buffer the whole file, something like this is what you really want:

... | awk -v n=2 'NR<=n{print;next} {buf[((NR-1)%n)+1]=$0}
         END{for (i=1;i<=n;i++) print buf[((NR+i-1)%n)+1]}'

I think the math is correct on that - hopefully you get the idea to use a rotating buffer indexed by the NR modded by the size of the buffer and adjusted to use indices in the range 1-n instead of 0-(n-1).

To help with comprehension of the modulus operator used in the indexing above, here is an example with intermediate print statements to show the logic as it executes:

$ cat file   
1
2
3
4
5
6
7
8

.

$ cat tst.awk                
BEGIN {
    print "Populating array by index ((NR-1)%n)+1:"
}
{
    buf[((NR-1)%n)+1] = $0

    printf "NR=%d, n=%d: ((NR-1 = %d) %%n = %d) +1 = %d -> buf[%d] = %s\n",
        NR, n, NR-1, (NR-1)%n, ((NR-1)%n)+1, ((NR-1)%n)+1, buf[((NR-1)%n)+1]

}
END { 
    print "\nAccessing array by index ((NR+i-1)%n)+1:"
    for (i=1;i<=n;i++) {
        printf "NR=%d, i=%d, n=%d: (((NR+i = %d) - 1 = %d) %%n = %d) +1 = %d -> buf[%d] = %s\n",
            NR, i, n, NR+i, NR+i-1, (NR+i-1)%n, ((NR+i-1)%n)+1, ((NR+i-1)%n)+1, buf[((NR+i-1)%n)+1]
    }
}
$ 
$ awk -v n=3 -f tst.awk file
Populating array by index ((NR-1)%n)+1:
NR=1, n=3: ((NR-1 = 0) %n = 0) +1 = 1 -> buf[1] = 1
NR=2, n=3: ((NR-1 = 1) %n = 1) +1 = 2 -> buf[2] = 2
NR=3, n=3: ((NR-1 = 2) %n = 2) +1 = 3 -> buf[3] = 3
NR=4, n=3: ((NR-1 = 3) %n = 0) +1 = 1 -> buf[1] = 4
NR=5, n=3: ((NR-1 = 4) %n = 1) +1 = 2 -> buf[2] = 5
NR=6, n=3: ((NR-1 = 5) %n = 2) +1 = 3 -> buf[3] = 6
NR=7, n=3: ((NR-1 = 6) %n = 0) +1 = 1 -> buf[1] = 7
NR=8, n=3: ((NR-1 = 7) %n = 1) +1 = 2 -> buf[2] = 8

Accessing array by index ((NR+i-1)%n)+1:
NR=8, i=1, n=3: (((NR+i = 9) - 1 = 8) %n = 2) +1 = 3 -> buf[3] = 6
NR=8, i=2, n=3: (((NR+i = 10) - 1 = 9) %n = 0) +1 = 1 -> buf[1] = 7
NR=8, i=3, n=3: (((NR+i = 11) - 1 = 10) %n = 1) +1 = 2 -> buf[2] = 8

Solution 2

head -n2 file && tail -n2 file

Solution 3

This might work for you (GNU sed):

sed -n ':a;N;s/[^\n]*/&/2;Ta;2p;$p;D' file

This keeps a window of 2 (replace the 2's for n) lines and then prints the first 2 lines and at end of file prints the window i.e. the last 2 lines.

Solution 4

Here's a GNU sed one-liner that prints the first 10 and last 10 lines:

gsed -ne'1,10{p;b};:a;$p;N;21,$D;ba'

If you want to print a '--' separator between them:

gsed -ne'1,9{p;b};10{x;s/$/--/;x;G;p;b};:a;$p;N;21,$D;ba'

If you're on a Mac and don't have GNU sed, you can't condense as much:

sed -ne'1,9{' -e'p;b' -e'}' -e'10{' -e'x;s/$/--/;x;G;p;b' -e'}' -e':a' -e'$p;N;21,$D;ba'

Explanation

gsed -ne' invoke sed without automatic printing pattern space

-e'1,9{p;b}' print the first 9 lines

-e'10{x;s/$/--/;x;G;p;b}' print line 10 with an appended '--' separator

-e':a;$p;N;21,$D;ba' print the last 10 lines

Solution 5

awk -v n=4 'NR<=n; {b = b "\n" $0} NR>=n {sub(/[^\n]*\n/,"",b)} END {print b}'

The first n lines are covered by NR<=n;. For the last n lines, we just keep track of a buffer holding the latest n lines, repeatedly adding one to the end and removing one from the front (after the first n).

It's possible to do it more efficiently, with an array of lines instead of a single buffer, but even with gigabytes of input, you'd probably waste more in brain time writing it out than you'd save in computer time by running it.

ETA: Because the above timing estimate provoked some discussion in (now deleted) comments, I'll add anecdata from having tried that out.

With a huge file (100M lines, 3.9 GiB, n=5) it's taken 454 seconds, compared to @EdMorton's lined-buffer solution, which executed in only 30 seconds. With more modest inputs ("mere" millions of lines) the ratio is similar: 4.7 seconds vs. 0.53 seconds.

Almost all of that additional time in this solution seems to be spent in the sub() function; a tiny fraction also does come from string concatenation being slower than just replacing an array member.

Share:
16,355
Amir
Author by

Amir

Updated on August 13, 2022

Comments

  • Amir
    Amir over 1 year

    How can I read the first n lines and the last n lines of a file?

    For n=2, I read online that (head -n2 && tail -n2) would work, but it doesn't.

    $ cat x
    1
    2
    3
    4
    5
    $ cat x | (head -n2 && tail -n2)
    1
    2
    

    The expected output for n=2 would be:

    1
    2
    4
    5