How to get text from range of dates using grep/sed in large text file?

88,509

Solution 1

With grep if you know the number of lines you want you can use context option -A to print lines after the pattern

grep -A 3 2016-07-13 file

that will give you the line with 2013-07-13 and the next 3 lines

with sed you can use the dates to delimit like this

sed -n '/2016-07-13/,/2016-07-19/p' file

which will print all lines from the first line with 2016-07-13 up to and including the first line with 2016-07-19. But that assumes you have only one line with 2016-07-19 (it will not print the next line). If there are multiple lines use the next date instead and use d to delete the output from it

sed -n '/2016-07-13/,/2016-07-20/{/2016-07-20/d; p}' file

Solution 2

This simple grep one liner will be enough:

grep -E ^2016-07-1[3-9] filename

Works nicely here and there is no need for sed :)

References:

Solution 3

awk solution:

$ awk '/^2016-07-13.*/,/2016-07-19.*/'  input.txt                                   
2016-07-13 < ?xml version> 
2016-07-18 < ?xml version> 
2016-07-18 < ?xml version> 
2016-07-19 < ?xml version> 

Basically prints any line from the one that starts with 2016-07-13 to the one that starts with 2016-07-19

Solution 4

All the other current answers rely on the fact that the log file entries are sorted chronologically or the fact that the date range can be matched easily with regular expressions. If you want a more generic solution, we need to do some more programming.

I present this GNU AWK script:

#!/usr/bin/gawk -f
BEGIN {
    starttime = mktime(starttime)
    endtime = mktime(endtime)
}

func in_range(n, start, end) {
    return start <= n && n < end
}

match($0, /^([0-9]{4})-([0-9]{2})-([0-9]{2})\s/, m) &&
    in_range(mktime(m[1] " " m[2] " " m[3] " 00 00 00"), starttime, endtime)

You supply the start and end time through the variables starttime and endtime in a format that mktime understands (YYYY MM DD hh dd ss). Thus you run the awk command like so, assuming that the above Awk script is in an executable file filter-log-dates.awk in the current working directory and the log file is mylog.txt:

./filter-log-dates.awk -v starttime='2016 07 13 00 00 00' -v endtime='2016 07 20 00 00 00' mylog.txt

Note that the end time is exclusive, i. e. valid log records must have a time stamp before the end time.

If your time stamp format is different, you can adjust the regular expression passed to the match function to suit it.

Solution 5

You could do it in steps. Find the number of the first line matching your starting pattern. Find the number of the last line matching your ending pattern. Then extract the test between these two lines. This can be done as follows.

grep -n 2016-07-13 bigtextfile | head -1
grep -n 2016-07-19 bigtestfile | tail -1
# Say the first number is 1234 and the second 5678, then use...
awk 'NR>=1234 && NR<=5678' bigtestfile > rangeoftext

This could be done all in an awk command but the steps may make it easier to follow. Within awk the NR variable is the current line number, and since no action was specified after the pattern (NR>=1234 && NR<=5678) the default action is to print the lines that in that range.

Share:
88,509

Related videos on Youtube

corey
Author by

corey

Updated on September 18, 2022

Comments

  • corey
    corey over 1 year

    I have big file text (almost 3GB) - it is a log file. I want to get lines of text which correspond to a range of dates from this file, from 13 July to 19 July. My log format is:

    2016-07-12 < ?xml version>
    2016-07-13 < ?xml version>
    2016-07-18 < ?xml version>
    2016-07-18 < ?xml version>
    2016-07-19 < ?xml version>
    2016-07-20 < ?xml version>
    sample text sample text
    sample text sample text
    sample text sample text
    2016-07-20 < ?xml version>
    sample text sample text
    2016-07-20 < ?xml version>
    

    so after grep/sed it should be output like this:

    2016-07-13 < ?xml version>
    2016-07-18 < ?xml version>
    2016-07-18 < ?xml version>
    2016-07-19 < ?xml version>
    

    How can I get this?

    • David Foerster
      David Foerster almost 8 years
      Are you sure you mean June? All the dates in your sample log file are in July and the desired output sample implies you meant the latter.
  • Anum Sheraz
    Anum Sheraz over 4 years
    (y) ...had to remove ^ to make it work. Using Mac.