Search and Filter Text on large log files

8,532
awk '$3 >= "11:58" && $3 <= "23:58" && /Unit ID: 1111/{print l"\n"$0};{l=$0}'
Share:
8,532

Related videos on Youtube

JohnMerlino
Author by

JohnMerlino

Updated on September 18, 2022

Comments

  • JohnMerlino
    JohnMerlino over 1 year

    I use the tail, head and grep commands to search log files. Most of the time the combination of these 3 commands, in addition to using pipe, gets the job done. However, I have this one log that many devices report to literally every few seconds. So this log is very large. But the pattern of the reporting is the same:

    Oct 10 11:58:50 Received Packet from [xxx.xx.xxx.xx:xxxx]: 0xD 0xD 0xD 
    Oct 10 11:58:50 Unit ID: 1111
    

    In the above example, it shows that UDP packet was sent to the socket server for a specific unit id.

    Now sometimes I want to view the packet information for this unit within a specific time range by quering the log.

    Oct 10 11:58:50 Received Packet from [xxx.xx.xxx.xx:xxxx]: 0xD 0xD 0xD 
    Oct 10 11:58:50 Unit ID: 1111
    
    ... // A bunch of other units reporting including unit id 1111
    
    Oct 10 23:58:50 Received Packet from [xxx.xx.xxx.xx:xxxx]: 0x28 0x28 0x28 
    Oct 10 23:58:50 Unit ID: 1111
    

    So in the example above, I would like to display log output only for Unit ID: 1111 within the time range of 11:58 and 23:58. So the possible results can look like this:

    Oct 10 11:58:50 Received Packet from [xxx.xx.xxx.xx:xxxx]: 0xD 0xD 0xD 
    Oct 10 11:58:50 Unit ID: 1111
    
    Oct 10 12:55:11 Received Packet from [xxx.xx.xxx.xx:xxxx]: 0x28 0xD 0x28 
    Oct 10 12:55:11 Unit ID: 1111
    
    Oct 10 15:33:50 Received Packet from [xxx.xx.xxx.xx:xxxx]: 0x33 0xD 0x11 
    Oct 10 15:33:50 Unit ID: 1111
    
    Oct 10 23:58:50 Received Packet from [xxx.xx.xxx.xx:xxxx]: 0x28 0x28 0x28 
    Oct 10 23:58:50 Unit ID: 1111
    

    Notice the results only display Unit ID: 1111 information and not the other units.

    Now the problem with using something like this:

    tail -n 10000 | grep -B20 -A20 "Oct 10 23:58:50 Unit ID: 1111" 
    

    is that will display a lot of stuff, not just the stuff that I need.

    • kurtm
      kurtm over 10 years
      Is the "Unit ID" line always going to be immediately after the first line you want to see?
    • peterph
      peterph over 10 years
      If you know it's just the two lines, why are you using -B20 -A20? You might also want to look at: unix.stackexchange.com/questions/94243/…
    • JohnMerlino
      JohnMerlino over 10 years
      @kurtm Yes the Unit ID line is always after the first line I show there.
    • JohnMerlino
      JohnMerlino over 10 years
      @peterph it's actually a little bit more than two lines(i just wanted to simplify the question) but those two lines are always right after the other.
    • Marco
      Marco over 10 years
      Since the dates are ascending you can use sed to print a range: sed -n '/Oct 10 10:58:50/,/Oct 10 23:58:50/p'
    • kurtm
      kurtm over 10 years
      @JohnMerlino Take Marco's suggestion above, but instead of dates for addresses, use patterns that will pick up the start and end of the information you want.
    • terdon
      terdon over 10 years
      @Marco that does not work as expected on my system. Did you test it? Using which sed version? Could you post an answer showing how that would work?
    • Marco
      Marco over 10 years
      GNU sed 4.2.2. That was just meant as a snippet to filter the date. It's not a solution since it cuts off the last entry and doesn't take unit IDs into account.