using awk to check between two dates

17,821

Solution 1

The key observation is that you can compare your timestamps using alphanumeric comparisons and get the correct answer - that is the beauty of ISO 8601 notation.

Thus, adapting your code slightly - and formatting to avoid scroll bars:

awk 'BEGIN {
        FS  = "\n"
        RS  = ""
        OFS = ";"
        ORS = "\n"
        t1  = "2010-03-23T07:45:00"
        t2  = "2010-03-23T08:00:00"
        m1  = "eventTimestamp: " t1
        m2  = "eventTimestamp: " t2
        }
$1 ~ /eventTimestamp:/ && $4 ~ /SMS-MO-FSM(-INFO)?$/ {
    if ($1 >= m1 && $1 <= m2) print $1, $2, $3, $4;
}' "$@"

Obviously, you could put this into a script file - you wouldn't want to type it often. And getting the date range entered accurately and conveniently is one of the hard parts. Note that I've adjusted the time range to match the data.

When run on the sample data, it outputs one record:

eventTimestamp: 2010-03-23T07:56:19.186;result: Allowed;protocol: SMS;payload: SMS-MO-FSM

Solution 2

A bit of a kludge, but this script assumes you have the unix "date" command. Also hard coded your start and end timestamps in the BEGIN block. Note that your test data listed above does not fall within your sample start/end times.

#!/usr/bin/awk -f
BEGIN {
        command="date -f\"%s\" -d \"2010-03-23 12:56:47\""; command | getline startTime; close(command)
        command="date -f\"%s\" -d \"2010-03-23 13:56:47\""; command | getline endTime; close(command)
}

$0 ~ /^eventTimestamp:/ {
        command="date -f\"%s\" -d " $2; command | getline currTime; close(command)

        if (currTime >= startTime && currTime <= endTime) {
                printIt="true"
        }else{
                printIt="false";
        }
}

printIt == "true" { print }             
Share:
17,821
amadain
Author by

amadain

Updated on June 04, 2022

Comments

  • amadain
    amadain almost 2 years

    I have a file with multiple data structures in it like so:

    eventTimestamp: 2010-03-23T07:56:19.166
    result: Allowed
    protocol: SMS
    payload: RCOMM_SMS
    
    eventTimestamp: 2010-03-23T07:56:19.167
    result: Allowed
    protocol: SMS
    payload: RCOMM_SMS
    
    eventTimestamp: 2010-03-23T07:56:19.186
    result: Allowed
    protocol: SMS
    payload: SMS-MO-FSM
    
    eventTimestamp: 2010-03-23T07:56:19.197
    result: Allowed
    protocol: SMS
    payload: COPS
    
    eventTimestamp: 2010-03-23T07:56:29.519
    result: Blocked
    protocol: SMS
    payload: COPS
    type: URL_IWF
    result: Blocked
    

    I want to find all of the events that are payload: SMS-MO-FSM or payload: SMS-MO-FSM-INFO that occurred between the times 2010-03-23 12:56:47 and 2010-03-23 13:56:47. When querying this file so far I have used awk in the following manner:

    cat checkThis.txt |
    awk 'BEGIN{FS="\n"; RS=""; OFS=";"; ORS="\n"}
         $1~/eventTimestamp: 2010-03-23T14\:16\:35/ && $4~/SMS-MO-FSM-INFO|SMS-MO-FSM$/ {$1=$1 ""; print $0}'
    

    Which will give me all of the events that occurred on the second of 14:16:35 in 2010-03-23. I am struggling, however, to think of how I could put the date range into my query. I could use the following to put the dates into epoch time but how can I use the following in my awk to check whether the date is between the times needed:

    python -c "import time; ENGINE_TIME_FORMAT='%Y-%m-%dT%H:%M:%S'; print int(time.mktime(time.strptime('2010-03-23T12:52:52', ENGINE_TIME_FORMAT)))"
    

    I know this could done in Python but I have written a parser in Python for this and I want this method as an alternative checker so I want to use awk if at all possible.

    I took this a little further and created a python script for time conversion:

    #!/usr/local/bin/python
    import time, sys
    ENGINE_TIME_FORMAT='%Y-%m-%dT%H:%M:%S'
    testTime = sys.argv[1]
    try:
        print int(time.mktime(time.strptime(testTime, ENGINE_TIME_FORMAT)))
    except:
        print "Time to convert %s" % testTime
        raise
    

    I then tried to use getline to assign the conversion to a variable for comparison:

    cat checkThis.txt| awk 'BEGIN {FS="\n"; RS=""; OFS=";"; ORS="\n"; "./firstDate '2010-03-23T12:56:47'" | getline start_time; close("firstDate"); "./firstDate '2010-03-23T13:56:47'" | getline end_time; close("firstDate");} ("./firstDate $1" | getline) > start_time {$1=$1 ""; print $0}'
    Traceback (most recent call last):
      File "./firstDate", line 4, in <module>
    testTime = sys.argv[1]
    IndexError: list index out of range
    

    The getline works in the BEGIN and I checked it in the final print but I seem to have problems in the comparison part of the script.

  • cryptochaos
    cryptochaos over 13 years
    As I don't have enough reputation to vote yet, I can't vote for Jonathan Leffler's solution. But it's a good one.