Shell script to sort by date column using Linux command 'sort'
$ sort -k 3,3r session.log | awk '!seen[$2]++ { print }'
9 u1 2018-08-12 pass
6 u4 2018-07-11 pass
2 u2 2018-06-15 pass
3 u3 2018-05-18 pass
(The { print }
can be removed completely, I'm only including it to show what happens. The default action is to print the entire input record if the condition is true.)
This sorts the file that you have, by decreasing dates (the most recent date first). The awk
program reads the sorted data and prints out the first entry found for each user. The seen
variable is an associative array/hash that uses the usernames as keys. If its value is zero for a user, then that user has not yet been seen, and so the line from the file is printed.
Your code with my annotations:
# get list of unique usernames from log:
( awk {print} session.log | cut -f2 -d' ' | sort | uniq ) > store.txt
# shorter equivalent:
# cut -d ' ' -f2 <session.log | sort -u >store.txt
# loop over the usernames:
for line in $(cat "store.txt")
do
# get entries related to this user:
grep "$line" session.log > "$line.txt"
# echo the last entry:
echo $(sort -k3 "$line.txt" | awk 'END{print}')
# shorter equivalent of both of the above commands:
# awk -v user="$line" '$2 == user { print }' session.log | sort -k3,3 | tail -n 1
done
rm -f store.txt
So an alternative approach based on your shell loop:
cut -d ' ' -f2 <session.log | sort -u |
while read username; do
awk -v user="$username" '$2 == user { print }' session.log | sort -k 3,3 | tail -n 1
done
Again, the { print }
may be left out from the awk
script above.
This still sorts subsets of session log once for each user, which is kind of wasteful.
Pre-sorting the log by decreasing dates:
sort -k3,3r -o sorted.log session.log
cut -d ' ' -f2 <session.log | sort -u |
while read username; do
awk -v user="$username" '$2 == user { print; exit }' sorted.log
done
rm sorted.log
This still requires us to scan the log from the top once per user. The natural improvement is to allow awk
to keep track of what users have been seen and what users have not been seen, which gives you my answer at the top.
Related videos on Youtube
Dipankar
Updated on September 18, 2022Comments
-
Dipankar over 1 year
I have a file called session.log It contains some text data. 1st column contains serial number. 2nd column contains username. 3rd column contains last logged in date. 4th column contains result.
One user logged in multiple times. I need to find the last logged in date for each unique user. So I wrote a shell script. Output will display the lines with latest logged in date (3rd column) for each unique user (2nd column). Output should not contain same username multiple times.
$ cat session.log 1 u1 2018-05-19 pass 2 u2 2018-06-15 pass 3 u3 2018-05-18 pass 4 u4 2018-05-17 pass 5 u2 2018-05-14 pass 6 u4 2018-07-11 pass 7 u1 2018-05-16 pass 8 u3 2018-05-13 pass 9 u1 2018-08-12 pass 10 u1 2018-05-10 pass
What I tried:
( awk {print} session.log | cut -f2 -d' ' | sort | uniq ) > store.txt for line in $(cat "store.txt") do touch "$line.txt" grep "$line" session.log > "$line.txt" temp=$(sort -k3 "$line.txt" | awk 'END{print}') echo $temp > "$line.txt" cat "$line.txt" done
Output
$ ./sort.sh 9 u1 2018-08-12 pass 2 u2 2018-06-15 pass 3 u3 2018-05-18 pass 6 u4 2018-07-11 pass
The shell script works for the date format (yyyy-mm-dd) and (yyyy/mm/dd). Is there any better code to do the same work? Using
awk
how do we do this?EDIT:
$ cat sort.sh ( awk {print} session.log | cut -f2 -d' ' | sort | uniq ) > store.txt for line in $(cat "store.txt") do #touch "$line.txt" grep "$line" session.log > "$line.txt" echo $(sort -k3 "$line.txt" | awk 'END{print}') #temp=$(sort -k3 "$line.txt" | awk 'END{print}') #echo $temp > "$line.txt" #cat "$line.txt" done rm -f store.txt