Faster awk script to get the substring / string we wanted
8,259
Solution 1
Try this:
awk -F'<25106>=' '{print substr($2,0,index($2,"]")-1);}'
Not using regex, just strict string operations.
Solution 2
If you will only print this number, you can try this:
echo "ORDER EVENT ......... [Account<25106>=ACCT1]" | awk -F'<25106>=' '{print $2}' | sed -e 's/].*//'
EDIT: sed -only solution:
echo "ORDER EVENT ......... [Account<25106>=ACCT1]" | sed -e 's/.*25106>=//' -e 's/].*//'
EDIT2:
awk '{if (split($0, a, "25106>=") > 1) {print substr(a[2], 0, index(a[2], "]")-1)} }'
Solution 3
If you have GNU awk (gawk
) you can use the match()
function with capturing parentheses:
gawk 'match($0, /<25106>=([^]]+)/, ary) {account = ary[1]}'
Alternately, you can use a complex field separator:
awk -F '<25106>=' '{split($2, ary, /\]/); account = ary[1]}'
Related videos on Youtube
![Admin](/assets/logo_square_200-5d0d61d6853298bd2a4fe063103715b4daf2819fc21225efa21dfb93e61952ea.png)
Author by
Admin
Updated on September 18, 2022Comments
-
Admin almost 2 years
ORDER EVENT .........[] [] ... so many other tags... [Account<25106>=ACCT1] [Destination...] .. so many other tags.
I am currently trying to get the account like this. I tried using match in awk, but it seems slower. Can you suggest anything else other than the one below which is even faster?
j = index($0, "<25106>="); account=substr($0, j + accountTagLength); account=substr(account,1,index(account, "]") - 1);
Account is not 2nd field and the field position my vary..
Timings:
bash-3.2$ time head -1000000 temp.log | awk -F'<25106>=' '{print $2}' | sed -e 's/].*//' > /dev/null real 0m2.410s user 0m2.782s sys 0m0.319s bash-3.2$ time head -1000000 temp.log | awk '{j = index($0, "25106>="); if (j > 0) { account=substr($0, j + 7); substr(account,1,index(account, "]") - 1);} }' real 0m1.690s user 0m1.737s sys 0m0.448s bash-3.2$ time head -1000000 temp.log | awk '{j = index($0, "25106>="); if (j > 0) { account=substr($0, j + 7); substr(account,1,index(account, "]") - 1);} }' real 0m1.588s user 0m1.733s sys 0m0.179s bash-3.2$ time head -1000000 temp.log | awk -F'<25106>=' '{print $2}' | sed -e 's/].*//' > /dev/null real 0m2.384s user 0m2.762s sys 0m0.272s bash-3.2$ time head -1000000 temp.log | awk '{j = index($0, "25106>="); if (j > 0) { account=substr($0, j + 7); substr(account,1,index(account, "]") - 1);} }' real 0m1.703s user 0m1.709s sys 0m0.484s bash-3.2$ time head -1000000 dumper/cam_verbose.20120220.000.log | gawk 'match($0, /<25106>=([^]]+)/, ary) {account = ary[1]}' real 0m3.449s user 0m3.661s sys 0m0.290s bash-3.2$ time head -1000000 dumper/cam_verbose.20120220.000.log | gawk 'match($0, /<25106>=([^]]+)/, ary) {account = ary[1]}' real 0m3.410s user 0m3.551s sys 0m0.236s bash-3.2$ time head -1000000 dumper/cam_verbose.20120220.000.log | gawk 'match($0, /<25106>=([^]]+)/, ary) {account = ary[1]}' real 0m3.361s user 0m3.487s sys 0m0.286s bash-3.2$ time head -1000000 dumper/cam_verbose.20120220.000.log | awk '{j = index($0, "25106>="); if (j > 0) { account=substr($0, j + 7); substr(account,1,index(account, "]") - 1);} }' real 0m1.626s user 0m1.831s sys 0m0.263s bash-3.2$ time head -1000000 dumper/cam_verbose.20120220.000.log | awk -F '<25106>=' '{split($2, ary, /\]/); account = ary[1]}' real 0m2.721s user 0m2.808s sys 0m0.265s bash-3.2$ time head -1000000 dumper/cam_verbose.20120220.000.log | awk -F '<25106>=' '{split($2, ary, /\]/); account = ary[1]}' real 0m2.787s user 0m2.863s sys 0m0.516s bash-3.2$ time head -1000000 dumper/cam_verbose.20120220.000.log | awk -F '<25106>=' '{split($2, ary, /\]/); account = ary[1]}' real 0m2.724s user 0m2.882s sys 0m0.278s bash-3.2$ time head -1000000 dumper/cam_verbose.20120220.000.log | awk '{j = index($0, "25106>="); if (j > 0) { account=substr($0, j + 7); substr(account,1,index(account, "]") - 1);} }' real 0m1.576s user 0m1.748s sys 0m0.235s bash-3.2$ time head -100000 ORDER_EVENTS_CHAS_20120224.log | grep -oE '<25106>=([A-Za-z0-9]*)+' | cut -d= -f2 > /dev/null real 0m2.098s user 0m2.131s sys 0m0.033s bash-3.2$ time head -100000 ORDER_EVENTS_CHAS_20120224.log | awk '{j = index($0, "25106>="); if (j > 0) { account=substr($0, j + 7); print substr(account,1,index(account, "]") - 1);} }' > /dev/null real 0m0.253s user 0m0.275s sys 0m0.040s bash-3.2$ time head -100000 ORDER_EVENTS_CHAS_20120224.log | grep -oE '<25106>=([A-Za-z0-9]*)+' | cut -d= -f2 > /dev/null real 0m2.070s user 0m2.105s sys 0m0.034s bash-3.2$ time head -100000 ORDER_EVENTS_CHAS_20120224.log | grep -oE '<25106>=([A-Za-z0-9]*)+' > /dev/null real 0m2.065s user 0m2.090s sys 0m0.037s bash-3.2$ time head -1000000 ORDER_EVENTS_CHAS_20120228.log | awk -F'<25106>=' '{ substr($2,0,index($2,"]")-1);}' real 0m3.426s user 0m3.637s sys 0m0.412s bash-3.2$ time head -1000000 ORDER_EVENTS_CHAS_20120228.log | awk -F'<25106>=' '{ substr($2,0,index($2,"]")-1);}' real 0m3.463s user 0m3.603s sys 0m0.408s bash-3.2$ time head -1000000 ORDER_EVENTS_CHAS_20120228.log | awk '{j = index($0, "25106>="); if (j > 0) { account=substr($0, j + 7); substr(account,1,index(account, "]") - 1);} }' real 0m2.247s user 0m2.307s sys 0m0.649s
-
Admin over 12 yearsjust found out even though I am not looking for a grep solution, regular expressions with grep are so slow.
-
Richard Fortune about 12 yearsOf course the literal string comparison is going to be faster than any regex comparison. And what you propose above is the straightforward implementation of that; so I wouldn't expect there to be anythin faster.
-
-
Admin over 12 yearssorry, Jan, I was vague in my explanation of question. It is not 2nd field and the field position my vary. I updated my question.
-
Jan Marek over 12 years@srikanthradix Updated.
-
Admin over 12 yearsupdated with timings using time command. still
index + substring
seems to be faster. -
Admin over 12 yearsanything with
regex
like match isslow
. I have tried. I have updated the timings. -
Jan Marek over 12 years@srikanthradix will be solution with only sed more faster?
-
Admin over 12 yearsActually, I am trying to find out whether there is anything faster only with
awk
. -
Rag over 12 yearseven that is slower than the
index
andsubstring
. I updated the stats, if you go to theend
. -
Admin over 12 yearsjust out of curiosity, I tried with
sed-only
solution. It is way slow.bash-3.2$ time head -1000000 ORDER_EVENTS_CHAS_20120228.log | sed -e 's/.*25106>=//' -e 's/].*//' > /dev/null real 0m9.956s user 0m10.167s sys 0m0.441s bash-3.2$ time head -1000000 ORDER_EVENTS_CHAS_20120228.log | sed -e 's/.*25106>=//' -e 's/].*//' > /dev/null real 0m10.083s user 0m10.254s sys 0m0.343s
-
Jan Marek over 12 years@srikanthradix I've tried to add another solution.