Deleting rows based on the column values in a file using awk command
Solution 1
Put your values in another file:
values
:
60
25
30
Then read them into an array in awk
:
awk 'FNR == NR {arr[$0] = 1; next} !($2 in arr)' values test.txt
FNR == NR
holds true when reading the first file, so the first block is only executed while reading the values. Because of the next
, the !($2 in arr)
is only executed for the second file.
Solution 2
Your command can be simplified further - you don't need if
statement and code block, because awk can print lines using matching condition that precedes code block. If you only want to print the line, the code-block can be skipped altogether:
$ awk '$2!=60 && $2!=25 && $2 != 30' input.txt
position_id risk_measure_id Scenario_id value_usd
3 45 90 300.7
8 20 0 5000.9
Alternative solution would be to use array:
awk -v values="60 30 25" 'BEGIN{split(values,array)};{ flag=0; for(val in array) if (array[val] == $2) flag=1; if (flag==0) print }' input.txt
What happens there is we create a string with all values we want,separated by space. In the BEGIN statement we break it down into array. Main code block sets flag variable to 0 upon reading each line, then we loop through all values in array and check if field #2 matches anything in array. If it does, we set flag to 1. After the loop exits we see if the loop found anything and set the flag, and if it didn't - print the line.
Shorter version of this approach is to use next
command to break from the loop if the excluded value is found. This way, print
function is reached only if there's no excluded value found:
awk -v values="60 30 25" 'BEGIN{split(values,array)};{for(val in array) if (array[val] == $2) next; print}' input.txt
Related videos on Youtube
Kevin Bowen
I route the packets that make the Internet sing. Contributor to Xfce Desktop Environment(DE) documentation: https://docs.xfce.org I have been a casual Linux user since Slackware 3.0 (Hello Walnut Creek!) and an Ubuntu user since around version 6.06 or 6.10. My primary professional skill set has been focused on the design, implementation, operation, and daily management of enterprise networks. This includes the configuration and operation of network hardware as well as the care and feeding involved in the network monitoring systems supporting them. The hardware includes routers, switches, firewalls, wan-optimizers, and occasionally load balancers (mostly Cisco, Riverbed, and f5 systems). Launchpad twitter
Updated on September 18, 2022Comments
-
Kevin Bowen over 1 year
I have a huge file like below. I want to delete the rows if second column is having values 60,30 etc., all these values I will get from another file in comma separated file.
position_id risk_measure_id Scenario_id value_usd 1 60 0 300.8 2 30 0 400.6 3 45 90 300.7 4 60 0 200.9 5 30 9 400.8 6 60 10 4000.9 8 20 0 5000.9
I can use below
awk
command to achieve it but if I have multiple values to exclude is there any easy way.$ awk '{ if ($2!=60 && $2!=25 && $2!=30) print $0}' test.txt position_id risk_measure_id Scenario_id value_usd 3 45 90 300.7 8 20 0 5000.9
-
Sergiy Kolodyazhnyy over 7 yearsWhat exactly do you mean by "easy way" ? You want to simplify the command ?
-
-
Sergiy Kolodyazhnyy over 7 yearsOP doesn't even need a file. They would specify
-v values="60 25 30"
and then split that into array insideBEGIN
statement. -
muru over 7 years@Serg I didn't feel like looping on an array. :)
-
Sergiy Kolodyazhnyy over 7 yearslol, ok, but you're still looping on an extra file :p Still a good answer though
-
alhelal over 6 years@muru I have a similar problem. I need rows having value greater than
36
in consecutive 7 columns. see the data