Deleting rows based on the column values in a file using awk command

command-line scripts text-processing awk

8,718

Solution 1

Put your values in another file:

values:

60
25
30

Then read them into an array in awk:

awk 'FNR == NR {arr[$0] = 1; next} !($2 in arr)'  values test.txt

FNR == NR holds true when reading the first file, so the first block is only executed while reading the values. Because of the next, the !($2 in arr) is only executed for the second file.

Solution 2

Your command can be simplified further - you don't need if statement and code block, because awk can print lines using matching condition that precedes code block. If you only want to print the line, the code-block can be skipped altogether:

$ awk '$2!=60 && $2!=25 && $2 != 30'  input.txt        
position_id risk_measure_id Scenario_id value_usd
3   45  90  300.7
8   20  0   5000.9

Alternative solution would be to use array:

awk -v values="60 30 25" 'BEGIN{split(values,array)};{ flag=0; for(val in array) if (array[val] == $2) flag=1; if (flag==0) print }'  input.txt

What happens there is we create a string with all values we want,separated by space. In the BEGIN statement we break it down into array. Main code block sets flag variable to 0 upon reading each line, then we loop through all values in array and check if field #2 matches anything in array. If it does, we set flag to 1. After the loop exits we see if the loop found anything and set the flag, and if it didn't - print the line.

Shorter version of this approach is to use next command to break from the loop if the excluded value is found. This way, print function is reached only if there's no excluded value found:

awk -v values="60 30 25" 'BEGIN{split(values,array)};{for(val in array) if (array[val] == $2) next; print}'  input.txt

8,718

Kevin Bowen

I route the packets that make the Internet sing. Contributor to Xfce Desktop Environment(DE) documentation: https://docs.xfce.org I have been a casual Linux user since Slackware 3.0 (Hello Walnut Creek!) and an Ubuntu user since around version 6.06 or 6.10. My primary professional skill set has been focused on the design, implementation, operation, and daily management of enterprise networks. This includes the configuration and operation of network hardware as well as the care and feeding involved in the network monitoring systems supporting them. The hardware includes routers, switches, firewalls, wan-optimizers, and occasionally load balancers (mostly Cisco, Riverbed, and f5 systems). Launchpad twitter

Updated on September 18, 2022

Comments

Kevin Bowen over 1 year
I have a huge file like below. I want to delete the rows if second column is having values 60,30 etc., all these values I will get from another file in comma separated file.
```
position_id risk_measure_id Scenario_id value_usd
1   60  0   300.8
2   30  0   400.6
3   45  90  300.7
4   60  0   200.9
5   30  9   400.8
6   60  10  4000.9
8   20  0   5000.9
```
I can use below awk command to achieve it but if I have multiple values to exclude is there any easy way.
```
$ awk '{ if ($2!=60 && $2!=25 && $2!=30) print $0}' test.txt
position_id risk_measure_id Scenario_id value_usd
3   45  90  300.7
8   20  0   5000.9
```
- Sergiy Kolodyazhnyy over 7 years
  
  What exactly do you mean by "easy way" ? You want to simplify the command ?
Sergiy Kolodyazhnyy over 7 years

OP doesn't even need a file. They would specify -v values="60 25 30" and then split that into array inside BEGIN statement.
muru over 7 years

@Serg I didn't feel like looping on an array. :)
Sergiy Kolodyazhnyy over 7 years

lol, ok, but you're still looping on an extra file :p Still a good answer though
alhelal over 6 years

@muru I have a similar problem. I need rows having value greater than 36 in consecutive 7 columns. see the data