Find value from one csv in another one (like vlookup) in bash (Linux)

11,787

Solution 1

A little approach, far away to be perfect:

DELIMITER="|"

for i in $(cut -f 7 -d "${DELIMITER}" file1.csv ); 
do 
    grep "${i}" file2.csv | cut -f 3 -d "${DELIMITER}"; 
done

Solution 2

This will work, but since the input files must be sorted, the output order will be affected:

join -t '|' -1 7 -2 1 -o 2.3 <(sort -t '|' -k7,7 file1.csv) <(sort -t '|' -k1,1 file2.csv)

The output would look like:

2200
2200
2400

which is useless. In order to have a useful output, include the key value:

join -t '|' -1 7 -2 1 -o 0,2.3 <(sort -t '|' -k7,7 file1.csv) <(sort -t '|' -k1,1 file2.csv)

The output then looks like this:

CORKCOR|2200
CORKKIN|2200
MAYOBAN|2400

Edit:

Here's an AWK version:

awk -F '|' 'FNR == NR {keys[$7]; next} {if ($1 in keys) print $3}' file1.csv file2.csv

This loops through file1.csv and creates array entries for each value of field 7. Simply referring to an array element creates it (with a null value). FNR is the record number in the current file and NR is the record number across all files. When they're equal, the first file is being processed. The next instruction reads the next record, creating a loop. When FNR == NR is no longer true, the subsequent file(s) are processed.

So file2.csv is now processed and if it has a field 1 that exists in the array, then its field 3 is printed.

Solution 3

You can use Miller (https://github.com/johnkerl/miller).

Starting from input01.txt

123|21|0452|IE|IE|1|MAYOBAN|BRIN|OFFICE|STREET|MAIN STREET|MAYOBAN|
123|21|0453|IE|IE|1|CORKKIN|ROBERT|SURNAME|CORK|APTS|CORKKIN|
123|21|0452|IE|IE|1|CORKCOR|NAME|HARRINGTON|DUBLIN|STREET|CORKCOR|

and input02.txt

MAYOBAN|BANGOR|2400
MAYOBEL|BELLAVARY|2400
CORKKIN|KINSALE|2200
CORKCOR|CORK|2200
DUBLD11|DUBLIN 11|2100

and running

mlr --csv -N --ifs "|" join  -j 7 -l 7 -r 1 -f input01.txt then cut -f 3 input02.txt

you will have

2400
2200
2200

Some notes:

  • -N to set input and output without header;
  • --ifs "|" to set the input fields separator;
  • -l 7 -r 1 to set the join fields of the input files;
  • cut -f 3 to extract the field named 3 from the join output
Share:
11,787
Admin
Author by

Admin

Updated on July 24, 2022

Comments

  • Admin
    Admin almost 2 years

    I have already tried all options that I found online to solve my issue but without good result.

    Basically I have two csv files (pipe separated):

    file1.csv:

    123|21|0452|IE|IE|1|MAYOBAN|BRIN|OFFICE|STREET|MAIN STREET|MAYOBAN|
    
    123|21|0453|IE|IE|1|CORKKIN|ROBERT|SURNAME|CORK|APTS|CORKKIN|
    
    123|21|0452|IE|IE|1|CORKCOR|NAME|HARRINGTON|DUBLIN|STREET|CORKCOR|
    

    file2.csv:

    MAYOBAN|BANGOR|2400
    
    MAYOBEL|BELLAVARY|2400
    
    CORKKIN|KINSALE|2200
    
    CORKCOR|CORK|2200
    
    DUBLD11|DUBLIN 11|2100
    

    I need a linux bash script to find the value of pos.3 from file2 based on the content of pos7 in file1.

    Example:

    file1, line1, pos 7: MAYOBAN
    find MAYOBAN in file2, return pos 3 (2400)
    

    the output should be something like this:

    **2400**
    
    **2200**
    
    **2200**
    
    **etc...**
    

    Please help Jacek