How can I search a CSV file for a value, then get another value associated with it?

7,535

Solution 1

The following awk code does what you ask for:

#!/bin/bash

filetosearch=myfile.csv
searchString=${1:-anotherLabel}

awk -F',' -v pat="$searchString" '
BEGIN{patl=tolower(pat);flag=0};
{prev=$1}(tolower($0)==patl){flag=1;exit}
END{
     if(flag){
          print prev
             }else{
          printf("%s%s%s\n", prev+1,FS,pat) >> ARGV[1]    # use ARGIND in gawk.
          print prev+1
             }
   }' "${filetosearch}"

Search for an string "${searchString}" that exactly match a full line (change tolower($0)==patl to tolower($0)~patl to match more loosely) and reports at which index it was found. If the string is not matched, it is added (appended) to the file used with an index that is one more than the last index of the file.

Example:

$ ./script aLabel
445

$ ./script anotherLabel
446

$ ./script MissingLabel
450

$ cat myfile.csv
445,aLabel
446,anotherLabel
447,aThirdLabel
448,dhdhdhdhdhd
449,anotherLabel4646
450,MissingLabel

Solution 2

If you wanted to do this in pure bash:

file="myfile.csv"
seek="anotherLabel"
while IFS=, read id label; do
    if [[ $label == "$seek" ]]; then
        myid=$id
        break
    fi
    lastid=$id
done < "$file"
if [[ -z $myid ]]; then
    myid=$((lastid + 1))
    echo "$myid,$seek" >> "$file"
fi
echo "$seek id is: $myid"

Solution 3

Try this out:

 awk -F',' '{ if ($2 == "anotherLabel") { print $1 } }' myfile.csv

Solution 4

Using grep and tail:

search="anotherLabel"
file=myfile.csv

if value=$(grep -Pio -m1 "^[0-9]+(?=,$search$)" "$file"); then
    echo "do something with $value"
elif lastvalue=$(tail -n1 "$file" | grep -o '^[0-9]\+'); then
    # append lastvalue + 1 and search string
    echo "$((++lastvalue)),$search" >> "$file"
else
    # handle error
    echo "error. no integer value in last line of \"$file\" found." >&2
fi

In the first grep the following options are used:

  • -P enable Perl-compatible regular expression (PCRE) to use a positive lookahead (see below).
  • -i ignore case in pattern
  • -o only print the matching part of the line
  • -m1 stop after the first match

The first regex ^[0-9]+(?=,$search$) is using a positive lookahead (?=pattern) to match a number followed by , and the search string without comma and search string being part of the match itself. In combination with option -o only the matching part (the number) is printed.

Solution 5

This is what you need:

$ awk -v tgt='the string you want to find' '
    BEGIN { FS=OFS="," }
    tolower($2) == tolower(tgt) { print $1 | "cat>&2"; f=1 }
    { print; p=$1 }
    END { if (!f) { print ++p, tgt; print p | "cat>&2"} exit !f }
' file

For example:

$ var=$( { awk -v tgt='anotherLabel' 'BEGIN{FS=OFS=","} tolower($2) == tolower(tgt){print $1 | "cat>&2"; f=1} {print; p=$1} END {if (!f) { print ++p, tgt; print p | "cat>&2"} exit !f}' file > out1; } 2>&1 )
$ echo "exit status: $?, value found: $var"
exit status: 0, value found: 446
$ cat out1
445,aLabel
446,anotherLabel
447,aThirdLabel

$ var=$( { awk -v tgt='missingLabel' 'BEGIN{FS=OFS=","} tolower($2) == tolower(tgt){print $1 | "cat>&2"; f=1} {print; p=$1} END {if (!f) { print ++p, tgt; print p | "cat>&2"} exit !f}' file > out1; } 2>&1 )
$ echo "exit status: $?, value found: $var"
exit status: 1, value found: 448
$ cat out1
445,aLabel
446,anotherLabel
447,aThirdLabel
448,missingLabel

The above will print any $1 that matches a $2 or is newly added to the end of the file to stderr (which is then captured in a variable, "var") set the exit status to success if the target string was found, failure otherwise, and print the whole file to stdout with the missing value added if appropriate.

Share:
7,535

Related videos on Youtube

mariahm24
Author by

mariahm24

Updated on September 18, 2022

Comments

  • mariahm24
    mariahm24 over 1 year

    myfile.csv

    445,aLabel
    446,anotherLabel
    447,aThirdLabel
    

    In a bash script, I want to search for the presence of "anotherLabel" (in a case-insensitive way) within myfile.csv and get the value 446.  Additionally, if "anotherLabel" isn't present, I want to add it to the end, incrementing the token of the previous line by 1.

    I started with the if statement:

    if grep -Fiq anotherLabel myfile.csv; then
        #get the value of field 1 (446 in this example) and store it in a variable
    else
        #increment the last value of field 1 present in the file and store it in a variable
        #and append "448,anotherLabel" to myfile.csv
    fi
    

    I'm not sure that using grep to verify whether the label is even in the file is the best way to approach this or if there is an easier method using sed or awk.

    • Jeff Schaller
      Jeff Schaller almost 5 years
      Maybe instead of saying "446" as if it's static, say "the value of field 1". For the "else" case where "anotherLabel" does not exist, do you need to set a variable?
    • mariahm24
      mariahm24 almost 5 years
      Yes, thanks for the suggestions. In the else case, I do need to process the uninitialized value of anotherLabel, but i also want to add it to the file in case i ever need to look it up again.
  • done
    done almost 5 years
    Ed: awk can write to file directly printf (...) >> ARGV[1] for the first file used, or ARGIND in gawk for an arbitrary list of files. Using stderr looks a lot more complex.
  • Ed Morton
    Ed Morton almost 5 years
    You mean from the END section, right? If so that's true, I just generally avoid writing to the file I'm reading unless absolutely necessary and I've thought through all of the scenarios that might end up corrupting the input file. I agree using stderr in this case is probably more complex than necessary (though I haven't thought through all possible scenarios), just force of habit as it will work in all cases.
  • Ed Morton
    Ed Morton almost 5 years
    @Isaac one example just occurred to me of a problem scenario for writing to ARGV[1]: if the input file isn't writable then the script outputs the number it tried to add as if it had added it but the resulting file doesn't have the new line added to it. It's these kind of edge cases that make me try to keep the file manipulation (which files get modified and how THOSE errors get handled) in shell and the text manipulation (what text gets changed and how THOSE errors are handled) in awk as much as possible.
  • done
    done almost 5 years
    Many thanks for the explanation, much apreciated.
  • done
    done almost 5 years
    Sorry, but on testing, if the file is not-writable, awk emits a clear and fatal error (no more input will be processed). Can't think of a case were this is not enough. Care to review? @EdMorton
  • Ed Morton
    Ed Morton almost 5 years
    My point was that if you write to a new file then you get the desired output while if you try to write to an unwritable file then you don't. Also if the directory is unwritable then directing to a new file will fail before awk runs rather than aborting while it's running. In both cases trying to write an an ARGV[1] that's not writable could abort the execution after potentially having written partial results to other files in writable locations. Directing to a new file produces a preferable result in both cases.