How can I search a CSV file for a value, then get another value associated with it?
Solution 1
The following awk code does what you ask for:
#!/bin/bash
filetosearch=myfile.csv
searchString=${1:-anotherLabel}
awk -F',' -v pat="$searchString" '
BEGIN{patl=tolower(pat);flag=0};
{prev=$1}(tolower($0)==patl){flag=1;exit}
END{
if(flag){
print prev
}else{
printf("%s%s%s\n", prev+1,FS,pat) >> ARGV[1] # use ARGIND in gawk.
print prev+1
}
}' "${filetosearch}"
Search for an string "${searchString}"
that exactly match a full line (change tolower($0)==patl
to tolower($0)~patl
to match more loosely) and reports at which index it was found. If the string is not matched, it is added (appended) to the file used with an index that is one more than the last index of the file.
Example:
$ ./script aLabel
445
$ ./script anotherLabel
446
$ ./script MissingLabel
450
$ cat myfile.csv
445,aLabel
446,anotherLabel
447,aThirdLabel
448,dhdhdhdhdhd
449,anotherLabel4646
450,MissingLabel
Solution 2
If you wanted to do this in pure bash:
file="myfile.csv"
seek="anotherLabel"
while IFS=, read id label; do
if [[ $label == "$seek" ]]; then
myid=$id
break
fi
lastid=$id
done < "$file"
if [[ -z $myid ]]; then
myid=$((lastid + 1))
echo "$myid,$seek" >> "$file"
fi
echo "$seek id is: $myid"
Solution 3
Try this out:
awk -F',' '{ if ($2 == "anotherLabel") { print $1 } }' myfile.csv
Solution 4
Using grep
and tail
:
search="anotherLabel"
file=myfile.csv
if value=$(grep -Pio -m1 "^[0-9]+(?=,$search$)" "$file"); then
echo "do something with $value"
elif lastvalue=$(tail -n1 "$file" | grep -o '^[0-9]\+'); then
# append lastvalue + 1 and search string
echo "$((++lastvalue)),$search" >> "$file"
else
# handle error
echo "error. no integer value in last line of \"$file\" found." >&2
fi
In the first grep
the following options are used:
-P
enable Perl-compatible regular expression (PCRE) to use a positive lookahead (see below).-i
ignore case in pattern-o
only print the matching part of the line-m1
stop after the first match
The first regex ^[0-9]+(?=,$search$)
is using a positive lookahead (?=pattern)
to match a number followed by ,
and the search string without comma and search string being part of the match itself.
In combination with option -o
only the matching part (the number) is printed.
Solution 5
This is what you need:
$ awk -v tgt='the string you want to find' '
BEGIN { FS=OFS="," }
tolower($2) == tolower(tgt) { print $1 | "cat>&2"; f=1 }
{ print; p=$1 }
END { if (!f) { print ++p, tgt; print p | "cat>&2"} exit !f }
' file
For example:
$ var=$( { awk -v tgt='anotherLabel' 'BEGIN{FS=OFS=","} tolower($2) == tolower(tgt){print $1 | "cat>&2"; f=1} {print; p=$1} END {if (!f) { print ++p, tgt; print p | "cat>&2"} exit !f}' file > out1; } 2>&1 )
$ echo "exit status: $?, value found: $var"
exit status: 0, value found: 446
$ cat out1
445,aLabel
446,anotherLabel
447,aThirdLabel
$ var=$( { awk -v tgt='missingLabel' 'BEGIN{FS=OFS=","} tolower($2) == tolower(tgt){print $1 | "cat>&2"; f=1} {print; p=$1} END {if (!f) { print ++p, tgt; print p | "cat>&2"} exit !f}' file > out1; } 2>&1 )
$ echo "exit status: $?, value found: $var"
exit status: 1, value found: 448
$ cat out1
445,aLabel
446,anotherLabel
447,aThirdLabel
448,missingLabel
The above will print any $1 that matches a $2 or is newly added to the end of the file to stderr (which is then captured in a variable, "var") set the exit status to success if the target string was found, failure otherwise, and print the whole file to stdout with the missing value added if appropriate.
Related videos on Youtube
mariahm24
Updated on September 18, 2022Comments
-
mariahm24 over 1 year
myfile.csv
445,aLabel 446,anotherLabel 447,aThirdLabel
In a bash script, I want to search for the presence of "anotherLabel" (in a case-insensitive way) within myfile.csv and get the value 446. Additionally, if "anotherLabel" isn't present, I want to add it to the end, incrementing the token of the previous line by 1.
I started with the
if
statement:if grep -Fiq anotherLabel myfile.csv; then #get the value of field 1 (446 in this example) and store it in a variable else #increment the last value of field 1 present in the file and store it in a variable #and append "448,anotherLabel" to myfile.csv fi
I'm not sure that using grep to verify whether the label is even in the file is the best way to approach this or if there is an easier method using sed or awk.
-
Jeff Schaller almost 5 yearsMaybe instead of saying "446" as if it's static, say "the value of field 1". For the "else" case where "anotherLabel" does not exist, do you need to set a variable?
-
mariahm24 almost 5 yearsYes, thanks for the suggestions. In the else case, I do need to process the uninitialized value of anotherLabel, but i also want to add it to the file in case i ever need to look it up again.
-
-
done almost 5 yearsEd: awk can write to file directly
printf (...) >> ARGV[1]
for the first file used, or ARGIND in gawk for an arbitrary list of files. Using stderr looks a lot more complex. -
Ed Morton almost 5 yearsYou mean from the END section, right? If so that's true, I just generally avoid writing to the file I'm reading unless absolutely necessary and I've thought through all of the scenarios that might end up corrupting the input file. I agree using stderr in this case is probably more complex than necessary (though I haven't thought through all possible scenarios), just force of habit as it will work in all cases.
-
Ed Morton almost 5 years@Isaac one example just occurred to me of a problem scenario for writing to ARGV[1]: if the input file isn't writable then the script outputs the number it tried to add as if it had added it but the resulting file doesn't have the new line added to it. It's these kind of edge cases that make me try to keep the file manipulation (which files get modified and how THOSE errors get handled) in shell and the text manipulation (what text gets changed and how THOSE errors are handled) in awk as much as possible.
-
done almost 5 yearsMany thanks for the explanation, much apreciated.
-
done almost 5 yearsSorry, but on testing, if the file is not-writable, awk emits a clear and fatal error (no more input will be processed). Can't think of a case were this is not enough. Care to review? @EdMorton
-
Ed Morton almost 5 yearsMy point was that if you write to a new file then you get the desired output while if you try to write to an unwritable file then you don't. Also if the directory is unwritable then directing to a new file will fail before awk runs rather than aborting while it's running. In both cases trying to write an an ARGV[1] that's not writable could abort the execution after potentially having written partial results to other files in writable locations. Directing to a new file produces a preferable result in both cases.