grep using array values and make it faster
Solution 1
You can use grep with file pattern option (-f)
Example:
$ echo -e "apple\nsony\nsamsung" > file_pattern
$ grep -f file_pattern your.csv
EDIT: In response of your new contraints:
sed 's/^/\^/g' $itemsFile > /tmp/pattern_file
while IFS=$';' read -r -a array
do
echo ${array[1]} | grep -q -f /tmp/pattern_file.txt
if [ $? -eq 0 ]; then
# here I do something with ${array[2]}, ${array[4]} line by line and so on,
# so I can't match the whole file $file_in at once but online line by line.
fi
done < $file_in
Solution 2
There are two errors in your script:
grep tries to match the string
$itemToFind
because you put it between single quote'
. Use double-quote instead.you are using an array from index 1 while
help read
tells it is starting at zero.
This should give this:
while IFS=$';' read -r -a array
do
mapfile -t arrayItems < $itemsFile
## now loop through the above array
for itemToFind in "${arrayItems[@]}"
do
itemFound=""
itemFound=$(echo ${array[0]} | grep -o "$itemToFind")
if [ -n "$itemFound" ]
then
echo $itemFound
# so end to search in case the item is found
break
fi
done
done < $file_in
EDIT:
If you want to make it faster, you can use extended regular expressions :
grep -E 'apple|sony|samsung' $file_in
And if you want to display only brands:
grep -E 'apple|sony|samsung' $file_in | awk '{print $1}'
Kintaro
Updated on September 18, 2022Comments
-
Kintaro over 1 year
array[1] is a string pulled from a 30k lines CSV: example:
samsung black 2014
I need match those lines with one of the values contained within an array (arrayItems).
arrayItems contains 221 values like:
apple sony samsung
The actual script:
while IFS=$';' read -r -a array do mapfile -t arrayItems < $itemsFile ## now loop through the above array for itemToFind in "${arrayItems[@]}" do itemFound="" itemFound="$(echo ${array[1]} | grep -o '^$itemToFind')" if [ -n "$itemFound" ] then echo $itemFound # so end to search in case the item is found break fi done # here I do something with ${array[2]}, ${array[4]} line by line and so on, # so I can't match the whole file $file_in at once but online line by line. done < $file_in
The problem is that grep don't match.
but works If I try to hardcode $itemToFind like this:
itemFound="$(echo ${array[1]} | grep -o '^samsung')"
Another thing is... how to do it faster as $file_in is a 30k lines CSV?
-
Thor over 5 yearsIf you want better answers, you need to provide a better example. You would also benefit from reading Raymond's smart question essay
-
lauhub over 5 yearsCan you provide an example of lines from the CSV file ?
-
-
lauhub over 5 yearsI think you miss the
-e
option for echo -
Kintaro over 5 yearsI use ${array[1]} because in the ${array[1]} doesn't contain the data I need from the CSV. ${array[0]} contain the first item of the column (which in this case is a reference code), I need the second item (which is the item name). Plus, the first while do other things during every loop (I'm going to add some code in the question)
-
lauhub over 5 yearsI suggest you to add the line
echo array0=${array[0]} array1=${array[1]}
in your loop and check what happens (to me,${array[0]}
is the complete line, asread
separates entries with newline characters) -
Kintaro over 5 years
$file_in
is a CSV with;
as a separator (as you can see the 1st while have:IFS=$';'
),${array[0]}
contains the first value of the line,${array[1]}
the 2nd and so no. p.s. I just edited the question code. -
Kintaro over 5 yearsI need to check it line by line. (question code updated)
-
Kintaro over 5 yearsYes, this is working very fast! I found it here too. Now the only thing I miss is the
^
in the regex (I edited again, sorry) -
apapillon over 5 yearsIf you want check if line start with pattern, you need to add ^ at the start of each line of $itemsFile. You can use
sed -i 's/^/\^/g' $itemsFile
. Be careful, this command change your file. -
lauhub over 5 years@Kintaro Did changing the single-quote help ?
-
Kintaro over 5 yearsyes double quotes helped but then I switched to the -f option
-
Kusalananda over 5 years@Kintaro Why do you need to check it line by line? This is already what
grep
does.