grep using array values and make it faster

8,086

Solution 1

You can use grep with file pattern option (-f)

Example:

$ echo -e "apple\nsony\nsamsung" > file_pattern
$ grep -f file_pattern your.csv

EDIT: In response of your new contraints:

sed 's/^/\^/g' $itemsFile > /tmp/pattern_file
while IFS=$';' read -r -a array
do
    echo ${array[1]} | grep -q -f /tmp/pattern_file.txt
    if [ $? -eq 0 ]; then 
        # here I do something with ${array[2]}, ${array[4]} line by line and so on, 
        # so I can't match the whole file $file_in at once but online line by line.
    fi
done < $file_in

Solution 2

There are two errors in your script:

  • grep tries to match the string $itemToFind because you put it between single quote '. Use double-quote instead.

  • you are using an array from index 1 while help read tells it is starting at zero.

This should give this:

while IFS=$';' read -r -a array
do
    mapfile -t arrayItems < $itemsFile
    ## now loop through the above array
    for itemToFind in "${arrayItems[@]}"
    do
       itemFound=""
       itemFound=$(echo ${array[0]} | grep -o "$itemToFind")
       if [ -n "$itemFound" ] 
       then 
          echo $itemFound 
          # so end to search in case the item is found
          break
       fi
    done
done < $file_in

EDIT:

If you want to make it faster, you can use extended regular expressions :

grep -E 'apple|sony|samsung' $file_in

And if you want to display only brands:

grep -E 'apple|sony|samsung' $file_in | awk '{print $1}'
Share:
8,086
Kintaro
Author by

Kintaro

Updated on September 18, 2022

Comments

  • Kintaro
    Kintaro over 1 year

    array[1] is a string pulled from a 30k lines CSV: example:

    samsung black 2014
    

    I need match those lines with one of the values contained within an array (arrayItems).

    arrayItems contains 221 values like:

    apple
    sony
    samsung
    

    The actual script:

    while IFS=$';' read -r -a array
    do
        mapfile -t arrayItems < $itemsFile
        ## now loop through the above array
        for itemToFind in "${arrayItems[@]}"
        do
           itemFound=""
           itemFound="$(echo ${array[1]} | grep -o '^$itemToFind')"
           if [ -n "$itemFound" ] 
           then 
              echo $itemFound 
              # so end to search in case the item is found
              break
           fi
        done
       # here I do something with ${array[2]}, ${array[4]} line by line and so on, 
       # so I can't match the whole file $file_in at once but online line by line.
    done < $file_in
    

    The problem is that grep don't match.

    but works If I try to hardcode $itemToFind like this:

    itemFound="$(echo ${array[1]} | grep -o '^samsung')"
    

    Another thing is... how to do it faster as $file_in is a 30k lines CSV?

    • Thor
      Thor over 5 years
      If you want better answers, you need to provide a better example. You would also benefit from reading Raymond's smart question essay
    • lauhub
      lauhub over 5 years
      Can you provide an example of lines from the CSV file ?
  • lauhub
    lauhub over 5 years
    I think you miss the -e option for echo
  • Kintaro
    Kintaro over 5 years
    I use ${array[1]} because in the ${array[1]} doesn't contain the data I need from the CSV. ${array[0]} contain the first item of the column (which in this case is a reference code), I need the second item (which is the item name). Plus, the first while do other things during every loop (I'm going to add some code in the question)
  • lauhub
    lauhub over 5 years
    I suggest you to add the line echo array0=${array[0]} array1=${array[1]} in your loop and check what happens (to me, ${array[0]} is the complete line, as read separates entries with newline characters)
  • Kintaro
    Kintaro over 5 years
    $file_in is a CSV with ; as a separator (as you can see the 1st while have: IFS=$';'), ${array[0]} contains the first value of the line, ${array[1]} the 2nd and so no. p.s. I just edited the question code.
  • Kintaro
    Kintaro over 5 years
    I need to check it line by line. (question code updated)
  • Kintaro
    Kintaro over 5 years
    Yes, this is working very fast! I found it here too. Now the only thing I miss is the ^ in the regex (I edited again, sorry)
  • apapillon
    apapillon over 5 years
    If you want check if line start with pattern, you need to add ^ at the start of each line of $itemsFile. You can use sed -i 's/^/\^/g' $itemsFile. Be careful, this command change your file.
  • lauhub
    lauhub over 5 years
    @Kintaro Did changing the single-quote help ?
  • Kintaro
    Kintaro over 5 years
    yes double quotes helped but then I switched to the -f option
  • Kusalananda
    Kusalananda over 5 years
    @Kintaro Why do you need to check it line by line? This is already what grep does.