How can I grep while avoiding 'Too many arguments'

14,478

Solution 1

Run several instances of grep. Instead of

grep -i [email protected] 1US* | awk '{...}' | xargs rm

do

(for i in 1US*; do grep -li user@domain "$i"; done) | xargs rm

Note the -l flag, since we only want the file name of the match. This will both speed up grep (terminate on first match) and makes your awk script unrequired. This could be improved by checking the return status of grep and calling rm, not using xargs (xargs is very fragile, IMO). I'll give you the better version if you ask.

Hope it helps.

Solution 2

you can use find to find all files which name's starting with the pattern '1US'. Then you can pipe the output to xargs which will take care, that the argument list will not growing to much and handle the grep call. Note that I've used a nullbyte to separate filenames for xargs. This avoids problems with problematic file names. ;)

find -maxdepth 1 -name '1US*' -printf '%f\0' | xargs -0 grep -u user@domain | awk ...

Solution 3

The -exec argument to find is useful here, I've used this myself in similar situations.

E.g.

# List the files that match
find /path/to/input/ -type f -exec grep -qiF [email protected] \{\} \; -print
# Once you're sure you've got it right
find /path/to/input/ -type f -exec grep -qiF [email protected] \{\} \; -delete

Solution 4

Using xargs is more efficient than using "find ... -exec grep" because you have less process creations etc.

One way to go about this would be:

ls 1US* | xargs grep -i [email protected] | awk -F: '{print $1}' | xargs rm

But easier would be:

find . -iname "1US*" -exec rm {} \;
Share:
14,478
Justin S
Author by

Justin S

Updated on July 10, 2022

Comments

  • Justin S
    Justin S almost 2 years

    I was trying to clean out some spam email and ran into an issue. The amount of files in queue, were so large that my usual command was unable to process. It would give me an error about too many arguments.

    I usually do this

    grep -i [email protected] 1US* | awk -F: '{print $1}' | xargs rm
    

    1US* can be anything between 1US[a-zA-Z]. The only thing I could make work was running this horrible contraption. Its one file, with 1USa, 1USA, 1USb etc, through the entire alphabet. I know their has to be a way to run this more efficiently.

    grep -s $SPAMMER /var/mailcleaner/spool/exim_stage1/input/1USa* | awk -F: '{print $1}' | xargs rm
    grep -s $SPAMMER /var/mailcleaner/spool/exim_stage1/input/1USA* | awk -F: '{print $1}' | xargs rm
    
  • chepner
    chepner about 11 years
    -print0 is a shortcut for -printf '%f\0'.
  • chepner
    chepner about 11 years
    This just shifts the too-long argument list from the grep to the for statement.
  • Guido
    Guido about 11 years
    But those are not actual arguments, they're inner to bash. I just made a folder with 1M files and tested: guido@solid:~/a$ ls * -bash: /bin/ls: Argument list too long guido@solid:~/a$ for i in *; do ls $i; done 1 10 100 ....
  • Justin S
    Justin S about 11 years
    How would I get around xargs?
  • Guido
    Guido almost 11 years
    (for i in 1US*; do if grep -qi user@domain "$i"; then rm "$i"; fi; done). This would check the return value from grep to see if there was a match and delete the file if so. Also added the -q option to supress output of grep (we don't need to pipe it and we don't want it). If you ever get a 'too many arguments' error again (I tried with a million files with no problem, but just in case) you'd be better off using find to delete the files.