What does "xargs grep" do?

74,399

Solution 1

$ find . -name '*.c' | grep 'stdlib.h'

This pipes the output (stdout)* from find to (stdin of)* grep 'stdlib.h' as text (ie the filenames are treated as text). grep does its usual thing and finds the matching lines in this text (any file names which themselves contain the pattern). The contents of the files are never read.

$ find . -name '*.c' | xargs grep 'stdlib.h'

This constructs a command grep 'stdlib.h' to which each result from find is an argument - so this will look for matches inside each file found by find (xargs can be thought of as turning its stdin into arguments to the given commands)*

Use -type f in your find command, or you will get errors from grep for matching directories. Also, if the filenames have spaces, xargs will screw up badly, so use the null separator by adding -print0 and xargs -0 for more reliable results:

find . -type f -name '*.c' -print0 | xargs -0 grep 'stdlib.h'

*added these extra explanatory points as suggested in comment by @cat

Solution 2

xargs takes its standard input and turns it into command line args.

find . -name '*.c' | xargs grep 'stdlib.h' is very similar to

grep 'stdlib.h' $(find . -name '*.c')  # UNSAFE, DON'T USE

And will give the same results as long as the list of filenames isn't too long for a single command line. (Linux supports megabytes of text on a single command line, so usually you don't need xargs.)


But both of these suck, because they break if your filenames contain spaces. Instead, find -print0 | xargs -0 works, but so does

find . -name '*.c' -exec grep 'stdlib.h' {} +

That never pipes the filenames anywhere: find batches them up into a big command line and runs grep directly.

\; instead of + runs grep separately for each file, which is much slower. Don't do that. But + is a GNU extension, so you need xargs to do this efficiently if you can't assume GNU find.


If you leave out xargs, find | grep does its pattern matching against the list of filenames that find prints.

So at that point, you might as well just do find -name stdlib.h. Of course, with -name '*.c' -name stdlib.h, you won't get any output because those patterns can't both match, and find's default behaviour is to AND the rules together.

Substitute less at any point in the process to see what output any part of the pipeline produces.


Further reading: http://mywiki.wooledge.org/BashFAQ has some great stuff.

Solution 3

In general, xargs is used for cases where you would pipe (with the symbol |) something from one command to the other (Command1 | Command2), but the output from the first command is not correctly received as the input for the second command.

This typically happens when the second command does not handle data input through Standard In (stdin) correctly (eg: Multiple lines as input, the way the lines are setup, the characters used as input, multiple parameters as input, the data type received as input, etc..). To give you a quick example, test the following:

Example 1:

ls | echo - This will not do anything since echo does not know how to handle the input he is receiving. Now in this case if we use xargs it will process the input in a way that can be handled correctly by echo (eg: As a single line of information)

ls | xargs echo - This will output all the information from ls in a single line

Example 2:

Let's say I have multiple goLang files inside a folder called go. I would look for them with something like this:

find go -name *.go -type f | echo - But if the pipe symbol there and the echo at the end, it would not work.

find go -name *.go -type f | xargs echo - Here it would work thanks to xargs but if I wanted each response from the find command in a single line, I would do the following:

find go -name *.go -type f | xargs -0 echo - In this case, the same output from find would be shown by echo.

Commands like cp, echo, rm, less and others that need a better way to handle the input get a benefit when used with xargs.

Solution 4

xargs is used to auto generate command line arguments based (usually) on a list of files.

So considering some alternatives to using the followoing xargs command:

find . -name '*.c' -print0 | xargs -0 grep 'stdlib.h'

There are several reasons to use it instead of other options that weren't originally mentioned in other answers:

  1. find . -name '*.c' -exec grep 'stdlib.h' {}\; will generate one grep process for every file—this is generally considered bad practice, and may put a big load on the system if there are many files found.
  2. If there are a lot of files, a grep 'stdlib.h' $(find . -name '*.c') command will likely fail, because the output of the $(...) operation will exceed the maximum command line length of the shell

As mentioned in other answers, the reason for using the -print0 argument to find in this scenario and the -0 argument to xargs, is so that filenames with certain characters (e.g. quotes, spaces or even newlines) are still handled correctly.

Share:
74,399

Related videos on Youtube

AlphaOmega
Author by

AlphaOmega

Updated on September 18, 2022

Comments

  • AlphaOmega
    AlphaOmega over 1 year

    I know the grep command and I am learning about the functionalities of xargs, so I read through this page which gives some examples on how to use the xargs command.

    I am confused by the last example, example 10. It says "The xargs command executes the grep command to find all the files (among the files provided by find command) that contained a string ‘stdlib.h’"

    $ find . -name '*.c' | xargs grep 'stdlib.h'
    ./tgsthreads.c:#include
    ./valgrind.c:#include
    ./direntry.c:#include
    ./xvirus.c:#include
    ./temp.c:#include
    ...
    ...
    ...
    

    However, what is the difference to simply using

    $ find . -name '*.c' | grep 'stdlib.h'
    

    ?

    Obviously, I am still struggling with what exactly xargs is doing, so any help is appreciated!

  • cat
    cat over 7 years
    you might consider noting (because it seems you omitted) the key point that | pipes stdout to grep's stdin which is not the same as grep's arguments and gives confusing results.
  • Peter Cordes
    Peter Cordes over 7 years
    Or use GNU find's find -name '*.c' -exec grep stdlib.h {} +. I pretty much never actually use xargs. Also surprised nobody mentioned that xargs serves a similar purpose to grep $(find) command substitution, so I wrote an answer of my own. Explaining xargs as command substitution with fewer limitations and problems seems natural.
  • ilkkachu
    ilkkachu over 7 years
    GNU xargs also has -d to set the separator, so you can use -d'\n' to handle a newline-separated list, which may be useful if you handle a list of file names in a file, etc. (as long as the file names don't have newlines in them, that is.)
  • Peter Cordes
    Peter Cordes over 7 years
    @ilkkachu: yeah, newlines in filenames are a lot more rare than spaces, since they break most scripts. myfunc(){ local IFS=$'\n'; fgrep stdlib.h` $(find); } also works with the same effect. Or as a one-liner, a (IFS=...; cmd...) subshell also works to contain the change to IFS without having to save/restore it.
  • lsd
    lsd over 7 years
    One situation I use xargs on is if I am deleting a lot of files as a result of find. If you just do -exec rm, it will run rm on each file one at a time, which is very inefficient. Piping to xargs will do them all at once with one rm. Limiting with say -n50 (do 50 at a time) could prevent command line overflow (problem with a lot of files).
  • Peter Cordes
    Peter Cordes almost 7 years
    @lsd: Why not find -delete for that special case? Or for commands other than rm, if you have GNU find, then -exec some_command {} + groups into batches like xargs, instead the \; behaviour of running the command separate for each.
  • Sergiy Kolodyazhnyy
    Sergiy Kolodyazhnyy almost 7 years
    @lsd find runs command on each file if and only if it's using -exec command \; Both xargs and -exec command \+ will call the command with maximum number of arguments allowed by the system. In other words, they're equivalent
  • Sergiy Kolodyazhnyy
    Sergiy Kolodyazhnyy almost 7 years
    @PeterCordes Please don't do command $( find ) type of stuff. Problematic filenames with spaces and special characters can break this type of thing. At the very least double quote the command substitution.
  • Peter Cordes
    Peter Cordes almost 7 years
    @SergiyKolodyazhnyy: Thanks for pointing out that it looks like I'm actually recommending doing that. People skimming might have copy/pasted that instead of reading the next section. Updated to address that.
  • Peter Cordes
    Peter Cordes almost 7 years
    @SergiyKolodyazhnyy: Or were you replying to my comment? Notice that I set IFS so it's equivalent to using xargs '-d\n'. Glob expansion and shell metacharacter processing happens before command substitution's effects, so I think it's safe even with filenames that contain $() or >. Agreed that using word-splitting on command substitution is not good practice except for one-off interactive use where you know something about the filenames. But command "$(find)" is only useful if you expect it to produce exactly 1 filename...
  • Sergiy Kolodyazhnyy
    Sergiy Kolodyazhnyy almost 7 years
    @PeterCordes I was actually referring to the example in the answer. Setting IFS also has its own issues, because its possible to have filenames with \n in them. For instance,touch with$'\n'newline ; IFS='\n' fgrep 'stdlib.h' $(find) and touch with$'\n'newline ; IFS='\n' fgrep 'stdlib.h' "$(find)" still produce errors
  • Sergiy Kolodyazhnyy
    Sergiy Kolodyazhnyy almost 7 years
    find -type f -print0 | xargs -0 fgrep 'stdlib' is by far the best option. That or find -type f -exec fgrep 'stdlib' {} \+ . In any case, the point is that when you use $() you rely on shell to properly pass filenames, which doesn't work out property because there's many things to consider. When you use xargs and -exec , that's being handled find or find and xargs alone, so the shell plays almost no role there.
  • Peter Cordes
    Peter Cordes almost 7 years
    @SergiyKolodyazhnyy: I completely agree with you; if you think my answer is giving the wrong impression to beginners, please suggest how to fix it, or just edit it yourself (I might roll-back some of it, though). I know filenames can contain newlines. See my answer, and my first comment. Agreed that messing around with IFS or using xargs '-d\n' is not the best approach for this. But it can be useful for interactive use if you know what filenames you're running it on.
  • lsd
    lsd almost 7 years
    Yes, sorry, I have a habit sometimes of not using gnu extensions because I used to use other systems that didn't have the gnu tools. I have had a problem with command line arguments too long with xargs without putting a limit, but that was some time ago and the behavior may have changed since then.