Why is xargs necessary?

9,884

Solution 1

You are confusing two very different kinds of input: STDIN and arguments. Arguments are a list of strings provided to the command as it starts, usually by specifying them after the command name (e.g. echo these are some arguments or rm file1 file2). STDIN, on the other hand, is a stream of bytes (sometimes text, sometimes not) that the command can (optionally) read after it starts. Here are some examples (note that cat can take either arguments or STDIN, but it does different things with them):

echo file1 file2 | cat    # Prints "file1 file2", since that's the stream of
                          # bytes that echo passed to cat's STDIN
cat file1 file2    # Prints the CONTENTS of file1 and file2
echo file1 file2 | rm    # Prints an error message, since rm expects arguments
                         # and doesn't read from STDIN

xargs can be thought of as converting STDIN-style input to arguments:

echo file1 file2 | cat    # Prints "file1 file2"
echo file1 file2 | xargs cat    # Prints the CONTENTS of file1 and file2

echo actually does more-or-less the opposite: it converts its arguments to STDOUT (which can be piped to some other command's STDIN):

echo file1 file2 | echo    # Prints a blank line, since echo doesn't read from STDIN
echo file1 file2 | xargs echo    # Prints "file1 file2" -- the first echo turns
                                 # them from arguments into STDOUT, xargs turns
                                 # them back into arguments, and the second echo
                                 # turns them back into STDOUT
echo file1 file2 | xargs echo | xargs echo | xargs echo | xargs echo    # Similar,
                                 # except that it converts back and forth between
                                 # args and STDOUT several times before finally
                                 # printing "file1 file2" to STDOUT.

Solution 2

cat takes input from STDIN and rm does not. For such commands you need xargs to iterate through STDIN line by line and execute the commands with command line parameters.

Solution 3

Minimal example

My goal is to provide a slightly clearer example than https://superuser.com/a/600273/128124

For educational purposes, let's start by using -n2, which limits xargs to use just 2 arguments per invocation.

Then if you run:

printf '1 2 3 4' | xargs -n2 echo

it supplies 2 arguments at a time to echo and is equivalent to:

echo 1 2
echo 3 4

which produces:

1 2
3 4

This is exactly the same as if you had a file:

notes.txt

1
2
3
4

and called:

xargs -n2 echo < notes.txt

Alternative approaches and why xargs is superior

With that in mind, let's consider the alternatives and why xargs is better.

One thing you could try is:

echo $(cat notes.txt)

which expands to:

echo 1 2 3 4

However, this is problematic because there is a maximum size for the command line arguments of a Linux program.

xargs knows about this, and automatically splits arguments intelligently to overcome that.

Another simple approach you could try would be:

while IFS="" read -r p || [ -n "$p" ]
do
  echo "$p"
done < notes.txt

from: https://stackoverflow.com/questions/1521462/looping-through-the-content-of-a-file-in-bash but this requires a lot of typing, and could be much slower because the executable echo is called many times, and some time is spent on a possibly slower bash loop.

To make xargs even more interesting, the GNU version that a -P option for parallel operation!

Related: https://unix.stackexchange.com/questions/24954/when-is-xargs-needed

Share:
9,884

Related videos on Youtube

seewalker
Author by

seewalker

Updated on September 18, 2022

Comments

  • seewalker
    seewalker over 1 year

    Suppose I want to remove all files in a directory except for one named "notes.txt". I would do this with the pipeline, ls | grep -v "notes.txt" | xargs rm. Why do I need xargs if the output of the second pipe is the input that rm should use?

    For the sake of comparison, the pipeline, echo "#include <knowledge.h>" | cat > foo.c inserts the echoed text into the file without the use of xargs. What is the difference between these two pipelines?

    • Admin
      Admin almost 11 years
      You should not use ls | grep -v "notes.txt" | xargs rm to remove everything except for notes.txt, or in general, never parse ls output. Your command would break if a single file contained a space, for example. The safer way would be rm !(notes.txt) in Bash (with shopt -s extglob set), or rm ^notes.txt in Zsh (with EXTENDED_GLOB) etc.
    • Admin
      Admin almost 10 years
      To avoid spaces you could do find . -maxdepth 1 -mindepth 1 -print0 | xargs -0 instead of ls | xargs :-)