/bin/cat: Argument list too long

14,781

Solution 1

If you want a line-count for each individual file:

find . -type f -exec wc -l {} + | awk '! /^[ 0-9]+[[:space:]]+total$/'

I've excluded the total lines because there will be several of them with this many files being processed. The find ... -exec ... + will try to fit as many filenames onto a single command line as possible, but that will be a LOT less than 119766 files....probably only several thousand (at most) per invocation of wc, and each one will result in its own independent 'total' line.

If you want the total number of lines in all files combined, here's one way of doing it:

find . -type f -exec wc -l {} + | 
    awk '/^[ 0-9]+[[:space:]]+total$/ {print $1}' | 
    xargs | sed -e 's/ /+/g' | bc

This prints only the line counts on the total lines, pipes that into xargs to get the counts all on one line, then sed to transform the spaces into + signs, and then pipes the lot into bc to do the calculation.

Example output:

$ cd /usr/share/doc
$ find . -type f -exec wc -l {} + | 
    awk '/^[ 0-9]+[[:space:]]+total$/ {print $1}' | 
    xargs | sed -e 's/ /+/g' | bc 
53358931

Update 2022-05-05

It is better to run wc -l via sh. This avoids the risk of problems arising if any of the filenames are called total....aside from the total line being the last line of wc's output, there is no way to distinguish an actual total line from the output for a file called "total", so a simple awk script that matches "total" can't work reliably.

To show counts for individual files, excluding totals:

find . -type f -exec sh -c 'wc -l "$@" | sed "\$d"' sh {} +

This runs wc -l on all filenames and deletes the last line (the "total" line) from each batch run by -exec.

The $d in the sed script needs to be escaped because the script is in a double-quoted string instead of the more usual single-quoted string. double-quotes were used because the entire sh -c is a single-quoted string. It's easier and more readable to just escape one $ symbol than to use '\'' to fake embedding a single-quote inside a single quote.

To show only the totals:

find . -type f -exec sh -c 'wc -l "$@" | awk "END {print \$1}"' sh {} + |
  xargs | sed -e 's/ /+/g' | bc

Instead of using sed to delete the last line from each batch of files passed to wc via sh by find ... -exec, this uses awk to print only the last lines (the "total") from each batch. The output of find is then converted to a single line (xargs) with + characters between each number (sed to transform spaces to +), and then piped into bc to perform the calculation.

Just like the $d in the sed script, the $1 in the awk script needs to be escaped because of double-quoting.

Solution 2

Well, to give that cat from the question a new home, this should do:

find . -type f -exec cat {} + | wc -l

It executes a cat with the maximum acceptable number of filenames (+) again and again and pipes everything to wc. If you do not want to traverse subdirectories, a -maxdepth 1 has to be added to the find command, after the directory.

As an alternative, the --files0-from option to GNU wc could be used:

find . -type f -print0 | wc -l --files0-from=- | tail -1

This option makes wc read not the contents but the filenames from stdin, separated by null characters. With -print0, find will print those filenames null-byte separated. As wc will still print out line counts for every file, it is advisable to skip everything except the summary line at the end, hence the tail.

Both solutions have the advantage that they will work in any locale, whereas @cas' solutions have to be adapted ('total' is 'insgesamt' in German, e.g.).

Share:
14,781

Related videos on Youtube

Milon Corleone
Author by

Milon Corleone

Updated on September 18, 2022

Comments

  • Milon Corleone
    Milon Corleone almost 2 years

    I have 119766 files in a folder. They are CSV files. I want to find out total number of lines of all files.

    I'm trying to run following command:

    cat * |wc -l
    

    But the following error occurrs:

    -bash: /bin/cat: Argument list too long

    How can I do that? Is there any way around this?

    One thing I would like to add that total number of lines would be very large.

    • Admin
      Admin over 8 years
      do you want the total number of lines for all files, or a count of lines for each individual file?
    • Admin
      Admin over 8 years
      The short answer is. You are hitting MAX_ARG limit. ls, cat, mv and other commands have this limitations. As the error already tells you, you are providing too many arguments to the cat command in this case. Use getconf -a |grep MAX_ARG to see the MAX_ARG value that applies to your kernel.
  • cuonglm
    cuonglm over 8 years
    This fail if a file named total foo.
  • Alessio
    Alessio over 8 years
    No, it won't. That's why i used awk '$2 == "total"' for an exact match rather than a regexp match. It will only "fail" on filenames that exactly match 'total', and there's really no way around that since wc doesn't have options to either exclude totals or print only totals. I've thought several times over the years that such options would be useful, but they don't exist.
  • cuonglm
    cuonglm over 8 years
    The exact match is meaningless here. printf '1 total\n1 total foo\n' | awk '$2 == "total"' give you two lines.
  • Alessio
    Alessio over 8 years
    hmmm, yes. i'll change it to / total$/ then. which will still "fail" on filenames that are exactly 'total' but as i said there's no avoiding that.
  • cuonglm
    cuonglm over 8 years
    It still fail. You can pipe wc to other tools to remove the last line. But it still fail with file named foo\n99999 bar. It will add 999999 to your result.
  • Alessio
    Alessio over 8 years
    total won't be on only the last line when you're dealing with 100000+ files. There will be one total line per few thousand filenames (depending on how many fit on a command line). As for pathological cases involving \n characters in filenames - anyone who does that has no right to expect anything other than occasional bizarre behavior. actions have consequences: make stupid filenames, get stupid results.
  • cuonglm
    cuonglm over 8 years
    I mean pipe wc only, not the entire find command.
  • Alessio
    Alessio over 8 years
    yes, i assumed that's what you meant - makes no difference. find runs wc on multiple files. each batch of files wc is run on will generate a total line.
  • cuonglm
    cuonglm over 8 years
    That's why I said you should do something like -exec sh -c 'wc -l "$@" | sed "$d"' {} +
  • Alessio
    Alessio over 8 years
    You may have thought that, but this is the first time you've suggested anything like it. I really don't see how it would make any difference, though. the problem is not find or -exec. the ONLY problem is the fact that wc doesn't have an option to either exclude totals or print only totals, combined with the fact that it's possible for a filename to be exactly 'total'.
  • cuonglm
    cuonglm over 8 years
    Why no difference? Each wc invocation have the last line is total line, you excluded it on each wc invocation, so your output is only files with their number of lines.
  • Pankaj Goyal
    Pankaj Goyal over 8 years
    Hey, folks who are downvoting this suggestion: It would be helpful to know why you think this to be a poor answer so that I can improve it.
  • arun
    arun almost 4 years
    This gave /usr/bin/find: Argument list too long (ubuntu 18.04 bash). To clarify, it was searching for files in sub-directories, so like */*/s*.out
  • mleonard
    mleonard almost 4 years
    @arun The error message comes from the shell, not find. If you use a wildcard pattern like */*/s*.out, your shell will (try to) expand all matching filenames before even invoking find, which makes the whole find command kind of futile. It is the same problem as in the question. Something like find . -type f -name "s*.out" -exec ... will find all files matching s*.out in all subdirectories and do something with them. If you need a certain directory depth (e.g. exactly two subdir levels), play with -mindepth and -maxdepth options.
  • arun
    arun almost 4 years
    Thx very much for the clarification.