/bin/cat: Argument list too long
Solution 1
If you want a line-count for each individual file:
find . -type f -exec wc -l {} + | awk '! /^[ 0-9]+[[:space:]]+total$/'
I've excluded the total lines because there will be several of them with this many files being processed. The find ... -exec ... +
will try to fit as many filenames onto a single command line as possible, but that will be a LOT less than 119766 files....probably only several thousand (at most) per invocation of wc
, and each one will result in its own independent 'total' line.
If you want the total number of lines in all files combined, here's one way of doing it:
find . -type f -exec wc -l {} + |
awk '/^[ 0-9]+[[:space:]]+total$/ {print $1}' |
xargs | sed -e 's/ /+/g' | bc
This prints only the line counts on the total lines, pipes that into xargs
to get the counts all on one line, then sed to transform the spaces into +
signs, and then pipes the lot into bc
to do the calculation.
Example output:
$ cd /usr/share/doc
$ find . -type f -exec wc -l {} + |
awk '/^[ 0-9]+[[:space:]]+total$/ {print $1}' |
xargs | sed -e 's/ /+/g' | bc
53358931
Update 2022-05-05
It is better to run wc -l
via sh
. This avoids the risk of problems arising if any of the filenames are called total
....aside from the total line being the last line of wc
's output, there is no way to distinguish an actual total line from the output for a file called "total", so a simple awk script that matches "total" can't work reliably.
To show counts for individual files, excluding totals:
find . -type f -exec sh -c 'wc -l "$@" | sed "\$d"' sh {} +
This runs wc -l
on all filenames and deletes the last line (the "total" line) from each batch run by -exec
.
The $d
in the sed script needs to be escaped because the script is in a double-quoted string instead of the more usual single-quoted string. double-quotes were used because the entire sh -c
is a single-quoted string. It's easier and more readable to just escape one $
symbol than to use '\''
to fake embedding a single-quote inside a single quote.
To show only the totals:
find . -type f -exec sh -c 'wc -l "$@" | awk "END {print \$1}"' sh {} + |
xargs | sed -e 's/ /+/g' | bc
Instead of using sed
to delete the last line from each batch of files passed to wc
via sh
by find ... -exec
, this uses awk
to print only the last lines (the "total") from each batch. The output of find
is then converted to a single line (xargs) with +
characters between each number (sed to transform spaces to +), and then piped into bc
to perform the calculation.
Just like the $d
in the sed script, the $1
in the awk script needs to be escaped because of double-quoting.
Solution 2
Well, to give that cat
from the question a new home, this should do:
find . -type f -exec cat {} + | wc -l
It executes a cat
with the maximum acceptable number of filenames (+
) again and again and pipes everything to wc
. If you do not want to traverse subdirectories, a -maxdepth 1
has to be added to the find command, after the directory.
As an alternative, the --files0-from
option to GNU wc
could be used:
find . -type f -print0 | wc -l --files0-from=- | tail -1
This option makes wc
read not the contents but the filenames from stdin, separated by null characters. With -print0
, find
will print those filenames null-byte separated. As wc
will still print out line counts for every file, it is advisable to skip everything except the summary line at the end, hence the tail
.
Both solutions have the advantage that they will work in any locale, whereas @cas' solutions have to be adapted ('total' is 'insgesamt' in German, e.g.).
Related videos on Youtube
![Milon Corleone](https://lh5.googleusercontent.com/-pvlBAXQm8yQ/AAAAAAAAAAI/AAAAAAAAABA/7w86qJtgLjY/photo.jpg?sz=256)
Milon Corleone
Updated on September 18, 2022Comments
-
Milon Corleone almost 2 years
I have 119766 files in a folder. They are CSV files. I want to find out total number of lines of all files.
I'm trying to run following command:
cat * |wc -l
But the following error occurrs:
-bash: /bin/cat: Argument list too long
How can I do that? Is there any way around this?
One thing I would like to add that total number of lines would be very large.
-
Admin over 8 yearsdo you want the total number of lines for all files, or a count of lines for each individual file?
-
Admin over 8 yearsThe short answer is. You are hitting MAX_ARG limit.
ls, cat, mv
and other commands have this limitations. As the error already tells you, you are providing too many arguments to thecat
command in this case. Usegetconf -a |grep MAX_ARG
to see the MAX_ARG value that applies to your kernel.
-
-
cuonglm over 8 yearsThis fail if a file named
total foo
. -
Alessio over 8 yearsNo, it won't. That's why i used
awk '$2 == "total"'
for an exact match rather than a regexp match. It will only "fail" on filenames that exactly match 'total', and there's really no way around that sincewc
doesn't have options to either exclude totals or print only totals. I've thought several times over the years that such options would be useful, but they don't exist. -
cuonglm over 8 yearsThe exact match is meaningless here.
printf '1 total\n1 total foo\n' | awk '$2 == "total"'
give you two lines. -
Alessio over 8 yearshmmm, yes. i'll change it to
/ total$/
then. which will still "fail" on filenames that are exactly 'total' but as i said there's no avoiding that. -
cuonglm over 8 yearsIt still fail. You can pipe
wc
to other tools to remove the last line. But it still fail with file namedfoo\n99999 bar
. It will add999999
to your result. -
Alessio over 8 years
total
won't be on only the last line when you're dealing with 100000+ files. There will be one total line per few thousand filenames (depending on how many fit on a command line). As for pathological cases involving\n
characters in filenames - anyone who does that has no right to expect anything other than occasional bizarre behavior. actions have consequences: make stupid filenames, get stupid results. -
cuonglm over 8 yearsI mean pipe
wc
only, not the entirefind
command. -
Alessio over 8 yearsyes, i assumed that's what you meant - makes no difference.
find
runswc
on multiple files. each batch of fileswc
is run on will generate a total line. -
cuonglm over 8 yearsThat's why I said you should do something like
-exec sh -c 'wc -l "$@" | sed "$d"' {} +
-
Alessio over 8 yearsYou may have thought that, but this is the first time you've suggested anything like it. I really don't see how it would make any difference, though. the problem is not find or -exec. the ONLY problem is the fact that
wc
doesn't have an option to either exclude totals or print only totals, combined with the fact that it's possible for a filename to be exactly 'total'. -
cuonglm over 8 yearsWhy no difference? Each
wc
invocation have the last line is total line, you excluded it on eachwc
invocation, so your output is only files with their number of lines. -
Pankaj Goyal over 8 yearsHey, folks who are downvoting this suggestion: It would be helpful to know why you think this to be a poor answer so that I can improve it.
-
arun almost 4 yearsThis gave
/usr/bin/find: Argument list too long
(ubuntu 18.04 bash). To clarify, it was searching for files in sub-directories, so like*/*/s*.out
-
mleonard almost 4 years@arun The error message comes from the shell, not find. If you use a wildcard pattern like
*/*/s*.out
, your shell will (try to) expand all matching filenames before even invokingfind
, which makes the wholefind
command kind of futile. It is the same problem as in the question. Something likefind . -type f -name "s*.out" -exec ...
will find all files matchings*.out
in all subdirectories and do something with them. If you need a certain directory depth (e.g. exactly two subdir levels), play with-mindepth
and-maxdepth
options. -
arun almost 4 yearsThx very much for the clarification.