Why is the command "find | grep 'filename'" so much slower than " find 'filename' "?
Solution 1
(I'm assuming GNU find
here)
Using just
find filename
would be quick, because it would just return filename
, or the names inside filename
if it's a directory, or an error if that name did not exist in the current directory. It's a very quick operation, similar to ls filename
(but recursive if filename
is a directory).
In contrast,
find | grep filename
would allow find
to generate a list of all names from the current directory and below, which grep
would then filter. This would obviously be a much slower operation.
I'm assuming that what was actually intended was
find . -type f -name 'filename'
This would look for filename
as the name of a regular file anywhere in the current directory or below.
This will be as quick (or comparably quick) as find | grep filename
, but the grep
solution would match filename
against the full path of each found name, similarly to what -path '*filename*'
would do with find
.
The confusion comes from a misunderstanding of how find
works.
The utility takes a number of paths and returns all names beneath these paths.
You may then restrict the returned names using various tests that may act on the filename, the path, the timestamp, the file size, the file type, etc.
When you say
find a b c
you ask find
to list every name available under the three paths a
, b
and c
. If these happens to be names of regular files in the current directory, then these will be returned. If any of them happens to be the name of a directory, then it will be returned along with all further names inside that directory.
When I do
find . -type f -name 'filename'
This generates a list of all names in the current directory (.
) and below. Then it restricts the names to those of regular files, i.e. not directories etc., with -type f
. Then there is a further restriction to names that matches filename
using -name 'filename'
. The string filename
may be a filename globbing pattern, such as *.txt
(just remember to quote it!).
Example:
The following seems to "find" the file called .profile
in my home directory:
$ pwd
/home/kk
$ find .profile
.profile
But in fact, it just returns all names at the path .profile
(there is only one name, and that is of this file).
Then I cd
up one level and try again:
$ cd ..
$ pwd
/home
$ find .profile
find: .profile: No such file or directory
The find
command can now not find any path called .profile
.
However, if I get it to look at the current directory, and then restrict the returned names to only .profile
, it finds it from there as well:
$ pwd
/home
$ find . -name '.profile'
./kk/.profile
Solution 2
Non-Technical explanation: Looking for Jack in a crowd is faster than looking for everyone in a crowd and eliminating all from consideration except Jack.
Solution 3
I have not understood the problem yet but can provide some more insights.
Like for Kusalananda the find | grep
call is clearly faster on my system which does not make much sense. At first I assumed some kind of buffering problem; that writing to the console slows down the time to the next syscall for reading the next file name. Writing to a pipe is very fast: about 40MiB/s even for 32-byte writes (on my rather slow system; 300 MiB/s for a block size of 1MiB). Thus I assumed that find
can read from the file system faster when writing to a pipe (or file) so that the two operations reading file paths and writing to the console could run in parallel (which find
as a single thread process cannot do on its own.
It's find
's fault
Comparing the two calls
:> time find "$HOME"/ -name '*.txt' >/dev/null
real 0m0.965s
user 0m0.532s
sys 0m0.423s
and
:> time find "$HOME"/ >/dev/null
real 0m0.653s
user 0m0.242s
sys 0m0.405s
shows that find
does something incredibly stupid (whatever that may be). It just turns out to be quite incompetent at executing -name '*.txt'
.
Might depend on the input / output ratio
You might think that find -name
wins if there is very little to write. But ist just gets more embarrassing for find
. It loses even if there is nothing to write at all against 200K files (13M of pipe data) for grep
:
time find /usr -name lwevhewoivhol
find
can be as fast as grep
, though
It turns out that find
's stupidity with name
does not extend to other tests. Use a regex instead and the problem is gone:
:> time find "$HOME"/ -regex '\.txt$' >/dev/null
real 0m0.679s
user 0m0.264s
sys 0m0.410s
I guess this can be considered a bug. Anyone willing to file a bug report? My version is find (GNU findutils) 4.6.0
Related videos on Youtube
yoyo_fun
Updated on September 18, 2022Comments
-
yoyo_fun over 1 year
I tried both commands and the command
find | grep 'filename'
is many many times slower than the simplefind 'filename'
command.What would be a proper explanation for this behavior?
-
Raman Sailopal over 6 yearsYou are listing every file with find and then passing the data to grep to process. With find used on it's own you are missing the step of passing every listed file to grep to parse the output. This will therefore be quicker.
-
Kusalananda over 6 yearsSlower in what sense? Does the commands take a different amount of time to complete?
-
yoyo_fun over 6 years@Kusalananda Yes it takes much longer to complete.
-
Kusalananda over 6 yearsI can't reproduce this locally. If anything,
time find "$HOME" -name '.profile'
reports a longer time thantime find "$HOME" | grep -F '.profile'
. (17s vs. 12s). -
yoyo_fun over 6 years@Kusalananda Are you sure it is not a caching issue that is causing this behavior? Which command did you execute first? Also, for me the command find " $HOME | grep -F '.profile' " found much more results than "find "$HOME" -name '.profile' "
-
yoyo_fun over 6 years@Kusalananda If you repeat the search more times the latter results will be faster.
-
Kusalananda over 6 years@JenniferAnderson I ran both repeatedly. The 17 and 12 seconds are averages. And yes, the
grep
variation will match anywhere in thefind
result, whereas matching withfind -name
would only match exactly (in this case). -
Sundeep over 6 yearsenclose your code samples within backticks... and add exact command used, haven't seen
find 'filename'
syntax used before.. some experiment made seems that it searches only current directory not subdirectories, whilefind | grep
will have to traverse through all files in current and subdirectories recursively -
Kusalananda over 6 yearsYes,
find filename
would be fast. I kinda assumed that this was a typo and that the OP meantfind -name filename
. Withfind filename
, onlyfilename
would be examined (and nothing else). -
yoyo_fun over 6 years@Kusalananda but what does the -name option do?
-
Dave Sherohman over 6 yearsThe
-name
option instructsfind
to return all files it finds which match the provided name. e.g.,find . -name TODO
would give you all files namedTODO
in the current directory or any of its subdirectories. -
yoyo_fun over 6 years@DaveSherohman But isn't this exactly what the
file
command does without the-name
option? -
Kusalananda over 6 years@JenniferAnderson No, see my updated answer.
-
Dave Sherohman over 6 years@JenniferAnderson - Nope.
find filename
looks at the one specific directory entryfilename
(recursing into it if it's a directory) and returns every file it finds.find . -name filename
looks at the current directory (recursing into subdirectories) and returns only files namedfilename
. Comparefind /etc
vs.find /etc -name passwd
to see the difference. (And note that, if you're only looking for one specific file at one specific path, usingfind
at all is overkill.ls
will do the job just as well, and likely with less overhead.)
-
-
Stéphane Chazelas over 6 years
find filename
would return onlyfilename
iffilename
was not of type directory (or was of type directory, but did not have any entry itself) -
Hauke Laging over 6 yearsHave you given that a try?
-
Paranoid over 6 yearsgrep isn't a string comparison, its regular expression comparison which means it has to work its way through the entire string until it either finds a match or reaches the end. The directory lookups are the same no matter what.
-
psmears over 6 yearsHow repeatable are your timings? If you did the
-name
test first, then it may have been slower due to the directory contents not being cached. (When testing-name
and-regex
I find they take roughly the same time, at least once the cache effect has been taken into consideration. Of course it may just be a different version offind
...) -
Hauke Laging over 6 years@psmears Of course, I have done these tests several times. The caching problem has been mentioned even in the comments to the question before the first answer. My
find
version is find (GNU findutils) 4.6.0 -
pipe over 6 years@Paranoid Hm, what version of find are you talking about? It's apparently not anything like the find I'm used to in debian.
-
Kusalananda over 6 yearsThe problem is that the OP is expecting Jack to be the only person in the crowd. If it is, they're lucky.
find jack
will listjack
if it's a file calledjack
, or all names in the directory if it's a directory. It's a misunderstanding of howfind
works. -
Barmar over 6 yearsWhy is it surprising that adding
-name '*.txt'
slows downfind
? It has to do extra work, testing each filename. -
Hauke Laging over 6 years@Barmar One the one hand this extra work can be done extremely fast. On the other hand this extra work saves other work.
find
has to write less data. And writing to a pipe is a much slower operation. -
Barmar over 6 yearsWriting to a disk is very slow, writing to a pipe is not so bad, it just copies to a kernel buffer. Notice that in your first test, writing more to
/dev/null
somehow used less system time.