Why is "echo" so much faster than "touch"?
Solution 1
In bash, touch
is an external binary, but echo
is a shell builtin:
$ type echo
echo is a shell builtin
$ type touch
touch is /usr/bin/touch
Since touch
is an external binary, and you invoke touch
once per file, the shell must create 300,000 instances of touch
, which takes a long time.
echo
, however, is a shell builtin, and the execution of shell builtins does not require forking at all. Instead, the current shell does all of the operations and no external processes are created; this is the reason why it is so much faster.
Here are two profiles of the shell's operations. You can see that a lot of time is spent cloning new processes when using touch
. Using /bin/echo
instead of the shell builtin should show a much more comparable result.
Using touch
$ strace -c -- bash -c 'for file in a{1..10000}; do touch "$file"; done'
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
56.20 0.030925 2 20000 10000 wait4
38.12 0.020972 2 10000 clone
4.67 0.002569 0 80006 rt_sigprocmask
0.71 0.000388 0 20008 rt_sigaction
0.27 0.000150 0 10000 rt_sigreturn
[...]
Using echo
$ strace -c -- bash -c 'for file in b{1..10000}; do echo >> "$file"; done'
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
34.32 0.000685 0 50000 fcntl
22.14 0.000442 0 10000 write
19.59 0.000391 0 10011 open
14.58 0.000291 0 20000 dup2
8.37 0.000167 0 20013 close
[...]
Solution 2
As others have answered, using echo
will be faster than touch
as echo
is a command which is commonly (though not required to be) built-in to the shell. Using it dispenses with the kernel overhead associated with running starting a new process for each file that you get with touch
.
However, note that the fastest way to achieve this effect is still to use touch
, but rather than running the program once for each file, it is possible to use the -exec
option with find
to ensure that is only run a few times. This approach will usually be faster since it avoids the overhead associated with a shell loop:
find . -name "*.xml" -exec touch {} +
Using the +
(as opposed to \;
) with find ... -exec
runs the command only once if possible with each file as an argument. If the argument list is very long (as is the case with 300,000 files) multiple runs will be made with an argument list which has a length close to the limit (ARG_MAX
on most systems).
Another advantage to this approach is that it behaves robustly with filenames containing all whitespace characters which is not the case with the original loop.
Solution 3
echo
is a shell builtin. On the other hand, touch
is an external binary.
$ type echo
echo is a shell builtin
$ type touch
touch is hashed (/usr/bin/touch)
Shell builtins are much faster as there is no overhead involved in loading the program, i.e. there is no fork
/exec
involved. As such, you'd observe a significant time difference when executing a builtin vs an external command a large number of times.
This is the reason that utilities like time
are available as shell builtins.
You can get the complete list of shell builtins by saying:
enable -p
As mentioned above, using the utility as opposed to the builtin results in a significant performance degradation. Following are the statistics of the time taken to create ~9000 files using the builtin echo
and the utility echo
:
# Using builtin
$ time bash -c 'for i in {1000..9999}; do echo > $i; done'
real 0m0.283s
user 0m0.100s
sys 0m0.184s
# Using utility /bin/echo
$ time bash -c 'for i in {1000..9999}; do /bin/echo > $i; done'
real 0m8.683s
user 0m0.360s
sys 0m1.428s
Related videos on Youtube
![polym](https://i.stack.imgur.com/7tV0y.jpg?s=256&g=1)
Comments
-
polym almost 2 years
I'm trying to update the timestamp to the current time on all of the xml files in my directory (recursively). I'm using Mac OSX 10.8.5.
On about 300,000 files, the following
echo
command takes 10 seconds:for file in `find . -name "*.xml"`; do echo >> $file; done
However, the following
touch
command takes 10 minutes! :for file in `find . -name "*.xml"`; do touch $file; done
Why is echo so much faster than touch here?
-
Admin about 10 yearsThe answer could be in the source code of these binaries. Will have a look (if i can), but your flags are
linux
andmacosx
. What are your OS/version and FS type ? -
Admin about 10 yearsJust a side remark: You do know that those two commands are not equivalent, don't you? At least for Unix/Linux, the
echo >> $file
will append a newline to$file
and thus modify it. I assume it will be the same for OS/X. If you do not want that, useecho -n >> $file
. -
Admin about 10 yearsAlso wouldn't
touch `find . -name "*.xml"`
be even faster than both of the above? -
Admin about 10 yearsI'm not sure if it's faster, but simpler is
find . -name "*.xml" -execdir touch {} \;
-
Admin about 10 yearsOr consider just
>>$file
-
Admin about 10 years@Dubu
echo -n
is a valid builtin for bash on OS X and on 10.9.2 the timings for/bin/echo
match the timings for/bin/touch
with the shell builtin ofecho
orders of magnitude faster as expected/explained. -
Admin about 10 yearsNot an answer to the explicit question, but why invoke
touch
so many times at all?find . -name '*.xml' -print0 | xargs -0 touch
invokestouch
much fewer times (possibly only once). Works on Linux, should work on OS X. -
Admin about 10 years@elmo argument list too long (easily, with 300.000 files...)
-
Admin about 10 years@Rmano look at Mike's comment. Same principle, but overcomes issue you mention.
-
Admin about 10 yearsMac OS X is POSIX compliant which I hope clarifies all these should's and dunnos. The question really is which shell and has been answered (default: bash).
-
Admin over 8 years@Dubu Please note that
echo -n >> $file
or even>> $file
does not update timestamp. Well, at least on my system and bash. -
Admin over 8 years@vaab Good catch, you are right.
-
-
Michael Mrozek about 10 yearsAnd I think there's an
echo
binary on most systems (for me it's/bin/echo
), so you can retry the timing tests using that instead of the built-in -
gerrit about 10 years
+1
for pointing out the find+
argument. I think many people are not aware of this (I wasn't). -
devnull about 10 years@MichaelMrozek Added timing tests for the builtin and the binary.
-
Barmar about 10 yearsNot all versions of
find
have the+
argument. You can get a similar effect by piping toxargs
. -
Ilmari Karonen about 10 years...specifically,
find . -name "*.xml" -print0 | xargs -0 touch
. -
bmike about 10 yearsDid you compile strace on OS X or run your test on another OS?
-
Graeme about 10 years@Barmar, the
+
part is required by POSIX, so should be portable.-print0
isn't. -
Barmar about 10 yearsI still occasionally run into implementations that don't have it. YMMV.
-
clerksx about 10 years@bmike My test is on Linux, but the principle is identical.
-
clerksx about 10 years@Graeme OpenBSD didn't have it for a long time, if I recall correctly.
-
bmike about 10 yearsI totally agree - see my comment on the main question about how /bin/echo is as slow as /bin/touch so the reasoning is sound. I just wanted to reproduce the timing of strace and failed using dtruss/dtrace and the bash -c syntax doesn't work as expected on OS X either.
-
Graeme about 10 years@ChrisDown, Something I have discovered is that the Busybox
find
has the option available but just treats it like a;
underneath the surface. -
Charles Duffy about 10 years@SplinterReality, it's only POSIX-standardized inside the last decade, so for most of your 20 years you weren't missing anything.