Why is "echo" so much faster than "touch"?

17,945

Solution 1

In bash, touch is an external binary, but echo is a shell builtin:

$ type echo
echo is a shell builtin
$ type touch
touch is /usr/bin/touch

Since touch is an external binary, and you invoke touch once per file, the shell must create 300,000 instances of touch, which takes a long time.

echo, however, is a shell builtin, and the execution of shell builtins does not require forking at all. Instead, the current shell does all of the operations and no external processes are created; this is the reason why it is so much faster.

Here are two profiles of the shell's operations. You can see that a lot of time is spent cloning new processes when using touch. Using /bin/echo instead of the shell builtin should show a much more comparable result.


Using touch

$ strace -c -- bash -c 'for file in a{1..10000}; do touch "$file"; done'
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 56.20    0.030925           2     20000     10000 wait4
 38.12    0.020972           2     10000           clone
  4.67    0.002569           0     80006           rt_sigprocmask
  0.71    0.000388           0     20008           rt_sigaction
  0.27    0.000150           0     10000           rt_sigreturn
[...]

Using echo

$ strace -c -- bash -c 'for file in b{1..10000}; do echo >> "$file"; done'
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 34.32    0.000685           0     50000           fcntl
 22.14    0.000442           0     10000           write
 19.59    0.000391           0     10011           open
 14.58    0.000291           0     20000           dup2
  8.37    0.000167           0     20013           close
[...]

Solution 2

As others have answered, using echo will be faster than touch as echo is a command which is commonly (though not required to be) built-in to the shell. Using it dispenses with the kernel overhead associated with running starting a new process for each file that you get with touch.

However, note that the fastest way to achieve this effect is still to use touch, but rather than running the program once for each file, it is possible to use the -exec option with find to ensure that is only run a few times. This approach will usually be faster since it avoids the overhead associated with a shell loop:

find . -name "*.xml" -exec touch {} +

Using the + (as opposed to \;) with find ... -exec runs the command only once if possible with each file as an argument. If the argument list is very long (as is the case with 300,000 files) multiple runs will be made with an argument list which has a length close to the limit (ARG_MAX on most systems).

Another advantage to this approach is that it behaves robustly with filenames containing all whitespace characters which is not the case with the original loop.

Solution 3

echo is a shell builtin. On the other hand, touch is an external binary.

$ type echo
echo is a shell builtin
$ type touch
touch is hashed (/usr/bin/touch)

Shell builtins are much faster as there is no overhead involved in loading the program, i.e. there is no fork/exec involved. As such, you'd observe a significant time difference when executing a builtin vs an external command a large number of times.

This is the reason that utilities like time are available as shell builtins.

You can get the complete list of shell builtins by saying:

enable -p

As mentioned above, using the utility as opposed to the builtin results in a significant performance degradation. Following are the statistics of the time taken to create ~9000 files using the builtin echo and the utility echo:

# Using builtin
$ time bash -c 'for i in {1000..9999}; do echo > $i; done'

real    0m0.283s
user    0m0.100s
sys 0m0.184s

# Using utility /bin/echo
$ time bash -c 'for i in {1000..9999}; do /bin/echo > $i; done'

real    0m8.683s
user    0m0.360s
sys 0m1.428s
Share:
17,945

Related videos on Youtube

polym
Author by

polym

[email protected]

Updated on September 18, 2022

Comments

  • polym
    polym almost 2 years

    I'm trying to update the timestamp to the current time on all of the xml files in my directory (recursively). I'm using Mac OSX 10.8.5.

    On about 300,000 files, the following echo command takes 10 seconds:

    for file in `find . -name "*.xml"`; do echo >> $file; done
    

    However, the following touch command takes 10 minutes! :

    for file in `find . -name "*.xml"`; do touch $file; done
    

    Why is echo so much faster than touch here?

    • Admin
      Admin about 10 years
      The answer could be in the source code of these binaries. Will have a look (if i can), but your flags are linux and macosx. What are your OS/version and FS type ?
    • Admin
      Admin about 10 years
      Just a side remark: You do know that those two commands are not equivalent, don't you? At least for Unix/Linux, the echo >> $file will append a newline to $file and thus modify it. I assume it will be the same for OS/X. If you do not want that, use echo -n >> $file.
    • Admin
      Admin about 10 years
      Also wouldn't touch `find . -name "*.xml"` be even faster than both of the above?
    • Admin
      Admin about 10 years
      I'm not sure if it's faster, but simpler is find . -name "*.xml" -execdir touch {} \;
    • Admin
      Admin about 10 years
      Or consider just >>$file
    • Admin
      Admin about 10 years
      @Dubu echo -n is a valid builtin for bash on OS X and on 10.9.2 the timings for /bin/echo match the timings for /bin/touch with the shell builtin of echo orders of magnitude faster as expected/explained.
    • Admin
      Admin about 10 years
      Not an answer to the explicit question, but why invoke touch so many times at all? find . -name '*.xml' -print0 | xargs -0 touch invokes touch much fewer times (possibly only once). Works on Linux, should work on OS X.
    • Admin
      Admin about 10 years
      @elmo argument list too long (easily, with 300.000 files...)
    • Admin
      Admin about 10 years
      @Rmano look at Mike's comment. Same principle, but overcomes issue you mention.
    • Admin
      Admin about 10 years
      Mac OS X is POSIX compliant which I hope clarifies all these should's and dunnos. The question really is which shell and has been answered (default: bash).
    • Admin
      Admin over 8 years
      @Dubu Please note that echo -n >> $file or even >> $file does not update timestamp. Well, at least on my system and bash.
    • Admin
      Admin over 8 years
      @vaab Good catch, you are right.
  • Michael Mrozek
    Michael Mrozek about 10 years
    And I think there's an echo binary on most systems (for me it's /bin/echo), so you can retry the timing tests using that instead of the built-in
  • gerrit
    gerrit about 10 years
    +1 for pointing out the find + argument. I think many people are not aware of this (I wasn't).
  • devnull
    devnull about 10 years
    @MichaelMrozek Added timing tests for the builtin and the binary.
  • Barmar
    Barmar about 10 years
    Not all versions of find have the + argument. You can get a similar effect by piping to xargs.
  • Ilmari Karonen
    Ilmari Karonen about 10 years
    ...specifically, find . -name "*.xml" -print0 | xargs -0 touch.
  • bmike
    bmike about 10 years
    Did you compile strace on OS X or run your test on another OS?
  • Graeme
    Graeme about 10 years
    @Barmar, the + part is required by POSIX, so should be portable. -print0 isn't.
  • Barmar
    Barmar about 10 years
    I still occasionally run into implementations that don't have it. YMMV.
  • clerksx
    clerksx about 10 years
    @bmike My test is on Linux, but the principle is identical.
  • clerksx
    clerksx about 10 years
    @Graeme OpenBSD didn't have it for a long time, if I recall correctly.
  • bmike
    bmike about 10 years
    I totally agree - see my comment on the main question about how /bin/echo is as slow as /bin/touch so the reasoning is sound. I just wanted to reproduce the timing of strace and failed using dtruss/dtrace and the bash -c syntax doesn't work as expected on OS X either.
  • Graeme
    Graeme about 10 years
    @ChrisDown, Something I have discovered is that the Busybox find has the option available but just treats it like a ; underneath the surface.
  • Charles Duffy
    Charles Duffy about 10 years
    @SplinterReality, it's only POSIX-standardized inside the last decade, so for most of your 20 years you weren't missing anything.