How well does grep/sed/awk perform on very large files?

6,333

Generally I would say grep is the fastest one, sed is the slowest. Of course this depends on what are you doing exactly. I find awk much faster than sed.

You can speed up grep if you don't need real regular expressions but only simple fixed strings (option -F).

If you want to use grep, sed, awk together in pipes, then I would place the grep command first if possible.

For example this:

grep -F "foo" file | sed -n 's/foo/bar/p'

is usually faster than this:

sed -n 's/foo/bar/p' file

Although the grep in the first line seems unnecessary.

BTW you may speed up these commands using LC_ALL=C if you are dealing with simple ASCII text files.

Note all my experience is based on the gnu commands. You may also try different implementations and compare the speed.

Share:
6,333

Related videos on Youtube

Luke Pafford
Author by

Luke Pafford

Updated on September 18, 2022

Comments

  • Luke Pafford
    Luke Pafford over 1 year

    I was wondering if grep, sed, and awk were viable tools for finding data in very large files.

    Lets say I have a 1TB file. If i wanted to process the text in that file, what would the time frame look like if i used the individual commands grep, sed, and awk, as well as mixing them together.

    Obviously a specific answer is not possible since the results will vary based on hardware specs, but if i could get a general estimation that would be helpful.

    • Sundeep
      Sundeep over 7 years
    • marcolz
      marcolz over 7 years
      Something that can have a very large influence: if you know that the file contains only 7bit ASCII, set LANG=C (as opposed to some .utf8 string) in the environment before running the command.