How well does grep/sed/awk perform on very large files?

text-processing sed awk grep performance

6,333

Generally I would say grep is the fastest one, sed is the slowest. Of course this depends on what are you doing exactly. I find awk much faster than sed.

You can speed up grep if you don't need real regular expressions but only simple fixed strings (option -F).

If you want to use grep, sed, awk together in pipes, then I would place the grep command first if possible.

For example this:

grep -F "foo" file | sed -n 's/foo/bar/p'

is usually faster than this:

sed -n 's/foo/bar/p' file

Although the grep in the first line seems unnecessary.

BTW you may speed up these commands using LC_ALL=C if you are dealing with simple ASCII text files.

Note all my experience is based on the gnu commands. You may also try different implementations and compare the speed.

6,333

Luke Pafford

Updated on September 18, 2022

Comments

Luke Pafford over 1 year

I was wondering if grep, sed, and awk were viable tools for finding data in very large files.

Lets say I have a 1TB file. If i wanted to process the text in that file, what would the time frame look like if i used the individual commands grep, sed, and awk, as well as mixing them together.

Obviously a specific answer is not possible since the results will vary based on hardware specs, but if i could get a general estimation that would be helpful.
- Sundeep over 7 years
  
  unix.stackexchange.com/questions/303044/… might help
- marcolz over 7 years
  
  Something that can have a very large influence: if you know that the file contains only 7bit ASCII, set LANG=C (as opposed to some .utf8 string) in the environment before running the command.