How well does grep/sed/awk perform on very large files?
Generally I would say grep
is the fastest one, sed
is the slowest. Of course this depends on what are you doing exactly. I find awk
much faster than sed
.
You can speed up grep if you don't need real regular expressions but only simple fixed strings (option -F).
If you want to use grep, sed, awk together in pipes, then I would place the grep command first if possible.
For example this:
grep -F "foo" file | sed -n 's/foo/bar/p'
is usually faster than this:
sed -n 's/foo/bar/p' file
Although the grep
in the first line seems unnecessary.
BTW you may speed up these commands using LC_ALL=C
if you are dealing with simple ASCII text files.
Note all my experience is based on the gnu commands. You may also try different implementations and compare the speed.
Related videos on Youtube
Luke Pafford
Updated on September 18, 2022Comments
-
Luke Pafford over 1 year
I was wondering if grep, sed, and awk were viable tools for finding data in very large files.
Lets say I have a 1TB file. If i wanted to process the text in that file, what would the time frame look like if i used the individual commands grep, sed, and awk, as well as mixing them together.
Obviously a specific answer is not possible since the results will vary based on hardware specs, but if i could get a general estimation that would be helpful.
-
Sundeep over 7 yearsunix.stackexchange.com/questions/303044/… might help
-
marcolz over 7 yearsSomething that can have a very large influence: if you know that the file contains only 7bit ASCII, set LANG=C (as opposed to some .utf8 string) in the environment before running the command.
-