Basic sed command on large one-line file: couldn't re-allocate memory

text-processing sed performance large-files out-of-memory

10,247

Solution 1

Yes, use tr instead:

tr 'a' 'b' < file.txt > output.txt

sed deals in lines so a huge line will cause it problems. I expect it is declaring a variable internally to hold the line and your input exceeds the maximum size allocated to that variable.

tr on the other hand deals with characters and should be able to handle arbitrarily long lines correctly.

Solution 2

Historical versions of sed and awk had memory problems, these have mostly been fixed in more recent versions, but one of the classic occurrences of this problem hit Larry Wall pretty hard. his answer was to write a new programming language - with no memory limits other than hardware. He called it perl. your specific problem can be solved more simply, but the general rule of thumb I use is when sed won't use perl.

Edit: by request an example:

perl -pe "s/a/b/g" < one-line-250-mb.txt

or for less memory usage:

perl -e 'BEGIN{$/=\32768}' -pe "s/a/b/g" < one-line-250-mb.txt

Solution 3

It is not a "proper way", but in some scenarios we can split the file, replace and join again. Example:

split -b 50M -d big_file big_file_part
sed -i 's/a/b/g' big_file_part*
cat big_file_part* >file

I successfully made replace in ~100 GB file.

But we need extra space on disk (to make a file copy).

10,247

Vitalij

I am Nicolas Raoul, IT consultant in Tokyo. Feel free to copy/paste the source code from my StackExchange answers, I release it to the public domain.

Updated on September 18, 2022

Comments

Vitalij almost 2 years
I have a 250 MB text file, all in one line.

In this file I want to replace a characters with b characters:
```
sed -e "s/a/b/g" < one-line-250-mb.txt
```
It fails with:
```
sed: couldn't re-allocate memory
```
It seems to me that this kind of task could be performed inline without allocating much memory.
Is there a better tool for the job, or a better way to use sed?

GNU sed version 4.2.1
Ubuntu 12.04.2 LTS
1 GB RAM
- JackDaniels over 10 years
  
  possible duplicate of Out of memory while using sed with multiline expressions on giant file
- Vitalij over 10 years
  
  That question is about a very complex multiline expression. My question is about the most basic expression you could imagine.
- terdon over 10 years
  
  @RubanSavvy plus, neither of the answers on the other Q take into account the long line and in fact, both would probably have the same issue.
- slm over 10 years
  
  Can you include your sed version in this Q and also your hardware info (RAM specifically) and distro version?
- U. Windl over 2 years
  
  A partial ltrace would be interesting.
slm over 10 years

Curiously I just created a 250MB file filled w/ "abcabc..." and was able to do sed -e "s/a/z/g" b.txt > c.txt without any issues. Using sed (GNU sed) 4.2.2.
terdon over 10 years

@slm same here on a 496M file and same sed version, guess it depends on implementation or hardware.
slm over 10 years

Yeah if I had to gander a guess we're dealing with an older version of sed.
Michael Mrozek over 10 years

This whole paragraph boils down to "Perl.". Some details would be nice, or at least an example or something
hildred over 10 years

@MichaelMrozek I realize that hat collection does tend lead to roboediting, but I figured with your reputation you would pay a little closer attention. Specifically in that the specific problem had already been solved, in a very narrow way, that would not help the majority of people searching, so I added an answer for the general case. the expanded answer I provided would have helped Nicolas Raoul If there hadn't already been a workable solution, but I doubt It would help very many others, whereas my original answer would help everyone who reached the limits of sed. If you disagree I'll delete
clerksx over 10 years

@hildred I don't think it's too much to ask that you could assume good faith of the moderators when they are making valid comments on your answer, without resorting immediately to accusations of ulterior motives (hats, really?!).
Michael Mrozek over 10 years

@ChrisDown On the contrary -- I'm in it entirely for the hats. Also this was flagged as not an answer by multiple people, but that's a distant second priority to the hats
Tomislav Nakic-Alfirevic almost 5 years

The second one with the memory limitation did the trick (for my 2.5GB 1-line file): thanks! Bit disappointed by sed, though. :\
Harold Fischer over 4 years

@hildred Where can I learn more about the perl command that uses less memory? The number 32,768- is that bytes? Is it specifying how much memory is being allotted to perl?
hildred over 4 years

@HaroldFischer, I think that was in the man page. What it does is a fixed size block read, so that instead of loading all 250mb into ram then doing the substitution it does multiple 32k reads with a substitution after each. the main drawback of this approach is that matches across blocks don't happen, although not a problem for single character matches.