Basic sed command on large one-line file: couldn't re-allocate memory

10,247

Solution 1

Yes, use tr instead:

tr 'a' 'b' < file.txt > output.txt

sed deals in lines so a huge line will cause it problems. I expect it is declaring a variable internally to hold the line and your input exceeds the maximum size allocated to that variable.

tr on the other hand deals with characters and should be able to handle arbitrarily long lines correctly.

Solution 2

Historical versions of sed and awk had memory problems, these have mostly been fixed in more recent versions, but one of the classic occurrences of this problem hit Larry Wall pretty hard. his answer was to write a new programming language - with no memory limits other than hardware. He called it perl. your specific problem can be solved more simply, but the general rule of thumb I use is when sed won't use perl.

Edit: by request an example:

perl -pe "s/a/b/g" < one-line-250-mb.txt

or for less memory usage:

perl -e 'BEGIN{$/=\32768}' -pe "s/a/b/g" < one-line-250-mb.txt

Solution 3

It is not a "proper way", but in some scenarios we can split the file, replace and join again. Example:

split -b 50M -d big_file big_file_part
sed -i 's/a/b/g' big_file_part*
cat big_file_part* >file

I successfully made replace in ~100 GB file.

But we need extra space on disk (to make a file copy).

Share:
10,247

Related videos on Youtube

Vitalij
Author by

Vitalij

I am Nicolas Raoul, IT consultant in Tokyo. Feel free to copy/paste the source code from my StackExchange answers, I release it to the public domain.

Updated on September 18, 2022

Comments

  • Vitalij
    Vitalij almost 2 years

    I have a 250 MB text file, all in one line.

    In this file I want to replace a characters with b characters:

    sed -e "s/a/b/g" < one-line-250-mb.txt
    

    It fails with:

    sed: couldn't re-allocate memory
    

    It seems to me that this kind of task could be performed inline without allocating much memory.
    Is there a better tool for the job, or a better way to use sed?


    GNU sed version 4.2.1
    Ubuntu 12.04.2 LTS
    1 GB RAM

    • JackDaniels
      JackDaniels over 10 years
    • Vitalij
      Vitalij over 10 years
      That question is about a very complex multiline expression. My question is about the most basic expression you could imagine.
    • terdon
      terdon over 10 years
      @RubanSavvy plus, neither of the answers on the other Q take into account the long line and in fact, both would probably have the same issue.
    • slm
      slm over 10 years
      Can you include your sed version in this Q and also your hardware info (RAM specifically) and distro version?
    • U. Windl
      U. Windl over 2 years
      A partial ltrace would be interesting.
  • slm
    slm over 10 years
    Curiously I just created a 250MB file filled w/ "abcabc..." and was able to do sed -e "s/a/z/g" b.txt > c.txt without any issues. Using sed (GNU sed) 4.2.2.
  • terdon
    terdon over 10 years
    @slm same here on a 496M file and same sed version, guess it depends on implementation or hardware.
  • slm
    slm over 10 years
    Yeah if I had to gander a guess we're dealing with an older version of sed.
  • Michael Mrozek
    Michael Mrozek over 10 years
    This whole paragraph boils down to "Perl.". Some details would be nice, or at least an example or something
  • hildred
    hildred over 10 years
    @MichaelMrozek I realize that hat collection does tend lead to roboediting, but I figured with your reputation you would pay a little closer attention. Specifically in that the specific problem had already been solved, in a very narrow way, that would not help the majority of people searching, so I added an answer for the general case. the expanded answer I provided would have helped Nicolas Raoul If there hadn't already been a workable solution, but I doubt It would help very many others, whereas my original answer would help everyone who reached the limits of sed. If you disagree I'll delete
  • clerksx
    clerksx over 10 years
    @hildred I don't think it's too much to ask that you could assume good faith of the moderators when they are making valid comments on your answer, without resorting immediately to accusations of ulterior motives (hats, really?!).
  • Michael Mrozek
    Michael Mrozek over 10 years
    @ChrisDown On the contrary -- I'm in it entirely for the hats. Also this was flagged as not an answer by multiple people, but that's a distant second priority to the hats
  • Tomislav Nakic-Alfirevic
    Tomislav Nakic-Alfirevic almost 5 years
    The second one with the memory limitation did the trick (for my 2.5GB 1-line file): thanks! Bit disappointed by sed, though. :\
  • Harold Fischer
    Harold Fischer over 4 years
    @hildred Where can I learn more about the perl command that uses less memory? The number 32,768- is that bytes? Is it specifying how much memory is being allotted to perl?
  • hildred
    hildred over 4 years
    @HaroldFischer, I think that was in the man page. What it does is a fixed size block read, so that instead of loading all 250mb into ram then doing the substitution it does multiple 32k reads with a substitution after each. the main drawback of this approach is that matches across blocks don't happen, although not a problem for single character matches.