sed in-place line deletion on full filesystem?

81

Solution 1

The -i option doesn't really overwrite the original file. It creates a new file with the output, then renames it to the original filename. Since you don't have room on the filesystem for this new file, it fails.

You'll need to do that yourself in your script, but create the new file on a different filesystem.

Also, if you're just deleting lines that match a regexp, you can use grep instead of sed.

grep -v 'myregex' /path/to/filename > /tmp/filename && mv /tmp/filename /path/to/filename

In general, it's rarely possible for programs to use the same file as input and output -- as soon as it starts writing to the file, the part of the program that's reading from the file will no longer see the original contents. So it either has to copy the original file somewhere first, or write to a new file and rename it when it's done.

If you don't want to use a temporary file, you could try caching the file contents in memory:

file=$(< /path/to/filename)
echo "$file" | grep -v 'myregex' > /path/to/filename

Solution 2

That's how sed works. If used with -i (in place edit) sed creates a temporary file with the new contents of the processed file. When finished sed, replaces the current working file with the temporary one. The utility does not edit the file in-place. That's exact the behavior of every editor.

It's like you perform the following task in a shell:

sed 'whatever' file >tmp_file
mv tmp_file file

At this point sed, tries to flush the buffered data to the file mentioned in the error message with the fflush() system call:

For output streams, fflush() forces a write of all user-space buffered data for the given output or update stream via the stream's underlying write function.


For your problem, I see a solution in mounting a separte filesystem (for instance a tmpfs, if you have enough memory, or an external storage device) and move some files there, process them there, and move them back.

Solution 3

Since posting this question I've learned that ex is a POSIX-compliant program. It's almost universally symlinked to vim, but either way, the following is (I think) a key point about ex in relation to filesystems (taken from the POSIX specification):

This section uses the term edit buffer to describe the current working text. No specific implementation is implied by this term. All editing changes are performed on the edit buffer, and no changes to it shall affect any file until an editor command writes the file.

"...shall affect any file..." I believe that putting something on the filesystem (at all, even a temp file) would count as "affecting any file." Maybe?*

Careful study of the POSIX specifications for ex indicate some "gotchas" about its intended portable use when compared to common scripted uses of ex found online (which are littered with vim-specific commands.)

  1. Implementing +cmd is optional according to POSIX.
  2. Allowing multiple -c options is also optional.
  3. The global command :g "eats" everything up to the next non-escaped newline (and therefore runs it after each match found for the regex rather than once at the end). So -c 'g/regex/d | x' only deletes one instance and then exits the file.

So according to what I've researched, the POSIX-compliant method for in-place editing a file on a full filesystem to delete all lines matching a specific regex, is:

ex -sc 'g/myregex/d
x' /path/to/file/filename

This should work providing you have sufficient memory to load the file into a buffer.

*If you find anything which indicates otherwise, please, mention it in the comments.

Solution 4

Use the pipe, Luke!

Read file | filter | write back

sed 's/PATTERN//' BIGFILE | dd of=BIGFILE conv=notrunc

in this case sed doesn't create a new file and just send output piped to dd which opens the same file. Of course one can use grep in particular case

grep -v 'PATTERN' BIGFILE | dd of=BIGFILE conv=notrunc

then truncate the remaining.

dd if=/dev/null of=BIGFILE seek=1 bs=BYTES_OF_SED_OUTPUT

Solution 5

This answer borrows ideas from this other answer and this other answer but builds on them, creating an answer that is more generally applicable:

num_bytes=$(sed '/myregex/d' /path/to/file/filename | wc -c)
sed '/myregex/d' /path/to/file/filename 1<> /path/to/file/filename
dd if=/dev/null of=/path/to/file/filename bs="$num_bytes" seek=1

The first line runs the sed command with output written to standard output (and not to a file); specifically, to a pipe to wc to count the characters.  The second line also runs the sed command with output written to standard output, which, in this case is redirected to the input file in read/write overwrite (no truncate) mode, which is discussed here.  This is a somewhat dangerous thing to do; it is safe only when the filter command never increases the amount of data (text); i.e., for every n bytes that it reads, it writes n or fewer bytes.  This is, of course, true for the sed '/myregex/d' command; for every line that it reads, it writes the exact same line, or nothing.  (Other examples: s/foo/fu/ or s/foo/bar/ would be safe, but s/fu/foo/ and s/foo/foobar/ would not.)

For example:

$ cat filename
It was
a dark and stormy night.
$ sed '/was/d' filename 1<> filename
$ cat filename
a dark and stormy night.
night.

because these 32 bytes of data:

I  t     w  a  s \n  a     d  a  r  k     a  n  d     s  t  o  r  m  y     n  i  g  h  t  . \n

got overwritten with these 25 characters:

a     d  a  r  k     a  n  d     s  t  o  r  m  y     n  i  g  h  t  . \n

leaving the seven bytes night.\n left over at the end.

Finally, the dd command seeks to the end of the new, scrubbed data (byte 25 in this example) and removes the rest of the file; i.e., it truncates the file at that point.


If, for any reason, the 1<> trick doesn’t work, you can do

sed '/myregex/d' /path/to/file/filename | dd of=/path/to/file/filename conv=notrunc

Also, note that, as long as all you’re doing is removing lines, all you need is grep -v myregex (as pointed out by Barmar).

Share:
81

Related videos on Youtube

Silvr Swrd
Author by

Silvr Swrd

Updated on September 18, 2022

Comments

  • Silvr Swrd
    Silvr Swrd over 1 year

    I have a lot of files. So, I created a method called ifHasEverExisted. It returns a type string for "COMPLETE: True", "COMPLETE: False", or an "ERROR: [ERROR]". (I know I could've used a boolean, but I needed an error String.) Anyway, I change the default extension to the files a lot, and I wanted to know if there is a way to check if the file exists with a different extension. Can anyone help me? All I have is this, LOL.

    String[] allPrevExts = {null, "SilvrGaming"};
    
    public static String ifHasEverExsisted(String filePath, String currentExt) {
    
    }
    

    So... Recap. If the extension of an existing file is different, but the filepath is the same, return "COMPLETE: True." If it throws some exception, return "ERROR: [ERROR]", and if it does not exist, return "COMPLETE: False". Thanks.

    • Hot Licks
      Hot Licks almost 10 years
      List the directory, scan through the file names, and check.
    • Balázs Édes
      Balázs Édes almost 10 years
      Instructions are unclear, but for the String return thing: You could have boolean as return value, and throw an Exception when you think an error occurred. The Exception could contain the error message.
    • MadProgrammer
      MadProgrammer almost 10 years
      you cold use a combination of File#listFiles and FileFilter, returning true (from the filter) when the file name starts with the expected value (don't forget to include the trailing "."). If the result returns back an array greater then 0 then the answer is yes
    • Admin
      Admin over 8 years
      For the astute readers wondering how I'm using a sed regex to check for duplicate lines: Good spotting; I'm really not checking for duplicate lines. The lines that should stay in the file all use double quotes around the values; the lines that should be deleted all use single quotes.
    • Admin
      Admin over 8 years
      sponge of moreutils fame might be able to schlep the data off to /tmp or perhaps a memory filesystem as a workaround to the partition being full.
    • Admin
      Admin over 8 years
      sed -i creates a temporary copy to operate on. I suspect that ed would be better for this, though I'm not familiar enough to proscribe an actual solution
    • Admin
      Admin over 8 years
      With ed you'd run: printf %s\\n g/myregex/d w q | ed -s infile but keep in mind some implementations also use temporary files just like sed (you could try busybox ed - afaik it doesn't create a temporary file)
    • Admin
      Admin over 8 years
      your vi success was probably only a success because you had the memory to handle it. a similar thing might be done with sed like: sed 'H;1h;$!d;x;P' <file | { read v&& sed "$script" >file; }
    • Admin
      Admin over 8 years
      @mikeserv, interesting point that it is only sufficient memory that allowed me to do that...so then (except for trailing newlines which would be stripped) I could probably have done it with echo "$(sed '/myregex/d' file)" > file?
    • Admin
      Admin over 8 years
      @Wildcard - not reliably w/ echo. use printf. and make sed append some char you drop at the last line so you can avoid losing trailing blanks. also, your shell needs to be able to handle the whole file in a single command-line. that's your risk - test first. bash is especially bad at that (i think its to do w/ stack space?) and may sick up on you at any time. the two sed's i recommended would at least use the kernel's pipe buffer to good effect between them, but the method is fairly similar. your command sub thing will also truncate file whether or not the sed w/in is successful.
    • Admin
      Admin over 8 years
      @Wildcard - try sed '/regex/!H;$!d;x' <file|{ read v && cat >file;} and if it works read the rest of my answer.'
  • Silvr Swrd
    Silvr Swrd almost 10 years
    It gives me an ArrayIndexOutOfBoundsException when I try that.
  • Hastur
    Hastur over 8 years
    Did it preserves permissions, ownership and timestamps? Maybe rsync -a --no-owner --no-group --remove-source-files "$backupfile" "$destination" from here
  • mikeserv
    mikeserv over 8 years
    @Hastur - do you mean to imply that sed -i does preserve that stuff?
  • Barmar
    Barmar over 8 years
    @Hastur sed -i doesn't preserve any of those things. I just tried it with a file I don't own, but located in a directory that I do own, and it let me replace the file. The replacement is owned by me, not the original owner.
  • mikeserv
    mikeserv over 8 years
    @Barmar - that's what i thought. sed -i or perl -i are both seriously insecure and I've always considered their popularity confusing. actually writing over the file is the only sure way to do it. creating a new file and moving it over the old results in a new file.
  • Ralph Rönnquist
    Ralph Rönnquist over 8 years
    What about echo "$(cat FILE)" | grep '^"' > FILE? I'm guessing that would capture FILE in RAM before renewing it.
  • mikeserv
    mikeserv over 8 years
    @RalphRönnquist - maybe - if cat can open FILE and if the shell can handle the length of the resulting command... Probably not, though, if the shell sets up the pipeline starting at the right side, or if the subshell spawned on the right-side winds up coming around sooner than the one opened on the left. In either of those cases (which are fairly likely to occur) the subshell on the right side truncates FILE before the one on the left opens it and reads it, or perhaps it truncates it while the command sub reads it. See my answer here for how to overwrite a file in place.
  • Barmar
    Barmar over 8 years
    @RalphRönnquist To be sure, you'd need to do it in two steps: var=$(< FILE); echo "$FILE" | grep '^"' > FILE
  • mikeserv
    mikeserv over 8 years
    @Barmar - how is that sure? you don't test anything.
  • Barmar
    Barmar over 8 years
    @mikeserv When commands are separated by a semicolon, the first one completes before the second one begins. So there can't be any interference. I don't need to test this to know it's true.
  • mikeserv
    mikeserv over 8 years
    i know how it works - but you dont test anything - it could be an empty variable. you dont know if it worked - you just echo.
  • Barmar
    Barmar over 8 years
    Why would it be an empty variable? I just assigned it from the output of a command that I know works.
  • Barmar
    Barmar over 8 years
    Just notices a typo, I meant echo "$var". I got it right in my edit of the answer.
  • mikeserv
    mikeserv over 8 years
    @Barmar - you don't it works - you don't even know you've successfully opened input. The very least you could do is v=$(<file)&& printf %s\\n "$v" >file but you don't even use &&. The asker's talking about running it in a script - automating overwriting a file with a portion of itself. you ought at least to validate you can successfully open input and output. Also, the shell might explode.
  • Wildcard
    Wildcard over 8 years
    This is a very good answer, actually; it hadn't occurred to me to place it in a variable. Also, @mikeserv is right: for automating this, I would definitely not run it without &&.
  • Wildcard
    Wildcard over 8 years
    I confess I hadn't read your answer in detail before, because it starts with unworkable (for me) solutions that involve byte count (different amongst each of the many servers) and /tmp which is on the same filesystem. I like your dual sed version. I think a combination of Barmar's and your answer would probably be best, something like: myvar="$(sed '/myregex/d' < file)" && [ -n "$myvar" ] && echo "$myvar" > file ; unset myvar (For this case I don't care about preserving trailing newlines.)
  • mikeserv
    mikeserv over 8 years
    @Wildcard - that could be. but you shouldnt use the shell like a database. the sed | cat thing above never opens output unless sed has already buffered the entire file and is ready to start writing all of it to output. If it tries to buffer the file and fails - read is not successful because finds EOF on the | pipe before it reads its first newline and so cat >out never happens until its time to write it out from memory entirely. an overflow or anything like it just fails. also the whole pipeline returns success or failure every time. storing it in a var is just more risky.
  • mikeserv
    mikeserv over 8 years
    @Wildcard - if i really wanted it in a variable too, i think id do it like: file=$(sed '/regex/!H;$!d;x' <file | read v && tee file) && cmp - file <<<"$file" || shite so the output file and the var would be written simultaneously, which would make either or an effective backup, which is the only reason you'd wanna complicate things further than you'd need to.
  • Wildcard
    Wildcard over 8 years
    Hmmm. Can the same thing be done (either with ed or with ex) such that memory is used rather than a separate filesystem? That's what I was really going for (and the reason I haven't accepted an answer.)
  • mikeserv
    mikeserv over 8 years
    but ex writes to tmpfiles... always. its spec'd to write its buffers to disk periodically. there are even spec'd commands for locating the tmp file buffers on disk.
  • Wildcard
    Wildcard over 8 years
    @kenorb, not quite, according to my reading of the specs—see my point 1 in the answer above. Exact quote from POSIX is "The ex utility shall conform to XBD Utility Syntax Guidelines, except for the unspecified usage of '-', and that '+' may be recognized as an option delimiter as well as '-'."
  • Wildcard
    Wildcard over 8 years
    Did you notice the "full filesystem" part of the question?
  • Leben Gleben
    Leben Gleben over 8 years
    @Wildcard , does sed always use temp files? grep anyway won't
  • G-Man Says 'Reinstate Monica'
    G-Man Says 'Reinstate Monica' over 8 years
    Hmm.  This may be more complicated than I realized.  I studied the source of ed extensively many years ago.  There were still such things as 16-bit computers, on which processes were limited to a 64K (!) address space, so the idea of an editor reading the entire file into memory was a non-starter.  Since then, of course, memory has gotten bigger — but so have disks and files.  Since disks are so big, people don’t feel a need to deal with the contingency of /tmp running out of space.  I just took a quick look at the source code of a recent version of ed, and it still seems  … (Cont’d)
  • G-Man Says 'Reinstate Monica'
    G-Man Says 'Reinstate Monica' over 8 years
    (Cont’d) …  to implement the “edit buffer” as a temp file, unconditionally — and I cannot find any indication that any version of ed (or ex or vi) offers an option to keep the buffer in memory.  On the other hand, Text Editing with ed and vi – Chapter 11: Text Processing – Part II: Exploring Red Hat Linux – Red Hat Linux 9 Professional Secrets – Linux systems says that ed’s edit buffer resides in memory,  … (Cont’d)
  • G-Man Says 'Reinstate Monica'
    G-Man Says 'Reinstate Monica' over 8 years
    (Cont’d) …  and UNIX Document Processing and Typesetting by Balasubramaniam Srinivasan says the same thing about vi (which is the same program as ex).  I believe that they’re just using sloppy, imprecise wording — but, if it’s on the Internet (or in print), it must be true, right?  You pay your money and you take your choice.
  • G-Man Says 'Reinstate Monica'
    G-Man Says 'Reinstate Monica' over 8 years
    I can’t prove it, except by appeal to common sense, but I believe that you’re reading more into that statement from the specification than is really there.  I suggest that the safer interpretation is that no changes to the edit buffer shall affect any file that existed before the edit session began, or that the user named.  See also my comments on my answer.
  • Wildcard
    Wildcard over 8 years
    @G-Man, I actually think you're right; my initial interpretation was probably wishful thinking. However, since editing the file in vi worked on a full filesystem, I believe that in most cases it would work with ex as well—though maybe not for a ginormous file. sed -i doesn't work on a full filesystem regardless of filesize.
  • VooXe
    VooXe over 7 years
    @mikeserv: I am dealing the same problem as the OP now and I find your solution really useful. But I don't understand the usage of read script and read v in your answer. If you can elaborate more about it I will be much appreciated, thanks!
  • mikeserv
    mikeserv over 7 years
    @sylye - $script is the sed script you would use to target whatever portion of your file you wanted; its the script that gets you the end result that you want in stream. v is just a placeholder for an empty line. in a bash shell it is not necessary because bash will automatically use the $REPLY shell variable in its stead if you dont specify one, but POSIXly you should always do so. im glad you find it useful, by the way. good luck with it. im mikeserv@gmail if you need anything in depth. i should have a computer again in a few days