sed in-place line deletion on full filesystem?
Solution 1
The -i
option doesn't really overwrite the original file. It creates a new file with the output, then renames it to the original filename. Since you don't have room on the filesystem for this new file, it fails.
You'll need to do that yourself in your script, but create the new file on a different filesystem.
Also, if you're just deleting lines that match a regexp, you can use grep
instead of sed
.
grep -v 'myregex' /path/to/filename > /tmp/filename && mv /tmp/filename /path/to/filename
In general, it's rarely possible for programs to use the same file as input and output -- as soon as it starts writing to the file, the part of the program that's reading from the file will no longer see the original contents. So it either has to copy the original file somewhere first, or write to a new file and rename it when it's done.
If you don't want to use a temporary file, you could try caching the file contents in memory:
file=$(< /path/to/filename)
echo "$file" | grep -v 'myregex' > /path/to/filename
Solution 2
That's how sed
works. If used with -i
(in place edit) sed
creates a temporary file with the new contents of the processed file. When finished sed
, replaces the current working file with the temporary one. The utility does not edit the file in-place. That's exact the behavior of every editor.
It's like you perform the following task in a shell:
sed 'whatever' file >tmp_file
mv tmp_file file
At this point sed
, tries to flush the buffered data to the file mentioned in the error message with the fflush()
system call:
For output streams,
fflush()
forces a write of all user-space buffered data for the given output or update stream via the stream's underlying write function.
For your problem, I see a solution in mounting a separte filesystem (for instance a tmpfs
, if you have enough memory, or an external storage device) and move some files there, process them there, and move them back.
Solution 3
Since posting this question I've learned that ex
is a POSIX-compliant program. It's almost universally symlinked to vim
, but either way, the following is (I think) a key point about ex
in relation to filesystems (taken from the POSIX specification):
This section uses the term edit buffer to describe the current working text. No specific implementation is implied by this term. All editing changes are performed on the edit buffer, and no changes to it shall affect any file until an editor command writes the file.
"...shall affect any file..." I believe that putting something on the filesystem (at all, even a temp file) would count as "affecting any file." Maybe?*
Careful study of the POSIX specifications for ex
indicate some "gotchas" about its intended portable use when compared to common scripted uses of ex
found online (which are littered with vim
-specific commands.)
- Implementing
+cmd
is optional according to POSIX. - Allowing multiple
-c
options is also optional. - The global command
:g
"eats" everything up to the next non-escaped newline (and therefore runs it after each match found for the regex rather than once at the end). So-c 'g/regex/d | x'
only deletes one instance and then exits the file.
So according to what I've researched, the POSIX-compliant method for in-place editing a file on a full filesystem to delete all lines matching a specific regex, is:
ex -sc 'g/myregex/d
x' /path/to/file/filename
This should work providing you have sufficient memory to load the file into a buffer.
*If you find anything which indicates otherwise, please, mention it in the comments.
Solution 4
Use the pipe, Luke!
Read file | filter | write back
sed 's/PATTERN//' BIGFILE | dd of=BIGFILE conv=notrunc
in this case sed
doesn't create a new file and just send output piped to dd
which opens the same file. Of course one can use grep
in particular case
grep -v 'PATTERN' BIGFILE | dd of=BIGFILE conv=notrunc
then truncate the remaining.
dd if=/dev/null of=BIGFILE seek=1 bs=BYTES_OF_SED_OUTPUT
Solution 5
This answer borrows ideas from this other answer and this other answer but builds on them, creating an answer that is more generally applicable:
num_bytes=$(sed '/myregex/d' /path/to/file/filename | wc -c) sed '/myregex/d' /path/to/file/filename 1<> /path/to/file/filename dd if=/dev/null of=/path/to/file/filename bs="$num_bytes" seek=1
The first line runs the sed
command with output
written to standard output (and not to a file);
specifically, to a pipe to wc
to count the characters.
The second line also runs the sed
command with output
written to standard output,
which, in this case is redirected to the input file
in read/write overwrite (no truncate) mode,
which is discussed here.
This is a somewhat dangerous thing to do; it is safe only
when the filter command never increases the amount of data (text);
i.e., for every n bytes that it reads, it writes n or fewer bytes.
This is, of course, true for the sed '/myregex/d'
command;
for every line that it reads, it writes the exact same line, or nothing.
(Other examples: s/foo/fu/
or s/foo/bar/
would be safe,
but s/fu/foo/
and s/foo/foobar/
would not.)
For example:
$ cat filename
It was
a dark and stormy night.
$ sed '/was/d' filename 1<> filename
$ cat filename
a dark and stormy night.
night.
because these 32 bytes of data:
I t w a s \n a d a r k a n d s t o r m y n i g h t . \n
got overwritten with these 25 characters:
a d a r k a n d s t o r m y n i g h t . \n
leaving the seven bytes night.\n
left over at the end.
Finally, the dd
command seeks to the end of the new,
scrubbed data (byte 25 in this example) and removes the rest of the file;
i.e., it truncates the file at that point.
If, for any reason, the 1<>
trick doesn’t work, you can do
sed '/myregex/d' /path/to/file/filename | dd of=/path/to/file/filename conv=notrunc
Also, note that, as long as all you’re doing is removing lines,
all you need is grep -v myregex
(as pointed out by Barmar).
Related videos on Youtube
Silvr Swrd
Updated on September 18, 2022Comments
-
Silvr Swrd over 1 year
I have a lot of files. So, I created a method called ifHasEverExisted. It returns a type string for "COMPLETE: True", "COMPLETE: False", or an "ERROR: [ERROR]". (I know I could've used a boolean, but I needed an error String.) Anyway, I change the default extension to the files a lot, and I wanted to know if there is a way to check if the file exists with a different extension. Can anyone help me? All I have is this, LOL.
String[] allPrevExts = {null, "SilvrGaming"}; public static String ifHasEverExsisted(String filePath, String currentExt) { }
So... Recap. If the extension of an existing file is different, but the filepath is the same, return "COMPLETE: True." If it throws some exception, return "ERROR: [ERROR]", and if it does not exist, return "COMPLETE: False". Thanks.
-
Hot Licks almost 10 yearsList the directory, scan through the file names, and check.
-
Balázs Édes almost 10 yearsInstructions are unclear, but for the
String
return thing: You could haveboolean
as return value, and throw anException
when you think an error occurred. TheException
could contain the error message. -
MadProgrammer almost 10 yearsyou cold use a combination of File#listFiles and FileFilter, returning true (from the filter) when the file name starts with the expected value (don't forget to include the trailing "."). If the result returns back an array greater then 0 then the answer is yes
-
Admin over 8 yearsFor the astute readers wondering how I'm using a
sed
regex to check for duplicate lines: Good spotting; I'm really not checking for duplicate lines. The lines that should stay in the file all use double quotes around the values; the lines that should be deleted all use single quotes. -
Admin over 8 years
sponge
ofmoreutils
fame might be able to schlep the data off to/tmp
or perhaps a memory filesystem as a workaround to the partition being full. -
Admin over 8 years
sed -i
creates a temporary copy to operate on. I suspect thated
would be better for this, though I'm not familiar enough to proscribe an actual solution -
Admin over 8 yearsWith
ed
you'd run:printf %s\\n g/myregex/d w q | ed -s infile
but keep in mind some implementations also use temporary files just likesed
(you could try busybox ed - afaik it doesn't create a temporary file) -
Admin over 8 yearsyour
vi
success was probably only a success because you had the memory to handle it. a similar thing might be done withsed
like:sed 'H;1h;$!d;x;P' <file | { read v&& sed "$script" >file; }
-
Admin over 8 years@mikeserv, interesting point that it is only sufficient memory that allowed me to do that...so then (except for trailing newlines which would be stripped) I could probably have done it with
echo "$(sed '/myregex/d' file)" > file
? -
Admin over 8 years@Wildcard - not reliably w/
echo
. useprintf
. and makesed
append some char you drop at the last line so you can avoid losing trailing blanks. also, your shell needs to be able to handle the whole file in a single command-line. that's your risk - test first.bash
is especially bad at that (i think its to do w/ stack space?) and may sick up on you at any time. the twosed
's i recommended would at least use the kernel's pipe buffer to good effect between them, but the method is fairly similar. your command sub thing will also truncatefile
whether or not the sed w/in is successful. -
Admin over 8 years@Wildcard - try
sed '/regex/!H;$!d;x' <file|{ read v && cat >file;}
and if it works read the rest of my answer.'
-
-
Silvr Swrd almost 10 yearsIt gives me an ArrayIndexOutOfBoundsException when I try that.
-
Hastur over 8 yearsDid it preserves permissions, ownership and timestamps? Maybe
rsync -a --no-owner --no-group --remove-source-files "$backupfile" "$destination"
from here -
mikeserv over 8 years@Hastur - do you mean to imply that
sed -i
does preserve that stuff? -
Barmar over 8 years@Hastur
sed -i
doesn't preserve any of those things. I just tried it with a file I don't own, but located in a directory that I do own, and it let me replace the file. The replacement is owned by me, not the original owner. -
mikeserv over 8 years@Barmar - that's what i thought.
sed -i
orperl -i
are both seriously insecure and I've always considered their popularity confusing. actually writing over the file is the only sure way to do it. creating a new file and moving it over the old results in a new file. -
Ralph Rönnquist over 8 yearsWhat about
echo "$(cat FILE)" | grep '^"' > FILE
? I'm guessing that would capture FILE in RAM before renewing it. -
mikeserv over 8 years@RalphRönnquist - maybe - if
cat
can openFILE
and if the shell can handle the length of the resulting command... Probably not, though, if the shell sets up the pipeline starting at the right side, or if the subshell spawned on the right-side winds up coming around sooner than the one opened on the left. In either of those cases (which are fairly likely to occur) the subshell on the right side truncates FILE before the one on the left opens it and reads it, or perhaps it truncates it while the command sub reads it. See my answer here for how to overwrite a file in place. -
Barmar over 8 years@RalphRönnquist To be sure, you'd need to do it in two steps:
var=$(< FILE); echo "$FILE" | grep '^"' > FILE
-
mikeserv over 8 years@Barmar - how is that sure? you don't test anything.
-
Barmar over 8 years@mikeserv When commands are separated by a semicolon, the first one completes before the second one begins. So there can't be any interference. I don't need to test this to know it's true.
-
mikeserv over 8 yearsi know how it works - but you dont test anything - it could be an empty variable. you dont know if it worked - you just echo.
-
Barmar over 8 yearsWhy would it be an empty variable? I just assigned it from the output of a command that I know works.
-
Barmar over 8 yearsJust notices a typo, I meant
echo "$var"
. I got it right in my edit of the answer. -
mikeserv over 8 years@Barmar - you don't it works - you don't even know you've successfully opened input. The very least you could do is
v=$(<file)&& printf %s\\n "$v" >file
but you don't even use&&
. The asker's talking about running it in a script - automating overwriting a file with a portion of itself. you ought at least to validate you can successfully open input and output. Also, the shell might explode. -
Wildcard over 8 yearsThis is a very good answer, actually; it hadn't occurred to me to place it in a variable. Also, @mikeserv is right: for automating this, I would definitely not run it without
&&
. -
Wildcard over 8 yearsI confess I hadn't read your answer in detail before, because it starts with unworkable (for me) solutions that involve byte count (different amongst each of the many servers) and
/tmp
which is on the same filesystem. I like your dualsed
version. I think a combination of Barmar's and your answer would probably be best, something like:myvar="$(sed '/myregex/d' < file)" && [ -n "$myvar" ] && echo "$myvar" > file ; unset myvar
(For this case I don't care about preserving trailing newlines.) -
mikeserv over 8 years@Wildcard - that could be. but you shouldnt use the shell like a database. the
sed
|cat
thing above never opens output unlesssed
has already buffered the entire file and is ready to start writing all of it to output. If it tries to buffer the file and fails -read
is not successful because finds EOF on the|
pipe before it reads its first newline and socat >out
never happens until its time to write it out from memory entirely. an overflow or anything like it just fails. also the whole pipeline returns success or failure every time. storing it in a var is just more risky. -
mikeserv over 8 years@Wildcard - if i really wanted it in a variable too, i think id do it like:
file=$(sed '/regex/!H;$!d;x' <file | read v && tee file) && cmp - file <<<"$file" || shite
so the output file and the var would be written simultaneously, which would make either or an effective backup, which is the only reason you'd wanna complicate things further than you'd need to. -
Wildcard over 8 yearsHmmm. Can the same thing be done (either with
ed
or withex
) such that memory is used rather than a separate filesystem? That's what I was really going for (and the reason I haven't accepted an answer.) -
mikeserv over 8 yearsbut ex writes to tmpfiles... always. its spec'd to write its buffers to disk periodically. there are even spec'd commands for locating the tmp file buffers on disk.
-
Wildcard over 8 years@kenorb, not quite, according to my reading of the specs—see my point 1 in the answer above. Exact quote from POSIX is "The ex utility shall conform to XBD Utility Syntax Guidelines, except for the unspecified usage of '-', and that '+' may be recognized as an option delimiter as well as '-'."
-
Wildcard over 8 yearsDid you notice the "full filesystem" part of the question?
-
Leben Gleben over 8 years@Wildcard , does
sed
always use temp files?grep
anyway won't -
G-Man Says 'Reinstate Monica' over 8 yearsHmm. This may be more complicated than I realized. I studied the source of
ed
extensively many years ago. There were still such things as 16-bit computers, on which processes were limited to a 64K (!) address space, so the idea of an editor reading the entire file into memory was a non-starter. Since then, of course, memory has gotten bigger — but so have disks and files. Since disks are so big, people don’t feel a need to deal with the contingency of/tmp
running out of space. I just took a quick look at the source code of a recent version ofed
, and it still seems … (Cont’d) -
G-Man Says 'Reinstate Monica' over 8 years(Cont’d) … to implement the “edit buffer” as a temp file, unconditionally — and I cannot find any indication that any version of
ed
(orex
orvi
) offers an option to keep the buffer in memory. On the other hand, Text Editing with ed and vi – Chapter 11: Text Processing – Part II: Exploring Red Hat Linux – Red Hat Linux 9 Professional Secrets – Linux systems says thated
’s edit buffer resides in memory, … (Cont’d) -
G-Man Says 'Reinstate Monica' over 8 years(Cont’d) … and UNIX Document Processing and Typesetting by Balasubramaniam Srinivasan says the same thing about
vi
(which is the same program asex
). I believe that they’re just using sloppy, imprecise wording — but, if it’s on the Internet (or in print), it must be true, right? You pay your money and you take your choice. -
G-Man Says 'Reinstate Monica' over 8 yearsI can’t prove it, except by appeal to common sense, but I believe that you’re reading more into that statement from the specification than is really there. I suggest that the safer interpretation is that no changes to the edit buffer shall affect any file that existed before the edit session began, or that the user named. See also my comments on my answer.
-
Wildcard over 8 years@G-Man, I actually think you're right; my initial interpretation was probably wishful thinking. However, since editing the file in
vi
worked on a full filesystem, I believe that in most cases it would work withex
as well—though maybe not for a ginormous file.sed -i
doesn't work on a full filesystem regardless of filesize. -
VooXe over 7 years@mikeserv: I am dealing the same problem as the OP now and I find your solution really useful. But I don't understand the usage of
read script
andread v
in your answer. If you can elaborate more about it I will be much appreciated, thanks! -
mikeserv over 7 years@sylye -
$script
is thesed
script you would use to target whatever portion of your file you wanted; its the script that gets you the end result that you want in stream.v
is just a placeholder for an empty line. in abash
shell it is not necessary becausebash
will automatically use the$REPLY
shell variable in its stead if you dont specify one, but POSIXly you should always do so. im glad you find it useful, by the way. good luck with it. im mikeserv@gmail if you need anything in depth. i should have a computer again in a few days