Why does redirecting the output of a file to itself produce a blank file?

6,606

Solution 1

When you use >, the file is opened in truncation mode so its contents are removed before the command attempts to read it.

When you use >>, the file is opened in append mode so the existing data is preserved. It is however still pretty risky to use the same file as input and output in this case. If the file is large enough not to fit the read input buffer size, its size might grow indefinitely until the file system is full (or your disk quota is reached).

Should you want to use a file both as input and output with a command that doesn't support in place modification, you can use a couple of workarounds:

  • Use an intermediary file and overwrite the original one when done and only if no error occurred while running the utility (this is the safest and more common way).

    fold foo.txt > fold.txt.$$ && mv fold.txt.$$ foo.txt
    
  • Avoid the intermediary file at the expense of a potential partial or complete data loss should an error or interruption happen. In this example, the contents of foo.txt are passed as input to a subshell (inside the parentheses) before the file is deleted. The previous inode stays alive as the subshell is keeping it open while reading data. The file written by the inner utility (here fold) while having the same name (foo.txt) points to a different inode because the old directory entry has been removed so technically, there are two different "files" with the same name during the process. When the subshell ends, the old inode is released and its data is lost. Beware to make sure you have enough space to temporarily store both the old file and the new one at the same time otherwise you'll lose data.

    (rm foo.txt; fold > foo.txt) < foo.txt
    

Solution 2

The file is opened for writing by the shell before the application has a chance to read it. Opening the file for writing truncates it.

Solution 3

In bash, the stream redirection operator ... > foo.txt empties foo.txt before evaluating the left operand.

One might use command substitution and print its result as a workaround. This solution takes less additional characters than in other answers:

printf '%s\n' "$(less foo.txt)" > foo.txt

Beware: This command does not preserve any trailling newline(s) in foo.txt. Have a look in the comment section below for more information

Here, the command substitution $(...) is evaluated before the stream redirection operator >, hence the preservation of information.

Share:
6,606
seewalker
Author by

seewalker

Updated on September 18, 2022

Comments

  • seewalker
    seewalker almost 2 years

    Why does redirecting the output of a file to itself produce a blank file?

    Stated in Bash, why do

    less foo.txt > foo.txt
    

    and

    fold foo.txt > foo.txt
    

    produce an empty foo.txt? Since an append such as less eggs.py >> eggs.py produces a two copies of the text in eggs.py, one might expect that an overwrite would produce one copy of the text.

    Note, I'm not saying this is a bug, it is more likely a pointer to something deep about Unix.

  • slhck
    slhck about 11 years
    sponge from moreutils can also help. fold foo.txt | sponge foo.txt – or fold foo.txt | sponge !$ should also do.
  • jlliagre
    jlliagre about 11 years
    @slhck Indeed, sponge could do the job too. However, being neither specified by POSIX nor mainstream in Unix like OSes, it is unlikely to be present.
  • slhck
    slhck about 11 years
    It's not like it can't be made present though ;)
  • Scott - Слава Україні
    Scott - Слава Україні about 5 years
    @KamilMaciorowski: Actually, there is tmp=$(cmd; printf q);  printf '%s' "${tmp%q}". But you missed another issue with this answer: it says “subshell” when it means “command substitution”.  Yes, command substitutions are generally subshells, but not vice versa, and subshells, in general, are no help for this problem.
  • ljleb
    ljleb about 5 years
    @KamilMaciorowski I feel so bad for missing all of this. Thanks for pointing all of this. For your (4)th point: would backquotes do the trick i.e. preserve trailing newline(s)?
  • ljleb
    ljleb about 5 years
    @Scott thanks for your reply. I changed "subshell" for "command substitution". By the way, I wonder what's the exact difference between the two.
  • Kamil Maciorowski
    Kamil Maciorowski about 5 years
    No, backquotes (backticks) strip trailing newline characters as well.
  • ljleb
    ljleb about 5 years
    Alright then, I added a warning message for now. I'll remove it if I find a solution.
  • Kamil Maciorowski
    Kamil Maciorowski about 5 years
    Well, now the answer is not that bad. Even with the warning there's one more problem: POSIX requires any non-empty text file to end with a newline character (otherwise the last line is incomplete). So %s\n as format would be better. But if the file is binary, %s may be better. In any case you're risking the new content is not exactly what it should be. Scott's approach can fix this; it's far from being elegant though.