Why does the command shuf file > file leave an empty file, but similar commands do not?
Solution 1
The problem is that > example.txt
starts writing to that file, before shuf example.txt
starts reading it. So as there was no output yet, example.txt
is empty, shuf
reads an empty file, and as shuf
makes no output in this case, the final result stays empty.
Your other command may suffer from the same issue. > example.txt
may kill the file before cat example.txt
starts reading it; it depends on the order the shell executes those things, and how long it takes cat
to actually open the file.
To avoid such issues entirely, you could use shuf example.txt > example.txt.shuf && mv example.txt.shuf example.txt
.
Or you could go with shuf example.txt --output=example.txt
instead.
Solution 2
The package moreutils has a command sponge
:
sponge reads standard input and writes it out to the specified file.
Unlike a shell redirect, sponge soaks up all its input before opening
the output file. This allows constricting pipelines that read from and
write to the same file.
with that way you can do:
shuf example.txt | sponge example.txt
(unfortunately the moreutils package also has a util named parallel
that is far less useful than gnu parallel. I removed the parallel
installed by moreutils)
Solution 3
You are just quite lucky running
cat example.txt | shuf > example.txt
doesn't empty example.txt
like this command is doing.
shuf example.txt > example.txt
Redirections are performed by the shell before the commands are executed and pipeline components are executed concurrently.
Using the -o
/ --output
option would be the best solution with shuf
but if you like taking (very slight) risks, here is a non traditional way to avoid the processed file to be truncated before being read:
shuf example.txt | (sleep 1;rm example.txt;cat > example.txt)
and this simpler and faster one, thanks to Ole's suggestion:
(rm example.txt; shuf > example.txt) < example.txt
Solution 4
From the GNU bash
manual (see also, for the details, section 3.7 Executing Commands):
3.1.1 Shell Operation
The following is a brief description of the shell’s operation when it reads and executes a command. Basically, the shell does the following:
- Reads its input from a file (see Shell Scripts), from a string supplied as an argument to the -c invocation option (see Invoking Bash), or from the user’s terminal.
- Breaks the input into words and operators, obeying the quoting rules described in Quoting. These tokens are separated by
metacharacters
. Alias expansion is performed by this step (see Aliases).- Parses the tokens into simple and compound commands (see Shell Commands).
- Performs the various shell expansions (see Shell Expansions), breaking the expanded tokens into lists of filenames (see Filename Expansion) and commands and arguments
- Performs any necessary redirections (see Redirections) and removes the redirection operators and their operands from the argument list.
- Executes the command (see Executing Commands).
- Optionally waits for the command to complete and collects its exit status (see Exit Status).
Consider the situation where your file does not exist. Yet your second example will create the file most of the times. If the file had not been so created and it didn't exist, and redirection hadn't occurred, cat
would complain there is no such file... Most of the time it won't. To reproduce the shuffle you got with that second command, I needed a great many tries. So indeed the second expression should leave an empty file most of the times.
Related videos on Youtube
toryan
Updated on September 18, 2022Comments
-
toryan over 1 year
I know this is sort of a duplicate of another question (Why this sort command gives me an empty file?) but I wanted to expand on the question in response to the answers given.
The command
shuf example.txt > example.txt
Returns a blank file, because the shell truncates the file before shuffling it, leaving only a blank file to shuffle. However,
cat example.txt | shuf > example.txt
will produce a shuffled file as expected.
Why does the pipeline method work when the simple redirection doesn't? If the file is truncated before the commands are run, shouldn't the second method also leave an empty file?
-
toryan over 10 years@Gilles The question is more specifically about why the command works in a different format, and has this point has not been answered yet
-
-
clerksx over 10 yearsThis, of course, assumes that 1 second is long enough. It would better to instead use
sponge
or a temporary file. -
jlliagre over 10 years@ChrisDown sponge or a temporary file would be definitely more secure, especially with a remote or non inode based file system but in usual cases, 1 second should be long enough for shuf to open the file unless this is a very loaded system.
-
frostschutz over 10 years@StephaneChazelas, opening a file for writing with truncation is usually how you start writing to a file. It's quite hard to do without opening it. ;)
-
Stéphane Chazelas over 10 yearsYour wording is confusing.
> file
doesn't write anything and the real problem is that it truncates. If it only opened for writing (as1<> file
does), there wouldn't be any problem (at least not withshuf
). -
frostschutz over 10 yearsIf
> file
didn't write anything, it would work on a read-only filesystem. It only doesn't write anything when you take it literally (as in the syscallwrite()
). You're riding on technicalities there that IMHO are way more confusing than my simplified explanation. -
Ole Tange almost 7 yearsTry:
(rm foo; shuf > foo) < foo
-
Ole Tange almost 7 yearsDon't have
sponge
? Use(rm foo; shuf > foo) < foo
-
Ole Tange almost 7 years
shuf 1<> file < file
is deceptively neat. It works becauseshuf
outputs the same amount of data, so all offile
is overwritten. It would not work with, say,wc 1<> file < file
. This, however, would:(rm file; wc > file) < file
-
Ole Tange almost 7 yearsFails if
example.txt
does not end in newline. -
Stéphane Chazelas almost 7 yearsI'd do
(rm foo&&shuf>foo)<foo
. As removingfoo
could fail (that's the permissions to the current directory that matter) while> foo
could succeed (as it'sfoo
's permissions that matter) resulting infoo
being truncated and the data lost. -
Ole Tange almost 7 years@StéphaneChazelas Good point
-
Ole Tange almost 7 yearsAnd it is dead slow. 175MB: time shuf ... = 6.2s, time ex ... > 20m.