Why does the command shuf file > file leave an empty file, but similar commands do not?

12,203

Solution 1

The problem is that > example.txt starts writing to that file, before shuf example.txt starts reading it. So as there was no output yet, example.txt is empty, shuf reads an empty file, and as shuf makes no output in this case, the final result stays empty.

Your other command may suffer from the same issue. > example.txt may kill the file before cat example.txt starts reading it; it depends on the order the shell executes those things, and how long it takes cat to actually open the file.

To avoid such issues entirely, you could use shuf example.txt > example.txt.shuf && mv example.txt.shuf example.txt.

Or you could go with shuf example.txt --output=example.txt instead.

Solution 2

The package moreutils has a command sponge:

   sponge  reads  standard input and writes it out to the specified file.
   Unlike a shell redirect, sponge soaks up all its input before  opening
   the output file. This allows constricting pipelines that read from and
   write to the same file.

with that way you can do:

shuf example.txt | sponge example.txt

(unfortunately the moreutils package also has a util named parallel that is far less useful than gnu parallel. I removed the parallel installed by moreutils)

Solution 3

You are just quite lucky running

cat example.txt | shuf > example.txt

doesn't empty example.txt like this command is doing.

shuf example.txt > example.txt

Redirections are performed by the shell before the commands are executed and pipeline components are executed concurrently.

Using the -o / --output option would be the best solution with shuf but if you like taking (very slight) risks, here is a non traditional way to avoid the processed file to be truncated before being read:

shuf example.txt | (sleep 1;rm example.txt;cat > example.txt)

and this simpler and faster one, thanks to Ole's suggestion:

(rm example.txt; shuf > example.txt) < example.txt

Solution 4

From the GNU bash manual (see also, for the details, section 3.7 Executing Commands):

3.1.1 Shell Operation

The following is a brief description of the shell’s operation when it reads and executes a command. Basically, the shell does the following:

  1. Reads its input from a file (see Shell Scripts), from a string supplied as an argument to the -c invocation option (see Invoking Bash), or from the user’s terminal.
  2. Breaks the input into words and operators, obeying the quoting rules described in Quoting. These tokens are separated by metacharacters. Alias expansion is performed by this step (see Aliases).
  3. Parses the tokens into simple and compound commands (see Shell Commands).
  4. Performs the various shell expansions (see Shell Expansions), breaking the expanded tokens into lists of filenames (see Filename Expansion) and commands and arguments
  5. Performs any necessary redirections (see Redirections) and removes the redirection operators and their operands from the argument list.
  6. Executes the command (see Executing Commands).
  7. Optionally waits for the command to complete and collects its exit status (see Exit Status).

Consider the situation where your file does not exist. Yet your second example will create the file most of the times. If the file had not been so created and it didn't exist, and redirection hadn't occurred, cat would complain there is no such file... Most of the time it won't. To reproduce the shuffle you got with that second command, I needed a great many tries. So indeed the second expression should leave an empty file most of the times.

Share:
12,203

Related videos on Youtube

toryan
Author by

toryan

Updated on September 18, 2022

Comments

  • toryan
    toryan over 1 year

    I know this is sort of a duplicate of another question (Why this sort command gives me an empty file?) but I wanted to expand on the question in response to the answers given.

    The command

    shuf example.txt > example.txt

    Returns a blank file, because the shell truncates the file before shuffling it, leaving only a blank file to shuffle. However,

    cat example.txt | shuf > example.txt

    will produce a shuffled file as expected.

    Why does the pipeline method work when the simple redirection doesn't? If the file is truncated before the commands are run, shouldn't the second method also leave an empty file?

    • toryan
      toryan over 10 years
      @Gilles The question is more specifically about why the command works in a different format, and has this point has not been answered yet
  • clerksx
    clerksx over 10 years
    This, of course, assumes that 1 second is long enough. It would better to instead use sponge or a temporary file.
  • jlliagre
    jlliagre over 10 years
    @ChrisDown sponge or a temporary file would be definitely more secure, especially with a remote or non inode based file system but in usual cases, 1 second should be long enough for shuf to open the file unless this is a very loaded system.
  • frostschutz
    frostschutz over 10 years
    @StephaneChazelas, opening a file for writing with truncation is usually how you start writing to a file. It's quite hard to do without opening it. ;)
  • Stéphane Chazelas
    Stéphane Chazelas over 10 years
    Your wording is confusing. > file doesn't write anything and the real problem is that it truncates. If it only opened for writing (as 1<> file does), there wouldn't be any problem (at least not with shuf).
  • frostschutz
    frostschutz over 10 years
    If > file didn't write anything, it would work on a read-only filesystem. It only doesn't write anything when you take it literally (as in the syscall write()). You're riding on technicalities there that IMHO are way more confusing than my simplified explanation.
  • Ole Tange
    Ole Tange almost 7 years
    Try: (rm foo; shuf > foo) < foo
  • Ole Tange
    Ole Tange almost 7 years
    Don't have sponge? Use (rm foo; shuf > foo) < foo
  • Ole Tange
    Ole Tange almost 7 years
    shuf 1<> file < file is deceptively neat. It works because shuf outputs the same amount of data, so all of file is overwritten. It would not work with, say, wc 1<> file < file. This, however, would: (rm file; wc > file) < file
  • Ole Tange
    Ole Tange almost 7 years
    Fails if example.txt does not end in newline.
  • Stéphane Chazelas
    Stéphane Chazelas almost 7 years
    I'd do (rm foo&&shuf>foo)<foo. As removing foo could fail (that's the permissions to the current directory that matter) while > foo could succeed (as it's foo's permissions that matter) resulting in foo being truncated and the data lost.
  • Ole Tange
    Ole Tange almost 7 years
    @StéphaneChazelas Good point
  • Ole Tange
    Ole Tange almost 7 years
    And it is dead slow. 175MB: time shuf ... = 6.2s, time ex ... > 20m.