Using /dev/stdin and a heredoc to pass a file from the command line

6,671

Solution 1

What precisely is going on when one "passes a heredoc as a file"?

You aren't. Here-documents provide standard input, like a pipe. Your example

awk '{ ... }' <<EOF
foo bar baz
EOF

is exactly equivalent to

echo foo bar baz | awk '{ ... }'

awk, cat, and ruby all read from standard input if they aren't given a filename to read from on the command line. That is an implementation choice.

Why does the first version with anisble-playbook fail but second version succeed?

ansible-playbook does not read from standard input by default, but requires a file path instead. This is a design choice.

/dev/stdin is quite likely a symlink to /dev/fd/0, which is a way of talking about the current process's file descriptor #0 (standard input). That's something exposed by your kernel (or system library). The ansible-playbook command opens /dev/stdin like a regular filesystem file and ends up reading its own standard input, which would otherwise have been ignored.

You likely also have /dev/stdout and /dev/stderr links to FDs 1 & 2, which you can use as well if you're telling something where to put its output.

What is the significance of passing /dev/stdin before the heredoc?

It is an argument to the ansible-playbook command.

Why do other utilities like ruby or awk not need the /dev/stdin before the heredoc?

They read from standard input by default as a design choice, because they are made to be used in pipelines. They write to standard output for the same reason.

Solution 2

A here-document is a redirection into the standard input of a command, just like <. This means that anywhere where you may use < to redirect contents from a file, you may instead redirect the contents of a here-document. The POSIX standard lists here-documents along with the other redirection operators.

In your Ansible example, ansible-playbook does not by default read from its standard input stream as it expects a filename. By giving it /dev/stdin as the filename and then supplying the here-document on standard input, you bypass this restriction in the utility. The /dev/stdin "file" will always contain the standard input data stream of the current process.

ruby and awk as well as many other utilities will read from standard input unless a filename is supplied on the command line.

So, you are technically wrong when you say "It seems like the shell thinks the heredoc is a file with contents equal to the value of the heredoc". It does not act like a file (with regards to having a filename and being seekable), but as a data stream on standard input. At least from the point of view of the utility.

The difference is the same as between

cat file

and

cat <file

In the first instance, cat opens the file file, but in the second (which is also what happens with a here-document), since no filename was given as an argument to cat, cat just reads its standard input stream (and the shell opens the file, or provides the here-document, on standard input to the utility). The utility does not need to know if the provided data comes from a file, a pipe, a here-document or some other data source.

How here-documents are implemented by the shell is in a way unimportant, but it may be through the use of a FIFO or indeed with a temporary file.

Solution 3

What exactly is going on with here-docs is depending on how shell implements here-doc: it may be either done with pipes internally as in case of dash or with temporary file descriptor, as in bash. So in one case it may not be possible to lseek(), but in the other - it can be (which for average user it means you can jump around the contents of the here-doc). See related answer.

As for the case of two ansible-playbook commands, it also depends on how command is implemented (so unless you read source code you won't actually know). Some commands simply check whether or not there is a file provided, and don't support stdin. Other commands like awk and ruby - they are designed to expect stdin or a file specified on command-line.

What you can try doing, however, is if you're using Linux, run strace ansible-playbook ...<other args> and see what things it tries to open, what syscalls occur, etc. For example, you'll see that with strace -e open tail /dev/stdin <<< "Jello World" the tail command will actually try to open /dev/stdin as file, whereas trace -e open tail doesn't.

Share:
6,671

Related videos on Youtube

mbigras
Author by

mbigras

Updated on September 18, 2022

Comments

  • mbigras
    mbigras over 1 year

    I'm curious about the theory behind how heredocs can be passed as a file to a command line utility.

    Recently, I discovered I can pass a file as heredoc.

    For example:

    awk '{ split($0, arr, " "); print arr[2] }' <<EOF
    foo bar baz
    EOF
    bar
    

    This is advantageous for me for several reasons:

    • Heredocs improve readability for multi line inputs.
    • I don't need to memorize each utilities flag for passing the file contents from the command line.
    • I can use single and double quotes in the given files.
    • I can control shell expansion.

    For example:

    ruby <<EOF
    puts "'hello $HOME'"
    EOF
    'hello /Users/mbigras'
    
    ruby <<'EOF'
    puts "'hello $HOME'"
    EOF
    'hello $HOME'
    

    I'm not clear what is happening. It seems like the shell thinks the heredoc is a file with contents equal to the value of the heredoc. I've this technique used with cat, but I'm still not sure what was going on:

    cat <<EOL
    hello world
    EOL
    hello world
    

    I know cat prints the contents of a file, so presumably this heredoc is a temporary file of some kind.

    I'm confused about what precisely is going on when I "pass a heredoc to a command line program".

    Here's an example using ansible-playbook. I pass the utility a playbook as a heredoc; however it fails, as shown using echo $?:

    ansible-playbook -i localhost, -c local <<EOF &>/dev/null
    ---
    - hosts: all
      gather_facts: false
      tasks:
        - name: Print something
          debug:
            msg: hello world
    EOF
    echo $?
    5
    

    However, if I pass the utility the same heredoc but preceed it with /dev/stdin it succeeds

    ansible-playbook -i localhost, -c local /dev/stdin <<EOF &>/dev/null
    ---
    - hosts: all
      gather_facts: false
      tasks:
        - name: Print something
          debug:
            msg: hello world
    EOF
    echo $?
    0
    
    • What precisly is going on when one "passes a heredoc as a file"?
    • Why does the first version with ansible-playbook fail but second version succeed?
    • What is the significance of passing /dev/stdin before the heredoc?
    • Why do other utilities like ruby or awk not need the /dev/stdin before the heredoc?