Using /dev/stdin and a heredoc to pass a file from the command line
Solution 1
What precisely is going on when one "passes a heredoc as a file"?
You aren't. Here-documents provide standard input, like a pipe. Your example
awk '{ ... }' <<EOF
foo bar baz
EOF
is exactly equivalent to
echo foo bar baz | awk '{ ... }'
awk
, cat
, and ruby
all read from standard input if they aren't given a filename to read from on the command line. That is an implementation choice.
Why does the first version with anisble-playbook fail but second version succeed?
ansible-playbook
does not read from standard input by default, but requires a file path instead. This is a design choice.
/dev/stdin
is quite likely a symlink to /dev/fd/0
, which is a way of talking about the current process's file descriptor #0 (standard input). That's something exposed by your kernel (or system library). The ansible-playbook
command opens /dev/stdin
like a regular filesystem file and ends up reading its own standard input, which would otherwise have been ignored.
You likely also have /dev/stdout
and /dev/stderr
links to FDs 1 & 2, which you can use as well if you're telling something where to put its output.
What is the significance of passing /dev/stdin before the heredoc?
It is an argument to the ansible-playbook
command.
Why do other utilities like ruby or awk not need the /dev/stdin before the heredoc?
They read from standard input by default as a design choice, because they are made to be used in pipelines. They write to standard output for the same reason.
Solution 2
A here-document is a redirection into the standard input of a command, just like <
. This means that anywhere where you may use <
to redirect contents from a file, you may instead redirect the contents of a here-document. The POSIX standard lists here-documents along with the other redirection operators.
In your Ansible example, ansible-playbook
does not by default read from its standard input stream as it expects a filename. By giving it /dev/stdin
as the filename and then supplying the here-document on standard input, you bypass this restriction in the utility. The /dev/stdin
"file" will always contain the standard input data stream of the current process.
ruby
and awk
as well as many other utilities will read from standard input unless a filename is supplied on the command line.
So, you are technically wrong when you say "It seems like the shell thinks the heredoc is a file with contents equal to the value of the heredoc". It does not act like a file (with regards to having a filename and being seekable), but as a data stream on standard input. At least from the point of view of the utility.
The difference is the same as between
cat file
and
cat <file
In the first instance, cat
opens the file file
, but in the second (which is also what happens with a here-document), since no filename was given as an argument to cat
, cat
just reads its standard input stream (and the shell opens the file, or provides the here-document, on standard input to the utility). The utility does not need to know if the provided data comes from a file, a pipe, a here-document or some other data source.
How here-documents are implemented by the shell is in a way unimportant, but it may be through the use of a FIFO or indeed with a temporary file.
Solution 3
What exactly is going on with here-docs is depending on how shell implements here-doc: it may be either done with pipes internally as in case of dash
or with temporary file descriptor, as in bash
. So in one case it may not be possible to lseek()
, but in the other - it can be (which for average user it means you can jump around the contents of the here-doc). See related answer.
As for the case of two ansible-playbook commands, it also depends on how command is implemented (so unless you read source code you won't actually know). Some commands simply check whether or not there is a file provided, and don't support stdin
. Other commands like awk
and ruby
- they are designed to expect stdin
or a file specified on command-line.
What you can try doing, however, is if you're using Linux, run strace ansible-playbook ...<other args>
and see what things it tries to open, what syscalls occur, etc. For example, you'll see that with strace -e open tail /dev/stdin <<< "Jello World"
the tail command will actually try to open /dev/stdin
as file, whereas trace -e open tail
doesn't.
Related videos on Youtube
mbigras
Updated on September 18, 2022Comments
-
mbigras over 1 year
I'm curious about the theory behind how heredocs can be passed as a file to a command line utility.
Recently, I discovered I can pass a file as heredoc.
For example:
awk '{ split($0, arr, " "); print arr[2] }' <<EOF foo bar baz EOF bar
This is advantageous for me for several reasons:
- Heredocs improve readability for multi line inputs.
- I don't need to memorize each utilities flag for passing the file contents from the command line.
- I can use single and double quotes in the given files.
- I can control shell expansion.
For example:
ruby <<EOF puts "'hello $HOME'" EOF 'hello /Users/mbigras' ruby <<'EOF' puts "'hello $HOME'" EOF 'hello $HOME'
I'm not clear what is happening. It seems like the shell thinks the heredoc is a file with contents equal to the value of the heredoc. I've this technique used with cat, but I'm still not sure what was going on:
cat <<EOL hello world EOL hello world
I know
cat
prints the contents of a file, so presumably this heredoc is a temporary file of some kind.I'm confused about what precisely is going on when I "pass a heredoc to a command line program".
Here's an example using ansible-playbook. I pass the utility a playbook as a heredoc; however it fails, as shown using
echo $?
:ansible-playbook -i localhost, -c local <<EOF &>/dev/null --- - hosts: all gather_facts: false tasks: - name: Print something debug: msg: hello world EOF echo $? 5
However, if I pass the utility the same heredoc but preceed it with
/dev/stdin
it succeedsansible-playbook -i localhost, -c local /dev/stdin <<EOF &>/dev/null --- - hosts: all gather_facts: false tasks: - name: Print something debug: msg: hello world EOF echo $? 0
- What precisly is going on when one "passes a heredoc as a file"?
- Why does the first version with
ansible-playbook
fail but second version succeed? - What is the significance of passing
/dev/stdin
before the heredoc? - Why do other utilities like
ruby
orawk
not need the/dev/stdin
before the heredoc?