While loop for bash scripting to read either stdin or arguments
The change is to this chunk
while read name; do
efetch -db nucleotide -id $name -format gpc > $name.xml;
done < "$@"
which makes the efetch
in the loop run with its standard input redirected to the file given by the arguments. So that makes two changes to the way efetch
is used:
- its standard input is no longer the default (terminal)
- its list of parameters is no longer literally the command-line parameters for the script, but indirectly, from a file.
If efetch
detects that its input is not a terminal, it could very well reopen the terminal directly (perhaps that is what you are referring to as "efetch accepts stdin instead of an id"). Alternatively, if efetch
is reading its stdin, it could read something unexpected (in a quick test, that seems to be the script itself).
@chepner pointed out that the shell (bash in this case) does not spawn a subprocess for the loop. I had in mind a different case which does. Consider these two scripts:
#!/bin/bash
LAST=...
while read name
do
/bin/echo "** $name"
LAST="$name"
done < "$@"
echo "...$LAST"
and
#!/bin/bash
LAST=...
cat "$@" | while read name
do
/bin/echo "** $name"
LAST="$name"
done
echo "...$LAST"
The latter (pipe) will echo "......" at the end, while the former (redirection) echoes the last variable assigned to LAST
within the loop. The form using a pipe is sometimes commented on as requiring a subprocess to account for the reason why variable assignments are not propagated out of the loop.
Interestingly enough, there are differences between shells for the latter (a pipe) regarding the number of processes used. Testing with (Debian/testing) bash, dash (/bin/sh), zsh and ksh93, using strace -fo
to capture system calls and process ids:
#!/bin/sh
for sh in bash dash zsh ksh93
do
echo "++ $sh"
strace -fo $sh.log ./do-$sh ./once
LC=$(sed -e 's/ .*//' $sh.log |sort -u |wc -l)
WC=$(wc -l $sh.log)
echo "-- $LC / $WC"
done
The script shows the number of processes and the number of system calls for each shell. (The file once
contains two lines: "first" and "second", to eliminate one testing boundary).
I see that zsh and ksh93 use one process fewer than bash and dash:
$ ./testit
++ bash
** first
** second
......
-- 5 / 401 bash.log
++ dash
** first
** second
......
-- 5 / 222 dash.log
++ zsh
** first
** second
...second
-- 4 / 568 zsh.log
++ ksh93
** first
** second
...second
-- 4 / 336 ksh93.log
Running the pipe takes 1 or 2 more processes than using a here-document for this example.
Related videos on Youtube
ahelix
Updated on September 18, 2022Comments
-
ahelix almost 2 years
I'm playing around with the accepted answer from this thread: Bash script that reads filenames from a pipe or from command line args?
When I use the below script, efetch (ftp://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/edirect.zip) accepts an id (argument, for example, 941241313) but not stdin.
if [ $# -gt 0 ] ;then for name in "$@"; do efetch -db nucleotide -id $name -format gpc > $name.xml; done else IFS=$'\n' read -d '' -r -a filenames while read name; do efetch -db nucleotide -id $name -format gpc > $name.xml; done < "${filenames[@]}" fi
When I modify it to the below version, efetch accepts stdin instead of an id
if [ $# -gt 0 ] ;then while read name; do efetch -db nucleotide -id $name -format gpc > $name.xml; done < "$@" else IFS=$'\n' read -d '' -r -a filenames while read name; do efetch -db nucleotide -id $name -format gpc > $name.xml; done < "${filenames[@]}" fi
What's wrong?
-
chepner over 8 yearsYou can't read from multiple files by feeding an array expansion to the
<
operator; that's just a syntax error.
-
-
chepner over 8 yearsThere's no (additional) subprocess involved with redirection, but the point about
efetch
detecting if its standard input is a terminal or not stands. -
Marius over 8 yearsI see (will amend).