How do I capture stdin to a variable without stripping any trailing newlines?
Solution 1
The trailing newlines are stripped before the value is stored in the variable. You may want to do something like:
var=`cat; echo x`
and use ${var%x}
instead of $var
. For instance:
printf "%s" "${var%x}"
Note that this solves the trailing newlines issue, but not the null byte one (if standard input is not text), since according to POSIX command substitution:
If the output contains any null bytes, the behavior is unspecified.
But shell implementations may preserve null bytes.
Solution 2
You can use the read
built-in to accomplish this:
$ IFS='' read -d '' -r foo < <(echo bar)
$ echo "<$foo>"
<bar
>
For a script to read STDIN, it'd simply be:
IFS='' read -d '' -r foo
I'm not sure what shells this will work in though. But works fine in both bash and zsh.
Solution 3
You can do like:
input | { var=$(sed '$s/$/./'); var=${var%.}; }
Whatever you do $var
disappears as soon as you step outside of that { current shell ; }
grouping anyway. But it could also work like:
var=$(input | sed '$s/$/./'); var=${var%.}
Related videos on Youtube
CommaToast
Updated on September 18, 2022Comments
-
CommaToast over 1 year
In a shell script...
How do I capture stdin to a variable without stripping any trailing newlines?
Right now I have tried:
var=`cat` var=`tee` var=$(tee)
In all cases
$var
will not have the trailing newline of the input stream. Thanks.ALSO: If there is no trailing newline in the input, then the solution must not add one.
UPDATE IN LIGHT OF THE ACCEPTED ANSWER:
The final solution that I used in my code is as follows:
function filter() { #do lots of sed operations #see https://github.com/gistya/expandr for full code } GIT_INPUT=`cat; echo x` FILTERED_OUTPUT=$(printf '%s' "$GIT_INPUT" | filter) FILTERED_OUTPUT=${FILTERED_OUTPUT%x} printf '%s' "$FILTERED_OUTPUT"
If you would like to see the full code, please see the github page for expandr, a little open-source git keyword-expansion filter shell script that I developed for information security purposes. According to rules set up in .gitattributes files (which can be branch-specific) and git config, git pipes each file through the expandr.sh shell script whenever checking it in or out of the repository. (That is why it was critical to preserve any trailing newlines, or lack thereof.) This lets you cleanse sensitive information, and swap in different sets of environment-specific values for test, staging, and live branches.
-
Admin over 9 yearswhat you do here is not necessary.
filter
takesstdin
- it runssed
. You catchstdin
in$GIT_INPUT
then print that back tostdout
over a pipe tofilter
and catch itsstdout
in$FILTERED_OUTPUT
and then print it back tostdout
. All 4 lines at the bottom of your example above could be replaced with just this:filter
. No offense meant here, it's just... you're working too hard. You don't need the shell variables most of the time - just direct the input to the right place and pass it on. -
Admin over 9 yearsNo, what I do here is necessary because if I just do
filter
, then it will add newline characters to the ends of any input streams that did not end in newlines initially. In fact I originally did just dofilter
but ran into that problem which led me to this solution because neither "always add newlines" nor "always strip newlines" are acceptable solutions. -
Admin over 9 years
sed
probably will do the extra newline - but you should handle that infilter
not with all the rest. And all of those functions that you have basically do the same thing - ased s///
. You're using the shell to pipe data it has saved in its memory tosed
so thatsed
might replace that data with other data that the shell has stored in its memory sosed
can pipe it back to the shell. Why not just[ "$var" = "$condition" ] && var=new_value
? I also don't get the arrays - are you storing the array name in[0]
then usingsed
to replace that with the value in[1]
? Maybe chat? -
Admin over 9 years@mikeserv - What would be the benefit of moving that code inside
filter
? It works perfectly as-is. Regarding how the code at my link works and why I set it up the way that I did, yeah, lets talk about it in a chat room.
-
-
CommaToast over 9 yearsWould text files typically contain null bytes? I can't see why they would. But the script that you just mentioned does not seem work.
-
vinc17 over 9 years@CommaToast Text files don't contain null bytes. But the question just says stdin / input stream, which may not be text in the most general case.
-
CommaToast over 9 yearsOK. Well I tried it from the command line and it didn't do anything, and from within my script itself, your suggestion fails because it adds "..." at the end of the file. Also if there was no newline there, then it still adds one.
-
vinc17 over 9 years@CommaToast The "..." was just an example. I've clarified my answer. No newline is added (see the text before the "..." in the example).
-
CommaToast over 9 yearsOk, so am I understanding this correctly: you add an "x" so it no longer has a trailing newline but a trailing "x" instead, then you remove the trailing "x" when you want to use the var? If so then couldn't you do
var=${var%x}
after the first line and that way, just deal with $var thenceforth? -
vinc17 over 9 years@CommaToast Yes, you can do
var=${var%x}
, and then use$var
. It's better if you use$var
several times, otherwise it is rather useless since you can use${var%x}
directly. -
vinc17 over 9 years@CommaToast And with zsh, you can do
var=${$(your_command; echo x)%x}
directly, but this doesn't work with other shells. -
CommaToast over 9 yearsWhy do you suppose they strip 0x0a bytes off the end of streams like that? It seems rather rude. And I thought ISIS was barbaric! BTW I think you should take the "..." out of your answer; it seems really irrelevant and unhelpful.
-
vinc17 over 9 years@CommaToast I've removed the
...\n
as you suggested, but note that some shells hide the last line if it doesn't end with a newline character. So, people need to test with zsh or similar. -
mikeserv over 9 years@vinc17 - if it is a loop I desired to use, then I would use it in place of the
{}
braces .It is true - and is explicitly noted in the answer - that the value for$var
is very likely to disappear entirely when the{ current shell; }
grouping is closed. Is there some more explicit way to say it than, Whatever you do$var
disappears...? -
mikeserv over 9 years@CommaToast - Using a tool like
sed
which can be used to append only a single byte which will not append anything but a single byte is something that can be relied upon in any shell. You also don't need to do the two commands. vinc17's answer here would be far more portable ifprintf
were used in place ofecho
- withprintf
you can doprintf x
- and no newline is appended. So, inzsh
and similar shells that preserve newlines in a command substitution subshell the behavior is identical - there are no nasty surprises. -
CommaToast over 9 yearsWell, shells shouldn't hide things, that is not cool. Those shells ought to be fired. I don't like it when my computer thinks it knows better than me.
-
vinc17 over 9 years@mikeserv No, zsh does not preserve newlines in command substitution:
var=$(echo foo); printf "%s...\n" "$var"
doesn't output a newline between "foo" and "...". -
mikeserv over 9 years@vinc17 - probably the best way, though:
input | sed "s/'"'/&"&"&/g;s/.*/process2 '"'-> &'/" | sh
-
vinc17 over 9 years@CommaToast This is equivalent:
echo x
adds a newline, but it immediately gets stripped in command substitution.echo
was just faster to type thanprintf
. :) Also that sinceecho
is simpler thanprintf
, it is more likely to be a builtin. Just in case... -
mikeserv over 9 years@vinc17 - a subshell doens't mean you can't get the variable's value - though it can be harder to do reliably. This three-line function does pretty well. Though I've just realized it needs a
${a:-continue}
or something. You can use it likesq *
to pipe out shell-quoted whatever. Anyway, you still haven't answered the question - how can I be more explicit than Whatever you do...? -
mikeserv over 9 years@vinc17 - answer to what? And it wasn't modified - except that
while read ...
might have mauled it and you added a->
. You mean like(input;printf .)|(var=$(sed -n '$p'); process2 "${var%.}")
; still - I wouldn't do it that way. It's inefficient. Justinput| sed "\$!d;s/'"'/&"&"&/g;s/.*/process2 '"'&'/" | sh
-
CommaToast over 9 yearsYep you're right. Actually the way I did it had weird effects to. Echo was better than printf in this situation :D
-
CommaToast over 9 years@vinc17 You can see the full code I'm using this for here: github.com/gistya/expandr It's a script that acts as a keyword expansion filter for git. Sed is used extensively. If the newlines were frakked with by the script then needless diffs would be generated. Thanks for your help.
-
mikeserv over 9 years@vinc17 - what are you on about? Ive already stated repeatedly that the variable's value is lost - Whatever you do... I did a similar thing though - as is linked above - w/ file descriptors, here-documents, and input aggregation just this morning. I dont know what your
fct[1-4]
do - and it maybe youre right - but it doesnt answer the very simple question that i have already repeatedly asked you. Why do you keep on? -
chepner over 9 yearsNeither
-d
nor the process substitution (<(...)
) are portable; this code will not work indash
, for instance. -
chepner over 9 years@vinc17 Even if
echo
is a built-in andprintf
is not, the command substitution would force a subshell to be started to execute the built-in, so there's no efficiency gain there. -
vinc17 over 9 years@chepner This depends on the shell. With mksh, there are 2 clones with
printf
while there's only one withecho
. Comparestrace -f -o out mksh -c 'var=$(echo a; printf x)'
andstrace -f -o out mksh -c 'var=$(echo a; echo x)'
. -
CommaToast over 9 yearsI realize I should've just used CLI PHP or Ruby, LOL, but I was too far into the project to turn back from shell scripting. Lesson learned.
-
chepner over 9 years@vinc17 I thought you were implying that
$(echo)
would be faster than$(printf)
if only the first used a built-in. -
mikeserv over 9 years@vinc17 - this is because
mksh
doesn't have aprintf
- in that shell you call whatever is theprintf
binary in path or you don't call one at all unless you do what mirabilos describes as some ugly hacks at build time to compile in aprintf
builtin. You can use - and likely should inksh
variants - theprint
builtin. Withecho
you get a clone for the subshell, but withprintf
you have to clone twice - once for the subshell and once for the command execution environment when it isexec
ed. -
mikeserv over 9 years@vinc17 -
while fct1; do fct2 | fct3; done | fct4
. I would not write shell functions that cannot work together. I don't understand why$var
is involved at all - what for? If the shell function doesn't explicitly set the value of the variable in the current shell because instead the shell function must run some outside program, then you just pipe it out - why capturestdin
if your only purpose is to pass it on? Just pass it. Else, have those functions work in concert in the current shell. Your problem is not an input problem it is a design problem. Your code makes it cumbersome. -
vinc17 over 9 years@mikeserv Yes, I gave this example based on what I said above:
echo
is more likely to be a builtin thanprintf
, and mksh is an example. But note that there is another important point in case where the last command is not a builtin: like bash, mksh does a uselessclone
. Both dash and zsh optimize by avoiding aclone
call. This can be seen with strace after replacingprintf
by/usr/bin/printf
(to test the case where the last command is not a builtin). If mksh had this optimization,printf
would be equivalent toecho
concerningclone
calls. -
mikeserv over 9 years@chepner - while the style differs slightly, the concept certainly does work in
dash
. You just use<<HEREDOC\n$(gen input)\nHEREDOC\n
- indash
- which uses pipes for heredocs the same way other shells use them for process substitution - it makes no difference. Theread -d
thing is just specifying a delimiter - you can do the same a dozen ways - just be sure about it. Though you will need some tail togen input
. -
mikeserv over 9 years@vinc17 - I'm not sure I'm following, but I just did this:
for sh in da ba z; do strace -c ${sh}sh -c 'echo | /usr/bin/true'; done 2>&1 | grep clone
. I get2 2 2
. -
vinc17 over 9 years@mikeserv You forgot the
-f
option, and the command is not the correct one (see above withvar=...
). Try:for sh in da ba z; do strace -f -c ${sh}sh -c 'var=$(echo ; /usr/bin/true)'; done 2>&1 | grep clone
(but on Debian, this is/bin/true
). -
vinc17 over 9 yearsLet's take an example based on a real-world sh script:
autoconf
(version 2.69). One hasarg=
with command substitution in acase
construct, where$arg
is used outside thecase
. Let's say that one would want to keep the trailing newlines from thesed
. With the pipeline solution (the first one), it is not obvious to update the script to get the wanted behavior (it actually appears to be impossible to me without major changes, but I would be interested in seeing a solution if there is one). -
mikeserv over 9 years@vinc17 - no.
autoconf
is awful. It is written by people that don't understand shell. And it doesn't matter - you still have not answered the question. And thesed
thing is really easy: sed '...$s/$/./'` - see? You do it withsed
. And why are you doing command subs in a case statement? -
vinc17 over 9 yearsThere's also the
_variables
function ofbash_completion
, which stores the result of a command substitution in a global variableCOMPREPLY
. If a pipeline solution were used to keep newlines, the result would be lost. In your answer, one has the impression that both solutions are equally good. Moreover it should be noted that the pipeline solution behavior heavily depends on the shell: a user could testecho foo | { var=$(sed '$s/$/./'); var=${var%.}; } ; echo $var
with ksh93 and zsh, and thinks that it is OK, while this code is buggy. -
vinc17 over 9 yearsYou did not say "it doesn't work". You just said "
$var
disappears" (which is actually not true since this depends on the shell — the behavior is unspecified by POSIX), which is a rather neutral sentence. The second solution is better because it doesn't suffer from this problem, and its behavior is consistent in all POSIX shells. -
mikeserv over 9 years@vinc17 - unspecified means it disappears. There is no guarantee. It is Schrodingers cat. Unspecified is not neutral when it comes to a specification. Unspecified is the opposite of a specification. It is anathema.
-
CommaToast over 9 yearsYou set IFS='' so it doesn't put spaces in between the lines it reads in eh? Cool trick.
-
vinc17 over 9 yearsNo, it is just unspecified by POSIX, which allows two possible valid behaviors, depending on whether the last command of the pipeline runs in a subshell or not. On the other hand, implementations have their own specification. With zsh, the last command of a pipeline is guaranteed to run in the current shell. With ksh93, the last command may run in a subshell or not, so that there are two possible behaviors.
-
phemmer over 9 yearsActually in this case
IFS=''
probably isn't necessary. It's meant so thatread
won't collapse spaces. But when it's reading into a single variable, it has no effect (that I can recall). But I just feel safer leaving it on :-) -
vinc17 over 9 yearsConcerning
cat
, the answer is at the beginning of the OP's question: it's inside a shell script, and the OP is interested in capturing the standard input.