How do I capture stdin to a variable without stripping any trailing newlines?

12,841

Solution 1

The trailing newlines are stripped before the value is stored in the variable. You may want to do something like:

var=`cat; echo x`

and use ${var%x} instead of $var. For instance:

printf "%s" "${var%x}"

Note that this solves the trailing newlines issue, but not the null byte one (if standard input is not text), since according to POSIX command substitution:

If the output contains any null bytes, the behavior is unspecified.

But shell implementations may preserve null bytes.

Solution 2

You can use the read built-in to accomplish this:

$ IFS='' read -d '' -r foo < <(echo bar)

$ echo "<$foo>"
<bar
>

For a script to read STDIN, it'd simply be:

IFS='' read -d '' -r foo

 

I'm not sure what shells this will work in though. But works fine in both bash and zsh.

Solution 3

You can do like:

input | { var=$(sed '$s/$/./'); var=${var%.}; }

Whatever you do $var disappears as soon as you step outside of that { current shell ; } grouping anyway. But it could also work like:

var=$(input | sed '$s/$/./'); var=${var%.}
Share:
12,841

Related videos on Youtube

CommaToast
Author by

CommaToast

Updated on September 18, 2022

Comments

  • CommaToast
    CommaToast over 1 year

    In a shell script...

    How do I capture stdin to a variable without stripping any trailing newlines?

    Right now I have tried:

    var=`cat`
    var=`tee`
    var=$(tee)
    

    In all cases $var will not have the trailing newline of the input stream. Thanks.

    ALSO: If there is no trailing newline in the input, then the solution must not add one.

    UPDATE IN LIGHT OF THE ACCEPTED ANSWER:

    The final solution that I used in my code is as follows:

    function filter() {
        #do lots of sed operations
        #see https://github.com/gistya/expandr for full code
    }
    
    GIT_INPUT=`cat; echo x`
    FILTERED_OUTPUT=$(printf '%s' "$GIT_INPUT" | filter)
    FILTERED_OUTPUT=${FILTERED_OUTPUT%x}
    printf '%s' "$FILTERED_OUTPUT"
    

    If you would like to see the full code, please see the github page for expandr, a little open-source git keyword-expansion filter shell script that I developed for information security purposes. According to rules set up in .gitattributes files (which can be branch-specific) and git config, git pipes each file through the expandr.sh shell script whenever checking it in or out of the repository. (That is why it was critical to preserve any trailing newlines, or lack thereof.) This lets you cleanse sensitive information, and swap in different sets of environment-specific values for test, staging, and live branches.

    • Admin
      Admin over 9 years
      what you do here is not necessary. filter takes stdin - it runs sed. You catch stdin in $GIT_INPUT then print that back to stdout over a pipe to filter and catch its stdout in $FILTERED_OUTPUT and then print it back to stdout. All 4 lines at the bottom of your example above could be replaced with just this: filter. No offense meant here, it's just... you're working too hard. You don't need the shell variables most of the time - just direct the input to the right place and pass it on.
    • Admin
      Admin over 9 years
      No, what I do here is necessary because if I just do filter, then it will add newline characters to the ends of any input streams that did not end in newlines initially. In fact I originally did just do filter but ran into that problem which led me to this solution because neither "always add newlines" nor "always strip newlines" are acceptable solutions.
    • Admin
      Admin over 9 years
      sed probably will do the extra newline - but you should handle that in filter not with all the rest. And all of those functions that you have basically do the same thing - a sed s///. You're using the shell to pipe data it has saved in its memory to sed so that sed might replace that data with other data that the shell has stored in its memory so sed can pipe it back to the shell. Why not just [ "$var" = "$condition" ] && var=new_value? I also don't get the arrays - are you storing the array name in [0] then using sed to replace that with the value in [1]? Maybe chat?
    • Admin
      Admin over 9 years
      @mikeserv - What would be the benefit of moving that code inside filter? It works perfectly as-is. Regarding how the code at my link works and why I set it up the way that I did, yeah, lets talk about it in a chat room.
  • CommaToast
    CommaToast over 9 years
    Would text files typically contain null bytes? I can't see why they would. But the script that you just mentioned does not seem work.
  • vinc17
    vinc17 over 9 years
    @CommaToast Text files don't contain null bytes. But the question just says stdin / input stream, which may not be text in the most general case.
  • CommaToast
    CommaToast over 9 years
    OK. Well I tried it from the command line and it didn't do anything, and from within my script itself, your suggestion fails because it adds "..." at the end of the file. Also if there was no newline there, then it still adds one.
  • vinc17
    vinc17 over 9 years
    @CommaToast The "..." was just an example. I've clarified my answer. No newline is added (see the text before the "..." in the example).
  • CommaToast
    CommaToast over 9 years
    Ok, so am I understanding this correctly: you add an "x" so it no longer has a trailing newline but a trailing "x" instead, then you remove the trailing "x" when you want to use the var? If so then couldn't you do var=${var%x} after the first line and that way, just deal with $var thenceforth?
  • vinc17
    vinc17 over 9 years
    @CommaToast Yes, you can do var=${var%x}, and then use $var. It's better if you use $var several times, otherwise it is rather useless since you can use ${var%x} directly.
  • vinc17
    vinc17 over 9 years
    @CommaToast And with zsh, you can do var=${$(your_command; echo x)%x} directly, but this doesn't work with other shells.
  • CommaToast
    CommaToast over 9 years
    Why do you suppose they strip 0x0a bytes off the end of streams like that? It seems rather rude. And I thought ISIS was barbaric! BTW I think you should take the "..." out of your answer; it seems really irrelevant and unhelpful.
  • vinc17
    vinc17 over 9 years
    @CommaToast I've removed the ...\n as you suggested, but note that some shells hide the last line if it doesn't end with a newline character. So, people need to test with zsh or similar.
  • mikeserv
    mikeserv over 9 years
    @vinc17 - if it is a loop I desired to use, then I would use it in place of the {} braces .It is true - and is explicitly noted in the answer - that the value for $var is very likely to disappear entirely when the { current shell; } grouping is closed. Is there some more explicit way to say it than, Whatever you do $var disappears...?
  • mikeserv
    mikeserv over 9 years
    @CommaToast - Using a tool like sed which can be used to append only a single byte which will not append anything but a single byte is something that can be relied upon in any shell. You also don't need to do the two commands. vinc17's answer here would be far more portable if printf were used in place of echo - with printf you can do printf x - and no newline is appended. So, in zsh and similar shells that preserve newlines in a command substitution subshell the behavior is identical - there are no nasty surprises.
  • CommaToast
    CommaToast over 9 years
    Well, shells shouldn't hide things, that is not cool. Those shells ought to be fired. I don't like it when my computer thinks it knows better than me.
  • vinc17
    vinc17 over 9 years
    @mikeserv No, zsh does not preserve newlines in command substitution: var=$(echo foo); printf "%s...\n" "$var" doesn't output a newline between "foo" and "...".
  • mikeserv
    mikeserv over 9 years
    @vinc17 - probably the best way, though: input | sed "s/'"'/&"&"&/g;s/.*/process2 '"'-> &'/" | sh
  • vinc17
    vinc17 over 9 years
    @CommaToast This is equivalent: echo x adds a newline, but it immediately gets stripped in command substitution. echo was just faster to type than printf. :) Also that since echo is simpler than printf, it is more likely to be a builtin. Just in case...
  • mikeserv
    mikeserv over 9 years
    @vinc17 - a subshell doens't mean you can't get the variable's value - though it can be harder to do reliably. This three-line function does pretty well. Though I've just realized it needs a ${a:-continue} or something. You can use it like sq * to pipe out shell-quoted whatever. Anyway, you still haven't answered the question - how can I be more explicit than Whatever you do...?
  • mikeserv
    mikeserv over 9 years
    @vinc17 - answer to what? And it wasn't modified - except that while read ... might have mauled it and you added a ->. You mean like (input;printf .)|(var=$(sed -n '$p'); process2 "${var%.}"); still - I wouldn't do it that way. It's inefficient. Just input| sed "\$!d;s/'"'/&"&"&/g;s/.*/process2 '"'&'/" | sh
  • CommaToast
    CommaToast over 9 years
    Yep you're right. Actually the way I did it had weird effects to. Echo was better than printf in this situation :D
  • CommaToast
    CommaToast over 9 years
    @vinc17 You can see the full code I'm using this for here: github.com/gistya/expandr It's a script that acts as a keyword expansion filter for git. Sed is used extensively. If the newlines were frakked with by the script then needless diffs would be generated. Thanks for your help.
  • mikeserv
    mikeserv over 9 years
    @vinc17 - what are you on about? Ive already stated repeatedly that the variable's value is lost - Whatever you do... I did a similar thing though - as is linked above - w/ file descriptors, here-documents, and input aggregation just this morning. I dont know what your fct[1-4] do - and it maybe youre right - but it doesnt answer the very simple question that i have already repeatedly asked you. Why do you keep on?
  • chepner
    chepner over 9 years
    Neither -d nor the process substitution (<(...)) are portable; this code will not work in dash, for instance.
  • chepner
    chepner over 9 years
    @vinc17 Even if echo is a built-in and printf is not, the command substitution would force a subshell to be started to execute the built-in, so there's no efficiency gain there.
  • vinc17
    vinc17 over 9 years
    @chepner This depends on the shell. With mksh, there are 2 clones with printf while there's only one with echo. Compare strace -f -o out mksh -c 'var=$(echo a; printf x)' and strace -f -o out mksh -c 'var=$(echo a; echo x)'.
  • CommaToast
    CommaToast over 9 years
    I realize I should've just used CLI PHP or Ruby, LOL, but I was too far into the project to turn back from shell scripting. Lesson learned.
  • chepner
    chepner over 9 years
    @vinc17 I thought you were implying that $(echo) would be faster than $(printf) if only the first used a built-in.
  • mikeserv
    mikeserv over 9 years
    @vinc17 - this is because mksh doesn't have a printf - in that shell you call whatever is the printf binary in path or you don't call one at all unless you do what mirabilos describes as some ugly hacks at build time to compile in a printf builtin. You can use - and likely should in ksh variants - the print builtin. With echo you get a clone for the subshell, but with printf you have to clone twice - once for the subshell and once for the command execution environment when it is execed.
  • mikeserv
    mikeserv over 9 years
    @vinc17 - while fct1; do fct2 | fct3; done | fct4. I would not write shell functions that cannot work together. I don't understand why $var is involved at all - what for? If the shell function doesn't explicitly set the value of the variable in the current shell because instead the shell function must run some outside program, then you just pipe it out - why capture stdin if your only purpose is to pass it on? Just pass it. Else, have those functions work in concert in the current shell. Your problem is not an input problem it is a design problem. Your code makes it cumbersome.
  • vinc17
    vinc17 over 9 years
    @mikeserv Yes, I gave this example based on what I said above: echo is more likely to be a builtin than printf, and mksh is an example. But note that there is another important point in case where the last command is not a builtin: like bash, mksh does a useless clone. Both dash and zsh optimize by avoiding a clone call. This can be seen with strace after replacing printf by /usr/bin/printf (to test the case where the last command is not a builtin). If mksh had this optimization, printf would be equivalent to echo concerning clone calls.
  • mikeserv
    mikeserv over 9 years
    @chepner - while the style differs slightly, the concept certainly does work in dash. You just use <<HEREDOC\n$(gen input)\nHEREDOC\n - in dash - which uses pipes for heredocs the same way other shells use them for process substitution - it makes no difference. The read -d thing is just specifying a delimiter - you can do the same a dozen ways - just be sure about it. Though you will need some tail to gen input.
  • mikeserv
    mikeserv over 9 years
    @vinc17 - I'm not sure I'm following, but I just did this: for sh in da ba z; do strace -c ${sh}sh -c 'echo | /usr/bin/true'; done 2>&1 | grep clone. I get 2 2 2.
  • vinc17
    vinc17 over 9 years
    @mikeserv You forgot the -f option, and the command is not the correct one (see above with var=...). Try: for sh in da ba z; do strace -f -c ${sh}sh -c 'var=$(echo ; /usr/bin/true)'; done 2>&1 | grep clone (but on Debian, this is /bin/true).
  • vinc17
    vinc17 over 9 years
    Let's take an example based on a real-world sh script: autoconf (version 2.69). One has arg= with command substitution in a case construct, where $arg is used outside the case. Let's say that one would want to keep the trailing newlines from the sed. With the pipeline solution (the first one), it is not obvious to update the script to get the wanted behavior (it actually appears to be impossible to me without major changes, but I would be interested in seeing a solution if there is one).
  • mikeserv
    mikeserv over 9 years
    @vinc17 - no. autoconf is awful. It is written by people that don't understand shell. And it doesn't matter - you still have not answered the question. And the sed thing is really easy: sed '...$s/$/./'` - see? You do it with sed. And why are you doing command subs in a case statement?
  • vinc17
    vinc17 over 9 years
    There's also the _variables function of bash_completion, which stores the result of a command substitution in a global variable COMPREPLY. If a pipeline solution were used to keep newlines, the result would be lost. In your answer, one has the impression that both solutions are equally good. Moreover it should be noted that the pipeline solution behavior heavily depends on the shell: a user could test echo foo | { var=$(sed '$s/$/./'); var=${var%.}; } ; echo $var with ksh93 and zsh, and thinks that it is OK, while this code is buggy.
  • vinc17
    vinc17 over 9 years
    You did not say "it doesn't work". You just said "$var disappears" (which is actually not true since this depends on the shell — the behavior is unspecified by POSIX), which is a rather neutral sentence. The second solution is better because it doesn't suffer from this problem, and its behavior is consistent in all POSIX shells.
  • mikeserv
    mikeserv over 9 years
    @vinc17 - unspecified means it disappears. There is no guarantee. It is Schrodingers cat. Unspecified is not neutral when it comes to a specification. Unspecified is the opposite of a specification. It is anathema.
  • CommaToast
    CommaToast over 9 years
    You set IFS='' so it doesn't put spaces in between the lines it reads in eh? Cool trick.
  • vinc17
    vinc17 over 9 years
    No, it is just unspecified by POSIX, which allows two possible valid behaviors, depending on whether the last command of the pipeline runs in a subshell or not. On the other hand, implementations have their own specification. With zsh, the last command of a pipeline is guaranteed to run in the current shell. With ksh93, the last command may run in a subshell or not, so that there are two possible behaviors.
  • phemmer
    phemmer over 9 years
    Actually in this case IFS='' probably isn't necessary. It's meant so that read won't collapse spaces. But when it's reading into a single variable, it has no effect (that I can recall). But I just feel safer leaving it on :-)
  • vinc17
    vinc17 over 9 years
    Concerning cat, the answer is at the beginning of the OP's question: it's inside a shell script, and the OP is interested in capturing the standard input.