Read lines into array, one element per line using bash

22,905

Solution 1

TL;DR

In bash:

readarray -t arr2 < <(git … )
printf '%s\n' "${arr2[@]}"

There are two distinct problems on your question

  1. Shell splitting.
    When you did:

    arr1=($(git … ))
    

    the "command expansion" is unquoted, and so: it is subject to shell split and glob.

    The exactly see what that shell splitting do, use printf:

    $ printf '<%s>  '  $(echo word '"one simple sentence"')
    <word>  <"one>  <simple>  <sentence">
    

    That would be avoided by quoting:

    $ printf '<%s>  '  "$(echo word '"one simple sentence"')"
    <word "one simple sentence">
    

    But that, also, would avoid the splitting on newlines that you want.

  2. Pipe
    When you executed:

    git … | … | … | readarray arr2
    

    The array variable arr2 got set but it went away when the pipe (|) was closed.

    You could use the value if you stay inside the last subshell:

    $ printf '%s\n' "First value." "Second value." | 
            { readarray -t arr2; printf '%s\n' "${arr2[@]}"; }
    First value.
    Second value.
    

    But the value of arr2 will not survive out of the pipe.

Solution(s)

You need to use read to split on newlines but not with a pipe.
From older to newer:

  1. Loop.
    For old shells without arrays (using positional arguments, the only quasi-array):

    set --
    while IFS='' read -r value; do
        set -- "$@" "$value"
    done <<-EOT
    $(printf '%s\n' "First value." "Second value.")
    EOT
    
    printf '%s\n' "$@"
    

    To set an array (ksh, zsh, bash)

    i=0; arr1=()
    while IFS='' read -r value; do
        arr1+=("$value")
    done <<-EOT
    $(printf '%s\n' "First value." "Second value.")
    EOT
    
    printf '%s\n' "${arr1[@]}"
    
  2. Here-string
    Instead of the here document (<<) we can use a here-string (<<<):

    i=0; arr1=()
    while IFS='' read -r value; do
        arr1+=("$value")
    done <<<"$(printf '%s\n' "First value." "Second value.")"
    
    printf '%s\n' "${arr1[@]}"
    
  3. Process substitution
    In shells that support it (ksh, zsh, bash) you can use <( … ) to replace the here-string:

    i=0; arr1=()
    while IFS='' read -r value; do
        arr1+=("$value")
    done < <(printf '%s\n' "First value." "Second value.")
    
    printf '%s\n' "${arr1[@]}"
    

    With differences: <( ) is able to emit NUL bytes while a here-string might remove (or emit a warning) the NULs. A here-string adds a trailing newline by default. There may be others AFAIK.

  4. readarray
    Use readarray in bash[a] (a.k.a mapfile) to avoid the loop:

    readarray -t arr2 < <(printf '%s\n' "First value." "Second value.")
    printf '%s\n' "${arr2[@]}"
    

[a]In ksh you will need to use read -A, which clears the variable before use, but needs some "magic" to split on newlines and read the whole input at once.

IFS=$'\n' read -d '' -A arr2 < <(printf '%s\n' "First value." "Second value.")

You will need to load a mapfile module in zsh to do something similar.

Solution 2

When you piped to readarray, you started a subshell which correctly populated an arr2 array, but then exited. Use process substitution as the input to readarray:

readarray -t arr2 < <(git ...)

Solution 3

You are close

This is a "file names with space character" problem.

By default, the separator is the space character. It is the IFS environment variable that sets this.

To change temporarily your environment variable use this:

ifs_backup=$IFS
IFS=$(echo -en "\n\b")

Then your output for this command:

for a in "${arr1[@]}"; do echo "$a"; done

will be:

"Directory Name/File B.txt"
"File A.txt"

To restore IFS :

IFS=$ifs_backup

Solution 4

Add quotes to the internal part of the code:

arr1=("$(git status --porcelain | grep "^.\w" | cut -c 4-)")
Share:
22,905
namsta
Author by

namsta

I am a high school student at the School of Science and Engineering (SEM) in Dallas ISD with an interest in machine learning, robotics, and math. GitHub: arjvik

Updated on September 18, 2022

Comments

  • namsta
    namsta over 1 year

    I am trying to get a bash array of all the unstaged modifications of files in a directory (using Git). The following code works to print out all the modified files in a directory:

    git -C $dir/.. status --porcelain | grep "^.\w" | cut -c 4-
    

    This prints

    "Directory Name/File B.txt"
    "File A.txt"
    

    I tried using

    arr1=($(git status --porcelain | grep "^.\w" | cut -c 4-))
    

    but then

    for a in "${arr1[@]}"; do echo "$a"; done
    

    (both with and without the quotes around ${arr1[@]} prints

    "Directory
    Name/File
    B.txt"
    "File
    A.txt"
    

    I also tried

    git -C $dir/.. status --porcelain | grep "^.\w" | cut -c 4- | readarray arr2
    

    but then

    for a in "${arr2[@]}"; do echo "$a"; done
    

    (both with and without the quotes around ${arr2[@]}) prints nothing. Using declare -a arr2 beforehand does absolutely nothing either.


    My question is this: How can I read in these values into an array? (This is being used for my argos plugin gitbar, in case it matters, so you can see all my code).

  • namsta
    namsta over 5 years
    Thank you so much for your clear answer. I have one question: You suggested a here-string (<<<"$(...)"), Jeff Schaller suggested using process substitution (< <(...)). Will there be a difference between the two, perhaps performance or spawning more processes? In the latter, will Bash still write to the temp file, and then read that back in, or will it be smart enough to directly pipe input from the git commands into readarray? If they both end up the same, I feel like < <(...) syntax looks cleaner, but I would rather not use that syntax if it actually creates and writes to a file.
  • done
    done over 5 years
    All: <<< like << and <(…) use a temporary file so that commands that lseek their stdin can use them. Use any (in terms of speed or number of processes spawned).
  • namsta
    namsta over 5 years
    Does piping also use a temporary file? If not, why doesn't <<< use the same mechanism as piping?
  • done
    done over 5 years
    Does piping also use a temporary file? No. Don't know of any condition or shell that does. ……*why doesn't <<< use the same mechanism as piping?* That is a great question for a shell developer, or as a new question. I believe (personal opinion) that it got implemented as a sub-shell the first time around and that concept has worked well enough until now, no need to change? Legacy? You pick.