bash: Iterating over members of a JSON array selected by index

15,164

Solution 1

There's a fair bit of room for improvement. Let's start here:

v1=($(cat $INPUT | jq '."config"[$i]."var1"[]'))

...first, you don't actually need to use cat; it's slowing your performance, because it forces jq to read from a pipe rather than from your input file directly. Just running jq <"$INPUT" would be more robust (or, better, <"$input", to avoid using all-uppercase names, which are reserved by convention for shell builtins and environment variables).

Second, you need to quote all variable expansions, including the expansion of the input file's name -- otherwise, you'll get bugs whenever your filename contains spaces.

Third, array=( $(stuff) ) splits the output of stuff on all characters in IFS, and expands the results of that splitting as a series of glob expressions (so if the output contains *.txt, and you're running this script in a directory that contains text files, you get the names of those files in your result array). Splitting on newlines only would mean you could correctly parse multi-word strings, and disabling glob expansion is necessary before you can use this technique reliably in the presence of glob characters. One way to do this is to set IFS=$'\n' and run set -h before running this command; another is to redirect the output of your command into a while read loop (shown below).

Fourth, string substitution into code is bad practice in any language -- that way lies (local equivalents to) Bobby Tables, allowing someone who's supposed to be able to only change the data passed into your process to provide content which is processed as executable code (albeit, in this case, as a jq script, which is less dangerous than arbitrary code execution in a more full-featured language; still, this can allow extra data to be added to the output).

Next, once you're getting jq to emit newline-separated content, you don't need to read it into an array at all: You can iterate over the content as it's written from jq and read into your shell, thus preventing the shell from needing to allocate memory to buffer that content:

while IFS= read -r; do
  echo "read content from jq: $REPLY"
done < <(jq -r --arg i "$i" '.config[$i | tonumber].var1[]' <"$input")

Finally -- let's say you do want to work with an array. There are two ways to do this that avoid pitfalls. One is to set IFS explicitly and disable glob expansion before the assignment:

IFS=$'\n' # split only on newlines
set -f
result=( $(jq -r ... <"$input") )

The other is to assign to your array with a loop:

result=( )
while IFS= read -r; do
  result+=( "$REPLY" )
done < <(jq -r ... <"$input")

...or, as suggested by @JohnKugelman, to use read -a to read the whole array in one operation:

IFS=$'\n' read -r -d '' -a result < <(jq -r ... <"$input")

Solution 2

Variables aren't interpolated inside single quotes. Use double quotes instead, and remove the existing quotes.

v1=($(cat $INPUT | jq ".config[$i].var1[]"))

Or use the --arg option and then you can stick with single quotes.

v1=($(cat $INPUT | jq --arg i "$i" '.config[$i].var1[]'))

You could also fix the useless use of cat:

v1=($(jq ".config[$i].var1[]" "$INPUT"))

Also, see @CharlesDuffy's answer for a great, detailed explanation of why assigning to array like this is unsafe.

Solution 3

If you have already stored the result of some JSON into a variable called $MY_VAR:

while IFS= read -r; do
  echo "$REPLY"
done < <(echo "$MY_VAR" | jq -r '.[]')

Solution 4

jq is capable of extracting the structure in one go, so the entire loop is superfluous. If the input JSON contains more records than you have values in nvars, use the index to chop.

jq -r '."config-vars"[]."var1"' "$INPUT" |
head -n "${#nvars[@]}"  # If you need just the #nvars first values
Share:
15,164
odieatla
Author by

odieatla

UI Developer for now.

Updated on July 13, 2022

Comments

  • odieatla
    odieatla almost 2 years

    I'm using jq to parse a JSON file, extracting each JSON array in a series into a shell array.

    My current code looks like the following:

    for ((i = 0; i < ${#nvars[@]}; i++)); do
        v1=($(cat $INPUT | jq '."config"[i]."var1"[]'))
        echo $v1
    done
    

    error message:

    error: i is not defined
    

    I also replaced

    v1=($(cat $INPUT | jq '."config"[i]."var1"[]'))
    

    with

    v1=($(cat $INPUT | jq '."config"[$i]."var1"[]'))
    

    still not working. Any idea? Any help is appreciated!


    Edit: Sample Input Data

    {
        "config-vars":[
            {
                "var1":["v1","v2"],
                "var2":""
            },
            {
                "var1":["v3",""],
                "var2":"v4"
            }
        ]
    }
    
  • odieatla
    odieatla over 9 years
    Thanks for your help. One more question, what should I do if I have 'config-vars' instead of 'config'? I suppose that's the reason why single quote was used in the first place. Thanks very much!
  • John Kugelman
    John Kugelman over 9 years
    Try: .[\"config-vars\"][$i].
  • Charles Duffy
    Charles Duffy over 9 years
    This isn't a bad approach, but jq can be told to be more helpful -- it has a mode where output is newline-delimited, allowing more robust operation than splitting on any IFS character.
  • Charles Duffy
    Charles Duffy over 9 years
    Props for amending to suggest use of --arg; I was just commenting on that, and saw it was in.
  • Charles Duffy
    Charles Duffy over 9 years
    Another gotcha here -- if jq emits *, it'll be replaced with a list of filenames in the local directory in the resulting array.
  • odieatla
    odieatla over 9 years
    Hello @CharlesDuffy , I tried nvars=($(jq --arg i "$i" '."config-vars"[$i]."var1"[]' "$INPUT")) but getting error: jq: error: Cannot index array with string. Am I doing something wrong? Thanks!
  • Charles Duffy
    Charles Duffy over 9 years
    @odieatla, what's the value of i you're passing?
  • Charles Duffy
    Charles Duffy over 9 years
    @odieatla, ...also, it would be very helpful if your question contained sample data in the same format as what your query needs to work with.
  • Charles Duffy
    Charles Duffy over 9 years
    @odieatla, ...also, using -r (--raw-output) is a good idea; otherwise, jq's output will contain characters like quotes that the shell won't correctly parse during word-splitting, but will instead treat as literal data.
  • John Kugelman
    John Kugelman over 9 years
    Great answer. Perhaps the loop could be eliminated by using read -a?
  • Charles Duffy
    Charles Duffy over 9 years
    @JohnKugelman, hmm; I'd expect IFS=$'\n' read -r -d '' -a result to work off the top of my head, but let me actually test it...
  • Charles Duffy
    Charles Duffy over 9 years
    @JohnKugelman, ...tested and amended into the answer; thanks!
  • odieatla
    odieatla over 9 years
    @CharlesDuffy , i just the index of the array. should be an integer. added sample data to the question. Thank you!
  • Charles Duffy
    Charles Duffy over 9 years
    @odieatla, heh -- looks like jq assumes that all input via --arg is a string; that does indeed pose a problem for trying to use --arg here. I'm looking into the availability of workarounds.
  • Charles Duffy
    Charles Duffy over 9 years
    @odieatla, found a solution -- documented in github.com/stedolan/jq/issues/605, and now amended in my answer; using $i | tonumber inside the jq query will cast the value to an integer.
  • Charles Duffy
    Charles Duffy over 9 years
    @odieatla, thus: jq -r --arg i 1 '.["config-vars"][$i | tonumber].var1[]' <test.json works with the data you provided in your question.
  • odieatla
    odieatla over 9 years
    @CharlesDuffy , the tonumber works! Thanks for saving my day!
  • Charles Duffy
    Charles Duffy over 9 years
    @odieatla... btw, it turns out that I made a mistake on the read -a form in the answers (the one in the comments was fine); apologies if you were trying to use that, now corrected.
  • sasikala
    sasikala over 2 years
    How can i get multiple values here in to variables like read v1 v2=($(jq ".config[$i].var1[] .confg[$i].var2" "$INPUT"))