Parse JSON using Python?

78,981

Solution 1

If you would use:

 $ cat members.json | \
     python -c 'import json,sys;obj=json.load(sys.stdin);print obj;'

you can inspect the structure of the nested dictonary obj and see that your original line should read:

$ cat members.json | \
    python -c 'import json,sys;obj=json.load(sys.stdin);print obj["hits"]["hits"][0]["_source"]["'$1'"]';

to the to that "memberId" element. This way you can keep the Python as a oneliner.

If there are multiple elements in the nested "hits" element, then you can do something like:

$ cat members.json | \
python -c '
import json, sys
obj=json.load(sys.stdin)
for y in [x["_source"]["'$1'"] for x in obj["hits"]["hits"]]:
    print y
'

Chris Down's solution is better for finding a single value to (unique) keys at any level.

With my second example that prints out multiple values, you are hitting the limits of what you should try with a one liner, at that point I see little reason why to do half of the processing in bash, and would move to a complete Python solution.

Solution 2

Another way to do this in bash is using jshon. Here is a solution to your problem using jshon:

$ jshon -e hits -e hits -a -e _source -e memberId -u < foo.json
0x7b93910446f91928e23e1043dfdf5bcf
0x7b93910446f91928e23e1043dfdf5bcG

The -e options extract values from the json. The -a iterates over the array and the -u decodes the final string.

Solution 3

Well, your key is quite clearly not at the root of the object. Try something like this:

json_key() {
    python -c '
import json
import sys

data = json.load(sys.stdin)

for key in sys.argv[1:]:
    try:
        data = data[key]
    except TypeError:  # This is a list index
        data = data[int(key)]

print(data)' "$@"
}

This has the benefit of not just simply injecting syntax into Python, which could cause breakage (or worse, arbitrary code execution).

You can then call it like this:

json_key hits hits 0 _source memberId < members.json

Solution 4

Another alternative is jq:

$ cat members.json | jq -r '.hits|.hits|.[]|._source|.memberId'
0x7b93910446f91928e23e1043dfdf5bcf
0x7b93910446f91928e23e1043dfdf5bcG

Solution 5

Try this:

$ cat json.txt | python -c 'import sys; import simplejson as json; \
print "\n".join( [i["_source"]["memberId"] for i in json.loads( sys.stdin.read() )["hits"]["hits"]] )'


If you already have pretty printed json, why don't you just grep it?
$ cat json.txt | grep memberId
               "memberId": "0x7b93910446f91928e23e1043dfdf5bcf",
               "memberId": "0x7b93910446f91928e23e1043dfdf5bcG",

You can always get a pretty printed format with simplejson python to grep it.

# cat json_raw.txt
{"hits": {"hits": [{"_score": 1, "_type": "Medical", "_id": "02:17447847049147026174478:174159", "_source": {"memberLastName": "Dubofsky", "memberMiddleName": "Prayag", "memberId": "0x7b93910446f91928e23e1043dfdf5bcf", "memberFirstName": "Uri"}, "_index": "2000_270_0"}, {"_score": 1, "_type": "Medical", "_id": "02:17447847049147026174478:174159", "_source": {"memberLastName": "Dubofsky", "memberMiddleName": "Prayag", "memberId": "0x7b93910446f91928e23e1043dfdf5bcG", "memberFirstName": "Uri"}, "_index": "2000_270_0"}], "total": 74, "max_score": 1}, "_shards": {"successful": 8, "failed": 0, "total": 8}, "took": 670, "timed_out": false}

Use dumps:

# cat json_raw.txt | python -c 'import sys; import simplejson as json; \
print json.dumps( json.loads( sys.stdin.read() ), sort_keys=True, indent=4); '

{
    "_shards": {
        "failed": 0,
        "successful": 8,
        "total": 8
    },
    "hits": {
        "hits": [
            {
                "_id": "02:17447847049147026174478:174159",
                "_index": "2000_270_0",
                "_score": 1,
                "_source": {
                    "memberFirstName": "Uri",
                    "memberId": "0x7b93910446f91928e23e1043dfdf5bcf",
                    "memberLastName": "Dubofsky",
                    "memberMiddleName": "Prayag"
                },
                "_type": "Medical"
            },
            {
                "_id": "02:17447847049147026174478:174159",
                "_index": "2000_270_0",
                "_score": 1,
                "_source": {
                    "memberFirstName": "Uri",
                    "memberId": "0x7b93910446f91928e23e1043dfdf5bcG",
                    "memberLastName": "Dubofsky",
                    "memberMiddleName": "Prayag"
                },
                "_type": "Medical"
            }
        ],
        "max_score": 1,
        "total": 74
    },
    "timed_out": false,
    "took": 670
}

Thereafter, simply grep result with 'memberId' pattern.

To be completely precise:

#!/bin/bash

filename="$1"
cat $filename | python -c 'import sys; import simplejson as json; \
print json.dumps( json.loads( sys.stdin.read() ), sort_keys=True, indent=4)' | \
grep memberId | awk '{print $2}' | sed -e 's/^"//g' | sed -e 's/",$//g'

Usage:

$ bash bash.sh json_raw.txt 
0x7b93910446f91928e23e1043dfdf5bcf
0x7b93910446f91928e23e1043dfdf5bcG
Share:
78,981

Related videos on Youtube

prayagupa
Author by

prayagupa

(def summary[] (:TCP/IP-socket-programmer "who loves to send and receive bits and bytes") (:using "bytecode instructions which runs on JVM [clojure, groovy, java12]") (and ([sql, nosql])) (also (did [PHP, CLR] socket programming once upon a time)) (does mobile app programming sometimes in [Android] Platform.) (TDD practitioner)) (def resume[] {:stackoverflow_careers "http://careers.stackoverflow.com/prayagupd" ))

Updated on September 18, 2022

Comments

  • prayagupa
    prayagupa over 1 year

    I have a JSON file members.json as below.

    {
       "took": 670,
       "timed_out": false,
       "_shards": {
          "total": 8,
          "successful": 8,
          "failed": 0
       },
       "hits": {
          "total": 74,
          "max_score": 1,
          "hits": [
             {
                "_index": "2000_270_0",
                "_type": "Medical",
                "_id": "02:17447847049147026174478:174159",
                "_score": 1,
                "_source": {
                   "memberId": "0x7b93910446f91928e23e1043dfdf5bcf",
                   "memberFirstName": "Uri",
                   "memberMiddleName": "Prayag",
                   "memberLastName": "Dubofsky"
                }
             }, 
             {
                "_index": "2000_270_0",
                "_type": "Medical",
                "_id": "02:17447847049147026174478:174159",
                "_score": 1,
                "_source": {
                   "memberId": "0x7b93910446f91928e23e1043dfdf5bcG",
                   "memberFirstName": "Uri",
                   "memberMiddleName": "Prayag",
                   "memberLastName": "Dubofsky"
                }
             }
          ]
       }
    }
    

    I want to parse it using bash script get only the list of field memberId.

    The expected output is:

    memberIds
    ----------- 
    0x7b93910446f91928e23e1043dfdf5bcf
    0x7b93910446f91928e23e1043dfdf5bcG
    

    I tried adding following bash+python code to .bashrc:

    function getJsonVal() {
       if [ \( $# -ne 1 \) -o \( -t 0 \) ]; then
           echo "Usage: getJsonVal 'key' < /tmp/file";
           echo "   -- or -- ";
           echo " cat /tmp/input | getJsonVal 'key'";
           return;
       fi;
       cat | python -c 'import json,sys;obj=json.load(sys.stdin);print obj["'$1'"]';
    }
    

    And then called:

    $ cat members.json | getJsonVal "memberId"
    

    But it throws:

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    KeyError: 'memberId'
    

    Reference

    https://stackoverflow.com/a/21595107/432903

    • Admin
      Admin about 10 years
      Why do you need to do this in bash? you are clearly using python here so why not just create a python script that does the job? You might not get actual answers on how to do it with bash because when you need to do that much you use another language.
    • Admin
      Admin about 10 years
      I changed your title from "using bash script" to "using python" since python, and not bash, is what you are using to parse json. E.g., that error is certainly a python error, not a bash error.
    • Admin
      Admin about 10 years
      @goldilocks just because his attempt used python, doesn't mean his goal is to use python
    • Admin
      Admin about 10 years
      @DavidG see my answer. It's not pure shell, it's an external command but it integrates into shell scripts pretty well.
    • Admin
      Admin about 10 years
      Can I suggest you take out most of the irrelevant fields in the json. It suffices to have 2-3 elements in _source to get the gist of what you try to do. The rest just distracts
    • Admin
      Admin about 10 years
      @jonrdanm I stand corrected, that tool seems simple enough that you don't need to switch to another language. I actually think your answer with jshon is the best since it is meant to be used from the shell.
  • clerksx
    clerksx about 10 years
    Note: This will not loop over each item in "hits". If you want that, you should write specific Python code for that instance.
  • prayagupa
    prayagupa about 10 years
    But it shows only one memberId.
  • prayagupa
    prayagupa about 10 years
    Let me install jshon