Parse JSON using Python?
Solution 1
If you would use:
$ cat members.json | \
python -c 'import json,sys;obj=json.load(sys.stdin);print obj;'
you can inspect the structure of the nested dictonary obj
and see that your original line should read:
$ cat members.json | \
python -c 'import json,sys;obj=json.load(sys.stdin);print obj["hits"]["hits"][0]["_source"]["'$1'"]';
to the to that "memberId" element. This way you can keep the Python as a oneliner.
If there are multiple elements in the nested "hits" element, then you can do something like:
$ cat members.json | \
python -c '
import json, sys
obj=json.load(sys.stdin)
for y in [x["_source"]["'$1'"] for x in obj["hits"]["hits"]]:
print y
'
Chris Down's solution is better for finding a single value to (unique) keys at any level.
With my second example that prints out multiple values, you are hitting the limits of what you should try with a one liner, at that point I see little reason why to do half of the processing in bash, and would move to a complete Python solution.
Solution 2
Another way to do this in bash is using jshon. Here is a solution to your problem using jshon
:
$ jshon -e hits -e hits -a -e _source -e memberId -u < foo.json
0x7b93910446f91928e23e1043dfdf5bcf
0x7b93910446f91928e23e1043dfdf5bcG
The -e
options extract values from the json. The -a
iterates over the array and the -u
decodes the final string.
Solution 3
Well, your key is quite clearly not at the root of the object. Try something like this:
json_key() {
python -c '
import json
import sys
data = json.load(sys.stdin)
for key in sys.argv[1:]:
try:
data = data[key]
except TypeError: # This is a list index
data = data[int(key)]
print(data)' "$@"
}
This has the benefit of not just simply injecting syntax into Python, which could cause breakage (or worse, arbitrary code execution).
You can then call it like this:
json_key hits hits 0 _source memberId < members.json
Solution 4
Another alternative is jq:
$ cat members.json | jq -r '.hits|.hits|.[]|._source|.memberId'
0x7b93910446f91928e23e1043dfdf5bcf
0x7b93910446f91928e23e1043dfdf5bcG
Solution 5
Try this:
$ cat json.txt | python -c 'import sys; import simplejson as json; \
print "\n".join( [i["_source"]["memberId"] for i in json.loads( sys.stdin.read() )["hits"]["hits"]] )'
If you already have
pretty printed
json, why don't you just grep
it?
$ cat json.txt | grep memberId
"memberId": "0x7b93910446f91928e23e1043dfdf5bcf",
"memberId": "0x7b93910446f91928e23e1043dfdf5bcG",
You can always get a pretty printed format with simplejson python to grep
it.
# cat json_raw.txt
{"hits": {"hits": [{"_score": 1, "_type": "Medical", "_id": "02:17447847049147026174478:174159", "_source": {"memberLastName": "Dubofsky", "memberMiddleName": "Prayag", "memberId": "0x7b93910446f91928e23e1043dfdf5bcf", "memberFirstName": "Uri"}, "_index": "2000_270_0"}, {"_score": 1, "_type": "Medical", "_id": "02:17447847049147026174478:174159", "_source": {"memberLastName": "Dubofsky", "memberMiddleName": "Prayag", "memberId": "0x7b93910446f91928e23e1043dfdf5bcG", "memberFirstName": "Uri"}, "_index": "2000_270_0"}], "total": 74, "max_score": 1}, "_shards": {"successful": 8, "failed": 0, "total": 8}, "took": 670, "timed_out": false}
Use dumps:
# cat json_raw.txt | python -c 'import sys; import simplejson as json; \
print json.dumps( json.loads( sys.stdin.read() ), sort_keys=True, indent=4); '
{
"_shards": {
"failed": 0,
"successful": 8,
"total": 8
},
"hits": {
"hits": [
{
"_id": "02:17447847049147026174478:174159",
"_index": "2000_270_0",
"_score": 1,
"_source": {
"memberFirstName": "Uri",
"memberId": "0x7b93910446f91928e23e1043dfdf5bcf",
"memberLastName": "Dubofsky",
"memberMiddleName": "Prayag"
},
"_type": "Medical"
},
{
"_id": "02:17447847049147026174478:174159",
"_index": "2000_270_0",
"_score": 1,
"_source": {
"memberFirstName": "Uri",
"memberId": "0x7b93910446f91928e23e1043dfdf5bcG",
"memberLastName": "Dubofsky",
"memberMiddleName": "Prayag"
},
"_type": "Medical"
}
],
"max_score": 1,
"total": 74
},
"timed_out": false,
"took": 670
}
Thereafter, simply grep
result with 'memberId' pattern.
To be completely precise:
#!/bin/bash
filename="$1"
cat $filename | python -c 'import sys; import simplejson as json; \
print json.dumps( json.loads( sys.stdin.read() ), sort_keys=True, indent=4)' | \
grep memberId | awk '{print $2}' | sed -e 's/^"//g' | sed -e 's/",$//g'
Usage:
$ bash bash.sh json_raw.txt
0x7b93910446f91928e23e1043dfdf5bcf
0x7b93910446f91928e23e1043dfdf5bcG
Related videos on Youtube
prayagupa
(def summary[] (:TCP/IP-socket-programmer "who loves to send and receive bits and bytes") (:using "bytecode instructions which runs on JVM [clojure, groovy, java12]") (and ([sql, nosql])) (also (did [PHP, CLR] socket programming once upon a time)) (does mobile app programming sometimes in [Android] Platform.) (TDD practitioner)) (def resume[] {:stackoverflow_careers "http://careers.stackoverflow.com/prayagupd" ))
Updated on September 18, 2022Comments
-
prayagupa over 1 year
I have a JSON file
members.json
as below.{ "took": 670, "timed_out": false, "_shards": { "total": 8, "successful": 8, "failed": 0 }, "hits": { "total": 74, "max_score": 1, "hits": [ { "_index": "2000_270_0", "_type": "Medical", "_id": "02:17447847049147026174478:174159", "_score": 1, "_source": { "memberId": "0x7b93910446f91928e23e1043dfdf5bcf", "memberFirstName": "Uri", "memberMiddleName": "Prayag", "memberLastName": "Dubofsky" } }, { "_index": "2000_270_0", "_type": "Medical", "_id": "02:17447847049147026174478:174159", "_score": 1, "_source": { "memberId": "0x7b93910446f91928e23e1043dfdf5bcG", "memberFirstName": "Uri", "memberMiddleName": "Prayag", "memberLastName": "Dubofsky" } } ] } }
I want to parse it using
bash
script get only the list of fieldmemberId
.The expected output is:
memberIds ----------- 0x7b93910446f91928e23e1043dfdf5bcf 0x7b93910446f91928e23e1043dfdf5bcG
I tried adding following bash+python code to
.bashrc
:function getJsonVal() { if [ \( $# -ne 1 \) -o \( -t 0 \) ]; then echo "Usage: getJsonVal 'key' < /tmp/file"; echo " -- or -- "; echo " cat /tmp/input | getJsonVal 'key'"; return; fi; cat | python -c 'import json,sys;obj=json.load(sys.stdin);print obj["'$1'"]'; }
And then called:
$ cat members.json | getJsonVal "memberId"
But it throws:
Traceback (most recent call last): File "<string>", line 1, in <module> KeyError: 'memberId'
Reference
-
Admin about 10 yearsWhy do you need to do this in bash? you are clearly using python here so why not just create a python script that does the job? You might not get actual answers on how to do it with bash because when you need to do that much you use another language.
-
Admin about 10 yearsI changed your title from "using bash script" to "using python" since
python
, and notbash
, is what you are using to parse json. E.g., that error is certainly a python error, not a bash error. -
Admin about 10 years@goldilocks just because his attempt used
python
, doesn't mean his goal is to usepython
-
Admin about 10 years@DavidG see my answer. It's not pure shell, it's an external command but it integrates into shell scripts pretty well.
-
Admin about 10 yearsCan I suggest you take out most of the irrelevant fields in the json. It suffices to have 2-3 elements in _source to get the gist of what you try to do. The rest just distracts
-
Admin about 10 years@jonrdanm I stand corrected, that tool seems simple enough that you don't need to switch to another language. I actually think your answer with
jshon
is the best since it is meant to be used from the shell.
-
-
clerksx about 10 yearsNote: This will not loop over each item in "hits". If you want that, you should write specific Python code for that instance.
-
prayagupa about 10 yearsBut it shows only one memberId.
-
prayagupa about 10 yearsLet me install jshon