script to translate xml to json
Solution 1
in fact, you could get away here even w/o Python programming, just using 2 unix utilities:
Thus, assuming your xml is in file.xml
, jtm would convert it to the following json:
bash $ jtm file.xml
[
{
"quiz": [
{
"que": "The question her"
},
{
"ca": "text"
},
{
"ia": "text"
},
{
"ia": "text"
},
{
"ia": "text"
}
]
}
]
bash $
and then, applying a series of JSON transformations, you can arrive to a desired result:
bash $ jtm file.xml | jtc -w'<quiz>l:[1:][-2]' -ei echo { '"answer[-]"': {} }\; -i'<quiz>l:[1:]' | jtc -w'<quiz>l:[-1][:][0]' -w'<quiz>l:[-1][:]' -s | jtc -w'<quiz>l:' -w'<quiz>l:[0]' -s | jtc -w'<quiz>l: <>v' -u'"text"'
[
{
"answer1": "text",
"answer2": "text",
"answer3": "text",
"answer4": "text",
"text": "The question her"
}
]
bash $
Though, due to involved shell scripting (echo
command), it'll be slower than Python's - for 5000 questions, I'd expect it would run around a minute. (In the future version of the jtc
I plan to allow interpolations even in statically specified JSONs, so that for templating no external shell-scripting would be required, then the operations will be blazing fast)
if you're curious about jtc
syntax, you could find a user guide here: https://github.com/ldn-softdev/jtc/blob/master/User%20Guide.md
Solution 2
I assume your Ubuntu has python installed
#!/usr/bin/python3
import io
import json
import xml.etree.ElementTree
d = """<quiz>
<que>The question her</que>
<ca>text</ca>
<ia>text</ia>
<ia>text</ia>
<ia>text</ia>
</quiz>
"""
s = io.StringIO(d)
# root = xml.etree.ElementTree.parse("filename_here").getroot()
root = xml.etree.ElementTree.parse(s).getroot()
out = {}
i = 1
for child in root:
name, value = child.tag, child.text
if name == 'que':
name = 'question'
else:
name = 'answer%s' % i
i += 1
out[name] = value
print(json.dumps(out))
save it and chmod
to executable
you can easily modify to take a file as input instead of just text
EDIT Okey, this is a more complete script:
#!/usr/bin/python3
import json
import sys
import xml.etree.ElementTree
def read_file(filename):
root = xml.etree.ElementTree.parse(filename).getroot()
return root
# assule we have a list of <quiz>, contained in some other element
def parse_quiz(quiz_element, out):
i = 1
tmp = {}
for child in quiz_element:
name, value = child.tag, child.text
if name == 'que':
name = 'question'
else:
name = 'answer%s' % i
i += 1
tmp[name] = value
out.append(tmp)
def parse_root(root_element, out):
for child in root_element:
if child.tag == 'quiz':
parse_quiz(child, out)
def convert_xml_to_json(filename):
root = read_file(filename)
out = []
parse_root(root, out)
print(json.dumps(out))
if __name__ == '__main__':
if len(sys.argv) > 1:
convert_xml_to_json(sys.argv[1])
else:
print("Usage: script <filename_with_xml>")
I made a file with following, which I named xmltest
:
<questions>
<quiz>
<que>The question her</que>
<ca>text</ca>
<ia>text</ia>
<ia>text</ia>
<ia>text</ia>
</quiz>
<quiz>
<que>Question number 1</que>
<ca>blabla</ca>
<ia>stuff</ia>
</quiz>
</questions>
So you have a list of quiz
inside some other container.
Now, I launch it like this:
$ chmod u+x scratch.py
, then scratch.py filenamewithxml
This gives me the answer:
$ ./scratch4.py xmltest
[{"answer3": "text", "answer2": "text", "question": "The question her", "answer4": "text", "answer1": "text"}, {"answer2": "stuff", "question": "Question number 1", "answer1": "blabla"}]
Solution 3
The xq
tool from https://kislyuk.github.io/yq/ turns your XML into
{
"quiz": {
"que": "The question her",
"ca": "text",
"ia": [
"text",
"text",
"text"
]
}
}
by just using the identity filter (xq . file.xml
).
We can massage this into something closer the form you want using
xq '.quiz | { text: .que, answers: .ia }' file.xml
which outputs
{
"text": "The question her",
"answers": [
"text",
"text",
"text"
]
}
To fix up the answers
bit so that you get your enumerated keys:
xq '.quiz |
{ text: .que } +
(
[
range(.ia|length) as $i | { key: "answer\($i+1)", value: .ia[$i] }
] | from_entries
)' file.xml
This adds the enumerated answer
keys with the values from the ia
nodes by iterating over the ia
nodes and manually producing a set of keys and values. These are then turned into real key-value pairs using from_entries
and added to the proto-object we've created ({ text: .que }
).
The output:
{
"text": "The question her",
"answer1": "text",
"answer2": "text",
"answer3": "text"
}
If your XML document contains multiple quiz
nodes under some root node, then change .quiz
in the jq
expression above to .[].quiz[]
to transform each of them, and you may want to put the resulting objects into an array:
xq '.[].quiz[] |
[ { text: .que } +
(
[
range(.ia|length) as $i | { key: "answer\($i+1)", value: .ia[$i] }
] | from_entries
) ]' file.xml
Solution 4
Here ruby answer is.
script xml2json.rb
:
require 'nokogiri'
require 'json'
in_doc = File.open(ARGV[0]) do |f|
Nokogiri::XML(f)
end
for quiz in in_doc.xpath('/contents/quiz') do
result = {
text: quiz.xpath('que').text,
answer1: quiz.xpath('ca').text
}
quiz.xpath('ia').each_with_index do |ia, ix|
result["answer#{ix + 2}"] = ia.text
end
print result.to_json, ",\n"
end
input quiz.xml
:
<contents>
<quiz>
<que>The question her</que>
<ca>text a1</ca>
<ia>text a2</ia>
<ia>text a3</ia>
<ia>text a4</ia>
</quiz>
<quiz>
<que>The question foo</que>
<ca>text b1</ca>
<ia>text b2</ia>
<ia>text b3</ia>
<ia>text b4</ia>
</quiz>
</contents>
execution:
$ ./xml2json.rb quiz.xml
Related videos on Youtube
Silver
Updated on September 18, 2022Comments
-
Silver over 1 year
I have 5000 questions in txt file like this:
<quiz> <que>The question her</que> <ca>text</ca> <ia>text</ia> <ia>text</ia> <ia>text</ia> </quiz>
I want to write a script in Ubuntu to convert all questions like this:
{ "text":"The question her", "answer1":"text", "answer2":"text", "answer3":"text", "answer4":"text" },
-
dgan about 5 yearsthat's better be done in python perl etc.. than in pure bash. so OS is irrelevant, it's more stack overflow question
-
Silver about 5 yearsyes it is - Ark
-
Stéphane Chazelas about 5 yearsWhat kind of assumptions can you make on the input? Are questions and answers always on separate lines, are the
<quiz>
and</quiz>
tags always on their own on a line. Could the text have some encoding (CDATA
,{
,é
....), could it contain double quotes or backslashes? Is the XML file encoded in UTF-8? -
Silver about 5 yearspatch = small file contain code and work by double click in ubuntu.
-
Silver about 5 yearssorry I mean script not patch- thank you Ark.
-
roaima about 5 yearsIf it's XML and you have 5000 questions, there must be a higher level element in the source (
<quizzes/>
, for example?) and a corresponding[ ... ]
in the output JSON. Please show that in your question.
-
-
Stéphane Chazelas about 5 yearsThat only works if the input has just one
<quiz>
-
Victor Yarema about 4 yearsYour solution looks really good. I didn't have time to check your concern regarding performance. But I'm curious: isn't it possible to replace multiple
echo
invocations with piping throughsed
?