Creating bash script to parse xml file to csv
10,799
Solution 1
You've posted a query similar to your pervious one. I'd again suggest using a XML parser. You could say:
xmlstarlet sel -t -m //List/Job -v @name -o "|" -v @id -n file.xml
It would return
John|1
Zack|2
Bob|3
for your sample data.
Pipe the output to sed
: sed "s/|/\t| /"
if you want it to appear as in your example.
Solution 2
Extending xmlstarlet approach:
Given this xml file as input:
<DATA>
<RECORD>
<NAME>John</NAME>
<SURNAME>Smith</SURNAME>
<CONTACTS>
"Smith" LTD,
London, Mtg Str, 12,
UK
</CONTACTS>
</RECORD>
</DATA>
And this script:
xmlstarlet sel -e utf-8 -t \
-o "NAME, SURNAME, CONTACTS" -n \
-m //DATA/RECORD \
-o "\"" \
-v $"str:replace(normalize-space(NAME), '\"', '\"\"')" -o "\",\"" \
-v $"str:replace(normalize-space(SURNAME), '\"', '\"\"')" -o "\",\"" \
-v $"str:replace(normalize-space(CONTACTS), '\"', '\"\"')" -o "\",\"" \
-o "\"" \
-n file.xml
You'll have the following output:
NAME, SURNAME, CONTACTS
"John", "Smith", """Smith"" LTD, London, Mtg Str, 12, UK"
Solution 3
Try something like this
#!/bin/bash
while read -r line; do
[[ $line =~ "name=\""(.*)"\"" ]] && name="${BASH_REMATCH[1]}" && [[ $line =~ "Job id=\""([^\"]+) ]] && echo "$name | ${BASH_REMATCH[1]}"
done < file
The line with John
is malformed. With it fixed, example output
John | 1
Zack | 2
Bob | 3
Solution 4
Using sed
sed -nr 's/.*id=\"([0-9]*)\"[^\"]*\"(\w*).*/\2 | \1/p' file
Additional, base on BroSlow's cript, I merge the options.
#!/bin/bash
while read -r line; do
[[ $line =~ id=\"([0-9]+).*name=\"([^\"|/]*) ]] && echo "${BASH_REMATCH[2]} | ${BASH_REMATCH[1]}"
done < file
Author by
user3259914
Updated on June 04, 2022Comments
-
user3259914 almost 2 years
I'm trying to create a bash script to parse an xml file and save it to a csv file.
For example:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <List> <Job id="1" name="John/> <Job id="2" name="Zack"/> <Job id="3" name="Bob"/> </List>
I would like the script to save information into a csv file as such:
John | 1 Zack | 2 Bob | 3
The name and id will be in a different cell.
Is there any way I can do this?
-
Reinstate Monica Please over 10 yearsMight have just edited the old question (stackoverflow.com/q/21495533/3076724) rather than posting a new one, but you should definitely at least link to it when posting similar questions.
-
Vanuan over 6 yearsDuplicate: stackoverflow.com/questions/14368347/…
-
-
BMW over 10 yearsin this instance
name="John/>
, there is no double quota after John, so recommend to replace[[ $line =~ "name=\""(.*)"\"" ]]
to[[ $line =~ "name=\""([^\"|/]*) ]]
-
Reinstate Monica Please over 10 years@BMW Thanks. I assumed it shouldn't be malformed xml, but if it is could do that or something like
([A-Za-z]*)
-
Dominik about 8 yearsdude, can u elaborate on that short script? I am quite confused. :) nevertheless its looking crazy good.
-
Diego1974 over 4 yearsThis is a good solution, and elegant. Just I got: compilation error: element with-param XSLT-with-param: Failed to compile select expression 'str:replace' because of unclosed parenthesis in normalize-space call; should read "str:replace(normalize-space(NAME) , '\"', '\"\"')"
-
Neek about 2 yearsThanks for this. Anyone else extracting URLs from XML may find the
&
isn't escaped. Fix this by adding-T
after thesel
command, e.g.xmlstarlet sel -T -e utf-8......
(see stackoverflow.com/questions/46255304/…)