Converting CSV to JSON in bash
Solution 1
The right tool for this job is jq
.
jq -Rsn '
{"occurrences":
[inputs
| . / "\n"
| (.[] | select(length > 0) | . / ";") as $input
| {"position": [$input[0], $input[1]], "taxo": {"espece": $input[2]}}]}
' <se.csv
emits, given your input:
{
"occurences": [
{
"position": [
"-21.3214077",
"55.4851413"
],
"taxo": {
"espece": "Ruizia cordata"
}
},
{
"position": [
"-21.3213078",
"55.4849803"
],
"taxo": {
"espece": "Cossinia pinnata"
}
}
]
}
By the way, a less-buggy version of your original script might look like:
#!/usr/bin/env bash
items=( )
while IFS=';' read -r lat long pos _; do
printf -v item '{ "position": [%s, %s], "taxo": {"espece": "%s"}}' "$lat" "$long" "$pos"
items+=( "$item" )
done <se.csv
IFS=','
printf '{"occurrences": [%s]}\n' "${items[*]}"
Note:
- There's absolutely no point using
cat
to pipe into a loop (and good reasons not to); thus, we're using a redirection (<
) to open the file directly as the loop's stdin. -
read
can be passed a list of destination variables; there's thus no need to read into an array (or first to read into a string, and then to generate a heresting and to read from that into an array). The_
at the end ensures that extra columns are discarded (by putting them into the dummy variable named_
) rather than appended topos
. -
"${array[*]}"
generates a string by concatenating elements ofarray
with the character inIFS
; we can thus use this to ensure that commas are present in the output only when they're needed. -
printf
is used in preference toecho
, as advised in the APPLICATION USAGE section of the specification forecho
itself. - This is still inherently buggy since it's generating JSON via string concatenation. Don't use it.
Solution 2
Here's a python one-liner/script that'll do the trick:
cat my.csv | python -c 'import csv, json, sys; print(json.dumps([dict(r) for r in csv.DictReader(sys.stdin)]))'
Solution 3
The accepted answer uses jq
to parse the input. This works but jq
doesn't handle escapes i.e. input from a CSV produced from Excel or similar tools is quoted like this:
foo,"bar,baz",gaz
will result in the incorrect output, as jq will see 4 fields, not 3.
One option is to use tab-separated values instead of comma (as long as your input data doesn't contain tabs!), along with the accepted answer.
Another option is to combine your tools, and use the best tool for each part: a CSV parser for reading the input and turning it into JSON, and jq
for transforming the JSON into the target format.
The python-based csvkit will intelligently parse the CSV, and comes with a tool csvjson
which will do a much better job of turning the CSV into JSON. This can then be piped through jq to convert the flat JSON output by csvkit into the target form.
With the data provided by the OP, for the desired output, this as as simple as:
csvjson --no-header-row |
jq '.[] | {occurrences: [{ position: [.a, .b], taxo: {espece: .c}}]}'
Note that csvjson automatically detects ;
as the delimiter, and without a header row in the input, assigns the json keys as a
, b
, and c
.
The same also applies to writing to CSV files -- csvkit
can read a JSON array or new-line delimited JSON, and intelligently output a CSV via in2csv
.
Solution 4
Here is an article on the subject: https://infiniteundo.com/post/99336704013/convert-csv-to-json-with-jq
It also uses JQ, but a bit different approach using split()
and map()
.
jq --slurp --raw-input \
'split("\n") | .[1:] | map(split(";")) |
map({
"position": [.[0], .[1]],
"taxo": {
"espece": .[2]
}
})' \
input.csv > output.json
It doesn't handle separator escaping, though.
Solution 5
John Kerl's Miller tool has this built-in:
mlr --c2j --jlistwrap cat INPUT.csv > OUTPUT.json
HydrUra
Updated on December 02, 2021Comments
-
HydrUra over 2 years
Trying to convert a CSV file into a JSON
Here is two sample lines :
-21.3214077;55.4851413;Ruizia cordata -21.3213078;55.4849803;Cossinia pinnata
I would like to get something like :
"occurrences": [ { "position": [-21.3214077, 55.4851413], "taxo": { "espece": "Ruizia cordata" }, ... }]
Here is my script :
echo '"occurences": [ ' cat se.csv | while read -r line do IFS=';' read -r -a array <<< $line; echo -n -e '{ "position": [' ${array[0]} echo -n -e ',' ${array[1]} ']' echo -e ', "taxo": {"espece":"' ${array[2]} '"' done echo "]";
I get really strange results :
"occurences": [ ""position": [ -21.3214077, 55.4851413 ], "taxo": {"espece":" Ruizia cordata ""position": [ -21.3213078, 55.4849803 ], "taxo": {"espece":" Cossinia pinnata
What is wrong with my code ?
-
HydrUra almost 7 yearsThx, didn't know jq. But I cannot figure it out how to input my CSV. What's $s at the end of your line ?
-
Charles Duffy almost 7 yearsOh -- that was reading from a string, not a file. Sorry 'bout that, left it in from testing.
-
Charles Duffy almost 7 yearsactually, I edited that out a while ago -- could you refresh to be sure you're seeing the current version of the answer?
-
Charles Duffy over 5 yearsThis is a good general approach! Perhaps you might edit to work with the OP's data structure? (Or would a 3rd-party edit doing so be welcome?)
-
febot over 5 years@CharlesDuffy, I gave it a shot, but not tested - feel free to fix/improve.
-
Charles Duffy over 5 yearsNeeded some minor tweaks -- changing from ',' to ';' as the separator, changing
".[3]"
to.[2]
; and--raw-output
wasn't serving any purpose (it's ignored when output isn't a string). -
Charles Duffy over 5 yearsAlso, the
.[1:]
(skipping the first line) is only appropriate if input has a header; that was true in the blog post, but I'm not sure it's true here. -
febot over 5 years@CharlesDuffy, how would you parse the headers and then made the map automatically? Imagine you have different CSV files with different columns and want the JSON object keys derived from the header. Does
jq
have some kind of variables? Or perhaps an extra call to `jq ... .[:1] to fill a Bash array, somehow? -
Charles Duffy over 5 yearsYes, jq does have variables.
-
Daniel C. Sobral almost 5 yearsThis fails if the last field is empty.
-
Daniel C. Sobral almost 5 yearsThe bug is on
_do_finalize
, in the case where the last character is a delimiter. In that case, instead of saving.[2]
, it discards it. Replacing it with something like delimiter on_do_next_value
fixes it. -
Greg over 4 yearsthat is some beautiful stuff, thanks... i like the use of inputs there.
-
Tobias J over 4 yearsPerfect, thanks! I knew there had to be a better way than parsing CSV with
jq
! -
btk about 3 yearsThis should be the accepted solution -- it does the trick for any and all payloads. The jq version is a one-off and requires painstakingly matching the schema
-
Abraham Labkovsky almost 3 yearsI love this idea. I modified slightly to write to file...
cat in.csv | python -c 'import csv, json, sys; f = open("out.json", "x"); f.write(json.dumps([dict(r) for r in csv.DictReader(sys.stdin)])); f.close()'
-
Nirmalya over 2 yearsI love 'jq' but this is really nice, at least for converting a column-header carrying CSV to JSON. @richardkmiller
-
K14 over 2 yearsto write it to a file just add
| > filename.json
at the end. Like this:cat my.csv | python -c 'import csv, json, sys; print(json.dumps([dict(r) for r in csv.DictReader(sys.stdin)]))' | > my.json
-
tink over 2 years@btk, no, it shouldn't ... it doesn't address the need to named entities in the json output at all. It may do the right thing if the CVS has named headers (I don't have the data or time to create it to verify that it would).
-
Fravadona over 2 years@tink it works when all headers are present only.
-
tink over 2 yearsthanks for proving my point that this shouldn't be the accepted answer then, @Fravadona :)
-
btk over 2 years@tink imo it's way easier to add headers to the .csv file than futz around with a complicated jq query
-
cherryblossom about 2 yearsI don't think the
dict
is needed and you could just dolist(csv.DictReader(sys.stdin))
instead. -
Marcos Roberto Silva about 2 yearsIn my opinion
csvjson
is the best approach for this since it also can infer the data types, and not treat numbers as string, avoiding to add double quotes in it. -
Chloe Sun about 2 years@K14 somehow adding the output part gave me an error " Broken pipe"