Converting CSV to JSON in bash

36,250

Solution 1

The right tool for this job is jq.

jq -Rsn '
  {"occurrences":
    [inputs
     | . / "\n"
     | (.[] | select(length > 0) | . / ";") as $input
     | {"position": [$input[0], $input[1]], "taxo": {"espece": $input[2]}}]}
' <se.csv

emits, given your input:

{
  "occurences": [
    {
      "position": [
        "-21.3214077",
        "55.4851413"
      ],
      "taxo": {
        "espece": "Ruizia cordata"
      }
    },
    {
      "position": [
        "-21.3213078",
        "55.4849803"
      ],
      "taxo": {
        "espece": "Cossinia pinnata"
      }
    }
  ]
}

By the way, a less-buggy version of your original script might look like:

#!/usr/bin/env bash

items=( )
while IFS=';' read -r lat long pos _; do
  printf -v item '{ "position": [%s, %s], "taxo": {"espece": "%s"}}' "$lat" "$long" "$pos"
  items+=( "$item" )
done <se.csv

IFS=','
printf '{"occurrences": [%s]}\n' "${items[*]}"

Note:

  • There's absolutely no point using cat to pipe into a loop (and good reasons not to); thus, we're using a redirection (<) to open the file directly as the loop's stdin.
  • read can be passed a list of destination variables; there's thus no need to read into an array (or first to read into a string, and then to generate a heresting and to read from that into an array). The _ at the end ensures that extra columns are discarded (by putting them into the dummy variable named _) rather than appended to pos.
  • "${array[*]}" generates a string by concatenating elements of array with the character in IFS; we can thus use this to ensure that commas are present in the output only when they're needed.
  • printf is used in preference to echo, as advised in the APPLICATION USAGE section of the specification for echo itself.
  • This is still inherently buggy since it's generating JSON via string concatenation. Don't use it.

Solution 2

Here's a python one-liner/script that'll do the trick:

cat my.csv | python -c 'import csv, json, sys; print(json.dumps([dict(r) for r in csv.DictReader(sys.stdin)]))'

Solution 3

The accepted answer uses jq to parse the input. This works but jq doesn't handle escapes i.e. input from a CSV produced from Excel or similar tools is quoted like this:

foo,"bar,baz",gaz

will result in the incorrect output, as jq will see 4 fields, not 3.

One option is to use tab-separated values instead of comma (as long as your input data doesn't contain tabs!), along with the accepted answer.

Another option is to combine your tools, and use the best tool for each part: a CSV parser for reading the input and turning it into JSON, and jq for transforming the JSON into the target format.

The python-based csvkit will intelligently parse the CSV, and comes with a tool csvjson which will do a much better job of turning the CSV into JSON. This can then be piped through jq to convert the flat JSON output by csvkit into the target form.

With the data provided by the OP, for the desired output, this as as simple as:

csvjson --no-header-row  |
  jq '.[] | {occurrences: [{ position: [.a, .b], taxo: {espece: .c}}]}'

Note that csvjson automatically detects ; as the delimiter, and without a header row in the input, assigns the json keys as a, b, and c.

The same also applies to writing to CSV files -- csvkit can read a JSON array or new-line delimited JSON, and intelligently output a CSV via in2csv.

Solution 4

Here is an article on the subject: https://infiniteundo.com/post/99336704013/convert-csv-to-json-with-jq

It also uses JQ, but a bit different approach using split() and map().

jq --slurp --raw-input \
   'split("\n") | .[1:] | map(split(";")) |
      map({
         "position": [.[0], .[1]],
         "taxo": {
             "espece": .[2]
          }
      })' \
  input.csv > output.json

It doesn't handle separator escaping, though.

Solution 5

John Kerl's Miller tool has this built-in:

mlr --c2j --jlistwrap cat INPUT.csv > OUTPUT.json
Share:
36,250
HydrUra
Author by

HydrUra

Updated on December 02, 2021

Comments

  • HydrUra
    HydrUra over 2 years

    Trying to convert a CSV file into a JSON

    Here is two sample lines :

    -21.3214077;55.4851413;Ruizia cordata
    -21.3213078;55.4849803;Cossinia pinnata
    

    I would like to get something like :

    "occurrences": [
                     {
                    "position": [-21.3214077, 55.4851413],
                    "taxo": {
                        "espece": "Ruizia cordata"
                     },
                     ...
                 }]
    

    Here is my script :

        echo '"occurences": [ '
    
    cat se.csv | while read -r line
      do
          IFS=';' read -r -a array <<< $line;
          echo -n -e '{ "position": [' ${array[0]}
          echo -n -e ',' ${array[1]} ']'
          echo -e ', "taxo": {"espece":"' ${array[2]} '"'
    done
    echo "]";
    

    I get really strange results :

       "occurences": [ 
     ""position": [ -21.3214077, 55.4851413 ], "taxo": {"espece":" Ruizia cordata
     ""position": [ -21.3213078, 55.4849803 ], "taxo": {"espece":" Cossinia pinnata
    

    What is wrong with my code ?

  • HydrUra
    HydrUra almost 7 years
    Thx, didn't know jq. But I cannot figure it out how to input my CSV. What's $s at the end of your line ?
  • Charles Duffy
    Charles Duffy almost 7 years
    Oh -- that was reading from a string, not a file. Sorry 'bout that, left it in from testing.
  • Charles Duffy
    Charles Duffy almost 7 years
    actually, I edited that out a while ago -- could you refresh to be sure you're seeing the current version of the answer?
  • Charles Duffy
    Charles Duffy over 5 years
    This is a good general approach! Perhaps you might edit to work with the OP's data structure? (Or would a 3rd-party edit doing so be welcome?)
  • febot
    febot over 5 years
    @CharlesDuffy, I gave it a shot, but not tested - feel free to fix/improve.
  • Charles Duffy
    Charles Duffy over 5 years
    Needed some minor tweaks -- changing from ',' to ';' as the separator, changing ".[3]" to .[2]; and --raw-output wasn't serving any purpose (it's ignored when output isn't a string).
  • Charles Duffy
    Charles Duffy over 5 years
    Also, the .[1:] (skipping the first line) is only appropriate if input has a header; that was true in the blog post, but I'm not sure it's true here.
  • febot
    febot over 5 years
    @CharlesDuffy, how would you parse the headers and then made the map automatically? Imagine you have different CSV files with different columns and want the JSON object keys derived from the header. Does jq have some kind of variables? Or perhaps an extra call to `jq ... .[:1] to fill a Bash array, somehow?
  • Charles Duffy
    Charles Duffy over 5 years
    Yes, jq does have variables.
  • Daniel C. Sobral
    Daniel C. Sobral almost 5 years
    This fails if the last field is empty.
  • Daniel C. Sobral
    Daniel C. Sobral almost 5 years
    The bug is on _do_finalize, in the case where the last character is a delimiter. In that case, instead of saving .[2], it discards it. Replacing it with something like delimiter on _do_next_value fixes it.
  • Greg
    Greg over 4 years
    that is some beautiful stuff, thanks... i like the use of inputs there.
  • Tobias J
    Tobias J over 4 years
    Perfect, thanks! I knew there had to be a better way than parsing CSV with jq!
  • btk
    btk about 3 years
    This should be the accepted solution -- it does the trick for any and all payloads. The jq version is a one-off and requires painstakingly matching the schema
  • Abraham Labkovsky
    Abraham Labkovsky almost 3 years
    I love this idea. I modified slightly to write to file... cat in.csv | python -c 'import csv, json, sys; f = open("out.json", "x"); f.write(json.dumps([dict(r) for r in csv.DictReader(sys.stdin)])); f.close()'
  • Nirmalya
    Nirmalya over 2 years
    I love 'jq' but this is really nice, at least for converting a column-header carrying CSV to JSON. @richardkmiller
  • K14
    K14 over 2 years
    to write it to a file just add | > filename.json at the end. Like this: cat my.csv | python -c 'import csv, json, sys; print(json.dumps([dict(r) for r in csv.DictReader(sys.stdin)]))' | > my.json
  • tink
    tink over 2 years
    @btk, no, it shouldn't ... it doesn't address the need to named entities in the json output at all. It may do the right thing if the CVS has named headers (I don't have the data or time to create it to verify that it would).
  • Fravadona
    Fravadona over 2 years
    @tink it works when all headers are present only.
  • tink
    tink over 2 years
    thanks for proving my point that this shouldn't be the accepted answer then, @Fravadona :)
  • btk
    btk over 2 years
    @tink imo it's way easier to add headers to the .csv file than futz around with a complicated jq query
  • cherryblossom
    cherryblossom about 2 years
    I don't think the dict is needed and you could just do list(csv.DictReader(sys.stdin)) instead.
  • Marcos Roberto Silva
    Marcos Roberto Silva about 2 years
    In my opinion csvjson is the best approach for this since it also can infer the data types, and not treat numbers as string, avoiding to add double quotes in it.
  • Chloe Sun
    Chloe Sun about 2 years
    @K14 somehow adding the output part gave me an error " Broken pipe"