Parse multiline JSON with grok in logstash

json elasticsearch logstash logstash-grok

20,512

Solution 1

I think I found a working answer to my problem. I am not sure if it's a clean solution, but it helps parse multiline JSONs of the type above.

input 
{   
    file 
    {
        codec => multiline
        {
            pattern => '^\{'
            negate => true
            what => previous                
        }
        path => ["/opt/mount/ELK/json/*.json"]
        start_position => "beginning"
        sincedb_path => "/dev/null"
        exclude => "*.gz"
    }
}

filter 
{
    mutate
    {
        replace => [ "message", "%{message}}" ]
        gsub => [ 'message','\n','']
    }
    if [message] =~ /^{.*}$/ 
    {
        json { source => message }
    }

}

output 
{ 
    stdout { codec => rubydebug }
}

My mutliline codec doesn't handle the last brace and therefore it doesn't appear as a JSON to json { source => message }. Hence the mutate filter:

replace => [ "message", "%{message}}" ]

That adds the missing brace. and the

gsub => [ 'message','\n','']

removes the \n characters that are introduced. At the end of it, I have a one-line JSON that can be read by json { source => message }

If there's a cleaner/easier way to convert the original multi-line JSON to a one-line JSON, please do POST as I feel the above isn't too clean.

Solution 2

You will need to use a multiline codec.

input {
  file {
    codec => multiline {
        pattern => '^{'
        negate => true
        what => previous
    }
    path => ['/opt/mount/ELK/json/mytestjson.json']
  }
}
filter {
  json {
    source => message
    remove_field => message
  }
}

The problem you will run into has to do with the last event in the file. It won't show up till there is another event in the file (so basically you'll lose the last event in a file) -- you could append a single { to the file before it gets rotated to deal with that situation.

20,512

Author by

Joseph

Updated on February 26, 2020

Comments

Joseph about 4 years

I've got a JSON of the format:

{
    "SOURCE":"Source A",
    "Model":"ModelABC",
    "Qty":"3"
}

I'm trying to parse this JSON using logstash. Basically I want the logstash output to be a list of key:value pairs that I can analyze using kibana. I thought this could be done out of the box. From a lot of reading, I understand I must use the grok plugin (I am still not sure what the json plugin is for). But I am unable to get an event with all the fields. I get multiple events (one even for each attribute of my JSON). Like so:

{
       "message" => "  \"SOURCE\": \"Source A\",",
      "@version" => "1",
    "@timestamp" => "2014-08-31T01:26:23.432Z",
          "type" => "my-json",
          "tags" => [
        [0] "tag-json"
    ],
          "host" => "myserver.example.com",
          "path" => "/opt/mount/ELK/json/mytestjson.json"
}
{
       "message" => "  \"Model\": \"ModelABC\",",
      "@version" => "1",
    "@timestamp" => "2014-08-31T01:26:23.438Z",
          "type" => "my-json",
          "tags" => [
        [0] "tag-json"
    ],
          "host" => "myserver.example.com",
          "path" => "/opt/mount/ELK/json/mytestjson.json"
}
{
       "message" => "  \"Qty\": \"3\",",
      "@version" => "1",
    "@timestamp" => "2014-08-31T01:26:23.438Z",
          "type" => "my-json",
          "tags" => [
        [0] "tag-json"
    ],
          "host" => "myserver.example.com",
          "path" => "/opt/mount/ELK/json/mytestjson.json"
}

Should I use the multiline codec or the json_lines codec? If so, how can I do that? Do I need to write my own grok pattern or is there something generic for JSONs that will give me ONE EVENT with key:value pairs that I get for one event above? I couldn't find any documentation that sheds light on this. Any help would be appreciated. My conf file is shown below:

input
{
        file
        {
                type => "my-json"
                path => ["/opt/mount/ELK/json/mytestjson.json"]
                codec => json
                tags => "tag-json"
        }
}

filter
{
   if [type] == "my-json"
   {
        date { locale => "en"  match => [ "RECEIVE-TIMESTAMP", "yyyy-mm-dd HH:mm:ss" ] }
   }
}

output
{
        elasticsearch
        {
                host => localhost
        }
        stdout { codec => rubydebug }
}

Joseph over 9 years

Thanks Alcanzar, I get a JSON parse failure though: [0] "_jsonparsefailure" Tried changing the pattern to pattern => '^\{' but still the same thing. And my file would have only 1 JSON per file i.e. only one { or } char. Each file would be one event (1 file = 1 JSON =1 event)
Alcanzar over 9 years

you may need to add start_postion => beginning to your file input to make sure it starts at the beginning of a record... also is there anything else in your file? (you can remove the filter and just add an output { stdout {} } to see what it's gathering to pass to the json filter)
Joseph over 9 years

I noticed that my production JSON does have additional "{" and "}" :( So my JSON is actually: { "SOURCE":"Source A", "Model":"ModelABC", "Qty":"3" "DESC": "{\"New prod-125\"}" } (sorry doesn't parse well in comments) And I can't make changes to these JSONs. We receive them from another source and I need to consume as is.
Alcanzar over 9 years

You'll have to "fix" the message before you do the json on it. For example, you could use a mutate filter with gsub => [ 'message','\"',''] If you need something more complicated, you could resort to a ruby code filter
Joseph over 9 years

I think it boils down to reducing my multiline JSON (bounded by braces) to one line and then I can apply the filter: if [message] =~ /^{.*}$/ {json { source => message } }. How can I reduce my multiline JSON to one line? I'm no ruby guy, so I can't do that. Any tips? It's strange that I can't find anyone else who's had to parse a multiline JSON