Parse multiline JSON with grok in logstash
Solution 1
I think I found a working answer to my problem. I am not sure if it's a clean solution, but it helps parse multiline JSONs of the type above.
input
{
file
{
codec => multiline
{
pattern => '^\{'
negate => true
what => previous
}
path => ["/opt/mount/ELK/json/*.json"]
start_position => "beginning"
sincedb_path => "/dev/null"
exclude => "*.gz"
}
}
filter
{
mutate
{
replace => [ "message", "%{message}}" ]
gsub => [ 'message','\n','']
}
if [message] =~ /^{.*}$/
{
json { source => message }
}
}
output
{
stdout { codec => rubydebug }
}
My mutliline codec doesn't handle the last brace and therefore it doesn't appear as a JSON to json { source => message }
. Hence the mutate filter:
replace => [ "message", "%{message}}" ]
That adds the missing brace. and the
gsub => [ 'message','\n','']
removes the \n
characters that are introduced. At the end of it, I have a one-line JSON that can be read by json { source => message }
If there's a cleaner/easier way to convert the original multi-line JSON to a one-line JSON, please do POST as I feel the above isn't too clean.
Solution 2
You will need to use a multiline
codec.
input {
file {
codec => multiline {
pattern => '^{'
negate => true
what => previous
}
path => ['/opt/mount/ELK/json/mytestjson.json']
}
}
filter {
json {
source => message
remove_field => message
}
}
The problem you will run into has to do with the last event in the file. It won't show up till there is another event in the file (so basically you'll lose the last event in a file) -- you could append a single {
to the file before it gets rotated to deal with that situation.
Joseph
Updated on February 26, 2020Comments
-
Joseph about 4 years
I've got a JSON of the format:
{ "SOURCE":"Source A", "Model":"ModelABC", "Qty":"3" }
I'm trying to parse this JSON using logstash. Basically I want the logstash output to be a list of key:value pairs that I can analyze using kibana. I thought this could be done out of the box. From a lot of reading, I understand I must use the grok plugin (I am still not sure what the json plugin is for). But I am unable to get an event with all the fields. I get multiple events (one even for each attribute of my JSON). Like so:
{ "message" => " \"SOURCE\": \"Source A\",", "@version" => "1", "@timestamp" => "2014-08-31T01:26:23.432Z", "type" => "my-json", "tags" => [ [0] "tag-json" ], "host" => "myserver.example.com", "path" => "/opt/mount/ELK/json/mytestjson.json" } { "message" => " \"Model\": \"ModelABC\",", "@version" => "1", "@timestamp" => "2014-08-31T01:26:23.438Z", "type" => "my-json", "tags" => [ [0] "tag-json" ], "host" => "myserver.example.com", "path" => "/opt/mount/ELK/json/mytestjson.json" } { "message" => " \"Qty\": \"3\",", "@version" => "1", "@timestamp" => "2014-08-31T01:26:23.438Z", "type" => "my-json", "tags" => [ [0] "tag-json" ], "host" => "myserver.example.com", "path" => "/opt/mount/ELK/json/mytestjson.json" }
Should I use the multiline codec or the json_lines codec? If so, how can I do that? Do I need to write my own grok pattern or is there something generic for JSONs that will give me ONE EVENT with key:value pairs that I get for one event above? I couldn't find any documentation that sheds light on this. Any help would be appreciated. My conf file is shown below:
input { file { type => "my-json" path => ["/opt/mount/ELK/json/mytestjson.json"] codec => json tags => "tag-json" } } filter { if [type] == "my-json" { date { locale => "en" match => [ "RECEIVE-TIMESTAMP", "yyyy-mm-dd HH:mm:ss" ] } } } output { elasticsearch { host => localhost } stdout { codec => rubydebug } }
-
Joseph over 9 yearsThanks Alcanzar, I get a JSON parse failure though: [0] "_jsonparsefailure" Tried changing the pattern to pattern => '^\{' but still the same thing. And my file would have only 1 JSON per file i.e. only one { or } char. Each file would be one event (1 file = 1 JSON =1 event)
-
Alcanzar over 9 yearsyou may need to add
start_postion => beginning
to your file input to make sure it starts at the beginning of a record... also is there anything else in your file? (you can remove the filter and just add anoutput { stdout {} }
to see what it's gathering to pass to the json filter) -
Joseph over 9 yearsI noticed that my production JSON does have additional "{" and "}" :( So my JSON is actually: { "SOURCE":"Source A", "Model":"ModelABC", "Qty":"3" "DESC": "{\"New prod-125\"}" } (sorry doesn't parse well in comments) And I can't make changes to these JSONs. We receive them from another source and I need to consume as is.
-
Alcanzar over 9 yearsYou'll have to "fix" the message before you do the
json
on it. For example, you could use amutate
filter withgsub => [ 'message','\"','']
If you need something more complicated, you could resort to aruby
code filter -
Joseph over 9 yearsI think it boils down to reducing my multiline JSON (bounded by braces) to one line and then I can apply the filter:
if [message] =~ /^{.*}$/ {json { source => message } }
. How can I reduce my multiline JSON to one line? I'm no ruby guy, so I can't do that. Any tips? It's strange that I can't find anyone else who's had to parse a multiline JSON