How to process multiline log entry with logstash filter?

47,981

Solution 1

I went through the source code and found out that :

  • The multiline filter will cancel all the events that are considered to be a follow up of a pending event, then append that line to the original message field, meaning any filters that are after the multiline filter won't apply in this case
  • The only event that will ever pass the filter, is one that is considered to be a new one ( something that start with [ in my case )

Here is the working code :

input {
   stdin{}
}  

filter{
      if "|ERROR|" in [message]{ #if this is the 1st message in many lines message
      grok{
        match => ['message',"\[.+\] - %{IP:ip}\|%{LOGLEVEL:loglevel}\| %{PATH:file}\|%{NUMBER:line}\|%{WORD:tag}\|%{GREEDYDATA:content}"]
      }

      mutate {
        replace => [ "message", "%{content}" ] #replace the message field with the content field ( so it auto append later in it )
        remove_field => ["content"] # we no longer need this field
      }
    }

    multiline{ #Nothing will pass this filter unless it is a new event ( new [2014-03-02 1.... )
        pattern => "^\["
        what => "previous"
        negate=> true
    }

    if "|DEBUG| flush_multi_line" in [message]{
      drop{} # We don't need the dummy line so drop it
    }
}

output {
  stdout{ debug=>true }
}

Cheers,

Abdou

Solution 2

grok and multiline handling is mentioned in this issue https://logstash.jira.com/browse/LOGSTASH-509

Simply add "(?m)" in front of your grok regex and you won't need mutation. Example from issue:

pattern => "(?m)<%{POSINT:syslog_pri}>(?:%{SPACE})%{GREEDYDATA:message_remainder}"

Solution 3

The multiline filter will add the "\n" to the message. For example:

"[2014-03-02 17:34:20] - 127.0.0.1|ERROR| E:\\xampp\\htdocs\\test.php|123|subject|The error message goes here ; array (\n  'create' => \n  array (\n    'key1' => 'value1',\n    'key2' => 'value2',\n    'key3' => 'value3'\n  ),\n)"

However, the grok filter can't parse the "\n". Therefore you need to substitute the \n to another character, says, blank space.

mutate {
    gsub => ['message', "\n", " "]
}

Then, grok pattern can parse the message. For example:

 "content" => "The error message goes here ; array (   'create' =>    array (     'key1' => 'value1',     'key2' => 'value2',     'key3' => 'value3'   ), )"

Solution 4

Isn't the issue simply the ordering of the filters. Order is very important to log stash. You don't need another line to indicate that you've finished outputting multiline log line. Just ensure multiline filter appears first before the grok (see below)

P.s. I've managed to parse a multiline log line fine where xml was appended to end of log line and it spanned multiple lines and still I got a nice clean xml object into my content equivalent variable (named xmlrequest below). Before you say anything about logging xml in logs... I know... its not ideal... but that's for another debate :)):

filter { 
multiline{
        pattern => "^\["
        what => "previous"
        negate=> true
    }

mutate {
    gsub => ['message', "\n", " "]
}

mutate {
    gsub => ['message', "\r", " "]
}

grok{
        match => ['message',"\[%{WORD:ONE}\] \[%{WORD:TWO}\] \[%{WORD:THREE}\] %{GREEDYDATA:xmlrequest}"]
    }

xml {
source => xmlrequest
remove_field => xmlrequest
target => "request"
  }
}
Share:
47,981

Related videos on Youtube

emonik
Author by

emonik

A passionate Web Developer :)

Updated on December 22, 2020

Comments

  • emonik
    emonik over 3 years

    Background:

    I have a custom generated log file that has the following pattern :

    [2014-03-02 17:34:20] - 127.0.0.1|ERROR| E:\xampp\htdocs\test.php|123|subject|The error message goes here ; array (
      'create' => 
      array (
        'key1' => 'value1',
        'key2' => 'value2',
        'key3' => 'value3'
      ),
    )
    [2014-03-02 17:34:20] - 127.0.0.1|DEBUG| flush_multi_line
    

    The second entry [2014-03-02 17:34:20] - 127.0.0.1|DEBUG| flush_multi_line Is a dummy line, just to let logstash know that the multi line event is over, this line is dropped later on.

    My config file is the following :

    input {
      stdin{}
    }
    
    filter{
      multiline{
          pattern => "^\["
          what => "previous"
          negate=> true
      }
      grok{
        match => ['message',"\[.+\] - %{IP:ip}\|%{LOGLEVEL:loglevel}"]
      }
    
      if [loglevel] == "DEBUG"{ # the event flush  line
        drop{}
      }else if [loglevel] == "ERROR"  { # the first line of multievent
        grok{
          match => ['message',".+\|.+\| %{PATH:file}\|%{NUMBER:line}\|%{WORD:tag}\|%{GREEDYDATA:content}"] 
        }
      }else{ # its a new line (from the multi line event)
        mutate{
          replace => ["content", "%{content} %{message}"] # Supposing each new line will override the message field
        }
      }  
    }
    
    output {
      stdout{ debug=>true }
    }
    

    The output for content field is : The error message goes here ; array (

    Problem:

    My problem is that I want to store the rest of the multiline to content field :

    The error message goes here ; array (
      'create' => 
      array (
        'key1' => 'value1',
        'key2' => 'value2',
        'key3' => 'value3'
      ),
    )
    

    So i can remove the message field later.

    The @message field contains the whole multiline event so I tried the mutate filter, with the replace function on that, but I'm just unable to get it working :( .

    I don't understand the Multiline filter's way of working, if someone could shed some light on this, it would be really appreciated.

    Thanks,

    Abdou.

  • emonik
    emonik about 10 years
    Thank you for your answer Ben, however your code won't work due to the reasons I stated in my answer
  • Ban-Chuan Lim
    Ban-Chuan Lim about 10 years
    Actually I have use your config and logs, it's work on me! You need to add gsub filter after multiline
  • emonik
    emonik about 10 years
    Indeed your code does work, thanks for the info about the grok trick. but I'd rather use the code in my answer since I need more control and edition over the message before it get appended. So that's the one i'll mark as an answer, too bad I don't have enough reps, to up vote your answer :( I appreciate your help, thx
  • Ban-Chuan Lim
    Ban-Chuan Lim about 10 years
    You are welcomed. :). If you have any question, come and ask at here. I have vote up for you. You answer is awesome!
  • CrazyPyro
    CrazyPyro about 9 years
    Yes! This worked for me when nothing else would. I think your pattern => bit should be grok{ match => instead.For completeness, consider editing to include @Thales Ceolin's comment as well as the actual multiline block from the original question. That way people have a turn-key solution in this answer.
  • CrazyPyro
    CrazyPyro about 9 years
    Downvoted this answer, but upvoted your question. This info may have been correct at one time, but isn't any more. (At least not for 1.4.2) Consider accepting @sbange's answer instead - that's the only one that worked for me.