Log file to Pandas Dataframe
Solution 1
You can use read_csv
with separator \s*\[
- whitespaces with [
:
import pandas as pd
from pandas.compat import StringIO
temp=u"""LogLevel [13/10/2015 00:30:00.650] [Message Text]
LogLevel [13/10/2015 00:30:00.650] [Message Text]
LogLevel [13/10/2015 00:30:00.650] [Message Text]
LogLevel [13/10/2015 00:30:00.650] [Message Text]"""
#after testing replace StringIO(temp) to filename
df = pd.read_csv(StringIO(temp), sep="\s*\[", names=['Level','Time','Text'], engine='python')
Then remove ]
by strip
and convert column Time
to_datetime
:
df.Time = pd.to_datetime(df.Time.str.strip(']'), format='%d/%m/%Y %H:%M:%S.%f')
df.Text = df.Text.str.strip(']')
print (df)
Level Time Text
0 LogLevel 2015-10-13 00:30:00.650 Message Text
1 LogLevel 2015-10-13 00:30:00.650 Message Text
2 LogLevel 2015-10-13 00:30:00.650 Message Text
3 LogLevel 2015-10-13 00:30:00.650 Message Text
print (df.dtypes)
Level object
Time datetime64[ns]
Text object
dtype: object
Solution 2
I had to parse mine manually since my separator showed up in my message body and the message body would span multiple lines as well, eg if an exception were thrown from my Flask application and the stack track recorded.
Here's my log creation format...
logging.basicConfig( filename="%s/%s_MyApp.log" % ( Utilities.logFolder , datetime.datetime.today().strftime("%Y%m%d-%H%M%S")) , level=logging.DEBUG, format="%(asctime)s,%(name)s,%(process)s,%(levelno)u,%(message)s", datefmt="%Y-%m-%d %H:%M:%S" )
And the parsing code in my Utilities module
Utilities.py
import re
import pandas
logFolder = "./Logs"
logLevelToString = { "50" : "CRITICAL",
"40" : "ERROR" ,
"30" : "WARNING" ,
"20" : "INFO" ,
"10" : "DEBUG" ,
"0" : "NOTSET" } # https://docs.python.org/3.6/library/logging.html#logging-levels
def logFile2DataFrame( filePath ) :
dfLog = pandas.DataFrame( columns=[ 'Timestamp' , 'Module' , 'ProcessID' , 'Level' , 'Message' ] )
tsPattern = "^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},"
with open( filePath , 'r' ) as logFile :
numRows = -1
for line in logFile :
if re.search( tsPattern , line ) :
tokens = line.split(",")
timestamp = tokens[0]
module = tokens[1]
processID = tokens[2]
level = logLevelToString[ tokens[3] ]
message = ",".join( tokens[4:] )
numRows += 1
dfLog.loc[ numRows ] = [ timestamp , module , processID , level , message ]
else :
# Multiline message, integrate it into last record
dfLog.loc[ numRows , 'Message' ] += line
return dfLog
I actually created this helper message to allow me to view my logs directly from my Flask app as I have a handy template that renders a DataFrame. Should accelerate debugging a bunch since encasing the flaskapp in a Tornado WSGI server prevents the display of the normal debug page visible from Flask when an exception gets thrown. If anyone knows how to restore that functionality in such a usage please share.
ukbaz
Updated on July 26, 2022Comments
-
ukbaz almost 2 years
I have log files, which have many lines in the form of :
LogLevel [13/10/2015 00:30:00.650] [Message Text]
My goal is to convert each line in the log file into a nice Data frame. I have tired to do that, by splitting the lines on the [ character, however I am still not getting a neat dataframe.
My code:
level = [] time = [] text = [] with open(filename) as inf: for line in inf: parts = line.split('[') if len(parts) > 1: level = parts[0] time = parts[1] text = parts[2] print (parts[0],parts[1],parts[2]) s1 = pd.Series({'Level':level, 'Time': time, 'Text':text}) df = pd.DataFrame(s1).reset_index()
Heres my printed Data frame:
Info 10/08/16 10:56:09.843] In Function CCatalinaPrinter::ItemDescription()] Info 10/08/16 10:56:09.843] Sending UPC Description Message ]
How can I improve this to strip the whitespace and the other ']' character
Thank you
-
ukbaz over 7 years@atkawa7 nope that didnt work
-
-
ukbaz over 7 yearsworks perfectly until I replace the temp with the filename, perhaps because my file is a .log and not .csv ?
-
jezrael over 7 yearsneed replace
StringIO(temp)
tofilenam.log
, it is not problem it can endswith.log
-
Vasu almost 4 yearsWhat if my log file in below format: LogLevel [13/10/2015 00:30:00.650] [Message Text1] [Message Text1A] [Message Text1B] LogLevel [13/10/2015 00:30:00.650] [Message Text2] LogLevel [13/10/2015 00:30:00.650] [Message Text3]