How to query cloudwatch logs using boto3 in python

34,703

Solution 1

You can get what you want using CloudWatch Logs Insights.

You would use start_query and get_query_results APIs: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/logs.html

To start a query you would use (for use case 2 from your question, 1 and 3 are similar):

import boto3
from datetime import datetime, timedelta
import time

client = boto3.client('logs')

query = "fields @timestamp, @message | parse @message \"username: * ClinicID: * nodename: *\" as username, ClinicID, nodename | filter ClinicID = 7667 and username='[email protected]'"  

log_group = '/aws/lambda/NAME_OF_YOUR_LAMBDA_FUNCTION'

start_query_response = client.start_query(
    logGroupName=log_group,
    startTime=int((datetime.today() - timedelta(hours=5)).timestamp()),
    endTime=int(datetime.now().timestamp()),
    queryString=query,
)

query_id = start_query_response['queryId']

response = None

while response == None or response['status'] == 'Running':
    print('Waiting for query to complete ...')
    time.sleep(1)
    response = client.get_query_results(
        queryId=query_id
    )

Response will contain your data in this format (plus some metadata):

{
  'results': [
    [
      {
        'field': '@timestamp',
        'value': '2019-12-09 17:07:24.428'
      },
      {
        'field': '@message',
        'value': 'username: [email protected] ClinicID: 7667 nodename: MacBook-Pro-2.local\n'
      },
      {
        'field': 'username',
        'value': '[email protected]'
      },
      {
        'field': 'ClinicID',
        'value': '7667'
      },
      {
        'field': 'nodename',
        'value': 'MacBook-Pro-2.local\n'
      }
    ]
  ]
}

Solution 2

You can achieve this with the cloudWatchlogs client and a little bit of coding. You can also customize the conditions or use JSON module for a precise result.

EDIT

You can use describe_log_streams to get the streams. If you want only the latest, just put limit 1, or if you want more than one, use for loop to iterate all streams while filtering as mentioned below.

    import boto3

    client = boto3.client('logs')


    ## For the latest
    stream_response = client.describe_log_streams(
        logGroupName="/aws/lambda/lambdaFnName", # Can be dynamic
        orderBy='LastEventTime',                 # For the latest events
        limit=1                                  # the last latest event, if you just want one
        )

    latestlogStreamName = stream_response["logStreams"]["logStreamName"]


    response = client.get_log_events(
        logGroupName="/aws/lambda/lambdaFnName",
        logStreamName=latestlogStreamName,
        startTime=12345678,
        endTime=12345678,
    )

    for event in response["events"]:
        if event["message"]["ClinicID"] == "7667":
            print(event["message"])
        elif event["message"]["username"] == "[email protected]":
            print(event["message"])
        #.
        #.
        # more if or else conditions

    ## For more than one Streams, e.g. latest 5
    stream_response = client.describe_log_streams(
        logGroupName="/aws/lambda/lambdaFnName", # Can be dynamic
        orderBy='LastEventTime',                 # For the latest events
        limit=5                                  
        )

    for log_stream in stream_response["logStreams"]:
        latestlogStreamName = log_stream["logStreamName"]

        response = client.get_log_events(
             logGroupName="/aws/lambda/lambdaFnName",
             logStreamName=latestlogStreamName,
             startTime=12345678,
             endTime=12345678,
        )
        ## For example, you want to search "ClinicID=7667", can be dynamic

        for event in response["events"]:
           if event["message"]["ClinicID"] == "7667":
             print(event["message"])
           elif event["message"]["username"] == "[email protected]":
             print(event["message"])
           #.
           #.
           # more if or else conditions



Let me know how it goes.

Solution 3

I used awslogs. if you install it, you can do. --watch will tail the new logs.

awslogs get /aws/lambda/log-group-1 --start="5h ago" --watch

You can install it using

pip install awslogs

to filter you can do:

awslogs get /aws/lambda/log-group-1  --filter-pattern '"ClinicID=7667"' --start "5h ago" --timestamp

It supports multiple filter patterns as well.

awslogs get /aws/lambda/log-group-1  --filter-pattern '"ClinicID=7667"' --filter-pattern '" [email protected]"' --start "5h ago" --timestamp

References:

awslogs

awslogs . PyPI

Share:
34,703
systemdebt
Author by

systemdebt

Updated on April 21, 2021

Comments

  • systemdebt
    systemdebt about 3 years

    I have a lambda function that writes metrics to Cloudwatch. While, it writes metrics, It generates some logs in a log-group.

    INFO:: username: [email protected] ClinicID: 7667 nodename: MacBook-Pro-2.local
    
    INFO:: username: [email protected] ClinicID: 7667 nodename: MacBook-Pro-2.local
    
    INFO:: username: [email protected] ClinicID: 7668 nodename: MacBook-Pro-2.local
    
    INFO:: username: [email protected] ClinicID: 7667 nodename: MacBook-Pro-2.local
    

    I would like to query AWS logs in past x hours where x could be anywhere between 12 to 24 hours, based on any of the params.

    For ex:

    1. Query Cloudwatch logs in last 5 hours where ClinicID=7667

    or

    1. Query Cloudwatch logs in last 5 hours where ClinicID=7667 and username='[email protected]'

    or

    1. Query Cloudwatch logs in last 5 hours where username='[email protected]'

    I am using boto3 in Python. Can I have a direction on this please?