Parsing a JSON file from a S3 Bucket

15,750

Solution 1

You can load the document using boto3.resource('s3').Object(...).get() and then parse it into python with json.loads():

import json
import boto3

s3 = boto3.resource('s3')

def lambda_handler(event, context):
  bucket =  'finalyearpro-aws'
  key = 'StudentResults.json'

  obj = s3.Object(bucket, key)
  data = obj.get()['Body'].read().decode('utf-8')
  json_data = json.loads(data)

  print(json_data)

Solution 2

json.loads(json_data) will parse the json string and create list of dicts (for this data) from it. After that you can iterate over the list and do whatever you want, i.e.

data = json.loads(json_data)
min([r['Result'] for r in data])
Share:
15,750
Nimra Sajid
Author by

Nimra Sajid

Updated on June 04, 2022

Comments

  • Nimra Sajid
    Nimra Sajid almost 2 years

    I am relatively new to Amazon Web Services.

    I need help on parsing a JSON file from an S3 Bucket using Python. I was able to read in the JSON file from S3 using the S3 trigger connected to the lambda function and display it on Cloud-Watch aswell. I need help on how to parse the "results" from the JSON file and calculate max, min and average of the "Results".

    Here is my JSON file:

    Student = [{"Student_ID": 1,
        "Name":"Erik",
        "ExamSubject": "English",
        "Result": 72.3,
        "ExamDate": "9/12/2020",
        "Sex": "M"},
    
    
    
    {"Student_ID": 2,
        "Name":"Daniel",
        "ExamSubject": "English",
        "Result": 71,
        "ExamDate": "9/12/2020",
        "Sex": "M"},
    
    
    {"Student_ID": 3,
        "Name":"Michael",
        "ExamSubject": "English",
        "Result": 62,
        "ExamDate": "9/12/2020",
        "Sex": "M"},
    
    
    {"Student_ID": 4,
        "Name":"Sven",
        "ExamSubject": "English",
        "Result": 73,
        "ExamDate": "9/12/2020",
        "Sex": "M"},
    
    
    {"Student_ID": 5,
        "Name":"Jake",
        "ExamSubject": "English",
        "Result": 84.15,
        "ExamDate": "9/12/2020",
        "Sex": "M"},
    ]
    
    
    print(Student)
    

    and here is the code I have used on the lambda function so far:

    import json
    import boto3
    
    
    s3 = boto3.client('s3')
    
    def lambda_handler(event, context):
    
       bucket =  'finalyearpro-aws'
       key = 'StudentResults.json'
    
    
          try:
            data = s3.get_object(Bucket=bucket, Key=key)
            json_data = data['Body'].read().decode('utf-8')
    
    
    
        print (json_data)
    
    
    except Exception as e:
    
        raise e
    

    How do I add to this code to make it read the "Results" from the JSON file, do analysis on it (max, min, average) and display on Lambda console.

  • Nimra Sajid
    Nimra Sajid almost 4 years
    I get this error when I put it into the lambda console: Response: { "errorMessage": "Syntax error in module 'lambda_function': unindent does not match any outer indentation level (lambda_function.py, line 18)", "errorType": "Runtime.UserCodeSyntaxError", "stackTrace": [ " File \"/var/task/lambda_function.py\" Line 18\n data = json.loads(json_data)\n" ] }
  • Yaroslav Fyodorov
    Yaroslav Fyodorov almost 4 years
    @NimraSajid Indentation problem - you know, regular python stuff - check that spaces are correct on this line - should be indented as much as line above
  • Nimra Sajid
    Nimra Sajid almost 4 years
    Yh im new at python so im just figuring it out as I go along. Python is really tight on indentation.