Open S3 object as a string with Boto3

284,933

Solution 1

read will return bytes. At least for Python 3, if you want to return a string, you have to decode using the right encoding:

import boto3

s3 = boto3.resource('s3')

obj = s3.Object(bucket, key)
obj.get()['Body'].read().decode('utf-8') 

Solution 2

I had a problem to read/parse the object from S3 because of .get() using Python 2.7 inside an AWS Lambda.

I added json to the example to show it became parsable :)

import boto3
import json

s3 = boto3.client('s3')

obj = s3.get_object(Bucket=bucket, Key=key)
j = json.loads(obj['Body'].read())

NOTE (for python 2.7): My object is all ascii, so I don't need .decode('utf-8')

NOTE (for python 3.6+): We moved to python 3.6 and discovered that read() now returns bytes so if you want to get a string out of it, you must use:

j = json.loads(obj['Body'].read().decode('utf-8'))

Solution 3

This isn't in the boto3 documentation. This worked for me:

object.get()["Body"].read()

object being an s3 object: http://boto3.readthedocs.org/en/latest/reference/services/s3.html#object

Solution 4

Python3 + Using boto3 API approach.

By using S3.Client.download_fileobj API and Python file-like object, S3 Object content can be retrieved to memory.

Since the retrieved content is bytes, in order to convert to str, it need to be decoded.

import io
import boto3

client = boto3.client('s3')
bytes_buffer = io.BytesIO()
client.download_fileobj(Bucket=bucket_name, Key=object_key, Fileobj=bytes_buffer)
byte_value = bytes_buffer.getvalue()
str_value = byte_value.decode() #python3, default decoding is utf-8

Solution 5

Decoding the whole object body to one string:

obj = s3.Object(bucket, key).get()
big_str = obj["Body"].read().decode("utf-8")

Decoding the object body to strings line-by-line:

obj = s3.Object(bucket, key).get()
reader = csv.reader(line.decode("utf-8") for line in obj["Body"].iter_lines())

When decoding as JSON, no need to convert to string, as json.loads accepts bytes too, since Python 3.6:

obj = s3.Object(bucket, key).get()
json.loads(obj["Body"].read())
Share:
284,933

Related videos on Youtube

Gahl Levy
Author by

Gahl Levy

Updated on January 07, 2022

Comments

  • Gahl Levy
    Gahl Levy over 2 years

    I'm aware that with Boto 2 it's possible to open an S3 object as a string with: get_contents_as_string()

    Is there an equivalent function in boto3 ?

  • roehrijn
    roehrijn over 8 years
    assuming "Body" contains string data, ou can use object.get()["Body"].read() to convert to a Python string.
  • Andrew_1510
    Andrew_1510 about 8 years
    boto3 get terrible doc, as of 2016.
  • jeffrey
    jeffrey about 7 years
    boto3.readthedocs.io/en/latest/reference/services/… tells us the return value is a dict, with a key "Body" of type StreamingBody, searching for that in read the docs gets you to botocore.readthedocs.io/en/latest/reference/response.html which will tell you to use read().
  • Tzunghsing David Wong
    Tzunghsing David Wong over 6 years
    to get this answer to work, I had to import botocore as obj.get()['Body'] is of type <class 'botocore.response.StreamingBody'>
  • Ken Williams
    Ken Williams over 6 years
    @TzunghsingDavidWong you shouldn't have to import a package to call methods on an existing object, right? Was that maybe only necessary while experimenting?
  • Timo
    Timo over 6 years
    Worked for me! AWS Boto3 documentation is a mess
  • Amaresh Jana
    Amaresh Jana over 6 years
    what is the value of key in the obj = s3.Object(bucket,key) ** bucket is buckername?? and key is the file name???*** please correct me if i m wrong...
  • Tipster
    Tipster over 6 years
    @Amaresh yes, bucket = bucket name and key = filename
  • Arun Kumar
    Arun Kumar about 6 years
    if a key is pdf format , is it work ? or please suggest another useful way, I tried import textract text = textract.process('path/to/a.pdf', method='pdfminer') It will sow import error
  • lurscher
    lurscher over 5 years
    seems that now get expected at least 1 arguments, got 0. Remove the get() and access the "Body" object property directly
  • Jakobovski
    Jakobovski over 3 years
    This is MUCH faster than object.get()["Body"].read() method.
  • Jakobovski
    Jakobovski over 3 years
    @gatsby-lee's answer below is MUCH faster than this. I get 120mb/s vs 24mb/s
  • Gatsby Lee
    Gatsby Lee almost 3 years
    FYI, if the content size is big, you will have pressure in memory.