How to load Image data from s3 bucket to sagemaker notebook?

10,600

Solution 1

you could use s3fs to easily access your bucket as well as an image file in it.

from PIL import Image
import s3fs

fs = s3fs.S3FileSystem()

# To List 5 files in your accessible bucket
fs.ls('s3://bucket-name/data/')[:5]

# open it directly
with fs.open(f's3://bucket-name/data/image.png') as f:
    display(Image.open(f))

Solution 2

You don't have to download images from S3 bucket to local SageMaker instance for training the model. If you are trying to pull them for data exploration/analysis you can use aws cli from your SageMaker notebook. You can use following command to download a sample image. This will copy sample.jpg to images directory in your pwd.

aws s3 cp s3://my_bucket/train/sample.jpg ./images/sample.jpg

Try looking at amazon-sagemaker-examples repo to learn how to work with image formats on SageMaker.

Share:
10,600
Admin
Author by

Admin

Updated on June 13, 2022

Comments

  • Admin
    Admin almost 2 years

    I just started to use aws sagemaker. I tried to import images from my s3 bucket to sagemaker notebook. But I can't import images to the notebook. my image location is s3://my_bucket/train how can I import the train folder from the given path to my sagemaker notebook. I've gone through some of the solution in here and the solutions are for CSV file. All the images in my S3 bucket are in .jpeg format.

  • Admin
    Admin about 5 years
    I've gone through the repo you mention. I wonder how can I convert my train folder which has almost 40k images to .lst format. They mention about im2rec.py file to convert file but at first, I have to access the file from s3 right? or Should I download the file from s3 to the local machine and then after convert again upload to s3 it sounds so boring and inefficient. Could you enlighten me more?
  • raj
    raj about 5 years
    What algorithm are you using with these images? If it is SageMaker built-in Image Classification, input can be in both RecordIO and image formats. Refer Image Classification doc link and notebooks to know how to create the list file depending on type of problem you are working with e.g. binary or multi-label classification. im2rec.py is running locally (on SageMaker), therefore cannot take input from the S3 bucket. To generate the RecordIO file, you need to download the images and then use the im2rec tool.
  • Dev Vanana
    Dev Vanana about 4 years
    Add this line on top : from PIL import Image
  • Roko Mijic
    Roko Mijic almost 4 years
    thanks! I will note that this also works for other problems, I had trouble with an XML file and this has fixed it