Pandas & AWS Lambda

56,676

Solution 1

After some tinkering around and lot's of googling I was able to make everything work and setup a repo that can just be cloned in the future.

Key takeaways:

  1. All static packages have to be compiled on an ec2 amazon Linux instance
  2. The python code needs to load the libraries in the lib/ folder before executing.

Github repo: https://github.com/moesy/AWS-Lambda-ML-Microservice-Skeleton

Solution 2

I believe you should be able to use the recent pandas version (or likely, the one on your machine). You can create a lambda package with pandas by yourself like this,

  1. First find where the pandas package is installed on your machine i.e. Open a python terminal and type

    import pandas
    pandas.__file__
    

    That should print something like '/usr/local/lib/python3.4/site-packages/pandas/__init__.py'

  2. Now copy the pandas folder from that location (in this case '/usr/local/lib/python3.4/site-packages/pandas) and place it in your repository.
  3. Package your Lambda code with pandas like this:

    zip -r9 my_lambda.zip pandas/
    zip -9 my_lambda.zip my_lambda_function.py
    

You can also deploy your code to S3 and make your Lambda use the code from S3.

aws s3 cp  my_lambda.zip s3://dev-code//projectx/lambda_packages/

Here's the repo that will get you started

Solution 3

The repo mthenw/awesome-layers lists several publicly available aws lambda layers.

In particular, keithrozario/Klayers has pandas+numpy and is up-to-date as of today with pandas 0.25.

Its ARN is arn:aws:lambda:us-east-1:113088814899:layer:Klayers-python37-pandas:1

Solution 4

I know the question was asked a couple years ago and Lambda was on a different stage back then.

I faced similar issues lately and I thought it would be a good idea to add the newest solution here for future users facing the same problem.

It turns out that amazon released the concept of layers in the re:Invent 2018. It is a great feature. This post in medium describes it much better than I could here: Creating New AWS Lambda Layer For Python Pandas Library

Solution 5

The easiest way to get pandas working in a Lambda function is to utilize Lambda Layers and AWS Data Wrangler. A Lambda Layer is a zip archive that contains libraries or dependencies. According to the AWS documentation, using layers keeps your deployment package small, making development easier.

The AWS Data Wrangler is an open source package that extends the power of pandas to AWS services.

Follow the instructions (under AWS Lambda Layer) here.

Share:
56,676
Admin
Author by

Admin

Updated on January 04, 2022

Comments

  • Admin
    Admin over 2 years

    Does anyone have a fully compiled version of pandas that is compatible with AWS Lambda?

    After searching around for a few hours, I cannot seem to find what I'm looking for and the documentation on this subject is non-existent.

    I need access to the package in a lambda function however I have been unsuccessful at getting the package to compile properly for usage in a Lambda function.

    In lieu of the compilation can anyone provide reproducible steps to create the binaries?

    Unfortunately I have not been able to successfully reproduce any of the guides on the subjects as they mostly combine pandas with scipy which I don't need and adds an extra layer of burden.

  • Admin
    Admin over 7 years
    @dsvensson please take a second look at the repo it builds the binaries from source.
  • JohnAndrews
    JohnAndrews over 5 years
    What the hell is sam?
  • JohnAndrews
    JohnAndrews over 5 years
    But this requires a ec2 instance. How to avoid that one?
  • Aakash Basu
    Aakash Basu about 5 years
    This is not working. What I did was, got pandas-0.24.2 and its dependencies (numpy-1.16.2, python-dateutil-2.8.0, pytz-2018.9, six-1.12.0), all cp36-cp36m-manylinux1_x86_64.whl from pypi.org and unzipped and put in a single windows folder. Put the Python code, zipped it and uploaded. Getting error: Unable to import module 'lambda_function': No module named 'lambda_function'
  • Aakash Basu
    Aakash Basu about 5 years
    @JohnAndrews If you've a linux machine, the same steps can be done in your local. The basic requirement is just that Lambda runs on Linux, hence, the compilation of the non Amazon APIs need to be Linux built.
  • ashtonium
    ashtonium about 5 years
    Sounds like it's expecting the default python file name. Is your lambda_function.py file in the root level of your .zip file along with the various package folders?
  • ashtonium
    ashtonium about 5 years
    Not sure this is an answer to the original question. You still need to create the Lambda Layer via a deployment package which contains the correctly compiled binaries. Lambda Layers just makes those dependencies re-usable across multiple functions.
  • aQ123
    aQ123 about 4 years
    On the mac here's the path: /Library/Frameworks/Python.framework/Versions/3.7/lib/python‌​3.7/site-packages/pa‌​ndas
  • nilsi
    nilsi almost 4 years
  • mfcss
    mfcss almost 4 years
    This was exactly what I was looking for - a solution where I can simply parse an arn to get access to numpy and pandas! Tested just now and it works like a charm.
  • B. Youngman
    B. Youngman almost 4 years
    This was perfect - thanks! I've been fighting this for a couple of days now and it was getting frustrating to say the least. I've read articles that pretty much said the same thing as this but with a lot of fluff around them that it made hit hard to follow. This was the first concise explanation that I have found and I was able to get my layer up and running in a couple of minutes.
  • user3661992
    user3661992 over 3 years
    No worries @B.Youngman! I just used the solution yesterday.
  • Jimbo
    Jimbo over 3 years
    Interesting solution, why do you use --no-deps for your second pip call? I assume this would not be correct if you were using something that had another dependency?
  • Jimbo
    Jimbo over 3 years
    just ran the code without and got this nice message .... "When restricting platform and interpreter constraints using --python-version, --platform, --abi, or --implementation, either --no-deps must be set, or --only-binary=:all: must be set and --no-binary must not be set (or must be set to :none:)."
  • Scott Brenstuhl
    Scott Brenstuhl over 3 years
    @Jimbo that sounds right. To be honest I don't completely remember but I think that I had to do a round or two of getting error messages to make sure I had all of the dependencies in one of the requirements files (with a notable one being pytz which also needs to be loaded as the linux version)
  • Chenna V
    Chenna V about 3 years
    @hackwithharsha refer to this answer stackoverflow.com/a/57969190/358013