Best way to send data from Dynamodb to Amazon Elasticsearch

10,393

Solution 1

Follow this AWS blog. They describe in detail how it is and should be done.

https://aws.amazon.com/blogs/compute/indexing-amazon-dynamodb-content-with-amazon-elasticsearch-service-using-aws-lambda/


edit

I'm assuming you use AWS elasticsearch managed service.

  1. You should use Dynamodb streams in order to listen to changes (among all, you'll have there events of new items added to dynamodb).
  2. Create new Kinesis Firehose stream that is set to output all records to your elasticsearch instance.
  3. Create a new lambda that is triggered by the events of new items in the DynamoDB stream.
  4. The lambda will get the unique DynamoDB record ID so you can fetch it, do fetch the record payload and ingest it to the Firehose stream endpoint.
  5. Depending on your DynamoDB record size, you might enable the option to include the record's payload in the stream item, so you won't need to be fetching it from the table and use the provisioned capacity that you've set.

Solution 2

I recommend creating an AWS Lambda stream on your DynamoDB, then take that data from the Lambda and write it into ElasticSearch.

Share:
10,393
JosepB
Author by

JosepB

Updated on June 13, 2022

Comments

  • JosepB
    JosepB over 1 year

    I was wondering which is the best way to send data from dynamoDB to elasticsearch.

    1. AWS sdk js. https://github.com/Stockflare/lambda-dynamo-to-elasticsearch/blob/master/index.js

    2. DynamoDB logstash plugin: https://github.com/awslabs/logstash-input-dynamodb

  • Will Barnwell
    Will Barnwell over 6 years
    Answers which are simply links to offsite blog posts are discouraged.
  • JosepB
    JosepB over 6 years
    Thanks @johni but I think that this link is deprecated. I didn't find the dynamodb-to-elasticsearch Lambda blueprint. I follwed that link and then I realized that the blueprint wasn't there. Then I saw this link: forums.aws.amazon.com/thread.jspa?threadID=240647. Tha's why I made the question here :)
  • sam
    sam over 5 years
    Why do we need to use firehose? Why can't we directly use lambda to feed data into elasticsearch?
  • johni
    johni over 5 years
    Couple of reasons; 1. Scale - your lambda may be generating huge amount of records, if each lambda tries to feed ES in parallel- you’d have many failures; 2. Retry mechanism - firehose would do local retries on the ingestion to ES and eventually put it on storage if exhausted , you don’t want to impl. that in your lambda;
  • Simon Dragsbæk
    Simon Dragsbæk over 5 years
    what do you do about dumping the untouched data into elastic search?
  • jhilden
    jhilden over 5 years
    can you explain more what you mean, I don't understand what you mean by "untouched"
  • Simon Dragsbæk
    Simon Dragsbæk over 5 years
    Let's say I set this up after my users added data to the table, and I want to sync the pre-existing data into elastic?
  • jhilden
    jhilden over 5 years
    Then you could create a script to read each of the records and then manually write them to the lamba. You could alternatively add a new Dynamo field for 'migrationDate' and update every record, which would cause the updated record to be sent to the lambda.