Exporting a AWS Postgres RDS Table to AWS S3

13,450

Solution 1

There is a sample on github. https://github.com/awslabs/data-pipeline-samples/tree/master/samples/RDStoS3

Here is the code: https://github.com/awslabs/data-pipeline-samples/blob/master/samples/RDStoS3/RDStoS3Pipeline.json

Solution 2

You can define a copy-activity in the Data Pipeline interface to extract data from a Postgres RDS instance into S3.

  1. Create a data node of the type SqlDataNode. Specify table name and select query.
  2. Setup the database connection by specifying RDS instance ID (the instance ID is in your URL, e.g. your-instance-id.xxxxx.eu-west-1.rds.amazonaws.com) along with username, password and database name.
  3. Create a data node of the type S3DataNode.
  4. Create a Copy activity and set the SqlDataNode as input and the S3DataNode as output.

Another option is to use an external tool like Alooma. Alooma can replicate tables from PostgreSQL database hosted Amazon RDS to Amazon S3 (https://www.alooma.com/integrations/postgresql/s3). The process can be automated and you can run it once a week.

Solution 3

I built a Pipeline from scratch using the MySQL and the documentation as reference.

You need to have the roles on place, DataPipelineDefaultResourceRole && DataPipelineDefaultRole.

I haven't load the parameters, so, you need to get into the architech and put your credentials and folders.

Hope it helps.

{
  "objects": [
    {
      "failureAndRerunMode": "CASCADE",
      "resourceRole": "DataPipelineDefaultResourceRole",
      "role": "DataPipelineDefaultRole",
      "pipelineLogUri": "#{myS3LogsPath}",
      "scheduleType": "ONDEMAND",
      "name": "Default",
      "id": "Default"
    },
    {
      "database": {
        "ref": "DatabaseId_WC2j5"
      },
      "name": "DefaultSqlDataNode1",
      "id": "SqlDataNodeId_VevnE",
      "type": "SqlDataNode",
      "selectQuery": "#{myRDSSelectQuery}",
      "table": "#{myRDSTable}"
    },
    {
      "*password": "#{*myRDSPassword}",
      "name": "RDS_database",
      "id": "DatabaseId_WC2j5",
      "type": "RdsDatabase",
      "rdsInstanceId": "#{myRDSId}",
      "username": "#{myRDSUsername}"
    },
    {
      "output": {
        "ref": "S3DataNodeId_iYhHx"
      },
      "input": {
        "ref": "SqlDataNodeId_VevnE"
      },
      "name": "DefaultCopyActivity1",
      "runsOn": {
        "ref": "ResourceId_G9GWz"
      },
      "id": "CopyActivityId_CapKO",
      "type": "CopyActivity"
    },
    {
      "dependsOn": {
        "ref": "CopyActivityId_CapKO"
      },
      "filePath": "#{myS3Container}#{format(@scheduledStartTime, 'YYYY-MM-dd-HH-mm-ss')}",
      "name": "DefaultS3DataNode1",
      "id": "S3DataNodeId_iYhHx",
      "type": "S3DataNode"
    },
    {
      "resourceRole": "DataPipelineDefaultResourceRole",
      "role": "DataPipelineDefaultRole",
      "instanceType": "m1.medium",
      "name": "DefaultResource1",
      "id": "ResourceId_G9GWz",
      "type": "Ec2Resource",
      "terminateAfter": "30 Minutes"
    }
  ],
  "parameters": [
  ]
}

Solution 4

You can now do this with aws_s3.query_export_to_s3 command within postgres itself https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/postgresql-s3-export.html

Share:
13,450

Related videos on Youtube

error2007s
Author by

error2007s

AWS All 5 Holder

Updated on June 15, 2022

Comments

  • error2007s
    error2007s almost 2 years

    I wanted to use AWS Data Pipeline to pipe data from a Postgres RDS to AWS S3. Does anybody know how this is done?

    More precisely, I wanted to export a Postgres Table to AWS S3 using data Pipeline. The reason I am using Data Pipeline is I want to automate this process and this export is going to run once every week.

    Any other suggestions will also work.

  • user2976753
    user2976753 over 7 years
    This code is for MySQL, can be used for postgres? Given the driver jdbc:mysql inside Pipeline?
  • AlexK
    AlexK about 3 years
    Here is the link to Postgres outside of Aurora: docs.aws.amazon.com/AmazonRDS/latest/UserGuide/…