Compress file on S3

amazon-s3 compression hive file-transfer emr

47,909

Solution 1

S3 does not support stream compression nor is it possible to compress the uploaded file remotely.

If this is a one-time process I suggest downloading it to a EC2 machine in the same region, compress it there, then upload to your destination.

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html

If you need this more frequently

Serving gzipped CSS and JavaScript from Amazon CloudFront via S3

Solution 2

Late answer but I found this working perfectly.

aws s3 sync s3://your-pics .

for file in "$(find . -name "*.jpg")"; do gzip "$file"; echo "$file";  done

aws s3 sync . s3://your-pics --content-encoding gzip --dryrun

This will download all files in s3 bucket to the machine (or ec2 instance), compresses the image files and upload them back to s3 bucket. Verify the data before removing dryrun flag.

Solution 3

There are now pre-built apps in Lambda that you could use to compress images and files in S3 buckets. So just create a new Lambda function and select a pre-built app of your choice and complete the configuration.

Step 1 - Create a new Lambda function
Step 2 - Search for prebuilt app
Step 3 - Select the app that suits your need and complete the configuration process by providing the S3 bucket names.

47,909

Author by

Matt Joiner

About Me I like parsimonious code, with simple interfaces and excellent documentation. I'm not interested in enterprise, boiler-plate, or cookie-cutter nonsense. I oppose cruft and obfuscation. My favourite languages are Go, Python and C. I wish I was better at Haskell. Google+ GitHub Bitbucket Google code My favourite posts http://stackoverflow.com/questions/3609469/what-are-the-thread-limitations-when-working-on-linux-compared-to-processes-for/3705919#3705919 http://stackoverflow.com/questions/4352425/what-should-i-learn-first-before-heading-to-c/4352469#4352469 http://stackoverflow.com/questions/6167809/how-much-bad-can-be-done-using-register-variables-in-c/6168852#6168852 http://stackoverflow.com/questions/4141307/c-and-c-source-code-profiling-tools/4141345#4141345 http://stackoverflow.com/questions/3463207/how-big-can-a-malloc-be-in-c/3486163#3486163 http://stackoverflow.com/questions/4095637/memory-use-of-stl-data-structures-windows-vs-linux/4183178#4183178

Updated on July 09, 2021

Comments

Matt Joiner almost 3 years

I have a 17.7GB file on S3. It was generated as the output of a Hive query, and it isn't compressed.

I know that by compressing it, it'll be about 2.2GB (gzip). How can I download this file locally as quickly as possible when transfer is the bottleneck (250kB/s).

I've not found any straightforward way to compress the file on S3, or enable compression on transfer in s3cmd, boto, or related tools.