Is it possible to write to s3 via a stream using s3 java sdk

12,184

Solution 1

You don't say what language you're using, but I'll assume Java based on your capitalization. In which case the answer is yes: TransferManager has an upload() method that takes a PutObjectRequest, and you can construct that object around a stream.

However, there are two important caveats. The first is in the documentation for PutObjectRequest:

When uploading directly from an input stream, content length must be specified before data can be uploaded to Amazon S3

So you have to know how much data you're uploading before you start. If you're receiving an upload from the web and have a Content-Length header, then you can get the size from it. If you're just reading a stream of data that's arbitrarily long, then you have to write it to a file first (or the SDK will).

The second caveat is that this really doesn't prevent data loss: your program can still crash in the middle of reading data. One thing that it will prevent is returning a success code to the user before storing the data in S3, but you could do that anyway with a file.

Solution 2

Surprisingly this is not possible (at time of writing this post) with standard Java SDK. Anyhow thanks to this 3rd party library you can atleast avoid buffering huge amounts of data to either memory or disk since it buffers internally ~5MB parts and uploads them automatically within multipart upload for you.

There is also github issue open in SDK repository one can follow to get updates.

Solution 3

It is possible:

AmazonS3 s3Client = AmazonS3ClientBuilder.standard()
    .build();
s3Client.putObject("bucket", "key", youtINputStream, s3MetData)

AmazonS3.putObject

Solution 4

public void saveS3Object(String key, InputStream inputStream) throws Exception {
        List<PartETag> partETags = new ArrayList<>();
        InitiateMultipartUploadRequest initRequest = new
                InitiateMultipartUploadRequest(bucketName, key);

        InitiateMultipartUploadResult initResponse =
                s3.initiateMultipartUpload(initRequest);

        int partSize = 5242880; // Set part size to 5 MB.

        try {
            byte b[] = new byte[partSize];
            int len = 0;
            int i = 1;
            
            while ((len = inputStream.read(b)) >= 0) {
                // Last part can be less than 5 MB. Adjust part size.

                ByteArrayInputStream partInputStream = new ByteArrayInputStream(b,0,len);

                UploadPartRequest uploadRequest = new UploadPartRequest()
                        .withBucketName(bucketName).withKey(key)
                        .withUploadId(initResponse.getUploadId()).withPartNumber(i)
                        .withFileOffset(0)
                        .withInputStream(partInputStream)
                        .withPartSize(len);

                partETags.add(
                        s3.uploadPart(uploadRequest).getPartETag());
                
                i++;
            }

            
            CompleteMultipartUploadRequest compRequest = new
                    CompleteMultipartUploadRequest(
                    bucketName,
                    key,
                    initResponse.getUploadId(),
                    partETags);

            s3.completeMultipartUpload(compRequest);
        } catch (Exception e) {
            s3.abortMultipartUpload(new AbortMultipartUploadRequest(
                    bucketName, key, initResponse.getUploadId()));
        }
}
Share:
12,184

Related videos on Youtube

Aditya Vivek
Author by

Aditya Vivek

Updated on September 18, 2022

Comments

  • Aditya Vivek
    Aditya Vivek about 1 year

    Normally when a file has to be uploaded to s3, it has to first be written to disk, before using something like the TransferManager api to upload to the cloud. This cause data loss if the upload does not finish on time(application goes down and restarts on a different server, etc). So I was wondering if it's possible to write to a stream directly across the network with the required cloud location as the sink.

  • Tyler2P
    Tyler2P about 2 years
    Please don't post only code as answer, but also provide an explanation what your code does and how it solves the problem of the question. Answers with an explanation are usually more helpful and of better quality, and are more likely to attract upvotes.