AWS: how to fix S3 event replacing space with '+' sign in object key names in json

java json amazon-web-services amazon-s3 aws-lambda

20,790

Solution 1

What I have done to fix this is

java.net.URLDecoder.decode(b.getS3().getObject().getKey(), "UTF-8")


{
    "Records": [
        {
            "s3": {
                "object": {
                    "key": "New+Text+Document.txt"
                }
            }
        }
    ]
}

So now the JSon value, "New+Text+Document.txt" gets converted to New Text Document.txt, correctly.

This has fixed my issue, please suggest if this is very correct solution. Will there be any corner case that can break my implementation.

Solution 2

I came across this looking for a solution for a lambda written in python instead of java; "urllib.parse.unquote_plus" worked for me, it properly handled a file with both spaces and + signs:

from urllib.parse import unquote_plus
import boto3


bucket = 'testBucket1234'
# uploaded file with name 'foo + bar.txt' for test, s3 Put event passes following encoded object_key
object_key = 'foo %2B bar.txt'
print(object_key)
object_key = unquote_plus(object_key)
print(object_key)

client = boto3.client('s3')
client.get_object(Bucket=bucket, Key=object_key)

Solution 3

NodeJS, Javascript or Typescript

Since we are sharing for other runtimes here is how to do it in NodeJS:

const srcKey = decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, " "));

I would say this is an official solution since it comes from the AWS docs here

Solution 4

I think in Java you should use:

getS3().getObject().getUrlDecodedKey()

method that returns decoded key, instead of

getS3().getObject().getKey()

Solution 5

in ASP.Net has UrlDecode. The sample is below.

HttpUtility.UrlDecode(s3entity.Object.Key, Encoding.UTF8)

View more solutions

20,790

Author by

ViS

Updated on July 09, 2022

Comments

ViS almost 2 years

I have a lamba function to copy objects from bucket 'A' to bucket 'B', and everything was working fine, until and object with name 'New Text Document.txt' was created in bucket 'A', the json that gets built in S3 event, key as "key": "New+Text+Document.txt".

the spaces got replaced with '+'. I know it is a known issue by seraching on web. But I am not sure how to fix this and the incoming json itself has a '+' and '+' can be actually in the name of the file. like 'New+Text Document.txt'.

So I cannot blindly have logic to space '+' by ' ' in my lambda function.

Due to this issue, when code tries to find the file in bucket it fails to find it.

Please suggest.
Michael - sqlbot almost 7 years

This should be the correct solution. Unless there are edge/corner cases not handled in an expected/sensible fashion by java.net.URLDecoder.decode(), your solution seems exactly correct.
Ariel Araza about 5 years

The problem is that 1. "New+Text+Document.txt" and 2. "New Text Document.txt", and 3. "New Text+Document.txt" will be the same in the event (key: "New+Text+Document.txt"). Your code will be fail on cases 1 and 3.
Scott over 4 years

the problem he's describing and that led me here is that the lambda 'create object' event trigger is what includes the + for space, which means you don't have an object yet because the key (as returned by the event) doesn't match any objects in the bucket.
Threadid almost 4 years

I have the exact same problem as the question. This solution solves the problem using a native method available in the Object - simple and elegant. It returns the key with out the encoding. The subsequent getObject operation finds the file key successfully and moves the file from Bucket A to Bucket B.
gipsh almost 4 years

same issue in golang, fixed with url.QueryUnescape(s3key) from net/url
alanning almost 3 years

@ArielAraza Decoding works because the key sent to lambda is already Url encoded. In the case of a file named, "my file with spaces + and plus test.csv", the key sent to lambda is "my+file+with+spaces+%2B+and+plus+test.csv". (Note the "+" was replaced with "%2B".)
Marcin over 2 years

Thanks. Exactly what I was looking for.