AWS: how to fix S3 event replacing space with '+' sign in object key names in json

20,790

Solution 1

What I have done to fix this is

java.net.URLDecoder.decode(b.getS3().getObject().getKey(), "UTF-8")


{
    "Records": [
        {
            "s3": {
                "object": {
                    "key": "New+Text+Document.txt"
                }
            }
        }
    ]
}

So now the JSon value, "New+Text+Document.txt" gets converted to New Text Document.txt, correctly.

This has fixed my issue, please suggest if this is very correct solution. Will there be any corner case that can break my implementation.

Solution 2

I came across this looking for a solution for a lambda written in python instead of java; "urllib.parse.unquote_plus" worked for me, it properly handled a file with both spaces and + signs:

from urllib.parse import unquote_plus
import boto3


bucket = 'testBucket1234'
# uploaded file with name 'foo + bar.txt' for test, s3 Put event passes following encoded object_key
object_key = 'foo %2B bar.txt'
print(object_key)
object_key = unquote_plus(object_key)
print(object_key)

client = boto3.client('s3')
client.get_object(Bucket=bucket, Key=object_key)

Solution 3

NodeJS, Javascript or Typescript

Since we are sharing for other runtimes here is how to do it in NodeJS:

const srcKey = decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, " "));

I would say this is an official solution since it comes from the AWS docs here

Solution 4

I think in Java you should use:

getS3().getObject().getUrlDecodedKey()

method that returns decoded key, instead of

getS3().getObject().getKey()

Solution 5

in ASP.Net has UrlDecode. The sample is below.

HttpUtility.UrlDecode(s3entity.Object.Key, Encoding.UTF8)
Share:
20,790
ViS
Author by

ViS

Updated on July 09, 2022

Comments

  • ViS
    ViS almost 2 years

    I have a lamba function to copy objects from bucket 'A' to bucket 'B', and everything was working fine, until and object with name 'New Text Document.txt' was created in bucket 'A', the json that gets built in S3 event, key as "key": "New+Text+Document.txt".

    the spaces got replaced with '+'. I know it is a known issue by seraching on web. But I am not sure how to fix this and the incoming json itself has a '+' and '+' can be actually in the name of the file. like 'New+Text Document.txt'.

    So I cannot blindly have logic to space '+' by ' ' in my lambda function.

    Due to this issue, when code tries to find the file in bucket it fails to find it.

    Please suggest.

  • Michael - sqlbot
    Michael - sqlbot almost 7 years
    This should be the correct solution. Unless there are edge/corner cases not handled in an expected/sensible fashion by java.net.URLDecoder.decode(), your solution seems exactly correct.
  • Ariel Araza
    Ariel Araza about 5 years
    The problem is that 1. "New+Text+Document.txt" and 2. "New Text Document.txt", and 3. "New Text+Document.txt" will be the same in the event (key: "New+Text+Document.txt"). Your code will be fail on cases 1 and 3.
  • Scott
    Scott over 4 years
    the problem he's describing and that led me here is that the lambda 'create object' event trigger is what includes the + for space, which means you don't have an object yet because the key (as returned by the event) doesn't match any objects in the bucket.
  • Threadid
    Threadid almost 4 years
    I have the exact same problem as the question. This solution solves the problem using a native method available in the Object - simple and elegant. It returns the key with out the encoding. The subsequent getObject operation finds the file key successfully and moves the file from Bucket A to Bucket B.
  • gipsh
    gipsh almost 4 years
    same issue in golang, fixed with url.QueryUnescape(s3key) from net/url
  • alanning
    alanning almost 3 years
    @ArielAraza Decoding works because the key sent to lambda is already Url encoded. In the case of a file named, "my file with spaces + and plus test.csv", the key sent to lambda is "my+file+with+spaces+%2B+and+plus+test.csv". (Note the "+" was replaced with "%2B".)
  • Marcin
    Marcin over 2 years
    Thanks. Exactly what I was looking for.