Why are S3 and Google Storage bucket names a global namespace?

18,702

Solution 1

“The bucket namespace is global - just like domain names”

http://aws.amazon.com/articles/1109#02

That's more than coincidental.

The reason seems simple enough: buckets and their objects can be accessed through a custom hostname that's the same as the bucket name... and a bucket can optionally host an entire static web site -- with S3 automatically mapping requests from the incoming Host: header onto the bucket of the same name.

In S3, these variant URLs reference the same object "foo.txt" in the bucket "bucket.example.com". The first one works with static website hosting enabled and requires a DNS CNAME (or Alias in Route 53) or a DNS CNAME pointing to the regional REST endpoint; the others require no configuration:

http://bucket.example.com/foo.txt
http://bucket.example.com.s3.amazonaws.com/foo.txt
http://bucket.example.com.s3[-region].amazonaws.com/foo.txt
http://s3[-region].amazonaws.com/bucket.example.com/foo.txt   

If an object store service needs a simple mechanism to resolve the Host: header in an HTTP incoming request into a bucket name, the bucket name namespace also needs to be global. Anything else, it seems, would complicate the implementation significantly.

For hostnames to be mappable to bucket names, something has to be globally unique, since obviously no two buckets could respond to the same hostname. The restriction being applied to the bucket name itself leaves no room for ambiguity.

It also seems likely that many potential clients wouldn't like to have their account identified in bucket names.

Of course, you could always add your account id, or any random string, to your desired bucket name, e.g. jozxyqk-payroll, jozxyqk-personnel, if the bucket name you wanted wasn't available.

Solution 2

The more I drink the greater the concept below makes sense, so I've elevated it from a comment on the accepted answer to its own entity:

An additional thought that popped into my head randomly tonight:

Given the ability to use the generic host names that the various object store services provide, one could easily obscure your corporate (or other) identity as the owner of any given data resource.

So, let's say Black Hat Corp hosts a data resource at http://s3.amazonaws.com/obscure-bucket-name/something-to-be-dissassociated.txt‌​.

It would be very difficult for any non-governmental entity to determine who the owner of that resource is without co-operaton from the object store provider.

Not nefarious by design, just objective pragmatism.

And possibly a stroke of brilliance by the architects of this paradigm

Share:
18,702

Related videos on Youtube

AJB
Author by

AJB

Lover of the beach, sharp minds, and distributed systems.

Updated on June 04, 2022

Comments

  • AJB
    AJB almost 2 years

    This has me puzzled. I can obviously understand why account ID's are global, but why bucket names?

    Wouldn't it make more sense to have something like: https://accountID.storageservice.com/bucketName

    Which would namespace buckets under accountID.

    What am I missing, why did these obviously elite architects choose to handle bucket names this way?

  • AJB
    AJB almost 10 years
    Thank you for the thoughtful, accurate, and informative answer @Michael
  • user3526
    user3526 over 8 years
    this link should be helpful to further understand this answer.
  • AJB
    AJB over 8 years
    An additional thought that popped into my head randomly tonight: Given the ability to use the generic host names that the various object store services provide, one could easily obscure your corporate (or other) identity as the owner of any given data resource. So, let's say Black Hat Corp hosted a data resource at http://s3.amazonaws.com/obscure-bucket-name/something-to-be-‌​dissassociated.txt. It would be very difficult to determine who the owner of that resource is without co-operaton from the object store provider. Not nefarious by design, just objective pragmatism.
  • Michael - sqlbot
    Michael - sqlbot over 8 years
    I'm not gonna let you drink and post... Not voting here, but next time, I'm taking away your keys (from your keyboard). (lol). While true, it does allow anonymity that would appear to require legal intervention in order to pierce, the "reason" there's a global namespace seems more likely to be parallel with the global namespace of DNS hostnames, particularly in light of the fact that there's a close correlation between the valid characters in a hostname and the valid characters in a bucket name.
  • Michael - sqlbot
    Michael - sqlbot over 8 years
    @user3526 thanks, I've incorporated your link into the answer.
  • cdhowie
    cdhowie about 8 years
    I wonder what process AWS has to deal with the situation where someone else has created a bucket with a domain name under your control -- will AWS force them to relinquish that bucket name to you?
  • Michael - sqlbot
    Michael - sqlbot about 8 years
    @cdhowie I'm sure they have something, but I wouldn't worry about it. There is a default limit of 100 buckets per account, and increasing that number requires submitting a request to support, describing your use case. Also, there's an easy workaround -- using CloudFront in front of S3 removes the naming restriction, because a CloudFront distribution can respond to a domain name but fetch content from any bucket you configure it to use, regardless of name -- or even multiple buckets, by path patterns, and is priced such that the charges are negligible when used in conjunction with S3.
  • hqt
    hqt over 6 years
    I don't get a point here: "Anything else, it seems, would complicate the implementation significantly.". We can map between hostname to user/bucket_name. In this case, I don't see any complicate situation. Can you explain more detail please. thanks.
  • Michael - sqlbot
    Michael - sqlbot over 6 years
    @hqt if the bucket namespace were not global, there would have to be a mapping internal to S3. Buckets in account A can be accessed by users in Account B, if both accounts grant the necessary permissions. Buckets can also be made public. How would these things work if the bucket namespace weren't global? How much more complicated would it be?
  • Zachary Weixelbaum
    Zachary Weixelbaum over 5 years
    I can understand why you needed to be drinking for this to make sense, because that is not at all the reason why buckets are unique
  • AJB
    AJB over 5 years
    @ZacharyWeixelbaum This isn't about uniqueness, it's obvious why two buckets can't have the same name. This is about the ability to create a bucket name that's not associated with any given accountID, therefore it can be disassociated from the owner.
  • AJB
    AJB over 5 years
    Actually, @Michael-sqlbot, I think @hqt has a solid question. The idea you're putting forth that the "mapping" is overly complicated by not using a global namespace doesn't really make sense. Consider https://accountID.storageservice.com/bucketName. DNS itself would handle everything up to the path, and then that would need to be parsed with the same effort as any typical storage service URL. Honestly, I can't help but keep thinking that my theory of data-disassociation is making more and more sense ;)
  • Michael - sqlbot
    Michael - sqlbot over 5 years
    @AJB The point is that with a global namespace, additional mapping is unnecessary. But your suggestion also has no ability to handle geographically-independent systems. The hostname used to access the bucket needs to also route the request to the correct region, because you can't use the path to accomplish that... so a hostname tied to an account number is a non-starter.