Optional secondary indexes in DynamoDB

11,962

When you asked this question, DynamoDB did not have Global Secondary Indexes: http://aws.amazon.com/about-aws/whats-new/2013/12/12/announcing-amazon-dynamodb-global-secondary-indexes/

Now, it does.

A local secondary index is best thought of, and functions as, a secondary range key. @andreimarinescu is right: you still must query by the item's hash key, only with a secondary index you can use a limited subset of a DynamoDB query's comparison operators on that range key (e.g. greater than, equal to, less than, etc.) So, you still need to know which "hash bucket" you're performing the comparison within.

Global secondary indexes are a bit of a different beast. They are more like a secondary version of your table (and Amazon charges you similarly in terms of provisioned throughput). You can use non-primary key attributes of your table as primary key attributes of your index in a global secondary index, and query them accordingly.

For example, if your table looks like:

|**Hash key**: Item ID | **Range Key**: Serial No | **Attribute**: Business ID |
--------------------------------------------------------------------------------
|           1          |        12345             |             1A             |
--------------------------------------------------------------------------------    
|           2          |        45678             |             2B             |
-------------------------------------------------------------------------------- 
|           3          |        34567             |            (empty)         |
--------------------------------------------------------------------------------
|           3          |        12345             |             2B             |
--------------------------------------------------------------------------------

Then, with a local secondary index on Business ID you could perform queries like, "find all the items with a hash key of 3 and a business ID equal to 2B", but you could not do "find all items with a business ID equal to 2B" because the secondary index requires a hash key.

If you were to add a global secondary index using business ID, then you could perform such queries. You would essentially be providing an alternate primary key for the table. You could perform a query like "find all items with a business ID equal to 2B and get items 2-45678 and 3-12345 as a response.

Sparse indexes work fine with DynamoDB; it's perfectly allowable that not all the items have a business ID and can allow you to keep the provisioned throughput on your index lower than that of the table depending on how many items you anticipate having a business ID.

Share:
11,962

Related videos on Youtube

nullPainter
Author by

nullPainter

Updated on June 04, 2022

Comments

  • nullPainter
    nullPainter almost 2 years

    I am migrating my persistence tier from Riak to DynamoDB. My data model contains an optional business identifier field, which is desired to be able to be queried as an alternative to the key.

    It appears that DynamoDB secondary indexes can't be null and require a range key, so despite the similar name to Riak's secondary indexes, make this appear quite a different beast.

    Is there an elegant way to efficiently query my optional field, short of throwing the data in an external search index?

    • andreimarinescu
      andreimarinescu over 10 years
      From my past experience, a query requires that the primary index is also queried, in addition to the secondary one. Point being that you won't be able to query by the secondary index alone. This might have changed in the meantime, but I doubt it. You can easily run a query to check this.
    • nullPainter
      nullPainter over 10 years
      Thanks Andrei; that was the conclusion that I came to. I ended up querying my Elasticsearch index for retrieving data by business identifier. It's kinda misusing a search index, but needs must.
  • Emil
    Emil over 7 years
    so globa secondary index is like making a copy of your table with different hash key. that means your storage usage will increase. for example,if my table is 100mb large using secondary index will make it 200mb used. is it true?
  • rpmartz
    rpmartz over 7 years
    @batmaci Sort of. What I mean by that is that you will have to pay for additional provisioned throughput on the table for a global secondary index. Note that you do not have to double your provisioned throughput for the table - you can set the main throughput capacities and the index throughput capacities independently of one another. I don't believe it would actually require doubling the actual storage size of the data, but that's an implementation detail that Amazon handles and you don't have to worry about.