Azure table storage querying partitionkey

16,395

Solution 1

Are you sure you want to use ticks as a partition key? This means that every measureable 100 ns instant becomes it's own partition. With time based data you can use the partition key to specify an interval like every hour, minute or even second and then a row key with the actual timestamp.

That problem aside let me show you how to do the query. First let me comment on how you generate the partition key. I suggest you do it like this:

var partitionKey = DateTime.UtcNow.Ticks.ToString("D18");

Don't use DateTime.Now.ToUniversalTime() to get the current UTC time. It will internally use DateTime.UtcNow, then convert it to the local time zone and ToUniversalTime() will convert back to UTC which is just wasteful (and more time consuming than you may think).

And your ConvertDateTimeToTicks() method serves no other purpose than to get the Ticks property so it is just making your code more complex without adding any value.

Here is how to perform the query:

var days = 5;
var partitionKey = DateTime.UtcNow.AddDays(-days).Ticks.ToString("D18")
var query = new TableQuery<MyEntity>().Where(
  TableQuery.GenerateFilterCondition(
    "PartitionKey",
    QueryComparisons.GreaterThanOrEqual,
    partitionKey
  )
);

The partition key is formatted as an 18 characters string allowing you to use a straightforward comparison.

I suggest that you move the code to generate the partition key (and row key) into a function to make sure that the keys are generated the same way throughout your code.

The reason 18 characters are used is because the Ticks value of a DateTime today as well as many thousands of years in the future uses 18 decimal digits. If you decide to base your partition key on hours, minutes or seconds instead of 100 ns ticks then you can shorten the length of the partition key accordingly.

Solution 2

As Martin suggests, using a timestamp as your partition key is almost certainly not what you want to do.

Partitions are the unit of scale in Azure Table Storage and more or less represent physical segmentation of your data. They're a scalability optimization that allows you to "throw hardware" at the problem of storing more and more data, while maintaining acceptable response times (something which is traditionally hard in data storage). You define the partitions in your data by assigning partition keys to each row. Its almost never desirable that each row lives in its own partition.

In ATS, the row key becomes your unique key within a given partition. So the combination of partition key + row key is the true unique key across the entire ATS table.

There's lots of advice out there for choosing a valid partition key and row key... none of which is generalized. It depends on the nature of your data, your anticipated query patterns, etc.

Choose a partition key that will aggregate your data into a reasonably well-distributed set of "buckets". All things being equal, if you anticipate having 1 million rows in your table, it's often useful to have, say, 10 buckets with 100,000 rows each... or maybe 100 buckets with 10,000 rows each. At query time you'll need to pick the partition(s) you're querying, so the number of buckets may matter to you. "Buckets" often correspond to a natural segmentation concept in your domain... a bucket to represent each US state, or a bucket to represent each department in your company, etc. Note that its not necessary (or often possible) to have perfectly distributed buckets... get as close as you can, with reasonable effort.

One example of where you might intentionally have an uneven distribution is if you intend to vary query patterns by bucket... bucket A will receive lots of cheap, fast queries, bucket B fewer, more expensive queries, etc. Or perhaps bucket A data will remain static while bucket B data changes frequently. This can be accomplished with multiple tables, too... so there's no "one size fits all" answer.

Given the limited knowledge we have of your problem, I like Martin's advice of using a time span as your partition key. Small spans will result in many partitions, and (among other things) make queries that utilize multiple time spans relatively expensive. Larger spans will result in fewer aggregation costs across spans, but will result in bigger partitions and thus more expensive queries within a partition (it will also make identifying a suitable row key potentially more challenging).

Ultimately you'll likely need to experiment with a few options to find the most suitable one for your data and intended queries.

One other piece of advice... don't be afraid to consider duplicating data in multiple data stores to suit widely varying query types. Not every query will work effectively against a single schema or storage configuration. The effort needed to synchronize data across stores may be less than that needed bend query technology X to your will.

more on Partition and Row key choices

also here

Best of luck!

Share:
16,395

Related videos on Youtube

Kurkula
Author by

Kurkula

Updated on June 04, 2022

Comments

  • Kurkula
    Kurkula almost 2 years

    I am using Azure table storage to retrieve data though timestamp filter. I see the execution is very slow as timestamp is not a partition key or row key. I researched on stackoverflow and found that time stamp should be converted to ticks and stored in to Partition key. I did the same and while inserting data I took the below string and inserted tick string to partition key.

    string currentDateTimeTick = ConvertDateTimeToTicks(DateTime.Now.ToUniversalTime()).ToString();
    
    public static long ConvertDateTimeToTicks(DateTime dtInput)
    {
        long ticks = 0;
        ticks = dtInput.Ticks;
        return ticks;
    }
    

    This is fine till here. But When I am trying to retrieve last 5 days data, I am unable to query the tick against partition key. I am trying to get last 5 days data. What was my mistake in the below code?

    int days = 5;
    TableQuery<MyEntity> query = new TableQuery<MyEntity>()
    .Where(TableQuery.GenerateFilterConditionForDate("PartitionKey", QueryComparisons.GreaterThanOrEqual, "0"+DateTimeOffset.Now.AddDays(days).Date.Ticks));
    
  • Kurkula
    Kurkula about 8 years
    Thanks for very useful stuff. Really appreciate your time.
  • JoshL
    JoshL about 8 years
    Thanks for the kind words, glad to help :-)
  • user1676558
    user1676558 about 7 years
    Choosing D18 would confuse me because unless your dates go back to ~300AD, it'll be equivalent to the default ToString() which uses G19. I recommend using D19 so when people see it they can think 'ah, because longs are limited to 19 digits.'
  • Martin Liversage
    Martin Liversage about 7 years
    @user1676558: However, you will most likely not use ticks as partition keys even though it is in the question. If say you want to use hours as the partition key the resulting key will be an 8 digit integer. Using D8 as the format will ensure that numbers with fewer than 8 digits still gives a 8 character partition key with zeros prefixed. However, you are right that for any reasonable date you can most likely just use ToString() because all dates in the not to far future or past will always be 8 digits for hours or 18 digits for ticks etc. My D# is supposed to be a general solution.