Count rows within partition in Azure table storage

20,037

Solution 1

As you may already know that there's no Count like functionality available in Azure Tables. In order to get the total number of entities (rows) in a Partition (or a Table), you have to fetch all entities.

You can reduce the response payload by using a technique called Query Projection. A query projection allows you to specify the list of entity attributes (columns) that you want table service to return. Since you're only interested in total count of entities, I would recommend that you only fetch PartitionKey back. You may find this blog post helpful for understanding about Query Projection: https://blogs.msdn.microsoft.com/windowsazurestorage/2011/09/15/windows-azure-tables-introducing-upsert-and-query-projection/.

Solution 2

https://azure.microsoft.com/en-gb/features/storage-explorer/ allows you to define a Query and you can use the Table Statistics toolbar item to get the total rows for the whole table or your query

enter image description here

Solution 3

Tested the speed using Stopwatch to fetch and count 100,000 entities in a Partition that have three fields in addition to the standard TableEntity.

I select just the PartitionKey and use a resolver to end up with just a list of strings, which once the entire Partition has been retrieved I count.

Fastest I have got it is around 6000ms - 6500ms. Here is the function:

public static async Task<int> GetCountOfEntitiesInPartition(string tableName, string partitionKey)
    {
        CloudTable table = tableClient.GetTableReference(tableName);

        TableQuery<DynamicTableEntity> tableQuery = new TableQuery<DynamicTableEntity>().Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, partitionKey)).Select(new string[] { "PartitionKey" });

        EntityResolver<string> resolver = (pk, rk, ts, props, etag) => props.ContainsKey("PartitionKey") ? props["PartitionKey"].StringValue : null;

        List<string> entities = new List<string>();

        TableContinuationToken continuationToken = null;
        do
        {
            TableQuerySegment<string> tableQueryResult =
                await table.ExecuteQuerySegmentedAsync(tableQuery, resolver, continuationToken);

            continuationToken = tableQueryResult.ContinuationToken;

            entities.AddRange(tableQueryResult.Results);
        } while (continuationToken != null);

        return entities.Count;
    }

This is a generic function, all you need is the tableName and partitionKey.

Solution 4

You could achieve this by leveraging atomic batch operation of azure table storage service pretty efficiently. For every partition have an additional entity with the same partition key and a specific row key like "PartitionCount" etc. That entity will have a single int (or long ) property Count.

Every time you insert a new entity do an atomic batch operation to also increment the Count property of your partition counter entity. Your partition counter entity will have the same partition key with your data entity so that allows you to do an atomic batch operation with guaranteed consistency.

Every time you delete an entity, go and decrement the Count property of the partition counter entity. Again in a batch execute operation so these 2 operations are consistent.

If you want to just read the value of partition count then all you need to do is to make a single point query to the partition counter entity and its Count property will tell you the current count for that partition.

Share:
20,037
goelze
Author by

goelze

Updated on July 27, 2022

Comments

  • goelze
    goelze almost 2 years

    I've seen various questions around SO about how to get the total row count of an Azure storage table, but I want to know how to get the number of rows within a single partition.

    How can I do this while loading a minimal amount of entity data into memory?

  • Rambalac
    Rambalac almost 7 years
    Azure Storage Table does not have atomic operations. Every "atomic" operation would require multiple request for read and merge.
  • Dogu Arslan
    Dogu Arslan almost 7 years
    well lets start building up knowledge first before we post up. see here: docs.microsoft.com/en-us/rest/api/storageservices/… and the comment "Operations within a change set are processed atomically; that is, all operations in the change set either succeed or fail. Operations are processed in the order they are specified in the change set."
  • Rambalac
    Rambalac almost 7 years
    If you never used Azure Storage, please at least read documentation carefully. Azure storage does not have any atomic batch or increment operations over single item. That "either succeed or fail" means you have to repeat retrieve, increment, merge continuously till success, which in concurrent environment means to increment one item you have to send tens of request.
  • Dogu Arslan
    Dogu Arslan almost 7 years
    Surely you are not reading or getting the one line I pasted from the documentation ironically. Just search for the substring atomic in that sentence. From client side a batch operation is atomic simply explaining this to you either all operations succeed or all fail. And that is a general terminology used by the official documentation and industry to refer batch operations. I don't think you have ever used any batch operation because this would be simple to understand then.
  • Rambalac
    Rambalac almost 7 years
    Ironically you did not read your own "prove". You did not read what is batch operation. Obviously you've never used Azure Storage. Batch operations are limited only to Storage operations and one entity can be only once in one batch. There is no Azure Storage operation incrementing or modifying existing values, only replacing whole. You cannot read value, increment it and update in one atomic batch.
  • Dogu Arslan
    Dogu Arslan almost 7 years
    Ok I am explaining you my answer in simpler terms so that you could grasp it at the end of the day I should have written the answer for all levels of experience and expertise like none in your case. When I say increment in the answer, it is basically reading the Partition Count entity first which has the same PK as the item you want to insert, then on the client side incrementing its Partition Count property and then making a batch request to both insert new entity with that partition key but also replace the Partition Count entity with the incremented count value and rely on Optimistic Conc.
  • Dogu Arslan
    Dogu Arslan almost 7 years
    So if in the meantime another client inserted a new item with the same PK then it would have updated the partition Count entity as well and our ENTIRE batch request would fail with 409 because etags on the Partition Count entity would not match, then you would retry same logic, again read the latest partition count entity, increment and do another batch operation. Because batch operation is atomic in that if replacing the Partition Count entity fails it would also fail inserting the new entity as such we would be in consistent state and safely retry. Too complicated fr you ?
  • Morten Holdflod Møller
    Morten Holdflod Møller over 6 years
    So the client will 1) read the count record for PK X 2) Construct the new table entity with PK X 3) Increment the local count variable 4) Create a TableBatchOperation with insert new table entity and a merge on the count record. Is that, right?
  • Dogu Arslan
    Dogu Arslan over 6 years
    yes. and delete entity operation is the same instead of increment obviously you decrement the counter.
  • Morten Holdflod Møller
    Morten Holdflod Møller over 6 years
    How will Table Storage handle the case where two clients A and B interleave the 4 steps like A1, B1, B2, B3, B4, A2, A3, A4?
  • Dogu Arslan
    Dogu Arslan over 6 years
    B will succeed A4 will fail because the Etag of the Count entity would be updated by B4 and will not match the one in A4 (see Optimistic Concurrency in Azure Table Storage). I explained these in detail in the answer and comments.
  • George
    George about 6 years
    This is a great thread, a++ would read again. While a great strategy for managing a count, this scenario would fail in the case of using table storage's Upsert operations, InsertOrReplace and InsertOrMerge. There's no way to know before the command is run whether the operation will be an insert or a replace/merge, and so you can't know whether to run the increment or not. Sad!
  • Dogu Arslan
    Dogu Arslan about 6 years
    Yes for InsertOrMerge/Replace operations, a trade off can be to always Insert first (with the count increment operation) and if that fails with entity already exists error then do an unconditional merge in a 2nd I/O call without any increment.
  • Emaborsa
    Emaborsa over 5 years
    IMHO this function is useless. It counts the items of the query...what is also written at the bottom of the page.
  • MBentley
    MBentley about 5 years
    The number at the bottom of the page only counts to 1000. The current version will not let you request more. So, this is a useful way around that.
  • James Westgate
    James Westgate almost 5 years
    Since you are only after the count you do not need to add the entities fetched to a list, rather just immediately increment a counter with the results count for that segment.
  • CraftyFella
    CraftyFella about 4 years
    Exactly what I was looking for.. nice one
  • Christian Ziegler
    Christian Ziegler about 4 years
    I suppose you meant RowKey instead of PartitionKey?
  • st0le
    st0le over 3 years
    No, It Won't. The Query will have continuation token. Each Query only returns upto a 1000 records.
  • jokab
    jokab over 2 years
    After clicking the button was waiting for something to happen. It turned out the result was being printed on the Activities window below. Thanks for this answer @Nigel.