Count rows within partition in Azure table storage

c# azure azure-table-storage rowcount

20,037

Solution 1

As you may already know that there's no Count like functionality available in Azure Tables. In order to get the total number of entities (rows) in a Partition (or a Table), you have to fetch all entities.

You can reduce the response payload by using a technique called Query Projection. A query projection allows you to specify the list of entity attributes (columns) that you want table service to return. Since you're only interested in total count of entities, I would recommend that you only fetch PartitionKey back. You may find this blog post helpful for understanding about Query Projection: https://blogs.msdn.microsoft.com/windowsazurestorage/2011/09/15/windows-azure-tables-introducing-upsert-and-query-projection/.

Solution 2

https://azure.microsoft.com/en-gb/features/storage-explorer/ allows you to define a Query and you can use the Table Statistics toolbar item to get the total rows for the whole table or your query

Solution 3

Tested the speed using Stopwatch to fetch and count 100,000 entities in a Partition that have three fields in addition to the standard TableEntity.

I select just the PartitionKey and use a resolver to end up with just a list of strings, which once the entire Partition has been retrieved I count.

Fastest I have got it is around 6000ms - 6500ms. Here is the function:

public static async Task<int> GetCountOfEntitiesInPartition(string tableName, string partitionKey)
    {
        CloudTable table = tableClient.GetTableReference(tableName);

        TableQuery<DynamicTableEntity> tableQuery = new TableQuery<DynamicTableEntity>().Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, partitionKey)).Select(new string[] { "PartitionKey" });

        EntityResolver<string> resolver = (pk, rk, ts, props, etag) => props.ContainsKey("PartitionKey") ? props["PartitionKey"].StringValue : null;

        List<string> entities = new List<string>();

        TableContinuationToken continuationToken = null;
        do
        {
            TableQuerySegment<string> tableQueryResult =
                await table.ExecuteQuerySegmentedAsync(tableQuery, resolver, continuationToken);

            continuationToken = tableQueryResult.ContinuationToken;

            entities.AddRange(tableQueryResult.Results);
        } while (continuationToken != null);

        return entities.Count;
    }

This is a generic function, all you need is the tableName and partitionKey.

Solution 4

You could achieve this by leveraging atomic batch operation of azure table storage service pretty efficiently. For every partition have an additional entity with the same partition key and a specific row key like "PartitionCount" etc. That entity will have a single int (or long ) property Count.

Every time you insert a new entity do an atomic batch operation to also increment the Count property of your partition counter entity. Your partition counter entity will have the same partition key with your data entity so that allows you to do an atomic batch operation with guaranteed consistency.

Every time you delete an entity, go and decrement the Count property of the partition counter entity. Again in a batch execute operation so these 2 operations are consistent.

If you want to just read the value of partition count then all you need to do is to make a single point query to the partition counter entity and its Count property will tell you the current count for that partition.

View more solutions

20,037

Author by

goelze

Updated on July 27, 2022

Comments

goelze almost 2 years

I've seen various questions around SO about how to get the total row count of an Azure storage table, but I want to know how to get the number of rows within a single partition.

How can I do this while loading a minimal amount of entity data into memory?
Rambalac almost 7 years

Azure Storage Table does not have atomic operations. Every "atomic" operation would require multiple request for read and merge.
Dogu Arslan almost 7 years

well lets start building up knowledge first before we post up. see here: docs.microsoft.com/en-us/rest/api/storageservices/… and the comment "Operations within a change set are processed atomically; that is, all operations in the change set either succeed or fail. Operations are processed in the order they are specified in the change set."
Rambalac almost 7 years

If you never used Azure Storage, please at least read documentation carefully. Azure storage does not have any atomic batch or increment operations over single item. That "either succeed or fail" means you have to repeat retrieve, increment, merge continuously till success, which in concurrent environment means to increment one item you have to send tens of request.
Dogu Arslan almost 7 years

Surely you are not reading or getting the one line I pasted from the documentation ironically. Just search for the substring atomic in that sentence. From client side a batch operation is atomic simply explaining this to you either all operations succeed or all fail. And that is a general terminology used by the official documentation and industry to refer batch operations. I don't think you have ever used any batch operation because this would be simple to understand then.
Rambalac almost 7 years

Ironically you did not read your own "prove". You did not read what is batch operation. Obviously you've never used Azure Storage. Batch operations are limited only to Storage operations and one entity can be only once in one batch. There is no Azure Storage operation incrementing or modifying existing values, only replacing whole. You cannot read value, increment it and update in one atomic batch.
Dogu Arslan almost 7 years

Ok I am explaining you my answer in simpler terms so that you could grasp it at the end of the day I should have written the answer for all levels of experience and expertise like none in your case. When I say increment in the answer, it is basically reading the Partition Count entity first which has the same PK as the item you want to insert, then on the client side incrementing its Partition Count property and then making a batch request to both insert new entity with that partition key but also replace the Partition Count entity with the incremented count value and rely on Optimistic Conc.
Dogu Arslan almost 7 years

So if in the meantime another client inserted a new item with the same PK then it would have updated the partition Count entity as well and our ENTIRE batch request would fail with 409 because etags on the Partition Count entity would not match, then you would retry same logic, again read the latest partition count entity, increment and do another batch operation. Because batch operation is atomic in that if replacing the Partition Count entity fails it would also fail inserting the new entity as such we would be in consistent state and safely retry. Too complicated fr you ?
Morten Holdflod Møller over 6 years

So the client will 1) read the count record for PK X 2) Construct the new table entity with PK X 3) Increment the local count variable 4) Create a TableBatchOperation with insert new table entity and a merge on the count record. Is that, right?
Dogu Arslan over 6 years

yes. and delete entity operation is the same instead of increment obviously you decrement the counter.
Morten Holdflod Møller over 6 years

How will Table Storage handle the case where two clients A and B interleave the 4 steps like A1, B1, B2, B3, B4, A2, A3, A4?
Dogu Arslan over 6 years

B will succeed A4 will fail because the Etag of the Count entity would be updated by B4 and will not match the one in A4 (see Optimistic Concurrency in Azure Table Storage). I explained these in detail in the answer and comments.
George about 6 years

This is a great thread, a++ would read again. While a great strategy for managing a count, this scenario would fail in the case of using table storage's Upsert operations, InsertOrReplace and InsertOrMerge. There's no way to know before the command is run whether the operation will be an insert or a replace/merge, and so you can't know whether to run the increment or not. Sad!
Dogu Arslan about 6 years

Yes for InsertOrMerge/Replace operations, a trade off can be to always Insert first (with the count increment operation) and if that fails with entity already exists error then do an unconditional merge in a 2nd I/O call without any increment.
Emaborsa over 5 years

IMHO this function is useless. It counts the items of the query...what is also written at the bottom of the page.
MBentley about 5 years

The number at the bottom of the page only counts to 1000. The current version will not let you request more. So, this is a useful way around that.
James Westgate almost 5 years

Since you are only after the count you do not need to add the entities fetched to a list, rather just immediately increment a counter with the results count for that segment.
CraftyFella about 4 years

Exactly what I was looking for.. nice one
Christian Ziegler about 4 years

I suppose you meant RowKey instead of PartitionKey?
st0le over 3 years

No, It Won't. The Query will have continuation token. Each Query only returns upto a 1000 records.
jokab over 2 years

After clicking the button was waiting for something to happen. It turned out the result was being printed on the Activities window below. Thanks for this answer @Nigel.