How can I update more than 500 docs in Firestore using Batch?

12,370

Solution 1

I also ran into the problem to update more than 500 documents inside a Firestore collection. And i would like to share how i solved this problem.

I use cloud functions to update my collection inside Firestore but this should also work on client side code.

The solution counts every operation which is made to the batch and after the limit is reached a new batch is created and pushed to the batchArray.

After all updates are completed the code loops through the batchArray and commits every batch which is inside the array.

It is important to count every operation set(), update(), delete() which is made to the batch because they all count to the 500 operation limit.

const documentSnapshotArray = await firestore.collection('my-collection').get();

const batchArray = [];
batchArray.push(firestore.batch());
let operationCounter = 0;
let batchIndex = 0;

documentSnapshotArray.forEach(documentSnapshot => {
    const documentData = documentSnapshot.data();

    // update document data here...

    batchArray[batchIndex].update(documentSnapshot.ref, documentData);
    operationCounter++;

    if (operationCounter === 499) {
      batchArray.push(firestore.batch());
      batchIndex++;
      operationCounter = 0;
    }
});

batchArray.forEach(async batch => await batch.commit());

return;

Solution 2

I liked this simple solution:

const users = await db.collection('users').get()

const batches = _.chunk(users.docs, 500).map(userDocs => {
    const batch = db.batch()
    userDocs.forEach(doc => {
        batch.set(doc.ref, { field: 'myNewValue' }, { merge: true })
    })
    return batch.commit()
})

await Promise.all(batches)

Just remember to add import * as _ from "lodash" at the top. Based on this answer.

Solution 3

As mentioned above, @Sebastian's answer is good and I upvoted that too. Although faced an issue while updating 25000+ documents in one go. The tweak to logic is as below.

console.log(`Updating documents...`);
let collectionRef = db.collection('cities');
try {
  let batch = db.batch();
  const documentSnapshotArray = await collectionRef.get();
  const records = documentSnapshotArray.docs;
  const index = documentSnapshotArray.size;
  console.log(`TOTAL SIZE=====${index}`);
  for (let i=0; i < index; i++) {
    const docRef = records[i].ref;
    // YOUR UPDATES
    batch.update(docRef, {isDeleted: false});
    if ((i + 1) % 499 === 0) {
      await batch.commit();
      batch = db.batch();
    }
  }
  // For committing final batch
  if (!(index % 499) == 0) {
    await batch.commit();
  }
  console.log('write completed');
} catch (error) {
  console.error(`updateWorkers() errored out : ${error.stack}`);
  reject(error);
}

Solution 4

You can use default BulkWriter. This method used 500/50/5 rule.

Example:

let bulkWriter = firestore.bulkWriter();

bulkWriter.create(documentRef, {foo: 'bar'});
bulkWriter.update(documentRef2, {foo: 'bar'});
bulkWriter.delete(documentRef3);
await close().then(() => {
  console.log('Executed all writes');
});

Solution 5

Explanations given on previous comments already explain the issue.

I'm sharing the final code that I built and worked for me, since I needed something that worked in a more decoupled manner, instead of the way that most of the solutions presented above do.

import { FireDb } from "@services/firebase"; // = firebase.firestore();

type TDocRef = FirebaseFirestore.DocumentReference;
type TDocData = FirebaseFirestore.DocumentData;

let fireBatches = [FireDb.batch()];
let batchSizes = [0];
let batchIdxToUse = 0;

export default class FirebaseUtil {
  static addBatchOperation(
    operation: "create",
    ref: TDocRef,
    data: TDocData
  ): void;
  static addBatchOperation(
    operation: "update",
    ref: TDocRef,
    data: TDocData,
    precondition?: FirebaseFirestore.Precondition
  ): void;
  static addBatchOperation(
    operation: "set",
    ref: TDocRef,
    data: TDocData,
    setOpts?: FirebaseFirestore.SetOptions
  ): void;
  static addBatchOperation(
    operation: "create" | "update" | "set",
    ref: TDocRef,
    data: TDocData,
    opts?: FirebaseFirestore.Precondition | FirebaseFirestore.SetOptions
  ): void {
    // Lines below make sure we stay below the limit of 500 writes per
    // batch
    if (batchSizes[batchIdxToUse] === 500) {
      fireBatches.push(FireDb.batch());
      batchSizes.push(0);
      batchIdxToUse++;
    }
    batchSizes[batchIdxToUse]++;

    const batchArgs: [TDocRef, TDocData] = [ref, data];
    if (opts) batchArgs.push(opts);

    switch (operation) {
      // Specific case for "set" is required because of some weird TS
      // glitch that doesn't allow me to use the arg "operation" to
      // call the function
      case "set":
        fireBatches[batchIdxToUse].set(...batchArgs);
        break;
      default:
        fireBatches[batchIdxToUse][operation](...batchArgs);
        break;
    }
  }

  public static async runBatchOperations() {
    // The lines below clear the globally available batches so we
    // don't run them twice if we call this function more than once
    const currentBatches = [...fireBatches];
    fireBatches = [FireDb.batch()];
    batchSizes = [0];
    batchIdxToUse = 0;

    await Promise.all(currentBatches.map((batch) => batch.commit()));
  }
}

Share:
12,370

Related videos on Youtube

Utkarsh Bhatt
Author by

Utkarsh Bhatt

Noob CS student, interested in Operating Systems and Computer Architecture.

Updated on June 17, 2022

Comments

  • Utkarsh Bhatt
    Utkarsh Bhatt about 2 years

    I'm trying to update a field timestamp with the Firestore admin timestamp in a collection with more than 500 docs.

    const batch = db.batch();
    const serverTimestamp = admin.firestore.FieldValue.serverTimestamp();
    
    db
      .collection('My Collection')
      .get()
      .then((docs) => {
        serverTimestamp,
      }, {
        merge: true,
      })
      .then(() => res.send('All docs updated'))
      .catch(console.error);
    

    This throws an error

    { Error: 3 INVALID_ARGUMENT: cannot write more than 500 entities in a single call
        at Object.exports.createStatusError (C:\Users\Growthfile\Desktop\cf-test\functions\node_modules\grpc\src\common.js:87:15)
        at Object.onReceiveStatus (C:\Users\Growthfile\Desktop\cf-test\functions\node_modules\grpc\src\client_interceptors.js:1188:28)
        at InterceptingListener._callNext (C:\Users\Growthfile\Desktop\cf-test\functions\node_modules\grpc\src\client_interceptors.js:564:42)
        at InterceptingListener.onReceiveStatus (C:\Users\Growthfile\Desktop\cf-test\functions\node_modules\grpc\src\client_interceptors.js:614:8)
        at callback (C:\Users\Growthfile\Desktop\cf-test\functions\node_modules\grpc\src\client_interceptors.js:841:24)
      code: 3,
      metadata: Metadata { _internal_repr: {} },
      details: 'cannot write more than 500 entities in a single call' }
    

    Is there a way that I can write a recursive method which creates a batch object updating a batch of 500 docs one by one until all the docs are updated.

    From the docs I know that delete operation is possible with the recursive approach as mentioned here:

    https://firebase.google.com/docs/firestore/manage-data/delete-data#collections

    But, for updating, I'm not sure how to end the execution since the docs are not being deleted.

    • Borko Kovacev
      Borko Kovacev almost 6 years
      Why dont you iterate through all the 500 docs, update and and use the last doc key to construct startAt to create a new query?
    • Stathis Ntonas
      Stathis Ntonas about 4 years
      You can limit and then batch recursively, faced same issue and this was my solution: stackoverflow.com/a/61639536/2195000
  • Adarsh
    Adarsh almost 5 years
    how do u ensure that all the batches are executed successfully as only the operations within a batch are atomic. It would lead to data inconsistency if some batches executed and some didn't
  • Sebastian Vischer
    Sebastian Vischer almost 5 years
    @Adarsh Yes, you are right. I have left out the error handling part. I will add this part to the answer soon. I have updated my database to a new data model which was an idempotent operation in my case. So i could repeat the code until every batch succeeds.
  • Adarsh
    Adarsh almost 5 years
    So there are couple of things which you can do. You can check the retry option when creating the cloud function. This will make sure your cloud function executes on any exception. But you will have to handle which failure you consider as transient else it will turn out to be an endless loop. Also some kind of state has to be maintained between cloud function executions so that the batches executed earlier aren't executed again. Maybe you can write to realtime database/firestore on every successful batch operation and carry on from there when some batch didn't in the next retry
  • Adarsh
    Adarsh almost 5 years
    Or you could write the job details (update details) to let's say /queue/pendingUpdates/ and write a cloud function which runs on a schedule (say every 5mins) which performs the updates. Once the operation is successful, you can delete/mark the job as completed. Else it retries automatically in the next interval. This is lot easier than the first one. Your thoughts?
  • Sebastian Vischer
    Sebastian Vischer almost 5 years
    I do not know your use case. Do you often write more than 500 documents?
  • Adarsh
    Adarsh almost 5 years
    consider this scenerio: User details are denormalized into audit trail collection. When a user makes any changes, an entry is made to audit trail. When the user updates their profile photo, username, phone number or email, it has to be updated in all documents having the denormalized user data which eventually can exceed 500 documents count
  • Sebastian Vischer
    Sebastian Vischer almost 5 years
    I do not know your use case. Do you often write more than 500 documents? Maybe you could structure your data differently? Your solution with the state written to the database is ok but these writes could also fail and mess up your data. I would consider a solution with an query for not updated documents then as soon as a document is updated it is no longer in the query. You could repeat this until the query is empty. But this depends on your use case. If you know how the updated data should look like you could also use transactions.
  • Sebastian Vischer
    Sebastian Vischer almost 5 years
    I prefer not to denormalize data in a noSQL database. I only have one or a few documents per user and all other users get the data from these few documents. This way you can scale your app properly if you have a lot of users. With denormalized data your app will be very inefficient.
  • Adarsh
    Adarsh almost 5 years
    The reason data is denormalized is because the number of times the reads happen > number of times writes happen. In your case, you will have to fetch the user details again (2 reads instead of 1 per user). NoSQL encourages denormalization of data as well. Is there any reason you haven't denormalized the data? what happens when your user base grows or users start sharing the same document etc?
  • Sebastian Vischer
    Sebastian Vischer almost 5 years
  • Mihae Kheel
    Mihae Kheel over 3 years
    @Sebe have you tested this for real life scenario? Does this creates new batch object whenever the batch write reach 500? Thanks
  • Sebastian Vischer
    Sebastian Vischer over 3 years
    @Mihae Kheel Yes, the loop creates a new batch after it reaches 500 operations, but it is important to count every operation. Also you need some form of error handling.
  • Mihae Kheel
    Mihae Kheel over 3 years
    @SebastianVischer it seems the code and logic works fine when I used it. Thank you very much
  • Matt Fletcher
    Matt Fletcher over 3 years
    "using typescript" ... I don't see any typescript
  • saurabh
    saurabh over 3 years
    The answer is great but I had 29000 documents in a collection to be updated and this failed. Unfortunately, as there is no exception handling, I was getting the errors after 2-3 minutes and was difficult to find. Error with code 16 something. So I tweaked the logic a bit to have some gap between the batch commits (guess that worked.) Will try to add that as another answer. Thanks.
  • Sebastian Vischer
    Sebastian Vischer over 3 years
    @saurabh I have never tried with this many documents. Maybe there is some kind of limit for commits. I like your solution with committing the batch after you reach 500 operation. In my opinion the simpler solution.
  • optimista
    optimista about 3 years
    @Adarsh what about Promise.all(batchArray.map(batch => batch.commit()).then().catch(); ?
  • Michel K
    Michel K about 3 years
    This should be part of the official documentation. Or at least something similar for no dependancy on lodash. Works like a charm! :)
  • tai C
    tai C almost 3 years
    @MattFletcher loadash wroten in Vanilla JS if you want type support install @types/lodash
  • jscuba
    jscuba about 2 years
    The code can return before the commits complete. The batchArray.forEach() line could be: await Promise.all(batchArray.map(batch => batch.commit()));