How to import large JSON datasets to Cloud Firestore periodically?

605

You appear to have hit the writes and transactions limit of 500.

Limit Details
Maximum API request size 10 MiB
Maximum number of writes that can be passed to a Commit operation or performed in a transaction 500
Maximum number of field transformations that can be performed on a single document in a Commit operation or in a transaction 500

A better way to do this is using Batched writes.

A batched write can contain up to 500 operations. Each operation in the batch counts separately towards your Cloud Firestore usage. Within a write operation, field transforms like serverTimestamp, arrayUnion, and increment each count as an additional operation.

You can structure your directories something like this:

batch_uploader\
    json_files\
        json_1.json   
        json_1.json
        json_1.json 
        json_1.json
    uploader.js
    ....

uploader.js

var admin = require("firebase-admin");

var serviceAccount = require("./service_key.json");

admin.initializeApp({
  credential: admin.credential.cert(serviceAccount),
  databaseURL: "YOUR_PROJECT_LINK"
});

const firestore = admin.firestore();
const path = require("path");
const fs = require("fs");
const directoryPath = path.join(__dirname, "files");

fs.readdir(directoryPath, function(err, files) {
  if (err) {
    return console.log("Unable to scan directory: " + err);
  }

  files.forEach(function(file) {
    var lastDotIndex = file.lastIndexOf(".");

    var menu = require("./json_files/" + file);

    menu.forEach(function(obj) {
      firestore
        .collection(file.substring(0, lastDotIndex))
        .doc(obj.itemID)
        .set(obj)
        .then(function(docRef) {
          console.log("Document written");
        })
        .catch(function(error) {
          console.error("Error adding document: ", error);
        });
    });
  });
});

If you want to run the script periodically, you can schedule a function to run at specified times. To run the script every five minutes, for example, you can do something like this:

exports.scheduledFunction = functions.pubsub.schedule('every 5 minutes').onRun((context) => {
  console.log('This will be run every 5 minutes!');
  return null;
});
Share:
605
Emir Kutlugün
Author by

Emir Kutlugün

Updated on December 27, 2022

Comments

  • Emir Kutlugün
    Emir Kutlugün over 1 year

    I am trying to import JSON with 180k records. As you can see in this code, I can upload 500 records per run, but I need to upload a 180k record periodically.

    What I am trying To Achieve:

    1. Parse JSON (DONE).
    2. Create a model from each JSON element (DONE)
    3. Upload this to Cloud Firestore (DONE BUT 500 DOCUMENT EACH)

    Factory of Model:

    factory Academician.fromJson(Map<String, dynamic> json) => Academician(
            // Using this to parse data from JSON, 
            // rating and reviewList does not exist in JSON, 
            // that's why i need custom model 
            name: json["name"] ?? "",
            designation: json["designation"] ?? "",
            field: json["field"] ?? "",
            universityName: json["university"] ?? "",
            department: json["department"] ?? "",
            rating: 0,
            reviewList: [],
          );
    

    Parsing and uploading to Cloud Firestore:

    parseJsonFromAssets('json/akademik_kadro.json')
            .then((value) => value['Sayfa1'].forEach((value) {
                  academicianList.add(Academician.fromJson(value));
                }))
            .then((value) {
                setState(() {
                    for (int z = 0; z < 500; z++) {
                        //500 is the single time upload limit to Cloud Firestore 
                        //I guess
                        dbOperation.addToCollection(academicianList[z]);
                }
              });
            });
    
  • Emir Kutlugün
    Emir Kutlugün over 3 years
    So, i have to use cloud functions for this?
  • Andrew
    Andrew over 3 years
    If you want to schedule the import then yes.