Insertion of thousands of contact entries using applyBatch is slow

20,958

Solution 1

Use ContentResolver.bulkInsert (Uri url, ContentValues[] values) instead of ApplyBatch()

ApplyBatch (1) uses transactions and (2) it locks the ContentProvider once for the whole batch instead locking/unlocking once per operation. because of this, it is slightly faster than doing them one at a time (non-batched).

However, since each Operation in the Batch can have a different URI and so on, there's a huge amount of overhead. "Oh, a new operation! I wonder what table it goes in... Here, I'll insert a single row... Oh, a new operation! I wonder what table it goes in..." ad infinitium. Since most of the work of turning URIs into tables involves lots of string comparisons, it's obviously very slow.

By contrast, bulkInsert applies a whole pile of values to the same table. It goes, "Bulk insert... find the table, okay, insert! insert! insert! insert! insert!" Much faster.

It will, of course, require your ContentResolver to implement bulkInsert efficiently. Most do, unless you wrote it yourself, in which case it will take a bit of coding.

Solution 2

bulkInsert: For those interested, here is the code that I was able to experiment with. Pay attention to how we can avoid some allocations for int/long/floats :) this could save more time.

private int doBulkInsertOptimised(Uri uri, ContentValues values[]) {
    long startTime = System.currentTimeMillis();
    long endTime = 0;
    //TimingInfo timingInfo = new TimingInfo(startTime);

    SQLiteDatabase db = mOpenHelper.getWritableDatabase();

    DatabaseUtils.InsertHelper inserter =
        new DatabaseUtils.InsertHelper(db, Tables.GUYS); 

    // Get the numeric indexes for each of the columns that we're updating
    final int guiStrColumn = inserter.getColumnIndex(Guys.STRINGCOLUMNTYPE);
    final int guyDoubleColumn = inserter.getColumnIndex(Guys.DOUBLECOLUMNTYPE);
//...
    final int guyIntColumn = inserter.getColumnIndex(Guys.INTEGERCOLUMUNTYPE);

    db.beginTransaction();
    int numInserted = 0;
    try {
        int len = values.length;
        for (int i = 0; i < len; i++) {
            inserter.prepareForInsert();

            String guyID = (String)(values[i].get(Guys.GUY_ID)); 
            inserter.bind(guiStrColumn, guyID);


            // convert to double ourselves to save an allocation.
            double d = ((Number)(values[i].get(Guys.DOUBLECOLUMNTYPE))).doubleValue();
            inserter.bind(guyDoubleColumn, lat);


            // getting the raw Object and converting it int ourselves saves
            // an allocation (the alternative is ContentValues.getAsInt, which
            // returns a Integer object)

            int status = ((Number) values[i].get(Guys.INTEGERCOLUMUNTYPE)).intValue();
            inserter.bind(guyIntColumn, status);

            inserter.execute();
        }
        numInserted = len;
        db.setTransactionSuccessful();
    } finally {
        db.endTransaction();
        inserter.close();

        endTime = System.currentTimeMillis();

        if (LOGV) {
            long timeTaken = (endTime - startTime);
            Log.v(TAG, "Time taken to insert " + values.length + " records was " + timeTaken + 
                    " milliseconds " + " or " + (timeTaken/1000) + "seconds");
        }
    }
    getContext().getContentResolver().notifyChange(uri, null);
    return numInserted;
}

Solution 3

An example of on how to override the bulkInsert(), in order to speed up multiples insert, can be found here

Solution 4

I get the basic solution for you, use "yield points" in batch operation.

The flip side of using batched operations is that a large batch may lock up the database for a long time preventing other applications from accessing data and potentially causing ANRs ("Application Not Responding" dialogs.)

To avoid such lockups of the database, make sure to insert "yield points" in the batch. A yield point indicates to the content provider that before executing the next operation it can commit the changes that have already been made, yield to other requests, open another transaction and continue processing operations.

A yield point will not automatically commit the transaction, but only if there is another request waiting on the database. Normally a sync adapter should insert a yield point at the beginning of each raw contact operation sequence in the batch. See withYieldAllowed(boolean).

I hope it's may be useful for you.

Solution 5

Here is am example of inserting same data amount within 30 seconds.

 public void testBatchInsertion() throws RemoteException, OperationApplicationException {
    final SimpleDateFormat FORMATTER = new SimpleDateFormat("mm:ss.SSS");
    long startTime = System.currentTimeMillis();
    Log.d("BatchInsertionTest", "Starting batch insertion on: " + new Date(startTime));

    final int MAX_OPERATIONS_FOR_INSERTION = 200;
    ArrayList<ContentProviderOperation> ops = new ArrayList<>();
    for(int i = 0; i < 600; i++){
        generateSampleProviderOperation(ops);
        if(ops.size() >= MAX_OPERATIONS_FOR_INSERTION){
            getContext().getContentResolver().applyBatch(ContactsContract.AUTHORITY,ops);
            ops.clear();
        }
    }
    if(ops.size() > 0)
        getContext().getContentResolver().applyBatch(ContactsContract.AUTHORITY,ops);
    Log.d("BatchInsertionTest", "End of batch insertion, elapsed: " + FORMATTER.format(new Date(System.currentTimeMillis() - startTime)));

}
private void generateSampleProviderOperation(ArrayList<ContentProviderOperation> ops){
    int backReference = ops.size();
    ops.add(ContentProviderOperation.newInsert(ContactsContract.RawContacts.CONTENT_URI)
            .withValue(ContactsContract.RawContacts.ACCOUNT_NAME, null)
            .withValue(ContactsContract.RawContacts.ACCOUNT_TYPE, null)
            .withValue(ContactsContract.RawContacts.AGGREGATION_MODE, ContactsContract.RawContacts.AGGREGATION_MODE_DISABLED)
            .build()
    );
    ops.add(ContentProviderOperation.newInsert(ContactsContract.Data.CONTENT_URI)
                    .withValueBackReference(ContactsContract.Data.RAW_CONTACT_ID, backReference)
                    .withValue(ContactsContract.Data.MIMETYPE, ContactsContract.CommonDataKinds.StructuredName.CONTENT_ITEM_TYPE)
                    .withValue(ContactsContract.CommonDataKinds.StructuredName.GIVEN_NAME, "GIVEN_NAME " + (backReference + 1))
                    .withValue(ContactsContract.CommonDataKinds.StructuredName.FAMILY_NAME, "FAMILY_NAME")
                    .build()
    );
    for(int i = 0; i < 10; i++)
        ops.add(ContentProviderOperation.newInsert(ContactsContract.Data.CONTENT_URI)
                        .withValueBackReference(ContactsContract.Data.RAW_CONTACT_ID, backReference)
                        .withValue(ContactsContract.Data.MIMETYPE, ContactsContract.CommonDataKinds.Phone.CONTENT_ITEM_TYPE)
                        .withValue(ContactsContract.CommonDataKinds.Phone.TYPE, ContactsContract.CommonDataKinds.Phone.TYPE_MAIN)
                        .withValue(ContactsContract.CommonDataKinds.Phone.NUMBER, Integer.toString((backReference + 1) * 10 + i))
                        .build()
        );
}

The log: 02-17 12:48:45.496 2073-2090/com.vayosoft.mlab D/BatchInsertionTest﹕ Starting batch insertion on: Wed Feb 17 12:48:45 GMT+02:00 2016 02-17 12:49:16.446 2073-2090/com.vayosoft.mlab D/BatchInsertionTest﹕ End of batch insertion, elapsed: 00:30.951

Share:
20,958
Anders
Author by

Anders

Updated on November 24, 2021

Comments

  • Anders
    Anders over 2 years

    I'm developing an application where I need to insert lots of Contact entries. At the current time approx 600 contacts with a total of 6000 phone numbers. The biggest contact has 1800 phone numbers.

    Status as of today is that I have created a custom Account to hold the Contacts, so the user can select to see the contact in the Contacts view.

    But the insertion of the contacts is painfully slow. I insert the contacts using ContentResolver.applyBatch. I've tried with different sizes of the ContentProviderOperation list(100, 200, 400), but the total running time is approx. the same. To insert all the contacts and numbers takes about 30 minutes!

    Most issues I've found regarding slow insertion in SQlite brings up transactions. But since I use the ContentResolver.applyBatch-method I don't control this, and I would assume that the ContentResolver takes care of transaction management for me.

    So, to my question: Am I doing something wrong, or is there anything I can do to speed this up?

    Anders

    Edit: @jcwenger: Oh, I see. Good explanation!

    So then I will have to first insert into the raw_contacts table, and then the datatable with the name and numbers. What I'll lose is the back reference to the raw_id which I use in the applyBatch.

    So I'll have to get all the id's of the newly inserted raw_contacts rows to use as foreign keys in the data table?

  • jcwenger
    jcwenger about 13 years
    (Replying to in-question edit, agree w/ @sarnold's comments above) Yes, that is the downside. You don't get the individual row_ids back, you only get a rollup of "Number inserted" -- Keep in mind of course that depending on your table constraints it may not be an all-or-nothing answer. So, if you need to crossreference a foreign key, yes, you'll need to go through and query afterward. Thankfully, query is blazingly fast compared to insertions... Bulk-insert and subsequent query should still be much faster overall.
  • Anders
    Anders about 13 years
    @jswenger and @sarnold. I'm sorry but I didn't get the add comment option until now. I'm new here and made a mistake by creating the question as a unregistered user. Now back to the original question: I implemented the solution, and first it didn't seem to make a difference. On the emulator that is. Then I tried using my device (HTC Desire), and I'm down to 3 minutes. A remarkable difference, but I want more!;) I've seen some applications inserting the same amount of entries to "custom" Sqlite databases in under a minute. Any hope to do this with the Contacts database?
  • Austyn Mahoney
    Austyn Mahoney over 12 years
    I used transactions in an overridden bulkInsert method and it sped up my 600 inserts from 31 seconds to under 1 second. I definitely recommend this approach.
  • Frank Cheng
    Frank Cheng over 11 years
    Which android version did you use. I have look up bulkInsert in android2.3.7 ContactsProvider, but haven't found it use transaction.
  • IgorGanapolsky
    IgorGanapolsky over 10 years
    Is bulkInsert() better than a ContentProviderOperation here?
  • Anuj
    Anuj about 9 years
    @Igor Ganapolsky A big Yes is what @jcwenger's explaination says
  • Vasile Jureschi
    Vasile Jureschi about 9 years
    Can you point out where exactly transactions are started for applyBatch() ? The default applyBatch() method just calls apply on the passed ContentProviderOperations which in turn just call the insert/update/delete operations in the content provider.
  • Yan
    Yan about 8 years
    You are wrong. This operation intended for a large filed amount processing. The reason it's slow for you is because you are not using it correctly. See example below (operation takes 30 seconds).
  • Om Infowave Developers
    Om Infowave Developers about 7 years
    i have multiple operation like insert ,update and delete how can i resolve the same issue,currently i am using applyBatch() but it take approx 45-60 second.can you please suggest and other solution for this