batch size of prepared statement in spring data cassandra

11,234

Solution 1

This is just a warning, informing you that the query size exceeds certain limit.

The query is still being processed. The reasoning behind is that bigger batched queries are expensive and may cause cluster imbalance. Therefore warning you (the developer) beforehand.

Look for batch_size_warn_threshold_in_kb in cassandra.yaml to adjust when should this warning be produced.

Here is the ticket where it was introduced: https://issues.apache.org/jira/browse/CASSANDRA-6487

Solution 2

I have done extensive performance testing and tuning on Cassandra, working closely withe DataStax Support.

That is why I created the ingest() methods in SDC*, which are super fast in 1.0.4.RELEASE and higher.

This method caches the PreparedStatement for you, and then loops over the individual Bind values and calls executeAsync for each insert. This sounds counter intuitive, but is the fastest (and most balanced) way to insert into Cassandra.

Share:
11,234
Oggie
Author by

Oggie

Updated on June 15, 2022

Comments

  • Oggie
    Oggie almost 2 years

    I'm getting this warning in the log:

    WARN [Native-Transport-Requests:17058] 2014-07-29 13:58:33,776 BatchStatement.java (line 223) Batch of prepared statements for [keyspace.tablex] is of size 10924, exceeding specified threshold of 5120 by 5804.

    Is there a way in spring data cassandra to specify the size?

    Cassandra 2.0.9 and spring data cassandra 1.0.0-RELEASE

  • Oggie
    Oggie over 9 years
    But does this mean we're ok with the warning, or can we make adjustments to the code to stay below the threshold? I don't think it's wise to just start increasing the threshold in the cassandra.yaml just to get rid of the message.
  • Oggie
    Oggie over 9 years
    What's the best way to change the batch size? Just limit the number of items in the array to the call to template.insertAsynchronously()?
  • Jeffrey Zampieron
    Jeffrey Zampieron over 7 years
    What's the appropriate equivalent method to use on a query? I'm seeing a substantial number of batch size warnings and slow performance using CassandraOperations.stream() w/ Cassandra 3.0.7 and SDC* 1.5.0.M1