Write timeout thrown by cassandra datastax driver

40,689

Solution 1

While I don't understand the root cause of this issue, I was able to solve the problem by increasing the timeout value in the conf/cassandra.yaml file.

write_request_timeout_in_ms: 20000

Solution 2

We experienced similar problems on a single node in an ESX cluster with SAN storage attached (which is not recommended by datastax, but we have no other options at this moment).

Note: the settings below can be a big blow to the maximum performance Cassandra can achieve, but we chose a stable system over high performance.

While running iostat -xmt 1 we found high w_await times at the same time the WriteTimeoutExceptions occured. It turned out the memtable could not be written to disk within the default write_request_timeout_in_ms: 2000 setting.

We significantly reduced the memtable size from 512Mb (defaults to 25% of heap space, which was 2Gb in our case) to 32Mb:

# Total permitted memory to use for memtables. Cassandra will stop
# accepting writes when the limit is exceeded until a flush completes,
# and will trigger a flush based on memtable_cleanup_threshold
# If omitted, Cassandra will set both to 1/4 the size of the heap.
# memtable_heap_space_in_mb: 2048
memtable_offheap_space_in_mb: 32

We also slightly increated the write timeout to 3 seconds:

write_request_timeout_in_ms: 3000

Also make sure you write regularly to disk if you have high IO wait times:

#commitlog_sync: batch
#commitlog_sync_batch_window_in_ms: 2
#
# the other option is "periodic" where writes may be acked immediately
# and the CommitLog is simply synced every commitlog_sync_period_in_ms
# milliseconds.
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000

These settings allowed the memtable to remain small and be written often. The exceptions were solved and we survived the stress tests that were run on the sytem.

Solution 3

It is coordinator (so the server) timing out waiting for acknowledgements for the write.

Solution 4

Its worth double checking your GC settings for Cassandra.

In my case I was using a semaphore to throttle async writes and still (sometimes) getting timeouts.

It transpired that I was using unsuitable GC settings, I'd been using cassandra-unit for convenience which had the unintended consequence of running with the default VM settings. Consequently we would eventually trigger hit a stop-the-world GC resulting in a write timeout. Applying the same GC settings as my running cassandra docker image and all is fine.

This might be an uncommon cause but it would have helped me so it seems worth recording here.

Share:
40,689
Jay
Author by

Jay

Experienced Java developer. Keen go (golang) developer.

Updated on February 26, 2020

Comments

  • Jay
    Jay about 4 years

    While doing a bulk load of data, incrementing counters based on log data, I am encountering a timeout exception. Im using the Datastax 2.0-rc2 java driver.

    Is this an issue with the server not being able to keep up (ie server side config issue), or is this an issue with the client getting bored waiting for the server to respond? Either way, is there an easy config change I can make that would fix this?

    Exception in thread "main" com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write)
        at com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:54)
        at com.datastax.driver.core.ResultSetFuture.extractCauseFromExecutionException(ResultSetFuture.java:271)
        at com.datastax.driver.core.ResultSetFuture.getUninterruptibly(ResultSetFuture.java:187)
        at com.datastax.driver.core.Session.execute(Session.java:126)
        at jason.Stats.analyseLogMessages(Stats.java:91)
        at jason.Stats.main(Stats.java:48)
    Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write)
        at com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:54)
        at com.datastax.driver.core.Responses$Error.asException(Responses.java:92)
        at com.datastax.driver.core.ResultSetFuture$ResponseCallback.onSet(ResultSetFuture.java:122)
        at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:224)
        at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:373)
        at com.datastax.driver.core.Connection$Dispatcher.messageReceived(Connection.java:510)
        at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
        at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:70)
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
        at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
        at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
        at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
        at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
        at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
        at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
        at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
        at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
        at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
    Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write)
        at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:53)
        at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:33)
        at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:165)
        at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66)
        ... 21 more
    

    One of the nodes reports this at roughly the time it occured:

    ERROR [Native-Transport-Requests:12539] 2014-02-16 23:37:22,191 ErrorMessage.java (line 222) Unexpected exception during request
    java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(Unknown Source)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
        at sun.nio.ch.IOUtil.read(Unknown Source)
        at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
        at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
        at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
        at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
        at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
    
  • abi_pat
    abi_pat over 8 years
    I faced the same problem once. I was using BatchStatement to write data in Cassnadra. My batch size was 10000. After reducing this batch size, I didn't face the exception. So, maybe you are trying to load to much data into Cassandra in a single request.
  • opstalj
    opstalj almost 8 years
    Hi Chris, how can I debug further to find out why the ACK did not come ? I am facing a similar issue and am trying to find the root cause ... Thanks.
  • iMajna
    iMajna about 6 years
    This is actually very poor choice to do. Did you maybe find out why this was happening because I'm facing the same error now.
  • Jay
    Jay about 6 years
    @Superbrain_bug Thanks for sharing your judgment of this workaround. I am sure that some people might find your judgement interesting. If you find an alternate solution to this issue, I am sure everyone would like to know about it.
  • walv
    walv over 3 years
    one of the reasons might be that cassandra is running some memory intensive internal processes like compaction, repair etc and you just don't have enough memory to make a write in 2s - this happens to me very often during development. It works well for 10-15 minutes and then this error so I have to restart it. Very annoying.