Cassandra ReadTimeout when querying existing data

11,649

Solution 1

For each zoom value, there could be tens of millions of small items. For zoom=11, the first idx is in around 100352. When I need to iterate over all items, I always see this time out error for specific storage cases.

This sounds like a wide row row issue. When you have many items for a single partition (zoom in your case) it can create problems for reads in cassandra. In general it's a good rule of thumb to keep partitions at < 100MB in size, do you think you may have partitions that large? On average how many bytes is the 'tile' column? For example, with idx being a 4-byte int, and lets assume a blob size of 96 bytes, giving 100 bytes per row and ignoring any overhead ~1,048,576 rows would equal 100MB

Although your page size in small, there is still is quite a bit of overhead on cassandra's end to read the data and its indexes on disk. What seems to happening is your C* node is not able to read the data within read_request_timeout_in_ms (default is 10s). When your queries do work about how long are they taking?

It may be worth enabling tracing ('TRACING ON' in a cqlsh session) to help understand what is taking so long when your queries do succeed. You could also consider increasing read_request_timeout_in_ms to some arbitrarily large value while debugging. A good article on tracing can be found here.

If you find that your rows are too wide, you may consider partitioning your data further, for example by day:

CREATE TABLE v2.tiles (
    zoom int,
    day timestamp,
    idx int,
    tile blob,
    PRIMARY KEY ((zoom, day), idx)
)

Although without knowing more about your data model, time might not be a good way of partitioning.

Solution 2

In my case, this error has been resolved increasing the time of parameter "range_request_timeout_in_ms" in "cassandra.yaml" file. By default, the value of this parameter is 10000 ms.

Share:
11,649
Yuri Astrakhan
Author by

Yuri Astrakhan

Maps/OpenStreetMap, Wikipedia, Wikidata, ElasticSearch, Kibana, Vega/DataViz, large datasets... Author of Wikipedia API/maps/graphs. DevOps and Maps Principal Engineer at Elastic.

Updated on June 04, 2022

Comments

  • Yuri Astrakhan
    Yuri Astrakhan almost 2 years

    For my test server, I have no-replication Cassandra 2.1.6 setup:

    CREATE KEYSPACE v2 WITH replication =
    {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = false;
    
    CREATE TABLE v2.tiles (
        zoom int,
        idx int,
        tile blob,
        PRIMARY KEY (zoom, idx)
    )
    

    For each zoom value, there could be tens of millions of small items. For zoom=11, the first idx is in around 100352. When I need to iterate over all items, I allways see this time out error for specific storage cases:

    cqlsh:v2> select zoom,idx from tiles where zoom=11 limit 10;
    ReadTimeout: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}
    

    I get the same error for "zoom=11 and idx > 1000". For idx value closer to the existing items, it gives the right result:

    cqlsh:v2> select zoom,idx from tiles where zoom=11 and idx > 100000 limit 10;
     zoom | idx
    ------+--------
       11 | 100352
    ...
    

    It also shows correct empty results when idx is compared with extremelly high value:

    cqlsh:v2> select zoom,idx from tiles where zoom=11 and idx > 1000000 limit 10;                                       
     zoom | idx | tile
    ------+-----+------
    (0 rows)