Cassandra ReadTimeout when querying existing data
Solution 1
For each zoom value, there could be tens of millions of small items. For zoom=11, the first idx is in around 100352. When I need to iterate over all items, I always see this time out error for specific storage cases.
This sounds like a wide row row issue. When you have many items for a single partition (zoom in your case) it can create problems for reads in cassandra. In general it's a good rule of thumb to keep partitions at < 100MB in size, do you think you may have partitions that large? On average how many bytes is the 'tile' column? For example, with idx being a 4-byte int, and lets assume a blob size of 96 bytes, giving 100 bytes per row and ignoring any overhead ~1,048,576 rows would equal 100MB
Although your page size in small, there is still is quite a bit of overhead on cassandra's end to read the data and its indexes on disk. What seems to happening is your C* node is not able to read the data within read_request_timeout_in_ms (default is 10s). When your queries do work about how long are they taking?
It may be worth enabling tracing ('TRACING ON' in a cqlsh session) to help understand what is taking so long when your queries do succeed. You could also consider increasing read_request_timeout_in_ms to some arbitrarily large value while debugging. A good article on tracing can be found here.
If you find that your rows are too wide, you may consider partitioning your data further, for example by day:
CREATE TABLE v2.tiles (
zoom int,
day timestamp,
idx int,
tile blob,
PRIMARY KEY ((zoom, day), idx)
)
Although without knowing more about your data model, time might not be a good way of partitioning.
Solution 2
In my case, this error has been resolved increasing the time of parameter "range_request_timeout_in_ms" in "cassandra.yaml" file. By default, the value of this parameter is 10000 ms.
Yuri Astrakhan
Maps/OpenStreetMap, Wikipedia, Wikidata, ElasticSearch, Kibana, Vega/DataViz, large datasets... Author of Wikipedia API/maps/graphs. DevOps and Maps Principal Engineer at Elastic.
Updated on June 04, 2022Comments
-
Yuri Astrakhan almost 2 years
For my test server, I have no-replication Cassandra 2.1.6 setup:
CREATE KEYSPACE v2 WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = false; CREATE TABLE v2.tiles ( zoom int, idx int, tile blob, PRIMARY KEY (zoom, idx) )
For each zoom value, there could be tens of millions of small items. For zoom=11, the first idx is in around 100352. When I need to iterate over all items, I allways see this time out error for specific storage cases:
cqlsh:v2> select zoom,idx from tiles where zoom=11 limit 10; ReadTimeout: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}
I get the same error for "zoom=11 and idx > 1000". For idx value closer to the existing items, it gives the right result:
cqlsh:v2> select zoom,idx from tiles where zoom=11 and idx > 100000 limit 10; zoom | idx ------+-------- 11 | 100352 ...
It also shows correct empty results when idx is compared with extremelly high value:
cqlsh:v2> select zoom,idx from tiles where zoom=11 and idx > 1000000 limit 10; zoom | idx | tile ------+-----+------ (0 rows)