How do I reduce Elasticsearch scroll response time?

java elasticsearch

14,682

Solution 1

.setSize(5000) means that each client.prepareSearchScroll call is going to retrieve 5000 records per shard. You are requesting back source, and if your records are big, assembling 5000 records in memory might take awhile. I would suggest trying a smaller number. Try 100 and 10 to see if you are getting a better performance.

.setFrom(0) is not necessary.

Solution 2

I'm going to add another answer here, because I was very puzzled by this behaviour and it took me a long time to find the answer in the comments by @AaronM

This applies to ES 1.7.2, using the java API.

I was scrolling/scanning an index of 500m records, but with a query that returns about 400k rows.

I started off with a scroll size of 1,000 which seemed to me a reasonable trade-off in terms of network versus CPU.

This query ran terribly slowly, taking about 30 minutes to complete, with very long pauses between fetches from the cursor.

I worried that maybe it was just the query I was running and did not believe that decreasing the scroll size could help, as 1000 seemed tiny.

However, seeing AaronM's comment above, I tried a scroll size of 10.

The whole job completed in 30 seconds (and this was whether I had restarted ES or not, so presumably nothing to do with caching) - a speed-up of about 60x!!!

So if you're having performance problems with scroll/scan, I highly recommend trying decreasing the scroll size. I couldn't find much about this on the internet, so posted this here.

Solution 3

Query data node not client node or master node
Select the fields you need with filter_pathproperty
Set scroll size according your document size, there is no a magic rule, you must set value and try, and so on
Monitor your network band width
If it's not enough, let's go for some multi-threads stuff:

Think that elasticsearch index is composed of multiple shards. This design means you can parallelize operation.

Let's say your index has 3 shards, and your cluster 3 nodes (good practice to have more nodes than shards by index).

You could run 3 Java "workers", in a separate thread each, that will search scroll a different shard and node, and use a queue to "centralize" the results.

This way, you will have a good performance!

This is what the elasticsearch-hadoop library does.

To retrieve shards/nodes details about an index, use the https://www.elastic.co/guide/en/elasticsearch/reference/current/search-shards.html API.

14,682

Author by

dranxo

Updated on June 23, 2022

Comments

dranxo almost 2 years

I have a query returning ~200K hits from 7 different indices distributed across our cluster. I process my results as:

while (true) {
    scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(600000)).execute().actionGet();

    for (SearchHit hit : scrollResp.getHits()){
            //process hit}

    //Break condition: No hits are returned
    if (scrollResp.hits().hits().length == 0) {
        break;
    }
}

I'm noticing that the client.prepareSearchScroll line can hang for quite some time before returning the next set of search hits. This seems to get worse the longer I run the code for.

My setup for the search is:

SearchRequestBuilder searchBuilder = client.prepareSearch( index_names )
    .setSearchType(SearchType.SCAN)
    .setScroll(new TimeValue(60000)) //TimeValue?
    .setQuery( qb )
    .setFrom(0) //?
    .setSize(5000); //number of jsons to get in each search, what should it be? I have no idea.
    SearchResponse scrollResp = searchBuilder.execute().actionGet();

Is it expected that scanning and scrolling just takes a long time when examining many results? I'm very new to Elastic Search so keep in mind that I may be missing something very obvious.

My query:

QueryBuilder qb = QueryBuilders.boolQuery().must(QueryBuilders.termsQuery("tweet", interesting_words));

Recents

Why Is PNG file with Drop Shadow in Flutter Web App Grainy?

How to troubleshoot crashes detected by Google Play Store for Flutter app

Cupertino DateTime picker interfering with scroll behaviour

Why does awk -F work for most letters, but not for the letter "t"?

Flutter change focus color and icon color but not works

How to print and connect to printer using flutter desktop via usb?

Critical issues have been reported with the following SDK versions: com.google.android.gms:play-services-safetynet:17.0.0

Flutter Dart - get localized country name from country code

navigatorState is null when using pushNamed Navigation onGenerateRoutes of GetMaterialPage

Android Sdk manager not found- Flutter doctor error

Flutter Laravel Push Notification without using any third party like(firebase,onesignal..etc)

How to change the color of ElevatedButton when entering text in TextField

Elasticsearch Java API - How to get the number of documents without retrieving the documents

Elastic Search Lucene version error

Update nested field in an index of ElasticSearch with Java API

NoNodeAvailableException : None of the configured nodes are available

Elasticsearch Spring boot integration test

elasticsearch java bulk batch size

document missing exception while updating an index in elasticsearch via java api

Elasticsearch Could not create the Java Virtual Machine

Getting org.elasticsearch.transport.NodeDisconnectedException: [][inet[localhost/127.0.0.1:9300]][cluster/nodes/info] disconnected

ElasticSearch - RestHighLevelClient - listener timeout after waiting for [30000] ms