Random document in ElasticSearch

32,924

Solution 1

I know it is an old question, but now it is possible to use random_score, with the following search query:

{
   "size": 1,
   "query": {
      "function_score": {
         "functions": [
            {
               "random_score": {
                  "seed": "1477072619038"
               }
            }
         ]
      }
   }
}

For me it is very fast with about 2 million documents.

I use current timestamp as seed, but you can use anything you like. The best is if you use the same seed, you will get the same results. So you can use your user's session id as seed and all users will have different order.

Solution 2

The only way I know of to get random documents from an index (at least in versions <= 1.3.1) is to use a script:

sort: {
  _script: {
    script: "Math.random() * 200000",
    type: "number",
    params: {},
    order: "asc"
 }
}

You can use that script to make some weighting based on some field of the record.

It's possible that in the future they might add something more complicated, but you'd likely have to request that from the ES team.

Solution 3

You can use random_score with a function_score query.

{
    "size":1,
    "query": {
        "function_score": {
            "functions": [
                {
                    "random_score":  {
                        "seed": 11
                    }
                }
            ],
            "score_mode": "sum",
        }
    }
}

The bad part is that this will apply a random score to every document, sort the documents, and then return the first one. I don't know of anything that is smart enough to just pick a random document.

Solution 4

You can use random_score to randomly order responses or retrieve a document with roughly 1/N probability.

Additional notes:

https://github.com/elastic/elasticsearch/issues/1170 https://github.com/elastic/elasticsearch/issues/7783

Solution 5

NEST Way :

var result = _elastic.Search<dynamic>(s => s
        .Query(q => q
        .FunctionScore(fs => fs.Functions(f => f.RandomScore())
        .Query(fq => fq.MatchAll()))));

raw query way :

 GET index-name/_search
    "size": 1,
    "query": {
        "function_score": {
                "query" : { "match_all": {} },
               "random_score": {}
        }
    }
}
Share:
32,924
mitchus
Author by

mitchus

Interested in algorithms, graph theory, probability and statistics.

Updated on October 24, 2021

Comments

  • mitchus
    mitchus over 2 years

    Is there a way to get a truly random sample from an elasticsearch index? i.e. a query that retrieves any document from the index with probability 1/N (where N is the number of documents currently indexed)?

    And as a follow-up question: if all documents have some numeric field s, is there a way to get a document through weighted random sampling, i.e. where the probability to get document i with value s_i is equal to s_i / sum(s_j for j in index)?

  • sudeepdino008
    sudeepdino008 over 6 years
    Can't use seed with this. n documents will be grouped and having same score where n is the shard size.
  • Ian Kemp
    Ian Kemp over 5 years
  • Praneeth Kumar
    Praneeth Kumar almost 4 years
    does painless script Math.random() return a value between 0 and 1 inclusive?