Elasticsearch query to return all records

802,125

Solution 1

I think lucene syntax is supported so:

http://localhost:9200/foo/_search?pretty=true&q=*:*

size defaults to 10, so you may also need &size=BIGNUMBER to get more than 10 items. (where BIGNUMBER equals a number you believe is bigger than your dataset)

BUT, elasticsearch documentation suggests for large result sets, using the scan search type.

EG:

curl -XGET 'localhost:9200/foo/_search?search_type=scan&scroll=10m&size=50' -d '
{
    "query" : {
        "match_all" : {}
    }
}'

and then keep requesting as per the documentation link above suggests.

EDIT: scan Deprecated in 2.1.0.

scan does not provide any benefits over a regular scroll request sorted by _doc. link to elastic docs (spotted by @christophe-roussy)

Solution 2

http://127.0.0.1:9200/foo/_search/?size=1000&pretty=1
                                   ^

Note the size param, which increases the hits displayed from the default (10) to 1000 per shard.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-from-size.html

Solution 3

elasticsearch(ES) supports both a GET or a POST request for getting the data from the ES cluster index.

When we do a GET:

http://localhost:9200/[your index name]/_search?size=[no of records you want]&q=*:*

When we do a POST:

http://localhost:9200/[your_index_name]/_search
{
  "size": [your value] //default 10
  "from": [your start index] //default 0
  "query":
   {
    "match_all": {}
   }
}   

I would suggest to use a UI plugin with elasticsearch http://mobz.github.io/elasticsearch-head/ This will help you get a better feeling of the indices you create and also test your indices.

Solution 4

Note: The answer relates to an older version of Elasticsearch 0.90. Versions released since then have an updated syntax. Please refer to other answers that may provide a more accurate answer to the latest answer that you are looking for.

The query below would return the NO_OF_RESULTS you would like to be returned..

curl -XGET 'localhost:9200/foo/_search?size=NO_OF_RESULTS' -d '
{
"query" : {
    "match_all" : {}
  }
}'

Now, the question here is that you want all the records to be returned. So naturally, before writing a query, you wont know the value of NO_OF_RESULTS.

How do we know how many records exist in your document? Simply type the query below

curl -XGET 'localhost:9200/foo/_search' -d '

This would give you a result that looks like the one below

 {
hits" : {
  "total" :       2357,
  "hits" : [
    {
      ..................

The result total tells you how many records are available in your document. So, that's a nice way to know the value of NO_OF RESULTS

curl -XGET 'localhost:9200/_search' -d ' 

Search all types in all indices

curl -XGET 'localhost:9200/foo/_search' -d '

Search all types in the foo index

curl -XGET 'localhost:9200/foo1,foo2/_search' -d '

Search all types in the foo1 and foo2 indices

curl -XGET 'localhost:9200/f*/_search

Search all types in any indices beginning with f

curl -XGET 'localhost:9200/_all/type1,type2/_search' -d '

Search types user and tweet in all indices

Solution 5

This is the best solution I found using python client

  # Initialize the scroll
  page = es.search(
  index = 'yourIndex',
  doc_type = 'yourType',
  scroll = '2m',
  search_type = 'scan',
  size = 1000,
  body = {
    # Your query's body
    })
  sid = page['_scroll_id']
  scroll_size = page['hits']['total']

  # Start scrolling
  while (scroll_size > 0):
    print "Scrolling..."
    page = es.scroll(scroll_id = sid, scroll = '2m')
    # Update the scroll ID
    sid = page['_scroll_id']
    # Get the number of results that we returned in the last scroll
    scroll_size = len(page['hits']['hits'])
    print "scroll size: " + str(scroll_size)
    # Do something with the obtained page

https://gist.github.com/drorata/146ce50807d16fd4a6aa

Using java client

import static org.elasticsearch.index.query.QueryBuilders.*;

QueryBuilder qb = termQuery("multi", "test");

SearchResponse scrollResp = client.prepareSearch(test)
        .addSort(FieldSortBuilder.DOC_FIELD_NAME, SortOrder.ASC)
        .setScroll(new TimeValue(60000))
        .setQuery(qb)
        .setSize(100).execute().actionGet(); //100 hits per shard will be returned for each scroll
//Scroll until no hits are returned
do {
    for (SearchHit hit : scrollResp.getHits().getHits()) {
        //Handle the hit...
    }

    scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(60000)).execute().actionGet();
} while(scrollResp.getHits().getHits().length != 0); // Zero hits mark the end of the scroll and the while loop.

https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-search-scrolling.html

Share:
802,125
John Livermore
Author by

John Livermore

Updated on October 13, 2021

Comments

  • John Livermore
    John Livermore over 2 years

    I have a small database in Elasticsearch and for testing purposes would like to pull all records back. I am attempting to use a URL of the form...

    http://localhost:9200/foo/_search?pretty=true&q={'matchAll':{''}}
    

    Can someone give me the URL you would use to accomplish this, please?

  • John Livermore
    John Livermore over 12 years
    Thanks. This was the final I came up with that returns what I need for now...localhost:9200/foo/_search?size=50&pretty=true&q=*:*
  • Karthick
    Karthick almost 11 years
    Adding to @Steve's answer, you can find a list of parameters that elasticsearch understands in this link elasticsearch.org/guide/reference/api/search/uri-request
  • Churro
    Churro over 10 years
    Is it possible to run a scan search with a query other than a match_all query?
  • Steve Casey
    Steve Casey over 10 years
    @Churro u should post a question, not hide it in the comments. but short answer, yes. elasticsearch.org/guide/reference/api/search/query
  • Churro
    Churro over 10 years
    Thanks @Steve for your answer. I didn't think it was significant enough for a new question. It wasn't explicitly stated anywhere, so I figured I'd ask here just to verify.
  • Alex Brasetvik
    Alex Brasetvik over 10 years
    You should really use the scan+scroll-requests. If you do use size=BIGNUMBER, note that Lucene allocates memory for scores for that number, so don't make it exceedingly large. :)
  • lfender6445
    lfender6445 about 10 years
    By default ES will return 10 results unless a size param is included in the base query.
  • rakslice
    rakslice over 9 years
    Did you really mean to use -d with -XGET?
  • vjpandian
    vjpandian almost 9 years
    The previous response was three years old. Updated it to a current one.
  • Pierce
    Pierce almost 9 years
    I was unaware of the ?size=<N> query string parameter until your answer, @SteveCasey. Thank you so much for posting this. My use case just requires me to list all the documents in a small index (generally <200 items), so appending ?size=1000 to the query made it fire right up.
  • Chopra
    Chopra over 8 years
    hey @SteveCasey I am struggling to find this answer. Could you please help me - stackoverflow.com/questions/34481152/…
  • Aminah Nuraini
    Aminah Nuraini about 8 years
    But, from what I remember, ES only allow getting 16000 data per request. So if the data is above 16000, this solution is not enough.
  • Christophe Roussy
    Christophe Roussy about 8 years
    Scan was deprecated in deprecated in 2.1.0: elastic.co/guide/en/elasticsearch/reference/current/…
  • Christophe Roussy
    Christophe Roussy almost 8 years
    @SteveCasey Ideally ES should respond with something special: stackoverflow.com/questions/13884141/…, another interesting problem ...
  • WoodyDRN
    WoodyDRN over 7 years
    Since which version does max size occur?
  • Will Barnwell
    Will Barnwell about 7 years
    Seeing as scan is deprecated, should this answer be updated to use scroll?
  • user732456
    user732456 about 7 years
    This will return an accumulated information, but not the hits themselves
  • Joshlo
    Joshlo almost 7 years
    never use this method if the data contains many documents... Each time you go to "the next page" Elastic will be slower and slower! Use SearchAfter instead
  • user3078523
    user3078523 over 6 years
    One thing to keep in mind though (from Elasticsearch docs): Note that from + size can not be more than the index.max_result_window index setting which defaults to 10,000.
  • Christoph Schranz
    Christoph Schranz about 6 years
    Thanks Mark, that was exactly what I was looking for! In my case (ELK 6.2.1, python 3), the search_type argument was not valid and the document_type isn't needed any more since ELK 6.0
  • Stamos
    Stamos about 6 years
    While this code snippet may solve the question, including an explanation really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion.
  • Usman Maqbool
    Usman Maqbool about 6 years
    Perfect solution! Thanks. I was using elasticsearch_dsl==5.4.0 and it works without search_type = 'scan',.
  • iclman
    iclman almost 6 years
    Also, this solution will not work if the overall data size is above 10 000. The option size=1000&from=10001 would fail.
  • stelios
    stelios almost 6 years
    This will return 1000, not all, user3078523 is right, this method has a limit of max_result_window
  • stelios
    stelios almost 6 years
    As another user mentioned: from + size can not be more than the index.max_result_window index setting which defaults to 10,000
  • stelios
    stelios almost 6 years
    Indeed fails. Parameters from + size can't be more than index.max_result_window index setting which defaults to 10,000
  • stelios
    stelios almost 6 years
    ES 6.3. This example makes my Elasticsearch service to crash, trying to scroll 110k documents with size=10000, at somewhere between 5th-7th iterations. with status=127, main ERROR Null object returned for RollingFile in Appenders, main ERROR Unable to locate appender "rolling" for logger config "root" No logs in /var/log/elasticsearch/elasticsearch.log
  • MCMZL
    MCMZL almost 6 years
    For the record, the python clients implements a scan helpers` that does the scroll under the hood (since version 5.x.x at leat)
  • Harry Wood
    Harry Wood over 5 years
    That may be the "best" way up to a point, but a bit noddy really. If you have many thousands of records, then the best way is a "scroll" query.
  • Harry Wood
    Harry Wood over 5 years
    Yes. Well a scan is a type of scroll. The answer should not include the 'search_type=scan' parameter. You don't need it, and it is deprecated.
  • Harry Wood
    Harry Wood over 5 years
    It has a maximum, and also (if you have many thousands of records to get) it's a rather noddy heavy approach to be going up towards that maximum. Instead you should use a "scroll" query.
  • Harry Wood
    Harry Wood over 5 years
    This approach has a maximum, and also (if you have many thousands of records to get) it's a rather noddy heavy approach to be going up towards that maximum. Instead you should use a "scroll" query
  • Harry Wood
    Harry Wood over 5 years
    search_type = 'scan' is deprecated. Similar code will work without that, although there are some interesting differences which are well buried in the old documentation. elastic.co/guide/en/elasticsearch/reference/1.4/… In particular, when migrating to not use search_type=scan, that first 'search' query will come with the first batch of results to process.
  • Harry Wood
    Harry Wood over 5 years
    If the data contains many thousands of documents, the correct answer is to use a 'scroll' query.
  • Harry Wood
    Harry Wood over 5 years
    This answer needs more updates. search_type=scan is now deprecated. So you should remove that, but then the behaviour has changed a little. The first batch of data comes back from the initial search call. The link you provide does show the correct way to do it.
  • WoodyDRN
    WoodyDRN over 5 years
    My comment was really to note that you can't just add any number as size, as it would be quite a lot slower. So I removed the code example and people can follow the link to get correct code.
  • Harry Wood
    Harry Wood over 5 years
    Setting the size to X like this, might have a surprising concurrency glitch: Consider what happens if a record is added in between doing the count and setting the size on your next query... but also if you have many thousands of records to get, then it's the wrong approach. Instead you should use a "scroll" query.
  • Harry Wood
    Harry Wood over 5 years
    Actually I've just noticed "search_type:scan" is not only deprecated. It was removed in elasticsearch version 5.0: elastic.co/guide/en/elasticsearch/reference/5.0/…
  • Maarten00
    Maarten00 about 5 years
    The author of the question was asking for 'all' results, not a pre-defined amount of results. While it is helpful to post a link to the docs, the docs do not describe how to achieve that, neither does your answer.
  • Trisped
    Trisped almost 5 years
    @WoodyDRN It is better to have the code in your answer (even if it gets old) so it is still available when the link dies.
  • KarelHusa
    KarelHusa over 4 years
    The limitation of this query is that size + from must be lower or equal to "index.max_result_window". For large number of documents (by default 10000+) this query is not applicable.
  • Jesse Chisholm
    Jesse Chisholm about 4 years
    Oddly enough, the official docs show curl -XGET ... -d '{...}' which is an unofficial mixed style of request. Thank you for showing the correct GET and POST formats.
  • Daniel Schneiter
    Daniel Schneiter about 4 years
    With the from and size-approach you will run into the Deep Pagination problem. Use the scroll API to make a dump of all documents.
  • Daniel Schneiter
    Daniel Schneiter about 4 years
    With the from and size-approach you will run into the Deep Pagination problem. Use the scroll API to make a dump of all documents.
  • Daniel Schneiter
    Daniel Schneiter about 4 years
    With the from and size-approach you will run into the Deep Pagination problem. Use the scroll API to make a dump of all documents.
  • Daniel Schneiter
    Daniel Schneiter about 4 years
    With the from and size-approach you will run into the Deep Pagination problem. Use the scroll API to make a dump of all documents.
  • Daniel Schneiter
    Daniel Schneiter about 4 years
    The scroll API should be used right from the start with the very first request.
  • Yar
    Yar almost 4 years
    you should pass pretty param as boolean: curl -XGET 'localhost:9200/logs/_search/?size=1000&pretty=true'
  • MarsAndBack
    MarsAndBack over 3 years
    Thanks, this is what I needed to understand (size); it helped me troubleshoot my empty ([ ]) returns.
  • asgs
    asgs almost 3 years
    this is the answer I'm looking for. the one without passing the request parameter q. thank you!
  • Florian Heigl
    Florian Heigl over 2 years
    this was very helpful - changed everything for me now i can actually hope to get results within the night.