Script-based sorting on Elasticsearch date field

16,103

Using scripts to sort documents is not really performant, especially if your document base is expected to grow over time. So I'm going to offer a solution for doing that and then suggest another option.

In order to sort using script, You need to transform your date into milliseconds so your sort can be run on a simple number (sort type can only be number or string).

POST myIndex/_search
{
   "query": {
      "match_all": {}
   },
   "sort": {
      "_script": {
         "script": "doc[\"someTime\"].date.getMillisOfDay()",
         "lang": "groovy",
         "type": "number",       <----- make sure this is number
         "order": "asc"
      }
   }
}

Note that depending on the granularity you want, you can also use getSecondOfDay() or getMinuteOfDay(). That way, provided your queries and filters have selected documents for the right day, your sort script will sort documents based on the number of milliseconds (or seconds or minutes) within that day.

The second solution would imply to also index the number of milliseconds (or seconds or minutes) since the beginning of that day into another field and simply use it to sort, so that you don't need script. The bottom line is that whatever information you need at search time that can be known at index time should be indexed instead of computed in real-time.

For instance, if your someTime field contains the date 2015-10-05T05:34:12.276Z then you'd index the millisOfDay field with the value 20052276, which is

  • 5 hours * 3600000 ms
  • +34 minutes * 60000 ms
  • +12 seconds * 1000 ms
  • +276 ms

Then you can sort using

POST myIndex/_search
{
   "query": {
      "range": {
          "someTime": {
              "gt": "now"
          }
      }
   },
   "sort": {
      "millisOfDay": {
         "order": "asc"
      }
   }
}

Note that I've added a query to select only the documents whose someTime date is after now, so you'll get all documents in the future, but sorted by ascending millisOfDay, which means you'll get the nearest date from now first.

UPDATE

If someTime has the format HH:mm, then you can also store its millisOfDay value, e.g. if someTime = 17:30 then millisOfDay would be (17h * 3600000 ms) + (30 min * 60000 ms) = 63000000

Then, your query needs to be reworked a little bit using a script filter, like this:

{
  "query": {
    "filtered": {
      "filter": {
        "script": {
          "script": "doc.millisOfDay.value > new DateTime().millisOfDay"
        }
      }
    }
  },
  "sort": {
    "millisOfDay": {
      "order": "asc"
    }
  }
}
Share:
16,103
Michael Mueller
Author by

Michael Mueller

I'm a web / blockchain developer and financial advisor (Chartered Financial Analyst) from the US based in Barcelona. I'm comfortable developing in React (functional components) on the frontend and Node, Express and MongoDB on the backend. In 2020, I completed the ConsenSys Academy blockchain developer bootcamp and have built some projects with Solidity. Currently getting up to speed with TypeScript.

Updated on July 09, 2022

Comments

  • Michael Mueller
    Michael Mueller almost 2 years

    I am just getting started with Elasticsearch and would like to use script-based sorting on a field that is mapped as date, format hour_minute. There can be multiple instances of the field in each document.

    Before introducing expressions, as a first step I'm trying a simple sort (using the Sense plugin):

    POST myIndex/_search
    {
       "query": {
          "match_all": {}
       },
       "sort": {
          "_script": {
             "script": "doc[\"someTime\"].value",
             "lang": "groovy",
             "type": "date",
             "order": "asc"
          }
       }
    }
    

    I get this error (fragment):

    SearchPhaseExecutionException[Failed to execute phase [query], all shards failed;
    shardFailures {[tjWL-zV5QXmGjNlXzLvrzw][myIndex][0]:
    SearchParseException[[myIndex][0]: 
    query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [Failed to parse source…
    

    If I post the above query with "type": "number" there is no error, although this of course doesn't sort by date. The following works fine:

    POST myIndex/_search
    {
       "query": {
          "match_all": {}
       },
       "sort": {
          "someTime": {
             "order": "asc"
          }
       }
    }
    

    Ultimately I'd like to use script-based sorting since I will be trying to query, filter or sort using date and time conditions, like query for documents with today’s date, then sort them by the lowest time that is after the time now, etc.

    Any suggestions would be much appreciated.