Elasticsearch lowercase filter search

20,051

Solution 1

The problem is that you have a field that you have analyzed during index to lowercase it, but you are using a term filter for the query which is not analyzed:

Term Filter

Filters documents that have fields that contain a term (not analyzed). Similar to term query, except that it acts as a filter.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-term-filter.html

I'd try using a query filter instead:

Query Filter

Wraps any query to be used as a filter. Can be placed within queries that accept a filter.

Example:

{
    "constantScore" : {
        "filter" : {
            "query" : {
                "query_string" : {
                    "query" : "this AND that OR thus"
                }
            }
        }
    } }

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-filter.html#query-dsl-query-filter

Solution 2

This may be achieved by appending .keyword to your field to query against the keyword version of the field. Assuming language was defined in the mapping with type keyword.

Note that now only the exact text would match: mandarin won't match and Italian would.

Your query would end up like this:

{
    "query": {
        "filtered": {
            "filter": {
                "bool": {
                    "should": [
                        {
                            "term": {
                                "language.keyword": "mandarin" // Returns Empty
                            }
                        },
                        {
                            "term": {
                                "language.keyword": "Italian" // Returns Italian.
                            }
                        }
                    ]
                }
            }
        }
    }
}

Combining the term values is also allowed:

{
    "query": {
        "filtered": {
            "filter": {
                "bool": {
                    "should": [
                        {
                            "term": {
                                "language.keyword":
                                     ["mandarin", "Italian"]
                            }
                        }
                    ]
                }
            }
        }
    }
}
Share:
20,051
Maruf
Author by

Maruf

Updated on April 02, 2020

Comments

  • Maruf
    Maruf about 4 years

    I'm trying to search my database and be able to use upper/lower case filter terms but I've noticed while query's apply analyzers, I can't figure out how to apply a lowercase analyzer on a filtered search. Here's the query:

    {
        "query": {
            "filtered": {
                "filter": {
                    "bool": {
                        "should": [
                            {
                                "term": {
                                    "language": "mandarin" // Returns a doc
                                }
                            },
                            {
                                "term": {
                                    "language": "Italian" // Does NOT return a doc, but will if lowercased
                                }
                            }
                        ]
                    }
                }
            }
        }
    }
    

    I have a type languages that I have lowercased using:

    "analyzer": {
        "lower_keyword": {
            "type": "custom",
            "tokenizer": "keyword",
            "filter": "lowercase"
        }
    }
    

    and a corresponding mapping:

    "mappings": {
        "languages": {
            "_id": {
                "path": "languageID"
            },
            "properties": {
                "languageID": {
                    "type": "integer"
                },
                "language": {
                    "type": "string",
                    "analyzer": "lower_keyword"
                },
                "native": {
                    "type": "string",
                    "analyzer": "keyword"
                },
                "meta": {
                    "type": "nested"
                },
                "language_suggest": {
                    "type": "completion"
                }
            }
        }
    }
    
  • Maruf
    Maruf almost 10 years
    So if I wanted tho term to be lowercased would I change the tokenizer to a lowercase one and reindex everything?
  • John Petrone
    John Petrone almost 10 years
    No, it's already lowercased during indexing due to the lowercase filter. The problem is you need to use a query type that will also analyze - Term filters do not.
  • odyth
    odyth over 8 years
    Would you see better performance if you just lowercased your input so you could continue using a Term Filter over using a Query Filter?