analyzed v not_analyzed or ...?

12,432

By setting not_analyzed, you are only allowing exact matches (e.g. "SOMECODE/FRED" only, including case and special characters).

My guess is that you are using the standard analyzer (It is the default analyzer if you don't specify one). If that's the case, Standard will treat slashes as a token separator, and generate two tokens [somecode] and [fred]:

$ curl -XGET 'localhost:9200/_analyze?analyzer=standard&pretty' -d 'SOMECODE/FRED'
{
    "tokens" : [ {
    "token" : "somecode",
    "start_offset" : 0,
    "end_offset" : 8,
    "type" : "<ALPHANUM>",
    "position" : 1
  }, {
    "token" : "fred",
    "start_offset" : 9,
    "end_offset" : 13,
    "type" : "<ALPHANUM>",
    "position" : 2
  } ]
}

If you don't want this behavior, you need to change to a tokenizer that doesn't split on special characters. However, I would question the use-case for this. Generally, you'll want to split those types of characters.

Share:
12,432
Jonesie
Author by

Jonesie

I'm a 35 year veteran (vintage?) programmer based in beautiful New Zealad. I work mostly on web sites but can do pretty much anything I'm asked to do. I particularly like : C#, ASP.Net Core / MVC, Angular, SQL Server and a few other things.

Updated on July 19, 2022

Comments

  • Jonesie
    Jonesie almost 2 years

    New to ES so maybe a dumb question but I am trying to search using a wildcard, e.g.: "SOMECODE*" and "*SOMECODE"

    It works fine, but the value in the document may have "SOMECODE/FRED".
    The problem is * will match with anything (which includes nothing).
    *SOMECODE will get a hit on SOMECODE/FRED.

    I tried searching for */SOMECODE but this returns nothing.
    I think the tokenization of the field is the root problem.
    i.e., / causes the value to be 2 words.

    I tried setting the map on the field to not_analyzed, but then I cant search on it at all.

    Am I doing it wrong?

    Thanks

  • Jonesie
    Jonesie over 11 years
    Thanks, this makes perfect sense now. Perhaps you should be writing the documentation for ES :) I just need to figure out how to set the tokenizer for a single field. I guess I do this in the map?
  • Zach
    Zach over 11 years
    Yep! You can set the mapping of a field either when you create the index (elasticsearch.org/guide/reference/api/…), or after the index is created with Put Mapping API (elasticsearch.org/guide/reference/api/…). You may have to delete your data or make a new index...ES doesn't allow you to alter the mapping of an existing field.
  • Jonesie
    Jonesie about 11 years
    I've set the field in my index map to not_analyzed but it still wont find the values I want.
  • Zach
    Zach about 11 years
    Are you search with exact case? Maybe put together a gist of the entire process ( index creation, mapping, data, etc)? It's a lot easier to help debug if you have all the steps required to debug it.
  • Jonesie
    Jonesie about 11 years
    I managed to solve this by talking about it (stackoverflow.com/questions/14866727/…)
  • Jonesie
    Jonesie about 11 years
    And sorry I dont gist, but I may do a blog post about the whole thing at some point.
  • Jonathan Hendler
    Jonathan Hendler almost 7 years
    Now you can not analyze and still do a prefix search. see elastic.co/guide/en/elasticsearch/guide/current/…