how to configure the synonyms_path in elasticsearch

13,653

I don't know, if your problem is because you defined bad the synonyms for "bar". As you said you are pretty new I'm going to put an example similar to yours that works. I want to show how elasticsearch deal with synonyms at search time and at index time. Hope it helps.

First thing create the synonym file:

foo => foo bar, baz

Now I create the index with the particular settings you are trying to test:

curl -XPUT 'http://localhost:9200/test/' -d '{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "synonym": {
            "tokenizer": "whitespace",
            "filter": ["synonym"]
          }
        },
        "filter" : {
          "synonym" : {
              "type" : "synonym",
              "synonyms_path" : "synonyms.txt"
          }
        }
      }
    }
  },
  "mappings": {

    "test" : {
      "properties" : {
        "text_1" : {
           "type" : "string",
           "analyzer" : "synonym"
        },
        "text_2" : {
           "search_analyzer" : "standard",
           "index_analyzer" : "standard",
           "type" : "string"
        },
        "text_3" : {
           "type" : "string",
           "search_analyzer" : "synonym",
           "index_analyzer" : "standard"
        }
      }
    }
  }
}'

Note that synonyms.txt must be in the same directory that the configuration file since that path is relative to the config dir.

Now index a doc:

curl -XPUT 'http://localhost:9200/test/test/1' -d '{
  "text_3": "baz dog cat",
  "text_2": "foo dog cat",
  "text_1": "foo dog cat"
}'

Now the searches

Searching in field text_1

curl -XGET 'http://localhost:9200/test/_search?q=text_1:baz'
{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.15342641,
    "hits": [
      {
        "_index": "test",
        "_type": "test",
        "_id": "1",
        "_score": 0.15342641,
        "_source": {
          "text_3": "baz dog cat",
          "text_2": "foo dog cat",
          "text_1": "foo dog cat"
        }
      }
    ]
  }
}

You get the document because baz is synonym of foo and at index time foo is expanded with its synonyms

Searching in field text_2

curl -XGET 'http://localhost:9200/test/_search?q=text_2:baz'

result:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

I don't get hits because I didn't expand synonyms while indexing (standard analyzer). And, since I'm searching baz and baz is not in the text, I don't get any result.

Searching in field text_3

curl -XGET 'http://localhost:9200/test/_search?q=text_3:foo'
{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.15342641,
    "hits": [
      {
        "_index": "test",
        "_type": "test",
        "_id": "1",
        "_score": 0.15342641,
        "_source": {
          "text_3": "baz dog cat",
          "text_2": "foo dog cat",
          "text_1": "foo dog cat"
        }
      }
    ]
  }
}

Note: text_3 is "baz dog cat"

text_3 was indexes without expanding synonyms. As I'm searching for foo, which have "baz" as one of the synonyms I get the result.

If you want to debug you can use _analyze endpoint for example:

curl -XGET 'http://localhost:9200/test/_analyze?text=foo&analyzer=synonym&pretty=true'

result:

{
  "tokens": [
    {
      "token": "foo",
      "start_offset": 0,
      "end_offset": 3,
      "type": "SYNONYM",
      "position": 1
    },
    {
      "token": "baz",
      "start_offset": 0,
      "end_offset": 3,
      "type": "SYNONYM",
      "position": 1
    },
    {
      "token": "bar",
      "start_offset": 0,
      "end_offset": 3,
      "type": "SYNONYM",
      "position": 2
    }
  ]
}
Share:
13,653
Rachid O
Author by

Rachid O

web developer

Updated on June 25, 2022

Comments

  • Rachid O
    Rachid O almost 2 years

    i'm pretty new to elasticsearch and i want to use synonyms, i added these lines in the configuration file:

    index :
        analysis :
            analyzer : 
                synonym :
                    type : custom
                    tokenizer : whitespace
                    filter : [synonym]
            filter :
                synonym :
                    type : synonym
                    synonyms_path: synonyms.txt
    

    then i created an index test:

    "mappings" : {
      "test" : {
         "properties" : {
            "text_1" : {
               "type" : "string",
               "analyzer" : "synonym"
            },
            "text_2" : {
               "search_analyzer" : "standard",
               "index_analyzer" : "synonym",
               "type" : "string"
            },
            "text_3" : {
               "type" : "string",
               "analyzer" : "synonym"
            }
         }
      }
    

    }

    and insrted a type test with this data:

    {
    "text_3" : "foo dog cat",
    "text_2" : "foo dog cat",
    "text_1" : "foo dog cat"
    }
    

    synonyms.txt contains "foo,bar,baz", and when i search for foo it returns what i expected but when i search for baz or bar it return zero results:

    {
    "query":{
    "query_string":{
        "query" : "bar",
        "fields" : [ "text_1"],
        "use_dis_max" : true,
        "boost" : 1.0
    }}} 
    

    result:

    {
    "took":1,
    "timed_out":false,
    "_shards":{
    "total":5,
    "successful":5,
    "failed":0
    },
    "hits":{
    "total":0,
    "max_score":null,
    "hits":[
    ]
    }
    }