Change default mapping of string to "not analyzed" in Elasticsearch
Solution 1
You can query the .raw
version of your field. This was added in Logstash 1.3.1:
The logstash index template we provide adds a “.raw” field to every field you index. These “.raw” fields are set by logstash as “not_analyzed” so that no analysis or tokenization takes place – our original value is used as-is!
So if your field is called foo
, you'd query foo.raw
to return the not_analyzed
(not split on delimiters) version.
Solution 2
Just create a template. run
curl -XPUT localhost:9200/_template/template_1 -d '{
"template": "*",
"settings": {
"index.refresh_interval": "5s"
},
"mappings": {
"_default_": {
"_all": {
"enabled": true
},
"dynamic_templates": [
{
"string_fields": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"index": "not_analyzed",
"omit_norms": true,
"type": "string"
}
}
}
],
"properties": {
"@version": {
"type": "string",
"index": "not_analyzed"
},
"geoip": {
"type": "object",
"dynamic": true,
"path": "full",
"properties": {
"location": {
"type": "geo_point"
}
}
}
}
}
}
}'
Solution 3
Make a copy of the lib/logstash/outputs/elasticsearch/elasticsearch-template.json from your Logstash distribution (possibly installed as /opt/logstash/lib/logstash/outputs/elasticsearch/elasticsearch-template.json), modify it by replacing
"dynamic_templates" : [ {
"string_fields" : {
"match" : "*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "string", "index" : "analyzed", "omit_norms" : true,
"fields" : {
"raw" : {"type": "string", "index" : "not_analyzed", "ignore_above" : 256}
}
}
}
} ],
with
"dynamic_templates" : [ {
"string_fields" : {
"match" : "*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "string", "index" : "not_analyzed", "omit_norms" : true
}
}
} ],
and point template
for you output plugin to your modified file:
output {
elasticsearch {
...
template => "/path/to/my-elasticsearch-template.json"
}
}
You can still override this default for particular fields.
Solution 4
I think updating the mapping is wrong approach just to handle a field for reporting purposes. Sooner or later you may want to be able to search the field for tokens. If you are updating the field to "not_analyzed" and want to search for foo from a value "foo bar", you won't be able to do that.
A more graceful solution is to use kibana aggregation filters instead of terms. Something like below will search for the terms ivr04 and ivr02. So in your case you can have a filter "Hello I'm Sinha". Hope this helps.
Sagnik Sinha
Updated on November 02, 2020Comments
-
Sagnik Sinha over 3 years
In my system, the insertion of data is always done through csv files via logstash. I never pre-define the mapping. But whenever I input a string it is always taken to be
analyzed
, as a result an entry likehello I am Sinha
is split intohello
,I
,am
,Sinha
. Is there anyway I could change the default/dynamic mapping of elasticsearch so that all strings, irrespective of index, irrespective of type are taken to benot analyzed
? Or is there a way of setting it in the.conf
file? Say myconf
file looks likeinput { file { path => "/home/sagnik/work/logstash-1.4.2/bin/promosms_dec15.csv" type => "promosms_dec15" start_position => "beginning" sincedb_path => "/dev/null" } } filter { csv { columns => ["Comm_Plan","Queue_Booking","Order_Reference","Multi_Ordertype"] separator => "," } ruby { code => "event['Generation_Date'] = Date.parse(event['Generation_Date']);" } } output { elasticsearch { action => "index" host => "localhost" index => "promosms-%{+dd.MM.YYYY}" workers => 1 } }
I want all the strings to be
not analyzed
and I don't mind it being the default setting for all future data to be inserted into elasticsearch either -
Sagnik Sinha over 9 yearsThat doesn't seem to give me desired results... I have a field named
State
which has an instance calledWest Bengal
. When I plot a bar graph, I get 2 different legends namelywest
andbengal
which is wrong. That problem is still there -
Magnus Bäck over 9 yearsIs this in a new index? Keep in mind that changing the index template won't make a difference for existing data.
-
Sagnik Sinha over 9 yearsI deleted the previous index, refreshed it and then inserted again. Yes, its in a new index
-
Magnus Bäck over 9 yearsInteresting. What if you get the mapping of the index and have a look at the actual mapping being used?
-
Sagnik Sinha over 9 yearsIt shows
{"promosms-16.12.2014":{"mappings":{"promosms_dec15":{"properties":{..............,"State":{"type":"string"},.........}}}}}
there is no information as to whether it is analyzed or not -
Sagnik Sinha over 9 yearsI declared the whole thing as a template and it worked. I set
"template" : "*"
-
Brad over 8 yearsWhat exactly does this do? How does it work? Can you elaborate a bit more on what this template is and how it applies to the question?
-
Sagnik Sinha over 8 years@Brad as you can see here, inside
dynamic templates
andproperties
, theindex
is set asnot analyzed
. As a result, if we input a stringhello, I am Sinha
it will be treated as one string and not split intohello
I
am
Sinha
-
Roland Kofler over 8 yearsthe foo.raw fields are empty while the foo fields are filled. I don't get it
-
Matt Leonowicz about 8 years@Roland Kofler Did you find out why they were empty?. I have the same issue
-
Roland Kofler about 8 years@AviArro all I can remember it is intended that way. Sort of a hack. More I can't remember
-
AbtPst about 8 yearswhat is the index name that this mapping applies to?
-
Sagnik Sinha about 8 yearsit applies to all indices @AbtPst
-
AbtPst about 8 yearsok, that makes sense. what if i only want to apply to some fields of an index. lets say i have an index
ind
. the number of fields in the incoming data is not fixed but for sure there will be a string fieldd
. I only want fieldd
to be analyzed and all other string fields to be not analyzed. -
vvs14 almost 8 years@SagnikSinha what exactly your are doing to drop ".raw" fields of each fields? Did you mean dropping "fields" parameter in mapping file will drop ".raw" part of each field and save 50% of space? I do have mapping file in which I do have "fields" parameter for each field I am extracting out of log.
-
tadasajon almost 8 yearsIs it a given that everyone with this question will be using logstash?
-
Banjer almost 8 years@JonCrowell well yes and no. I stumbled back on my answer the other day when NOT using logstash. I couldn't rind the
.raw
field and realized it was something logstash created for me, so my answer here was not helpful. :) However, the question is tagged and refers to logstash, so makes sense that my answer is the accepted one. There are other related SO questions and answers about creatingnot_analyzed
fields with vanilla Elasticsearch that can be referred upon. -
Wex almost 7 yearsLogstash
5.x
with Elasticsearch5.x
changes the mapping from.raw
to.keyword
- elastic.co/guide/en/logstash/current/…