How to store data in elasticsearch _source but not index it?

10,461

By default the _source of the document is stored regardless of the fields that you choose to index. The _source is used to return the document in the search results, whereas the fields that are indexed are used for searching.

You can't set index: no on an object to prevent all fields in an object being indexed, but you can do what you want with Dynamic Templates using path_match property to apply the index: no setting to every field within an object. Here is a simple example.

Create an index with your mapping that includes the dynamic templates for the author object and the nested categories object:

POST /shop
{
    "mappings": {
        "book": {
            "dynamic_templates": [
                {
                    "author_object_template": {
                        "path_match": "author.*",
                        "mapping": {
                            "index": "no"
                        }
                    }
                },
                {
                    "categories_object_template": {
                        "path_match": "categories.*",
                        "mapping": {
                            "index": "no"
                        }
                    }
                }
            ],
            "properties": {
                "categories": {
                    "type": "nested"
                }
            }
        }
    }
}

Index a document:

POST /shop/book/1
{
    "title": "book one",
    "author": {
        "first_name": "jon",
        "last_name": "doe"
    },
    "categories": [
        {
            "cat_id": 1,
            "cat_name": "category one"
        },
        {
            "cat_id": 2,
            "cat_name": "category two"
        }
    ]
}

If you searched on the title field with the search term book the document would be returned. If you search on the author.first_name or author.last_name, there won't be a match because this fields were not indexed:

POST /shop/book/_search
{
    "query": {
        "match": {
            "author.first_name": "jon"
        }
    }
}

The same would be the case for a nested query on the category fields:

POST /shop/book/_search
{
    "query": {
        "nested": {
            "path": "categories",
            "query": {
                "match": {
                    "categories.cat_name": "category"
                }
            }
        }
    }
}

Also you can use the Luke tool to expect the Lucene index and see what fields have been indexed.

Share:
10,461

Related videos on Youtube

pinkeen
Author by

pinkeen

Updated on October 09, 2022

Comments

  • pinkeen
    pinkeen about 1 year

    I am searching only by couple of fields but I want to be able to store the whole document in ES in order not to additional DB (MySQL) queries.

    I tried adding index: no, store: no to whole objects/properties in the mapping but I'm still not sure if the fields are being indexed and add unnecessary overhead.

    Let's say I've got books and each has an author. I want to search only by book title, but I want to be able to retrieve the whole document.

    Is this okay:

    mappings:
    properties:
        title:
            type: string
            index: analyzed
        author:
            type: object
            index: no
            store: no
            properties:
                first_name:
                    type: string
                last_name:
                    type: string
    

    Or should I rather do:

    mappings:
    properties:
        title:
            type: string
            index: analyzed
        author:
            type: object
            properties:
                first_name:
                    index: no
                    store: no
                    type: string
                last_name:
                    index: no
                    store: no
                    type: string
    

    Or maybe I am doing it completely wrong? And what about nested properties that should not be indexed?

  • pinkeen
    pinkeen over 8 years
    Does "index": "no" imply "store": "no" ? I've read store means storing the original property's _source in lucene but I'm not sure how it is related to index. And just to make sure - I don't have to provide mappings for the non-indexed fields? ES won't throw errors if I put a document with property X that is an int and then a document with the same property but with string?
  • Dan Tuffery
    Dan Tuffery over 8 years
    No, the setting for index does not determine the setting for store. The default for store is no, which is fine in your use case because the _source is enabled. If you disabled the _source field and select the fields you want to store, the stored fields will only be returned in the search results when there is a match. You have to provide mapping for non indexed fields in order to tell Elasticsearch not to index them, otherwise Elasticsearch will use the default analyzer (Standard Analyzer) to index the field.
  • Dan Tuffery
    Dan Tuffery over 8 years
    However, in the above example the dynamic templates are used for the mappings of non indexed fields. If you don't have a mapping for a field an error won't be returned if you change the type of a property.