How to do a join in Elasticsearch -- or at the Lucene level

14,414

Solution 1

As already mentioned the way to go is parent/child. The point is that nested documents are extremely performant but in order for them to be updated you need to re-submit the whole structure (parent + nested documents). Although the internal implementation of nested documents consists of separate lucene documents, those nested doc are not visible nor directly accessible. In fact when using nested documents you then need to use proper queries to access them (nested query, nested filter, nested facet etc.).

On the other hand parent/child allows you to have separate documents that refer to each other, which can be updated independently. It has a cost in terms of performance and memory used but it is way more flexible than nested documents.

As mentioned in this article though, the fact that elasticsearch helps you managing relations doesn't mean that you must use those features. In a lot of complex usecases it is just better to have some custom logic on the application layer that handles with relations. In facet there are limitations with parent/child too: for instance you can never get back both parent and children at the same time, as opposed to nested documents that doesn't allow to get back only matching children (for now).

Solution 2

Take a look at my answer for: In Elasticsearch, can multiple top-level documents share a single nested document?

This discusses the use of _parent mapping as a way to avoid the issue with needing to update every Item when a Person is updated.

Share:
14,414

Related videos on Youtube

Daniel Winterstein
Author by

Daniel Winterstein

I am co-founder and CTO for Good-Loop. I also have a personal homepage.

Updated on June 04, 2022

Comments

  • Daniel Winterstein
    Daniel Winterstein almost 2 years

    What's the best way to do the equivalent of an SQL join in Elasticsearch?

    I have an SQL setup with two large tables: Persons and Items. A Person can own many items. Both Person and Item rows can change (i.e. be updated). I have to run searches which filter by aspects of both the person and the item.

    In Elasticsearch, it looks like you could make Person a nested document of Item, then use has_child.

    But: if you then update a Person, I think you'd need to update every Item they own (which could be a lot).

    Is that correct? Is there a nice way to solve this query in Elasticsearch?

    • javanna
      javanna over 10 years
      Just a small terminology issue: if you use the has_child, person would be a child document, not a nested one (parent/child vs nested documents).
    • sumanth232
      sumanth232 almost 9 years
      can we use ElasticSearch Hive connector to do a JOIN operation from Hive on ElasticSearch data store ? - github.com/elastic/elasticsearch-hadoop
  • Phil
    Phil over 10 years
    +1 for the article you referenced. I hadn't seen that previously and it makes a great summary of the points.
  • sumanth232
    sumanth232 almost 9 years
    can we use ElasticSearch Hive connector to do a JOIN operation from Hive on ElasticSearch data store ? - github.com/elastic/elasticsearch-hadoop
  • sumanth232
    sumanth232 almost 9 years
    can we use ElasticSearch Hive connector to do a JOIN operation from Hive on ElasticSearch data store ? - github.com/elastic/elasticsearch-hadoop
  • Phil
    Phil almost 9 years
    @krishna222 it's probably worth asking a new question to get an answer regarding the Hive connector