Join query in ElasticSearch

19,117

Solution 1

It depends what you intend when you say JOIN. Elasticsearch is not like regular database that supports JOIN between tables. It is a text search engine that manages documents within indexes.

On the other hand you can search within the same index over multiple types using a fields that are common to every type.

For example taking your data I can create an index with 2 types and their data like follows:

curl -XPOST localhost:9200/product -d '{
    "settings" : {
        "number_of_shards" : 5
    }
}'

curl -XPOST localhost:9200/product/type1/_mapping -d '{
        "type1" : {
            "properties" : {
                "product_id" : { "type" : "string" },
                "price" : { "type" : "integer" },
                "stock" : { "type" : "integer" }
            }
        }   
}'              

curl -XPOST localhost:9200/product/type2/_mapping -d '{
        "type2" : {
            "properties" : {
                "product_id" : { "type" : "string" },
                "category" : { "type" : "string" },
                "manufacturer" : { "type" : "string" }
            }
        }
}'  

curl -XPOST localhost:9200/product/type1/1 -d '{
        product_id: "1111", 
        price: "23",
        stock: "100"
}'

curl -XPOST localhost:9200/product/type2/1 -d '{
        product_id: "1111",
        category: "iPhone case",
        manufacturer: "Belkin"
}'

I effectively created one index called product with 2 type type1 and type2. Now I can do the following query and it will return both documents:

curl -XGET 'http://localhost:9200/product/_search?pretty=1' -d '{
    "query": {
        "query_string" : {
            "query" : "product_id:1111"
        }
    }
}'

{
  "took" : 95,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.5945348,
    "hits" : [ {
      "_index" : "product",
      "_type" : "type1",
      "_id" : "1",
      "_score" : 0.5945348, "_source" : {
    product_id: "1111",
    price: "23",
    stock: "100"
}
    }, {
      "_index" : "product",
      "_type" : "type2",
      "_id" : "1",
      "_score" : 0.5945348, "_source" : {
    product_id: "1111",
    category: "iPhone case",
    manufacturer: "Belkin"
}
    } ]
  }
}

The reason is because Elasticsearch will search over all documents within that index regardless of their type. This is still different than a JOIN in the sense Elasticsearch is not going to do a Cartesian product of the documents that belong to each type.

Hope that helps

Solution 2

isaac.hazan's answer works quite well, but I would like to add a few points that helped me with this kind of situation:

I landed on this page when I was trying to solve a similar problem, in that I had to exclude multiple records of one index based on documents of another index. The lack of relationships is one of the main downsides of unstructured databases.

The elasticsearch documentation page on Handling Relationships explains a lot.

Four common techniques are used to manage relational data in Elasticsearch:

  • Application-side joins
  • Data denormalization
  • Nested objects
  • Parent/child relationships

Often the final solution will require a mixture of a few of these techniques.

I've used nested objects and application-side joins, mostly. While using the same field name could momentarily solve the problem, I think it is better to rethink and create best-suited mapping for your application.

For instance, you might find that you want to list all products with price greater than x, or list all products that are not in stock anymore. To deal with such scenarios it helps if you are using one of the solutions mentioned above.

Share:
19,117
Fawad
Author by

Fawad

Updated on July 27, 2022

Comments

  • Fawad
    Fawad almost 2 years

    Is there any way (query) to join 2 JSONs below in ElasticSearch

    {
    product_id: "1111",
    price: "23.56",
    stock: "100"
    }
    
    {
    product_id: "1111",
    category: "iPhone case",
    manufacturer: "Belkin"
    }
    

    Above 2 JSONs processed (input) under 2 different types in Logstash, so their indexes are available in different 'type' filed in Elasticsearch.

    What I want is to join 2 JSONs on product_id field.

  • Fawad
    Fawad about 10 years
    Thank you so much for your reply, can you please tell me what is the best possible/alternative solution in such case. As you mentioned above I do have one index 'product' and 2 types 'type1' and 'type2' with one common field 'product_id' in both types. Now what is the possible option for me to have both the JSONs merged and indexed in new index or type in Elasticsearch in case of same 'product_id'. FYI, I am using ELK stack (Elasticsearch, Logstash and Kibana).
  • isaac.hazan
    isaac.hazan about 10 years
    I am not sure i understand what you are trying to achieve, what is your end goal? What do you want to display in Kibana at the end?
  • Fawad
    Fawad about 10 years
    Your answer is useful for sure. Actually @isaac.hazan my end goal is to display 'product_id', 'price', 'stock', 'category' and 'manufacturer' under single event in Kibana.
  • Fawad
    Fawad about 10 years
    To upvote your answer I need 15 reputation, but I am just about three days old in Stake Overflow :(
  • isaac.hazan
    isaac.hazan about 10 years
    Regarding your end goal, i don't think that is possible as a built-in feature. I dealt with the same kind of issues on another project and the way i overcame that was by developing the layer on top that takes the 2 events, merge them and save them into elasticsearch as a single event. That being said remember that in Kibana if you query just the index w/o the type you will get the whole data set which technically mean that you would be able show some control but i agree that this is not ideal for example for an histogram as you will 2 events instead of 1.
  • Fawad
    Fawad about 10 years
    can you please tell me how can we merge 2 events and save them in Elasticsearch as single event, are you using logstash for this ?
  • isaac.hazan
    isaac.hazan about 10 years
    No logstash as far as i know does not do that. I am using the Elasticsearch JAVA API for doing that.