How to use Apache Avro to serialize the JSON document and then write it into Cassandra?

10,578

Since you already use jackson, you could try the Jackson dataformat module to support Avro-encoded data.

Share:
10,578
Admin
Author by

Admin

Updated on June 09, 2022

Comments

  • Admin
    Admin almost 2 years

    I have been reading a lot about Apache Avro these days and I am more inclined towards using it instead of using JSON. Currently, what we are doing is, we are serializing the JSON document using Jackson and then writing that serialize JSON document into Cassandra for each row key/user id. Then we have a REST service that reads the whole JSON document using the row key and then deserialize it and use it further.

    We will write into Cassandra like this-

    user-id   column-name   serialize-json-document-value
    

    Below is an example which shows the JSON document that we are writing into Cassandra. This JSON document is for particular row key/user id.

    {
      "lv" : [ {
        "v" : {
          "site-id" : 0,
          "categories" : {
            "321" : {
              "price_score" : "0.2",
              "confidence_score" : "0.5"
            },
            "123" : {
              "price_score" : "0.4",
              "confidence_score" : "0.2"
            }
          },
          "price-score" : 0.5,
          "confidence-score" : 0.2
        }
      } ],
      "lmd" : 1379214255197
    }
    

    Now we are thinking to use Apache Avro so that we can compact this JSON document by serializing with Apache Avro and then store it in Cassandra. I have couple of questions on this-

    1. Is it possible to serialize the above JSON document using Apache Avro first of all and then write it into Cassandra? If yes, how can I do that? Can anyone provide a simple example?
    2. And also we need to deserialize it as well while reading back from Cassandra from our REST service. Is this also possible to do?

    Below is my simple code which is serializing the JSON document and printing it out on the console.

    public static void main(String[] args) {
    
        final long lmd = System.currentTimeMillis();
    
        Map<String, Object> props = new HashMap<String, Object>();
        props.put("site-id", 0);
        props.put("price-score", 0.5);
        props.put("confidence-score", 0.2);
    
        Map<String, Category> categories = new HashMap<String, Category>();
        categories.put("123", new Category("0.4", "0.2"));
        categories.put("321", new Category("0.2", "0.5"));
        props.put("categories", categories);
    
        AttributeValue av = new AttributeValue();
        av.setProperties(props);
    
        Attribute attr = new Attribute();
        attr.instantiateNewListValue();
        attr.getListValue().add(av);
        attr.setLastModifiedDate(lmd);
    
        // serialize it
        try {
            String jsonStr = JsonMapperFactory.get().writeValueAsString(attr);
    
            // then write into Cassandra
            System.out.println(jsonStr);
        } catch (JsonGenerationException e) {
            e.printStackTrace();
        } catch (JsonMappingException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    

    Serialzie JSON document will look something like this -

    {"lv":[{"v":{"site-id":0,"categories":{"321":{"price_score":"0.2","confidence_score":"0.5"},"123":{"price_score":"0.4","confidence_score":"0.2"}},"price-score":0.5,"confidence-score":0.2}}],"lmd":1379214255197}
    

    AttributeValue and Attribute class are using Jackson Annotations.

    And also one important note, properties inside the above json document will get changed depending on the column names. We have different properties for different column names. Some column names will have two properties, some will have 5 properties. So the above JSON document will have its correct properties and its value according to our metadata that we are having.

    I hope the question is clear enough. Can anyone provide a simple example for this how can I achieve that using Apache Avro. I am just starting with Apache Avro so I am having lot of problems..