Avro schema definition nesting types

23,422

Solution 1

There are 4 possible ways:

  1. Including it in pom file as mentioned in this ticket.
  2. Declare all your types in a single avsc file.
  3. Using a single static parser that first parses all the imports and then parse the actual data types.
  4. (This is a hack) Use avdl file and use imports like https://avro.apache.org/docs/1.7.7/idl.html#imports . Though, IDL is intended for RPC calls.

Example for 2. Declare all your types in a single avsc file. Also answers array declaration on address.

[
{
    "type": "record",
    "namespace": "com.company.model",
    "name": "AddressRecord",
    "fields": [
        {
            "name": "streetaddress",
            "type": "string"
        },
        {
            "name": "city",
            "type": "string"
        },
        {
            "name": "state",
            "type": "string"
        },
        {
            "name": "zip",
            "type": "string"
        }
    ]
},
{
    "namespace": "com.company.model",
    "type": "record",
    "name": "Customer",
    "fields": [
        {
            "name": "firstname",
            "type": "string"
        },
        {
            "name": "lastname",
            "type": "string"
        },
        {
            "name": "email",
            "type": "string"
        },
        {
            "name": "phone",
            "type": "string"
        },
        {
            "name": "address",
            "type": {
                "type": "array",
                "items": "com.company.model.AddressRecord"
            }
        }
    ]
},
{
    "namespace": "com.company.model",
    "type": "record",
    "name": "Customer2",
    "fields": [
        {
            "name": "x",
            "type": "string"
        },
        {
            "name": "y",
            "type": "string"
        },
        {
            "name": "address",
            "type": {
                "type": "array",
                "items": "com.company.model.AddressRecord"
            }
        }
    ]
}
]

Example for 3. Using a single static parser

Parser parser = new Parser(); // Make this static and reuse
parser.parse(<location of address.avsc file>);
parser.parse(<location of customer.avsc file>);
parser.parse(<location of customer2.avsc file>);

If we want a hold of the Schema, that is if we want to create new records, we can either do https://avro.apache.org/docs/1.5.4/api/java/org/apache/avro/Schema.Parser.html#getTypes() method to get the schema or

Parser parser = new Parser(); // Make this static and reuse
Schema addressSchema =parser.parse(<location of address.avsc file>);
Schema customerSchema=parser.parse(<location of customer.avsc file>);
Schema customer2Schema =parser.parse(<location of customer2.avsc file>); 

Solution 2

Just to added to @Princey James answer, the nested type must be defined before it is used.

Solution 3

Other add to @Princey James

With the Example for 2. Declare all your types in a single avsc file.

It will work for Serializing and deserializing with code generation

but Serializing and deserializing without code generation is not working

you will get org.apache.avro.AvroRuntimeException: Not a record schema: [{"type":" ...

working example with code generation :

  @Test
  public void avroWithCode() throws IOException {

    UserPerso UserPerso3 = UserPerso.newBuilder()
                                    .setName("Charlie")
                                    .setFavoriteColor("blue")
                                    .setFavoriteNumber(null)
                                    .build();

    AddressRecord adress = AddressRecord.newBuilder()
                                        .setStreetaddress("mo")
                                        .setCity("Paris")
                                        .setState("IDF")
                                        .setZip("75")
                                        .build();

    ArrayList<AddressRecord> li = new ArrayList<>();
    li.add(adress);

    Customer cust = Customer.newBuilder()
                            .setUser(UserPerso3)
                            .setPhone("0101010101")
                            .setAddress(li)
                            .build();

    String fileName = "cust.avro";

    File a = new File(fileName);

    DatumWriter<Customer> customerDatumWriter = new SpecificDatumWriter<>(Customer.class);
    DataFileWriter<Customer> dataFileWriter = new DataFileWriter<>(customerDatumWriter);
    dataFileWriter.create(cust.getSchema(), new File(fileName));
    dataFileWriter.append(cust);
    dataFileWriter.close();

    DatumReader<Customer> custDatumReader = new SpecificDatumReader<>(Customer.class);
    DataFileReader<Customer> dataFileReader = new DataFileReader<>(a, custDatumReader);
    Customer cust2 = null;
    while (dataFileReader.hasNext()) {
      cust2 = dataFileReader.next(cust2);
      System.out.println(cust2);
    }
  }

without :

  @Test
  public void avroWithoutCode() throws IOException {

    Schema schemaUserPerso = new Schema.Parser().parse(new File("src/main/resources/avroTest/user.avsc"));
    Schema schemaAdress = new Schema.Parser().parse(new File("src/main/resources/avroTest/user.avsc"));
    Schema schemaCustomer = new Schema.Parser().parse(new File("src/main/resources/avroTest/user.avsc"));

    System.out.println(schemaUserPerso);

    GenericRecord UserPerso3 = new GenericData.Record(schemaUserPerso);
    UserPerso3.put("name", "Charlie");
    UserPerso3.put("favorite_color", "blue");
    UserPerso3.put("favorite_number", null);

    GenericRecord adress = new GenericData.Record(schemaAdress);

    adress.put("streetaddress", "mo");
    adress.put("city", "Paris");
    adress.put("state", "IDF");
    adress.put("zip", "75");

    ArrayList<GenericRecord> li = new ArrayList<>();
    li.add(adress);

    GenericRecord cust = new GenericData.Record(schemaCustomer);

    cust.put("user", UserPerso3);
    cust.put("phone", "0101010101");
    cust.put("address", li);

    String fileName = "cust.avro";

    File file = new File(fileName);

    DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<>(schemaCustomer);
    DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<>(datumWriter);
    dataFileWriter.create(schemaCustomer, file);
    dataFileWriter.append(cust);
    dataFileWriter.close();

    File a = new File(fileName);

    DatumReader<GenericRecord> datumReader = new GenericDatumReader<>(schemaCustomer);
    DataFileReader<GenericRecord> dataFileReader = new DataFileReader<>(a, datumReader);
    GenericRecord cust2 = null;
    while (dataFileReader.hasNext()) {
      cust2 = dataFileReader.next(cust2);
      System.out.println(cust2);

    }
  }
Share:
23,422

Related videos on Youtube

derdc
Author by

derdc

Updated on July 09, 2022

Comments

  • derdc
    derdc almost 2 years

    I am fairly new to Avro and going through documentation for nested types. I have the example below working nicely but many different types within the model will have addresses. Is it possible to define an address.avsc file and reference that as a nested type? If that is possible, can you also take it a step further and have a list of Addresses for a Customer? Thanks in advance.

    {"namespace": "com.company.model",
      "type": "record",
      "name": "Customer",
      "fields": [
        {"name": "firstname", "type": "string"},
        {"name": "lastname", "type": "string"},
        {"name": "email", "type": "string"},
        {"name": "phone", "type": "string"},
        {"name": "address", "type":
          {"type": "record",
           "name": "AddressRecord",
           "fields": [
             {"name": "streetaddress", "type": "string"},
             {"name": "city", "type": "string"},
             {"name": "state", "type": "string"},
             {"name": "zip", "type": "string"}
           ]}
        }
      ]
    }
    
  • RedBullet
    RedBullet about 8 years
    Not clear on how to use the parser in example #3. Once it is created, how does one go about creating a record (a blank record, not deserialize)
  • Princey James
    Princey James about 8 years
    @RedBullet I have edited my answer to clarify your doubt. Hope it is clear now.
  • Shannon
    Shannon almost 7 years
    In #2, your root type is a UNION, right? So that would allow users to serialize any of those top-level types as the root object? That's a little unfortunate because if you only want to serialize Customer objects at the top-level, you can't really get it to work that way.
  • userMod2
    userMod2 about 6 years
    I'm trying Option 2 - however when I use it I get an error - can you confirm it should like {"name": "address", "type": "com.company.model.AddressRecord"}
  • Antoine Boisier-Michaud
    Antoine Boisier-Michaud over 5 years
    You should leave that message as a comment under Princey James answer, since it is not a complete answer to the question.
  • Ming
    Ming over 5 years
    I dont have the 50 reputation needed to add comment to that answer.

Related