How to generate fields of type String instead of CharSequence using Avro?

18,082

Solution 1

If you want all you string fields be instances of java.lang.String then you only have to configure the compiler:

java -jar /path/to/avro-tools-1.7.7.jar compile -string schema 

or if you are using the Maven plugin

<plugin>
  <groupId>org.apache.avro</groupId>
  <artifactId>avro-maven-plugin</artifactId>
  <version>1.7.7</version>
  <configuration>
    <stringType>String</stringType>
  </configuration>
  [...]
</plugin>        

If you want one specific field to be of type java.lang.String then... you can't. It is not supported by the compiler. You can use "java-class" with the reflect API but the compiler does not care.

If you want to learn more, you can set a breakpoint in SpecificCompiler line 372, Avro 1.7.7. You can see that before the call to addStringType() the schema have the required information in the props field. If you pass this schema to SpecificCompiler.javaType() then it will do what you want. But then addStringType replaces your schema by a static one. I will most likely ask the question on the mailing list since I don't see the point.

Solution 2

You can set it per field level, just change the type to an object, and include "type" : "string" and "avro.java.string" : "String"

See below for example:

{
    "type": "record",
    "name": "test",
    "fields": [
        {
            "name": "name",
            "type": {
                "type": "string",
                "avro.java.string": "String"
            }
        }
    ]
}
Share:
18,082
Shekhar
Author by

Shekhar

Currently working as a Techno architect for AstraZeneca. Have vast experience of Big Data application design, planning, development, deployment and other phases of application development. Have hands on experience in Amazon Web Services, Hadoop, Hive, Pig, HBase, Kafka, IoT, Java, Storm technologies.

Updated on June 03, 2022

Comments

  • Shekhar
    Shekhar almost 2 years

    I wrote one Avro schema in which some of the fields ** need to be ** of type String but Avro has generated those fields of type CharSequence.

    I am not able to find any way to tell Avro to make those fields of type String.

    I tried to use

    "fields": [
        {
            "name":"startTime",
            "type":"string",
            "avro.java.stringImpl":"String"
        },
        {
            "name":"endTime",
            "type":"string",
            "avro.java.string":"String"
        }
    ]
    

    but for both the fields Avro is generating fields of type CharSequence.

    Is there any other way to make those fields of type String?