AVRO Validation

14,312

Solution 1

Not that I'm aware of. I wrote this little python script that will tell you if a json file matches a schema, but it won't tell you where the error is if there is one.

It depends on the Python avro library.

#!/usr/bin/env python

from avro.io import validate
from avro.schema import parse
from json import loads
from sys import argv

def main(argv):
    valid = set()
    invalid_avro = set()
    invalid_json = set()

    if len(argv) < 3:
        print "Give me an avro schema file and a whitespace-separated list of json files to validate against it."
    else:
        schema = parse(open(argv[1]).read())
        for arg in argv[2:]:
            try:
                json = loads(open(arg, 'r').read())
                if validate(schema, json):
                    valid.add(arg)
                else:
                    invalid_avro.add(arg)
            except ValueError:
                invalid_json.add(arg)
    print ' Valid files:\n\t' + '\n\t'.join(valid)
    print 'Invalid avro:\n\t' + '\n\t'.join(invalid_avro)
    print 'Invalid json:\n\t' + '\n\t'.join(invalid_json)

if '__main__' == __name__:
    main(argv)

Solution 2

I am not sure your question makes sense: since Avro Schema is MANDATORY when processing Avro data, it is always basically validated by default. Put another way, act of parsing Avro will by necessity validate it.

Unfortunately, given that there is very little metadata in Avro data, all incompatible changes will be essentially data corruption; and you may well just get garbage. This because there are no field ids or separators: all data is interpreted based on what Schema says must follow. This lack of redundancy makes data very compact, but also means that even smallest data corruption may make the whole data stream useless.

Solution 3

I made an Avro validator for JavaScript that you can run on JSON. It's not yet part of an Avro release, but it should be committed soon. You can find the patch at https://issues.apache.org/jira/browse/AVRO-485.

Share:
14,312
airboss
Author by

airboss

Updated on August 01, 2022

Comments

  • airboss
    airboss almost 2 years

    Am fairly new to AVRO so please excuse if am missing anything obvious. Is there an AVRO validator/commandline utility that validates input against an AVRO schema ? Or probably points to where the error is in the json input.

  • airboss
    airboss about 12 years
    I get what you are saying. But the Avro exceptions (in case of input mismatch with schema) are vague and don't exactly point to the actual issue in the input. Probably , Looking for something more user friendly.
  • airboss
    airboss about 12 years
    Thanks for the script...will try it out.But the fact that there's nothing that points to the actual issue is bugging.
  • kojiro
    kojiro about 12 years
    @Anup You'd have to break down the schema and the input into chunks and validate those chunks. Breaking up a RecordSchema is easy enough, but if the JSON input is invalid, it's hard to know how to break it up. If you have any suggestions, let me know.
  • StaxMan
    StaxMan about 12 years
    Ah, yeah, understood. Although you could probably suggest improvements to avro team, wrt default parser. I agree in that errors tend to be not that useful... :-/