Dynamically create Hive external table with Avro schema on Parquet Data

18,144

Below query works:

CREATE TABLE avro_test ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS AVRO TBLPROPERTIES ('avro.schema.url'='myHost/myAvroSchema.avsc'); 

CREATE EXTERNAL TABLE parquet_test LIKE avro_test STORED AS PARQUET LOCATION 'hdfs://myParquetFilesPath';
Share:
18,144
tmouron
Author by

tmouron

Updated on June 07, 2022

Comments

  • tmouron
    tmouron almost 2 years

    I'm trying to dynamically (without listing column names and types in Hive DDL) create a Hive external table on parquet data files. I have the Avro schema of underlying parquet file.

    My try is to use below DDL:

    CREATE EXTERNAL TABLE parquet_test
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
    STORED AS PARQUET
    LOCATION 'hdfs://myParquetFilesPath'
    TBLPROPERTIES ('avro.schema.url'='http://myHost/myAvroSchema.avsc');
    

    My Hive table is successfully created with the right schema, but when I try to read the data :

    SELECT * FROM parquet_test;
    

    I get the following error :

    java.io.IOException: org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Expecting a AvroGenericRecordWritable
    

    Is there a way to successfully create and read Parquet files, without mentioning columns name and types list in DDL?