Case sensitive column names in Hive

14,805

This is an old question, but the partition column has to be case sensitive because of the unix filesystem on which it gets stored.

path "/columnname=value/" is always different from path "/columnName=value/" in unix

So it should be considered a bad practice to rely on case insensitive column names for Hive.

Share:
14,805
Raymond26
Author by

Raymond26

Updated on June 09, 2022

Comments

  • Raymond26
    Raymond26 almost 2 years

    I am trying to create an external HIVE table with partitions. Some of my column names have Upper case letters. This caused a problem while creating tables since the values of column names with upper case letters were returned as NULL. I then modified the ParquetSerDe in order for it to handle this by using SERDEPROPERTIES and this was working with external tables (not partitioned). Now I am trying to create an external table WITH partitions, and whenever I try to access the upper case columns (Eg FieldName) I get this error. select FieldName from tablename;

        FAILED: RuntimeException Java. Lang.RuntimeException: cannot find field
        FieldName from
        [org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@4f45884b,
        org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@8f11f27,
        org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@77e8eb0e,
        org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@1dae4cd,
       org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@623e336d
       ]
    

    Are there any suggestions you can think of? I cannot change the schema of the data source.

    This is the command I use to create tables -

        CREATE EXTERNAL TABLE tablename (fieldname string)
        PARTITIONED BY (partion_name string)
        ROW FORMAT SERDE 'path.ModifiedParquetSerDeLatest'
        WITH SERDEPROPERTIES ("casesensitive"="FieldName")
        STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'
        OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'
    

    And then add partition:

        ALTER TABLE tablename ADD PARTITION (partition_name='partitionvalue')
        LOCATION '/path/to/data'