Spark2 Can't write dataframe to parquet hive table : HiveFileFormat`. It doesn't match the specified format `ParquetFileFormat`

11,358

I have just tried to use .format("hive") to saveAsTable after getting the error and it worked.

I also would not recommend to use insertInto suggested by the author, because it looks not type-safe (as much as this term can be applied to SQL API) and is error-prone in the way it ignores column names and uses position-base resolution.

Share:
11,358
youssef grati
Author by

youssef grati

Updated on June 17, 2022

Comments

  • youssef grati
    youssef grati almost 2 years

    I'm trying to save dataframe in table hive.

    In spark 1.6 it's work but after migration to 2.2.0 it doesn't work anymore.

    Here's the code:

    blocs
          .toDF()
          .repartition($"col1", $"col2", $"col3", $"col4")
          .write
          .format("parquet")
          .mode(saveMode)
          .partitionBy("col1", "col2", "col3", "col4")
          .saveAsTable("db".tbl)
    

    The format of the existing table project_bsc_dhr.bloc_views is HiveFileFormat. It doesn't match the specified format ParquetFileFormat.; org.apache.spark.sql.AnalysisException: The format of the existing table project_bsc_dhr.bloc_views is HiveFileFormat. It doesn't match the specified format ParquetFileFormat.;

  • Ak777
    Ak777 over 3 years
    how do i insert only specific columns from the dataFrame into the hive table? say, i have 50 columns in my table, but i have 20 columns only in my DF that i want to update/insert to the table. consider those 20 as required while the others are not mandatory. With above, it gives the position/column mismatch kind of error.