How to rename a column in Databricks

14,937

Solution 1

You can't rename or change a column datatype in Databricks, only add new columns, reorder them or add column comments. To do this you must rewrite the table using the overwriteSchema option.

Take this example below from this documentation:

spark.read.table(...)
  .withColumnRenamed("date", "date_created")
  .write
  .mode("overwrite")
  .option("overwriteSchema", "true")
  .table(...)

Solution 2

To be able to rename the column, overwriteSchema with saveAsTable should be used:

spark.read.table(Table_Name)
  .withColumnRenamed("currentName", "newName")
  .write
  .format("delta")
  .mode("overwrite")
  .option("overwriteSchema", "true")
  .saveAsTable("Table_Name")
Share:
14,937
David Maddox
Author by

David Maddox

I am a Data Engineer and MBA who strives to figure out the business problem BEFORE putting in the hard work of making data systems do something cool. I love optimizing SQL and getting people answers before they get twitchy or forget what they were asking for in the first place. Beyond coding, I'm a family man first and foremost. I belong to the Church of Jesus Christ of Latter-Day Saints, and it makes me so happy.

Updated on July 29, 2022

Comments

  • David Maddox
    David Maddox over 1 year

    How do you rename a column in Databricks?

    The following does not work:

    ALTER TABLE mySchema.myTable change COLUMN old_name new_name int
    

    It returns the error:

    ALTER TABLE CHANGE COLUMN is not supported for changing column 'old_name' with type 'IntegerType >(nullable = true)' to 'new_name' with type 'IntegerType (nullable = true)';

    If it makes a difference, this table is using Delta Lake, and it is NOT partitioned or z-ordered by this "old_name" column.