Trouble when writing the data to Delta Lake in Azure databricks (Incompatible format detected)

11,042

Solution 1

This error message is telling you that there is already data at the destination path (in this case dbfs:/user/[email protected]/delta/customer-data/), and that that data is not in the Delta format (i.e. there is no transaction log). You can either choose a new path (which based on the comments above, it seems like you did) or delete that directory and try again.

Solution 2

I found this Question with this search: "You are trying to write to *** using Databricks Delta, but there is no transaction log present."

In case someone searches for the same: For me the solution was to explicitly code

.write.format("parquet")

because

.format("delta")

is the dafault since Databricks Runtime 8.0 and above and I need "parquet" for legacy reasons.

11,042

Author by

Themis

Updated on June 13, 2022

Comments

Themis almost 2 years

I need to read dataset into a DataFrame, then write the data to Delta Lake. But I have the following exception :

AnalysisException: 'Incompatible format detected.\n\nYou are trying to write to `dbfs:/user/[email protected]/delta/customer-data/` using Databricks Delta, but there is no\ntransaction log present. Check the upstream job to make sure that it is writing\nusing format("delta") and that you are trying to write to the table base path.\n\nTo disable this check, SET spark.databricks.delta.formatCheck.enabled=false\nTo learn more about Delta, see https://docs.azuredatabricks.net/delta/index.html\n;

Here is the code preceding the exception :

from pyspark.sql.types import StructType, StructField, DoubleType, IntegerType, StringType

inputSchema = StructType([
  StructField("InvoiceNo", IntegerType(), True),
  StructField("StockCode", StringType(), True),
  StructField("Description", StringType(), True),
  StructField("Quantity", IntegerType(), True),
  StructField("InvoiceDate", StringType(), True),
  StructField("UnitPrice", DoubleType(), True),
  StructField("CustomerID", IntegerType(), True),
  StructField("Country", StringType(), True)
])

rawDataDF = (spark.read
  .option("header", "true")
  .schema(inputSchema)
  .csv(inputPath)
)

# write to Delta Lake
rawDataDF.write.mode("overwrite").format("delta").partitionBy("Country").save(DataPath)