Drop table or truncate table in Cassandra, which is better
Solution 1
I think for almost all cases Truncate is a safer operation than a drop recreate. There have been several issues with dropping/recreating in the past with ghost data, schema disagreement, ect... Although there have been a number of fixes to try to make drop/recreate more stable, if its an operation you are performing every day Truncate should be much cheaper and more stable.
Solution 2
Drop table drops the table and all data. Truncate clears all data in the table, and by default creates a snapshot of the data (but not the schema). Efficiency wise, they're close - though truncate will create the snapshot. You can disable this by setting auto_snapshot to false in cassandra yaml config, but it is server wide. If it's not too much trouble, I'd drop and recreate table - but I've seen issues if you don't wait a while after drop before recreating.
Solution 3
NOTE: By default, snapshots are created when tables are dropped or truncated. This will need to be cleaned out manually to reclaim disk space.
Tested manually as well.
Truncate will keep the schema though, drop will not.
Solution 4
Beware!
From datastax documentation: https://docs.datastax.com/en/archived/cql/3.3/cql/cql_reference/cqlTruncate.html
Note: TRUNCATE sends a JMX command to all nodes, telling them to delete SSTables that hold the data from the specified table. If any of these nodes is down or doesn't respond, the command fails and outputs a message like the following: truncate cycling.user_activity; Unable to complete request: one or more nodes were unavailable.
Unfortunately, there is nothing on the documentation saying if DROP
behaves differently
Ankur
Updated on June 05, 2022Comments
-
Ankur almost 2 years
We have a use case where we need to re-create a table every day with current data in Cassandra. For this should we use drop table or truncate table, which would be efficient? We do not want the data to be backed up etc?
Thanks Ankur
-
Ankur over 9 yearsThanks a lot RussS! One question I have is if we use truncate will it create tombstones or remove the data immediately?
-
Ankur over 9 yearsThanks a lot Ashic! As you mentioned we also face this issue that even after dropping the table when we try to re-create it will not let us do that for significant time ~5 mins or so.
-
ashic over 9 yearsIt's immediate. No tombstones.
-
ashic over 9 yearsI recommended dropping as it being a one a day op, I imagined you might be able to afford a long delay between drop and create. If not, do a truncate, but remember to clear the snapshot, unless you disable auto snapshots.
-
Ankur over 9 yearsThank you Ashic. I tried using truncate table but even after the cql completed successfully, I was able to see records in the table. Then I tried drop table and as you mentioned even that was not effective even after an hour. I am willing to use truncate but it doesn't seeem to be immediate.
-
ashic over 9 yearsHow much data are you dropping? Truncate should be immediate. Are there records, say five minutes after truncating? When u say drop wasn't effective, what do you mean?
-
Ankur over 9 yearsWe have close to 4 billion rows but the last run was not successful so I am not really sure how many rows were in the file. I had waited for 15-20 mins before running the drop table. Also with drop table, I meant it was not immediate too. I waited for 30 mins after running drop table and still was not able to create the table, so I ran nodetool repair.
-
ashic over 9 yearsWhat happens if you drop the table, run a repair on the cluster, and then create the table?
-
Ankur over 9 yearsThat works but when I ran repair on my cluster for some reason some of my nodes went down. I had to re-start them.
-
ashic over 9 yearsIs your OS forcing the process to die? It's likely the OOM killer.