Add partitions on existing hive table
20,891
Solution 1
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
INSERT OVERWRITE TABLE table_name PARTITION(Date) select date from table_name;
Note : In the insert statement for a partitioned table make sure that you are specifying the partition columns at the last in select clause.
Solution 2
You have to restructure the table. Here are the steps:
- Make sure no other process is writing to the table.
- Create new external table using partitioning
- Insert into new table by selecting from the old table
- Drop the new table (external), only table will be dropped but data will be there
- Drop the old table
- Create the table with original name by pointing to the location under step 2
- You can run repair command to fix all the metadata.
Alternative 4, 5, 6 and 7
- Create the table with original name by running
show create table
on new table and replace with original table name - Run
LOAD DATA INPATH
command to move files under partitions to new partitions of new table - Drop the external table created
Both the approaches will achieve restructuring with one insert/map reduce job.
Author by
Shakile
Updated on January 09, 2020Comments
-
Shakile over 4 years
I'm processing a big hive's table (more than 500 billion records). The processing is too slow and I would like to make it faster. I think that by adding partitions, the process could be more efficient.
Can anybody tell me how I can do that? Note that my table already exists.
My table :
create table T( nom string, prenom string, ... date string)
Partitioning on date field.
Thx