Add partitions on existing hive table

20,891

Solution 1

 SET hive.exec.dynamic.partition = true;

SET hive.exec.dynamic.partition.mode = nonstrict;

INSERT OVERWRITE TABLE table_name PARTITION(Date) select date from table_name; 

Note : In the insert statement for a partitioned table make sure that you are specifying the partition columns at the last in select clause.

Solution 2

You have to restructure the table. Here are the steps:

  1. Make sure no other process is writing to the table.
  2. Create new external table using partitioning
  3. Insert into new table by selecting from the old table
  4. Drop the new table (external), only table will be dropped but data will be there
  5. Drop the old table
  6. Create the table with original name by pointing to the location under step 2
  7. You can run repair command to fix all the metadata.

Alternative 4, 5, 6 and 7

  1. Create the table with original name by running show create table on new table and replace with original table name
  2. Run LOAD DATA INPATH command to move files under partitions to new partitions of new table
  3. Drop the external table created

Both the approaches will achieve restructuring with one insert/map reduce job.

Share:
20,891
Shakile
Author by

Shakile

Updated on January 09, 2020

Comments

  • Shakile
    Shakile over 4 years

    I'm processing a big hive's table (more than 500 billion records). The processing is too slow and I would like to make it faster. I think that by adding partitions, the process could be more efficient.

    Can anybody tell me how I can do that? Note that my table already exists.

    My table :

    create table T(
    nom string,
    prenom string,
    ...
    date string)
    

    Partitioning on date field.

    Thx