Insert overwrite partition in Hive table - Values getting duplicated

18,766

Solution 1

It seems like you forgot the WHERE clause in your last INSERT OVERWRITE:

INSERT INTO TABLE Unm_Parti_Trail PARTITION (Department = 'A') 
SELECT employeeid,firstname,designation, CASE WHEN employeeid=19 
THEN 50000 ELSE salary END AS salary FROM Unm_Parti_Trail 
WHERE department = 'A';

Solution 2

One possible solution.

When you do the insert it is necessary to select the partitioning fields as the last ones on the query. Eg:

INSERT INTO TABLE Unm_Parti_Trail PARTITION(department='A') 
SELECT EmployeeID, FirstName,Designation,Salary, Department 
FROM Unm_Dup_Parti_Trail
WHERE department='A';

See this link for more info.

Share:
18,766
USB
Author by

USB

Updated on June 05, 2022

Comments

  • USB
    USB almost 2 years

    I created a Hive table with Non-partition table and using select query I inserted data into Partitioned Hive table.

    Refered site

    1. By following above link my partition table contains duplicate values. Below are the setps

    This is my Sample employee dataset:link1

    I tried the following queries: link2

    But after updating a value in Hive table,

    Updating salary of Steven with EmployeeID 19 to 50000.

    INSERT OVERWRITE TABLE Unm_Parti_Trail PARTITION (Department = 'A') SELECT employeeid,firstname,designation, CASE WHEN employeeid=19 THEN 50000 ELSE salary END AS salary FROM Unm_Parti_Trail;

    the values are getting duplicated.

    7       Nirmal  Tech    12000   A
    7       Nirmal  Tech    12000   B
    

    Nirmal is placed in Department A only but it is duplicated to department B.

    Am I doing anything wrong?

    Please suggest.

  • sfotiadis
    sfotiadis over 9 years
    Yes exactly, I'm sorry I thought it wasn't confusing. I've updated the answer.
  • USB
    USB over 9 years
    But if we do the above query as you suggested it shows "FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into target table because column number/types are different ''A'': Table insclause-0 has 4 columns, but query has 5 columns.". we dont need to give department along with our select statement. ANd the link you provided only explains the insert not update statement
  • sfotiadis
    sfotiadis over 9 years
    Hmm yes you're write. It probably refers to an earlier version of hive. Which hive version are you using? Could you put into another paste bin the contents of your final table after each one of your four inserts?
  • USB
    USB over 9 years
    Thanks got it worked.Instead of Insert Overwrite we need to do an Insert into and I missed a where class too
  • sfotiadis
    sfotiadis over 9 years
    Nice. I've noticed the WHERE clause missing too, see the other solution I had proposed below. If you think it solves the problem you could accept it.
  • USB
    USB over 9 years
    Insert Overwrite deletes the entire Dept A and then inserts only the one record we are updating.Insert into is fine. See this post: unmeshasreeveni.blogspot.in/2014/11/…
  • USB
    USB over 9 years
    You can add all these into 1 answer.No need to add as different answers.You can label them as Edit 1,Edit 2...
  • vikrant rana
    vikrant rana about 5 years
    @sfotiadis ,, you have missed to select the partition column in your query.