Insert overwrite partition in Hive table - Values getting duplicated
Solution 1
It seems like you forgot the WHERE clause in your last INSERT OVERWRITE:
INSERT INTO TABLE Unm_Parti_Trail PARTITION (Department = 'A')
SELECT employeeid,firstname,designation, CASE WHEN employeeid=19
THEN 50000 ELSE salary END AS salary FROM Unm_Parti_Trail
WHERE department = 'A';
Solution 2
One possible solution.
When you do the insert it is necessary to select the partitioning fields as the last ones on the query. Eg:
INSERT INTO TABLE Unm_Parti_Trail PARTITION(department='A')
SELECT EmployeeID, FirstName,Designation,Salary, Department
FROM Unm_Dup_Parti_Trail
WHERE department='A';
See this link for more info.
USB
Updated on June 05, 2022Comments
-
USB almost 2 years
I created a Hive table with Non-partition table and using select query I inserted data into Partitioned Hive table.
- By following above link my partition table contains duplicate values. Below are the setps
This is my Sample employee dataset:link1
I tried the following queries: link2
But after updating a value in Hive table,
Updating salary of Steven with EmployeeID 19 to 50000.
INSERT OVERWRITE TABLE Unm_Parti_Trail PARTITION (Department = 'A') SELECT employeeid,firstname,designation, CASE WHEN employeeid=19 THEN 50000 ELSE salary END AS salary FROM Unm_Parti_Trail;
the values are getting duplicated.
7 Nirmal Tech 12000 A 7 Nirmal Tech 12000 B
Nirmal is placed in Department A only but it is duplicated to department B.
Am I doing anything wrong?
Please suggest.
-
sfotiadis over 9 yearsYes exactly, I'm sorry I thought it wasn't confusing. I've updated the answer.
-
USB over 9 yearsBut if we do the above query as you suggested it shows "FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into target table because column number/types are different ''A'': Table insclause-0 has 4 columns, but query has 5 columns.". we dont need to give department along with our select statement. ANd the link you provided only explains the insert not update statement
-
sfotiadis over 9 yearsHmm yes you're write. It probably refers to an earlier version of hive. Which hive version are you using? Could you put into another paste bin the contents of your final table after each one of your four inserts?
-
USB over 9 yearsThanks got it worked.Instead of Insert Overwrite we need to do an Insert into and I missed a where class too
-
sfotiadis over 9 yearsNice. I've noticed the WHERE clause missing too, see the other solution I had proposed below. If you think it solves the problem you could accept it.
-
USB over 9 yearsInsert Overwrite deletes the entire Dept A and then inserts only the one record we are updating.Insert into is fine. See this post: unmeshasreeveni.blogspot.in/2014/11/…
-
USB over 9 yearsYou can add all these into 1 answer.No need to add as different answers.You can label them as Edit 1,Edit 2...
-
vikrant rana about 5 years@sfotiadis ,, you have missed to select the partition column in your query.