Apache hive MSCK REPAIR TABLE new partition not added
For the MSCK
to work, naming convention /partition_name=partition_value/
should be used. For example in the root directory of table;
# hadoop fs -ls /user/hive/root_of_table/*
/user/hive/root_of_table/day=20200101/data1.parq
/user/hive/root_of_table/day=20200101/data2.parq
/user/hive/root_of_table/day=20200102/data3.parq
/user/hive/root_of_table/day=20200102/data4.parq
When you run msck repair table <tablename>
partitions of day
; 20200101
and 20200102
will be added automatically.
Green
Updated on July 09, 2022Comments
-
Green almost 2 years
I am new for Apache Hive. While working on external table partition, if I add new partition directly to HDFS, the new partition is not added after running MSCK REPAIR table. Below are the codes I tried,
-- creating external table
hive> create external table factory(name string, empid int, age int) partitioned by(region string) > row format delimited fields terminated by ',';
--Detailed Table Information
Location: hdfs://localhost.localdomain:8020/user/hive/warehouse/factory Table Type: EXTERNAL_TABLE Table Parameters: EXTERNAL TRUE transient_lastDdlTime 1438579844
-- creating directory in HDFS to load data for table factory
[cloudera@localhost ~]$ hadoop fs -mkdir 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory1' [cloudera@localhost ~]$ hadoop fs -mkdir 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory2'
-- Table data
cat factory1.txt emp1,500,40 emp2,501,45 emp3,502,50 cat factory2.txt EMP10,200,25 EMP11,201,27 EMP12,202,30
-- copying from local to HDFS
[cloudera@localhost ~]$ hadoop fs -copyFromLocal '/home/cloudera/factory1.txt' 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory1' [cloudera@localhost ~]$ hadoop fs -copyFromLocal '/home/cloudera/factory2.txt' 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory2'
-- Altering table to update in the metastore
hive> alter table factory add partition(region='southregion') location '/user/hive/testing/testing1/factory2'; hive> alter table factory add partition(region='northregion') location '/user/hive/testing/testing1/factory1'; hive> select * from factory; OK emp1 500 40 northregion emp2 501 45 northregion emp3 502 50 northregion EMP10 200 25 southregion EMP11 201 27 southregion EMP12 202 30 southregion
Now I created new file factory3.txt to add as new partition for the table factory
cat factory3.txt user1,100,25 user2,101,27 user3,102,30
-- creating the path and copying table data
[cloudera@localhost ~]$ hadoop fs -mkdir 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory2' [cloudera@localhost ~]$ hadoop fs -copyFromLocal '/home/cloudera/factory3.txt' 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory3'
now I executed the below query to update the metastore for the new partition added
MSCK REPAIR TABLE factory;
Now the table is not giving the new partition content of factory3 file. Can I know where I am doing mistake while adding partition for table factory?
whereas, if I run the alter command then it is showing the new partition data.
hive> alter table factory add partition(region='eastregion') location '/user/hive/testing/testing1/factory3';
Can I know why the MSCK REPAIR TABLE command is not working?
-
user1735921 over 2 yearsdidn't understand, what if there are 1000s of values ?
-
HakkiBuyukcengiz over 2 yearsAdded example to answer.
-
Purushothaman Srikanth over 2 yearsWhat if the partition directories are empty?