Apache hive MSCK REPAIR TABLE new partition not added

37,173

For the MSCK to work, naming convention /partition_name=partition_value/ should be used. For example in the root directory of table;

# hadoop fs -ls /user/hive/root_of_table/*
 /user/hive/root_of_table/day=20200101/data1.parq
 /user/hive/root_of_table/day=20200101/data2.parq
 /user/hive/root_of_table/day=20200102/data3.parq
 /user/hive/root_of_table/day=20200102/data4.parq

When you run msck repair table <tablename> partitions of day; 20200101 and 20200102 will be added automatically.

Share:
37,173
Green
Author by

Green

Updated on July 09, 2022

Comments

  • Green
    Green almost 2 years

    I am new for Apache Hive. While working on external table partition, if I add new partition directly to HDFS, the new partition is not added after running MSCK REPAIR table. Below are the codes I tried,

    -- creating external table

    hive> create external table factory(name string, empid int, age int) partitioned by(region string)  
        > row format delimited fields terminated by ','; 
    

    --Detailed Table Information

    Location:  hdfs://localhost.localdomain:8020/user/hive/warehouse/factory     
    Table Type:             EXTERNAL_TABLE           
    Table Parameters:        
        EXTERNAL                TRUE                
        transient_lastDdlTime   1438579844  
    

    -- creating directory in HDFS to load data for table factory

    [cloudera@localhost ~]$ hadoop fs -mkdir 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory1'
    [cloudera@localhost ~]$ hadoop fs -mkdir 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory2'
    

    -- Table data

    cat factory1.txt
    emp1,500,40
    emp2,501,45
    emp3,502,50
    
    cat factory2.txt
    EMP10,200,25
    EMP11,201,27
    EMP12,202,30
    

    -- copying from local to HDFS

    [cloudera@localhost ~]$ hadoop fs -copyFromLocal '/home/cloudera/factory1.txt' 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory1'
    [cloudera@localhost ~]$ hadoop fs -copyFromLocal '/home/cloudera/factory2.txt' 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory2'
    

    -- Altering table to update in the metastore

    hive> alter table factory add partition(region='southregion') location '/user/hive/testing/testing1/factory2';
    hive> alter table factory add partition(region='northregion') location '/user/hive/testing/testing1/factory1';            
    hive> select * from factory;                                                                      
    OK
    emp1    500 40  northregion
    emp2    501 45  northregion
    emp3    502 50  northregion
    EMP10   200 25  southregion
    EMP11   201 27  southregion
    EMP12   202 30  southregion
    

    Now I created new file factory3.txt to add as new partition for the table factory

    cat factory3.txt
    user1,100,25
    user2,101,27
    user3,102,30
    

    -- creating the path and copying table data

    [cloudera@localhost ~]$ hadoop fs -mkdir 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory2'
    [cloudera@localhost ~]$ hadoop fs -copyFromLocal '/home/cloudera/factory3.txt' 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory3'
    

    now I executed the below query to update the metastore for the new partition added

    MSCK REPAIR TABLE factory;
    

    Now the table is not giving the new partition content of factory3 file. Can I know where I am doing mistake while adding partition for table factory?

    whereas, if I run the alter command then it is showing the new partition data.

    hive> alter table factory add partition(region='eastregion') location '/user/hive/testing/testing1/factory3';
    

    Can I know why the MSCK REPAIR TABLE command is not working?

  • user1735921
    user1735921 over 2 years
    didn't understand, what if there are 1000s of values ?
  • HakkiBuyukcengiz
    HakkiBuyukcengiz over 2 years
    Added example to answer.
  • Purushothaman Srikanth
    Purushothaman Srikanth over 2 years
    What if the partition directories are empty?