In Hive, does "Load data local inpath" overwrite existing data or append?

19,655

This site http://wiki.apache.org/hadoop/Hive/LanguageManual is your friend when dealing with Hive. :)

The page that addresses loading data into Hive is http://wiki.apache.org/hadoop/Hive/LanguageManual/DML That page states that

if the OVERWRITE keyword is used then the contents of the target table (or partition) will be deleted and replaced with the files referred to by filepath. Otherwise the files referred by filepath will be added to the table. Note that if the target table (or partition) already has a file whose name collides with any of the filenames contained in filepath - then the existing file will be replaced with the new file.

In your case, you are not using the OVERWRITE keyword, so the files will be added to the table. (Unless they are the same files, in which case they are overwritten)

Share:
19,655
CMaury
Author by

CMaury

Learning by doing.

Updated on June 18, 2022

Comments

  • CMaury
    CMaury almost 2 years

    I am hoping to run an import into Hive on a cron, and was hoping just using

    "load data local inpath '/tmp/data/x' into table X" into a table would be sufficient.

    Will subsequent commands overwrite whats already in the table? or will it append?

  • webDEVILopers
    webDEVILopers almost 11 years
    Assuming I have daily created files e.g. 20130808.csv and have to re-import the data for this day. Then all I have to do is import the same file with the same filename 20130808.csv again and the updated version will be added to the table? Or do I have to remove the rows myself by some kind of query?
  • Indrajeet Gour
    Indrajeet Gour over 6 years
    Just to update if you guys are using same file load, again and again, that will going to add again and again into the table, it is not that you would get the override the file is you gonna use the same for next load. just be beware of this.