storing pig output into Hive table in a single instance
Solution 1
Ok. Create a external hive table with a schema layout somewhere in HDFS directory. Lets say
create external table emp_records(id int,
name String,
city String)
row formatted delimited
fields terminated by '|'
location '/user/cloudera/outputfiles/usecase1';
Just create a table like above and no need to load any file into that directory.
Now write a Pig script that we read data for some input directory and then when you store the output of that Pig script use as below
A = LOAD 'inputfile.txt' USING PigStorage(',') AS(id:int,name:chararray,city:chararray);
B = FILTER A by id > = 678933;
C = FOREACH B GENERATE id,name,city;
STORE C INTO '/user/cloudera/outputfiles/usecase1' USING PigStorage('|');
Ensure that destination location and delimiter and schema layout of final FOREACH statement in you Pigscript matches with Hive DDL schema.
Solution 2
There are two approaches explained below with 'Employee' table example to store pig output into hive table. (Prerequisite is that hive table should be already created)
A = LOAD 'EMPLOYEE.txt' USING PigStorage(',') AS(EMP_NUM:int,EMP_NAME:chararray,EMP_PHONE:int);
Approach 1: Using Hcatalog
// dump pig result to Hive using Hcatalog
store A into 'Empdb.employee' using org.apache.hive.hcatalog.pig.HCatStorer();
(or)
Approach 2: Using HDFS physical location
// dump pig result to external hive warehouse location
STORE A INTO 'hdfs://<<nmhost>>:<<port>>/user/hive/warehouse/Empdb/employee/' USING PigStorage(',')
;
Kirthika
Updated on June 13, 2022Comments
-
Kirthika almost 2 years
I would like to insert the pig output into Hive tables(tables in Hive is already created with the exact schema).Just need to insert the output values into table. I dont want to the usual method, wherein I first store into a file, then read that file from Hive and then insert into tables. I need to reduce that extra hop which is done.
Is it possible. If so please tell me how this can be done ?
Thanks