How to insert direct values into a hive table?

sql hadoop insert hive

21,811

Solution 1

You can use hive's table generating functions,like exlode() or stack()

Example

Table struct as (name String, age Int)

INSERT INTO TABLE target_table
SELECT STACK(
    2,                 # Amount of record
    'John', 80,        # record 1
    'Bill', 61         # record 2
    ) 
FROM dual              # Any table already exists
LIMIT 2;               # Amount of record! Have to add this line!

That will add 2 records in your target_table.

Solution 2

As of latest version of Hive, insert into .. values (...)is not supported. The enhancement to insert/update/delete syntax is under development. Please look at the Implement insert, update, and delete in Hive with full ACID support

Solution 3

Inserting values into a table is now supported by HIVE from the version Hive 0.14.

CREATE TABLE students (name VARCHAR(64), age INT, gpa DECIMAL(3, 2)) CLUSTERED BY (age) INTO 2 BUCKETS STORED AS ORC;

INSERT INTO TABLE students VALUES ('fred flintstone', 35, 1.28), ('barney rubble', 32, 2.32);

More can be found at https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingvaluesintotablesfromSQL

21,811

Author by

Kaushik

Updated on September 16, 2020

Comments

Kaushik over 3 years
I am new to hive. I just wanted to know how I can insert data into Hive table directly
```
Create table t1 ( name string)

and I want to insert a value eg name = 'John'
```
But I have seen so many documentation there isn't any example that inserts data directly into the table. Either I need to create a file internal or externally and add the value 'John' and load this data into the table or i can load data from another table.

My goal is to add data directly into the hive table by providing a values directly? I have provided an oracle example of a sql query I want to achieve:
```
INSERT INTO t1 (name)
values ('John')
```
I want an equivalent statement as above in Hive ?
Rick Moritz over 6 years

It should be noted, that unless the transactional feature is being used, or the VALUES array is very large, this is usually an anti-pattern, since each insert generates at least one new file in HDFS. Queries over thousands of files are very inefficient.