Sqoop import as OrC file

16,050

Solution 1

At least in Sqoop 1.4.5 there exists hcatalog integration that support orc file format (amongst others).

For example you have the option

--hcatalog-storage-stanza

which can be set to

stored as orc tblproperties ("orc.compress"="SNAPPY")

Example:

sqoop import 
 --connect jdbc:postgresql://foobar:5432/my_db 
 --driver org.postgresql.Driver 
 --connection-manager org.apache.sqoop.manager.GenericJdbcManager 
 --username foo 
 --password-file hdfs:///user/foobar/foo.txt 
 --table fact 
 --hcatalog-home /usr/hdp/current/hive-webhcat 
 --hcatalog-database my_hcat_db 
 --hcatalog-table fact 
 --create-hcatalog-table 
 --hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="SNAPPY")'

Solution 2

Sqoop import supports only below formats.

--as-avrodatafile   Imports data to Avro Data Files

--as-sequencefile   Imports data to SequenceFiles

--as-textfile   Imports data as plain text (default)

--as-parquetfile    Imports data as parquet file (from sqoop 1.4.6 version)

Solution 3

In current version of sqoop available, it is not possible to import data from RDBS to HDFS in ORC format in a single shoot command. This is something known issue in sqoop. Reference link for this issue raised: https://issues.apache.org/jira/browse/SQOOP-2192

I think the only alternative available for now, is the same as you mentioned. I also came across the similar use case, and have used the alternative two step approach.

Share:
16,050
Rajashekar Reddy Peta
Author by

Rajashekar Reddy Peta

Updated on June 04, 2022

Comments

  • Rajashekar Reddy Peta
    Rajashekar Reddy Peta almost 2 years

    Is there any option in sqoop to import data from RDMS and store it as ORC file format in HDFS?

    Alternatives tried: imported as text format and used a temp table to read input as text file and write to hdfs as orc in hive

  • Arun
    Arun almost 8 years
    I am using Sqoop 1.4.6.2.3.4.0 to move table from Netezza to HIVE in ORC format. But I see the following issue. stackoverflow.com/questions/36782519/…
  • Edi Bice
    Edi Bice almost 8 years
    I do something similar (except I do not specify hcatalog-home and I do specify direct mode) to import from Oracle to Hive. My resulting hive table is okay at first but after some time I can't query it - I get {"trace":"org.apache.ambari.view.hive.client.HiveErrorStatus‌​Exception: H170 Unable to fetch results. java.io.IOException: java.io.IOException: Error reading file: hdfs://host:port/apps/hive/warehouse/db/table/part-m-0000