Sqoop import as OrC file
Solution 1
At least in Sqoop 1.4.5 there exists hcatalog integration that support orc file format (amongst others).
For example you have the option
--hcatalog-storage-stanza
which can be set to
stored as orc tblproperties ("orc.compress"="SNAPPY")
Example:
sqoop import
--connect jdbc:postgresql://foobar:5432/my_db
--driver org.postgresql.Driver
--connection-manager org.apache.sqoop.manager.GenericJdbcManager
--username foo
--password-file hdfs:///user/foobar/foo.txt
--table fact
--hcatalog-home /usr/hdp/current/hive-webhcat
--hcatalog-database my_hcat_db
--hcatalog-table fact
--create-hcatalog-table
--hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="SNAPPY")'
Solution 2
Sqoop import supports only below formats.
--as-avrodatafile Imports data to Avro Data Files
--as-sequencefile Imports data to SequenceFiles
--as-textfile Imports data as plain text (default)
--as-parquetfile Imports data as parquet file (from sqoop 1.4.6 version)
Solution 3
In current version of sqoop available, it is not possible to import data from RDBS to HDFS in ORC format in a single shoot command. This is something known issue in sqoop. Reference link for this issue raised: https://issues.apache.org/jira/browse/SQOOP-2192
I think the only alternative available for now, is the same as you mentioned. I also came across the similar use case, and have used the alternative two step approach.
Rajashekar Reddy Peta
Updated on June 04, 2022Comments
-
Rajashekar Reddy Peta almost 2 years
Is there any option in sqoop to import data from RDMS and store it as ORC file format in HDFS?
Alternatives tried: imported as text format and used a temp table to read input as text file and write to hdfs as orc in hive
-
Arun almost 8 yearsI am using Sqoop 1.4.6.2.3.4.0 to move table from Netezza to HIVE in ORC format. But I see the following issue. stackoverflow.com/questions/36782519/…
-
Edi Bice almost 8 yearsI do something similar (except I do not specify hcatalog-home and I do specify direct mode) to import from Oracle to Hive. My resulting hive table is okay at first but after some time I can't query it - I get {"trace":"org.apache.ambari.view.hive.client.HiveErrorStatusException: H170 Unable to fetch results. java.io.IOException: java.io.IOException: Error reading file: hdfs://host:port/apps/hive/warehouse/db/table/part-m-0000