hive-drop-import-delims not removing newline while using HCatalog in Sqoop
Solution 1
Use --map-column-java
option to explicitly state the column is of type String
. Then --hive-drop-import-delims
works as expected (to remove \n
from data).
Changed Sqoop Command :
sqoop import --connect jdbc:oracle:thin:@ORA_IP:ORA_PORT:ORA_SID \
--username user123 --password passwd123 -table SCHEMA.TBL_2 \
--hcatalog-table tbl2 --hcatalog-database testdb --num-mappers 1 \
--split-by SOME_ID --columns col1,col2,col3,col4 --hive-drop-import-delims \
--outdir /tmp/temp_table_loc --class-name "SqoopWithHCAT" \
--null-string "" --map-column-java col4=String
Solution 2
sqoop import \
--connect jdbc:oracle:thin:@ORA_IP:ORA_PORT:ORA_SID \
--username 123 \
--password 123 \
--table SCHEMA.TBL_2 \
--hcatalog-table tbl2 --hcatalog-database testdb --num-mappers 1 \
--split-by SOME_ID --columns col1,col2,col3,col4 \
--hive-delims-replacement "anything" \
--outdir /tmp/temp_table_loc --class-name "SqoopWithHCAT" \
--null-string ""
You can try this --hive-delims-replacement "anything" this will replace all \n , \t , and \01 characters with the string you provided(in this case replace with string "anything").
Suraj Nayak
Big Data Enthusiast using, developing and promoting big data using distributed platform to solve business problems, reduce cost and time in crunching enterprise data. Integrating analytics to unveil deep insights in the enterprise data.
Updated on June 04, 2022Comments
-
Suraj Nayak almost 2 years
Sqoop while used with HCatalog import not able to remove new line (\n) from column data even after using --hive-drop-import-delims option in the command when running Apache Sqoop with Oracle.
Sqoop Query:
sqoop import --connect jdbc:oracle:thin:@ORA_IP:ORA_PORT:ORA_SID \ --username user123 --password passwd123 -table SCHEMA.TBL_2 \ --hcatalog-table tbl2 --hcatalog-database testdb --num-mappers 1 \ --split-by SOME_ID --columns col1,col2,col3,col4 --hive-drop-import-delims \ --outdir /tmp/temp_table_loc --class-name "SqoopWithHCAT" \ --null-string ""
Data in Oracle Column col4 as below: (Data has control characters such as ^M)
<li>Details:^M <ul>^M <li>
Does Control character causing this problem?
Am I missing anything ? Is there any workaround or solution for this problem?
-
Suraj Nayak about 9 yearsBoth --hive-delims-replacement "null" and --hive-drop-import-delims did not work untill --map-column-java col4=String was added. That mean whichever column you want delims switch to work, it has to be java String type.
-
sudheer over 7 yearsHow can I do the same if I need to dump data into HDFS?
-
Kumar over 5 yearsI had a column with data type CLOB so i used --map-column-java to convert it to string then both hive-drop-import-delims and --hive-delims-replacement "|" worked