hive-drop-import-delims not removing newline while using HCatalog in Sqoop

13,700

Solution 1

Use --map-column-java option to explicitly state the column is of type String. Then --hive-drop-import-delims works as expected (to remove \n from data).

Changed Sqoop Command :

sqoop import --connect jdbc:oracle:thin:@ORA_IP:ORA_PORT:ORA_SID \
--username user123 --password passwd123 -table SCHEMA.TBL_2 \ 
--hcatalog-table tbl2 --hcatalog-database testdb --num-mappers 1 \ 
--split-by SOME_ID --columns col1,col2,col3,col4 --hive-drop-import-delims \
--outdir /tmp/temp_table_loc --class-name "SqoopWithHCAT" \
--null-string "" --map-column-java col4=String

Solution 2

sqoop import \
--connect jdbc:oracle:thin:@ORA_IP:ORA_PORT:ORA_SID \
--username 123 \
--password 123 \
--table SCHEMA.TBL_2 \
--hcatalog-table tbl2 --hcatalog-database testdb --num-mappers 1 \
--split-by SOME_ID --columns col1,col2,col3,col4 \
--hive-delims-replacement "anything" \
--outdir /tmp/temp_table_loc --class-name "SqoopWithHCAT" \
--null-string ""

You can try this --hive-delims-replacement "anything" this will replace all \n , \t , and \01 characters with the string you provided(in this case replace with string "anything").

Share:
13,700
Suraj Nayak
Author by

Suraj Nayak

Big Data Enthusiast using, developing and promoting big data using distributed platform to solve business problems, reduce cost and time in crunching enterprise data. Integrating analytics to unveil deep insights in the enterprise data.

Updated on June 04, 2022

Comments

  • Suraj Nayak
    Suraj Nayak almost 2 years

    Sqoop while used with HCatalog import not able to remove new line (\n) from column data even after using --hive-drop-import-delims option in the command when running Apache Sqoop with Oracle.

    Sqoop Query:

        sqoop import --connect jdbc:oracle:thin:@ORA_IP:ORA_PORT:ORA_SID \
    --username user123 --password passwd123 -table SCHEMA.TBL_2 \ 
    --hcatalog-table tbl2 --hcatalog-database testdb --num-mappers 1 \ 
    --split-by SOME_ID --columns col1,col2,col3,col4 --hive-drop-import-delims \
    --outdir /tmp/temp_table_loc --class-name "SqoopWithHCAT" \
    --null-string ""
    

    Data in Oracle Column col4 as below: (Data has control characters such as ^M)

    <li>Details:^M
        <ul>^M
            <li>
    

    Does Control character causing this problem?

    Am I missing anything ? Is there any workaround or solution for this problem?

  • Suraj Nayak
    Suraj Nayak about 9 years
    Both --hive-delims-replacement "null" and --hive-drop-import-delims did not work untill --map-column-java col4=String was added. That mean whichever column you want delims switch to work, it has to be java String type.
  • sudheer
    sudheer over 7 years
    How can I do the same if I need to dump data into HDFS?
  • Kumar
    Kumar over 5 years
    I had a column with data type CLOB so i used --map-column-java to convert it to string then both hive-drop-import-delims and --hive-delims-replacement "|" worked