Trying to connect to Oracle from Spark

23,729

Solution 1

This one got worked out with this change.

   sqlContext = SQLContext(sc)
   ora_tmp=spark.read.format('jdbc').options(
        url=Oracle_CONNECTION_URL,
        dbtable="tablename",
        driver="oracle.jdbc.OracleDriver"
        ).load() 

Solution 2

Using this setup:

spark_session = ...



 
emDF = spark_session.read \
        .format("jdbc") \
        .option("url", "jdbc:oracle:thin:@your_aliastns?TNS_ADMIN=path/to/wallet") \
        .option("dbtable", 'table_name or query') \
        .option("user", "user") \
        .option("password", "password") \
        .option("driver", "oracle.jdbc.driver.OracleDriver") \
        .load()

Remember to see ojbcX.jar and put in first wallet folder, i.e. firstdfolder/secondfolder/walletfolder

Put in firstfolder.

The wallet folder contain the alias (tnsnames.ora) Verify this!

Solution 3

I have followed below code and it work for me. Import jdbc drive(ojdbc6).

    import org.apache.spark.sql.SparkSession

    object ConnectingOracleDatabase {
      def main(args: Array[String]): Unit = {
        val spark = SparkSession.builder()
          .appName("ConnectingOracleDatabase")
          .master("local")
          .getOrCreate()

    val jdbcDF = spark.read
      .format("jdbc")
      .option("url", "jdbc:oracle:thin:@localhost:1521:xe")
      .option("dbtable", "ADDRESS")
      .option("user", "system")
      .option("password", "oracle")
      .option("driver", "oracle.jdbc.OracleDriver")
      .load()

    jdbcDF.show()

  }
}
Share:
23,729
Ramsey
Author by

Ramsey

Updated on December 10, 2021

Comments

  • Ramsey
    Ramsey over 2 years

    I am trying to connect to Oracle to Spark and want pull data from some table and SQL queries. But I am not able to connect to Oracle. I have tried different work around options, but no look. I have followed the below steps. Please correct me if I need to make any changes.

    I am using Windows 7 machine. I using Jupyter notebook to use Pyspark. I have python 2.7 and Spark 2.1.0. I have set a spark Class path in environment variables:

      SPARK_CLASS_PATH = C:\Oracle\Product\11.2.0\client_1\jdbc\lib\ojdbc6.jar
    

    jdbcDF = sqlContext.read.format("jdbc").option("driver", "oracle.jdbc.driver.OracleDriver").option("url", "jdbc:oracle://dbserver:port#/database").option("dbtable","Table_name").option("user","username").option("password","password").load()

    Errors:

    1.Py4JJavaError:

    An error occurred while calling o148.load.
    : java.sql.SQLException: Invalid Oracle URL specified
    

    2.Py4JJavaError:

    An error occurred while calling o114.load. : java.lang.ClassNotFoundException: oracle.jdbc.driver.OracleDriver
    

    Another Scenario:

      from pyspark import SparkContext, SparkConf
        from pyspark.sql import SQLContext
        ORACLE_DRIVER_PATH = "C:\Oracle\Product\11.2.0\client_1\jdbc\lib\ojdbc7.jar"                                            
        Oracle_CONNECTION_URL ="jdbc:oracle:thin:username/password@servername:port#/dbservicename"    
       conf = SparkConf()
       conf.setMaster("local")
       conf.setAppName("Oracle_imp_exp")       
       sqlContext = SQLContext(sc)
       ora_tmp=sqlContext.read.format('jdbc').options(
            url=Oracle_CONNECTION_URL,
            dbtable="tablename",
            driver="oracle.jdbc.OracleDriver"
            ).load() 
    

    I am getting below error.

    Error: IllegalArgumentException: u"Error while instantiating org.apache.spark.sql.hive.HiveSessionState':"
    

    Please help me on this.

  • CodeRunner
    CodeRunner about 5 years
    @Ramsey - can you please check this and provide solution stackoverflow.com/questions/56151363/…