Error creating transactional connection factory during running Spark on Hive project in IDEA

12,897

Solution 1

I believe adding hiveContext may help, if not added already in code.

import org.apache.spark.sql.hive.HiveContext
import hiveContext.implicits._
import hiveContext.sql
val hiveContext = new HiveContext(sc)

A hive context adds support for finding tables in the MetaStore and writing queries using HiveQL. Users who do not have an existing Hive deployment can still create a HiveContext. When not configured by the hive-site.xml, the context automatically creates metastore_db and warehouse in the current directory. -- From spark example

Issue was resolved by,

Actually I created metastore_db manually so if spark connects mysql, it will pass the direct sql DbType test. However since it didn't pass the mysql direct sql therefore spark used derby as default metastore db as what was shown in the log4j. This implies spark was not connected to metastore_db in mysql although hive-site.xml was correctly configured. I found a solution for this which is described in the question comment. Hope it helps

Solution 2

If you are using Mysql database specify mysql driver path in spark-env.sh file,

SampleCode: export SPARK_CLASSPATH="/home/${user.name}/work/apache-hive-2.0.0-bin/lib/mysql-connector-java-5.1.38-bin.jar"

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

export SPARK_LOG_DIR=/home/hadoop/work/spark-2.0.0-bin-hadoop2.7/logs

export SPARK_WORKER_DIR=/home/hadoop/work/spark-2.0.0-bin-hadoop2.7/logs/worker_dir

More info: https://www.youtube.com/watch?v=y3E8tUFhBt0

Share:
12,897
Shirui Tang
Author by

Shirui Tang

Updated on July 09, 2022

Comments

  • Shirui Tang
    Shirui Tang almost 2 years

    I am trying to setup a develop environment for a Spark Streaming project which requires write data into Hive. I have a cluster with 1 master, 2 slaves and 1 develop machine (coding in Intellij Idea 14).

    Within the spark shell, everything seems working fine and I am able to store data into default database in Hive via Spark 1.5 using DataFrame.write.insertInto("testtable")

    However when creating a scala project in IDEA and run it using same cluster with same setting, Error was thrown when creating transactional connection factory in the metastore database which suppose to be "metastore_db" in mysql.

    Here's the hive-site.xml:

    <configuration>
    <property>
      <name>hive.metastore.uris</name>
      <value>thrift://10.1.50.73:9083</value>
    </property>
    <property>
      <name>hive.metastore.warehouse.dir</name>
      <value>/user/hive/warehouse</value>
    </property>
    <property>
      <name>javax.jdo.option.ConnectionURL</name>
      <value>jdbc:mysql://10.1.50.73:3306/metastore_db?createDatabaseIfNotExist=true</value>
    </property>
    <property>
      <name>javax.jdo.option.ConnectionDriverName</name>
      <value>com.mysql.jdbc.Driver</value>
    </property>
    <property>
      <name>javax.jdo.option.ConnectionUserName</name>
      <value>sa</value>
    </property>
    <property>
      <name>javax.jdo.option.ConnectionPassword</name>
      <value>huanhuan</value>
    </property>
    </configuration>
    

    the machine which I was running IDEA, can remotely login Mysql and Hive to create tables, so there should have no problem with permissions. Here's the log4j Output:

    > /home/stdevelop/jdk1.7.0_79/bin/java -Didea.launcher.port=7536 -Didea.launcher.bin.path=/home/stdevelop/idea/bin -Dfile.encoding=UTF-8 -classpath /home/stdevelop/jdk1.7.0_79/jre/lib/plugin.jar:/home/stdevelop/jdk1.7.0_79/jre/lib/deploy.jar:/home/stdevelop/jdk1.7.0_79/jre/lib/jfxrt.jar:/home/stdevelop/jdk1.7.0_79/jre/lib/charsets.jar:/home/stdevelop/jdk1.7.0_79/jre/lib/javaws.jar:/home/stdevelop/jdk1.7.0_79/jre/lib/jfr.jar:/home/stdevelop/jdk1.7.0_79/jre/lib/jce.jar:/home/stdevelop/jdk1.7.0_79/jre/lib/jsse.jar:/home/stdevelop/jdk1.7.0_79/jre/lib/rt.jar:/home/stdevelop/jdk1.7.0_79/jre/lib/resources.jar:/home/stdevelop/jdk1.7.0_79/jre/lib/management-agent.jar:/home/stdevelop/jdk1.7.0_79/jre/lib/ext/zipfs.jar:/home/stdevelop/jdk1.7.0_79/jre/lib/ext/sunec.jar:/home/stdevelop/jdk1.7.0_79/jre/lib/ext/sunpkcs11.jar:/home/stdevelop/jdk1.7.0_79/jre/lib/ext/sunjce_provider.jar:/home/stdevelop/jdk1.7.0_79/jre/lib/ext/localedata.jar:/home/stdevelop/jdk1.7.0_79/jre/lib/ext/dnsns.jar:/home/stdevelop/IdeaProjects/StreamingIntoHive/target/scala-2.10/classes:/root/.sbt/boot/scala-2.10.4/lib/scala-library.jar:/home/stdevelop/SparkDll/spark-assembly-1.5.0-hadoop2.5.2.jar:/home/stdevelop/SparkDll/datanucleus-api-jdo-3.2.6.jar:/home/stdevelop/SparkDll/datanucleus-core-3.2.10.jar:/home/stdevelop/SparkDll/datanucleus-rdbms-3.2.9.jar:/home/stdevelop/SparkDll/mysql-connector-java-5.1.35-bin.jar:/home/stdevelop/idea/lib/idea_rt.jar com.intellij.rt.execution.application.AppMain StreamingIntoHive 10.1.50.68 8080
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    15/09/22 19:43:18 INFO SparkContext: Running Spark version 1.5.0
    15/09/22 19:43:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    15/09/22 19:43:22 INFO SecurityManager: Changing view acls to: root
    15/09/22 19:43:22 INFO SecurityManager: Changing modify acls to: root
    15/09/22 19:43:22 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
    15/09/22 19:43:26 INFO Slf4jLogger: Slf4jLogger started
    15/09/22 19:43:26 INFO Remoting: Starting remoting
    15/09/22 19:43:26 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:58070]
    15/09/22 19:43:26 INFO Utils: Successfully started service 'sparkDriver' on port 58070.
    15/09/22 19:43:26 INFO SparkEnv: Registering MapOutputTracker
    15/09/22 19:43:26 INFO SparkEnv: Registering BlockManagerMaster
    15/09/22 19:43:26 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-e7fdc896-ebd2-4faa-a9fe-e61bd93a9db4
    15/09/22 19:43:26 INFO MemoryStore: MemoryStore started with capacity 797.6 MB
    15/09/22 19:43:27 INFO HttpFileServer: HTTP File server directory is /tmp/spark-fb07a3ad-8077-49a8-bcaf-12254cc90282/httpd-0bb434c9-1418-49b6-a514-90e27cb80ab1
    15/09/22 19:43:27 INFO HttpServer: Starting HTTP Server
    15/09/22 19:43:27 INFO Utils: Successfully started service 'HTTP file server' on port 38865.
    15/09/22 19:43:27 INFO SparkEnv: Registering OutputCommitCoordinator
    15/09/22 19:43:29 INFO Utils: Successfully started service 'SparkUI' on port 4040.
    15/09/22 19:43:29 INFO SparkUI: Started SparkUI at http://10.1.50.68:4040
    15/09/22 19:43:29 INFO SparkContext: Added JAR /home/stdevelop/SparkDll/mysql-connector-java-5.1.35-bin.jar at http://10.1.50.68:38865/jars/mysql-connector-java-5.1.35-bin.jar with timestamp 1442922209496
    15/09/22 19:43:29 INFO SparkContext: Added JAR /home/stdevelop/SparkDll/datanucleus-api-jdo-3.2.6.jar at http://10.1.50.68:38865/jars/datanucleus-api-jdo-3.2.6.jar with timestamp 1442922209498
    15/09/22 19:43:29 INFO SparkContext: Added JAR /home/stdevelop/SparkDll/datanucleus-rdbms-3.2.9.jar at http://10.1.50.68:38865/jars/datanucleus-rdbms-3.2.9.jar with timestamp 1442922209534
    15/09/22 19:43:29 INFO SparkContext: Added JAR /home/stdevelop/SparkDll/datanucleus-core-3.2.10.jar at http://10.1.50.68:38865/jars/datanucleus-core-3.2.10.jar with timestamp 1442922209564
    15/09/22 19:43:30 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
    15/09/22 19:43:30 INFO AppClient$ClientEndpoint: Connecting to master spark://10.1.50.71:7077...
    15/09/22 19:43:32 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20150922062654-0004
    15/09/22 19:43:32 INFO AppClient$ClientEndpoint: Executor added: app-20150922062654-0004/0 on worker-20150921191458-10.1.50.71-44716 (10.1.50.71:44716) with 1 cores
    15/09/22 19:43:32 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150922062654-0004/0 on hostPort 10.1.50.71:44716 with 1 cores, 1024.0 MB RAM
    15/09/22 19:43:32 INFO AppClient$ClientEndpoint: Executor added: app-20150922062654-0004/1 on worker-20150921191456-10.1.50.73-36446 (10.1.50.73:36446) with 1 cores
    15/09/22 19:43:32 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150922062654-0004/1 on hostPort 10.1.50.73:36446 with 1 cores, 1024.0 MB RAM
    15/09/22 19:43:32 INFO AppClient$ClientEndpoint: Executor added: app-20150922062654-0004/2 on worker-20150921191456-10.1.50.72-53999 (10.1.50.72:53999) with 1 cores
    15/09/22 19:43:32 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150922062654-0004/2 on hostPort 10.1.50.72:53999 with 1 cores, 1024.0 MB RAM
    15/09/22 19:43:32 INFO AppClient$ClientEndpoint: Executor updated: app-20150922062654-0004/1 is now LOADING
    15/09/22 19:43:32 INFO AppClient$ClientEndpoint: Executor updated: app-20150922062654-0004/0 is now LOADING
    15/09/22 19:43:32 INFO AppClient$ClientEndpoint: Executor updated: app-20150922062654-0004/2 is now LOADING
    15/09/22 19:43:32 INFO AppClient$ClientEndpoint: Executor updated: app-20150922062654-0004/0 is now RUNNING
    15/09/22 19:43:32 INFO AppClient$ClientEndpoint: Executor updated: app-20150922062654-0004/1 is now RUNNING
    15/09/22 19:43:32 INFO AppClient$ClientEndpoint: Executor updated: app-20150922062654-0004/2 is now RUNNING
    15/09/22 19:43:33 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 60161.
    15/09/22 19:43:33 INFO NettyBlockTransferService: Server created on 60161
    15/09/22 19:43:33 INFO BlockManagerMaster: Trying to register BlockManager
    15/09/22 19:43:33 INFO BlockManagerMasterEndpoint: Registering block manager 10.1.50.68:60161 with 797.6 MB RAM, BlockManagerId(driver, 10.1.50.68, 60161)
    15/09/22 19:43:33 INFO BlockManagerMaster: Registered BlockManager
    15/09/22 19:43:34 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
    15/09/22 19:43:35 INFO SparkContext: Added JAR /home/stdevelop/Builds/streamingintohive.jar at http://10.1.50.68:38865/jars/streamingintohive.jar with timestamp 1442922215169
    15/09/22 19:43:39 INFO SparkDeploySchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://[email protected]:40110/user/Executor#-132020084]) with ID 2
    15/09/22 19:43:39 INFO SparkDeploySchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://[email protected]:38248/user/Executor#-1615730727]) with ID 0
    15/09/22 19:43:40 INFO BlockManagerMasterEndpoint: Registering block manager 10.1.50.72:37819 with 534.5 MB RAM, BlockManagerId(2, 10.1.50.72, 37819)
    15/09/22 19:43:40 INFO BlockManagerMasterEndpoint: Registering block manager 10.1.50.71:48028 with 534.5 MB RAM, BlockManagerId(0, 10.1.50.71, 48028)
    15/09/22 19:43:42 INFO HiveContext: Initializing execution hive, version 1.2.1
    15/09/22 19:43:42 INFO ClientWrapper: Inspected Hadoop version: 2.5.2
    15/09/22 19:43:42 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.5.2
    15/09/22 19:43:42 INFO SparkDeploySchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://[email protected]:56385/user/Executor#1871695565]) with ID 1
    15/09/22 19:43:43 INFO BlockManagerMasterEndpoint: Registering block manager 10.1.50.73:43643 with 534.5 MB RAM, BlockManagerId(1, 10.1.50.73, 43643)
    15/09/22 19:43:45 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
    15/09/22 19:43:45 INFO ObjectStore: ObjectStore, initialize called
    15/09/22 19:43:47 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
    15/09/22 19:43:47 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
    15/09/22 19:43:47 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
    15/09/22 19:43:48 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
    15/09/22 19:43:58 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
    15/09/22 19:44:03 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
    15/09/22 19:44:03 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
    15/09/22 19:44:10 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
    15/09/22 19:44:10 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
    15/09/22 19:44:12 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
    15/09/22 19:44:12 INFO ObjectStore: Initialized ObjectStore
    15/09/22 19:44:13 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
    15/09/22 19:44:14 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
    15/09/22 19:44:15 INFO HiveMetaStore: Added admin role in metastore
    15/09/22 19:44:15 INFO HiveMetaStore: Added public role in metastore
    15/09/22 19:44:16 INFO HiveMetaStore: No user is added in admin role, since config is empty
    15/09/22 19:44:16 INFO HiveMetaStore: 0: get_all_databases
    15/09/22 19:44:16 INFO audit: ugi=root  ip=unknown-ip-addr  cmd=get_all_databases   
    15/09/22 19:44:17 INFO HiveMetaStore: 0: get_functions: db=default pat=*
    15/09/22 19:44:17 INFO audit: ugi=root  ip=unknown-ip-addr  cmd=get_functions: db=default pat=* 
    15/09/22 19:44:17 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
    15/09/22 19:44:18 INFO SessionState: Created local directory: /tmp/9ee94679-df51-46bc-bf6f-66b19f053823_resources
    15/09/22 19:44:18 INFO SessionState: Created HDFS directory: /tmp/hive/root/9ee94679-df51-46bc-bf6f-66b19f053823
    15/09/22 19:44:18 INFO SessionState: Created local directory: /tmp/root/9ee94679-df51-46bc-bf6f-66b19f053823
    15/09/22 19:44:18 INFO SessionState: Created HDFS directory: /tmp/hive/root/9ee94679-df51-46bc-bf6f-66b19f053823/_tmp_space.db
    15/09/22 19:44:19 INFO HiveContext: default warehouse location is /user/hive/warehouse
    15/09/22 19:44:19 INFO HiveContext: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
    15/09/22 19:44:19 INFO ClientWrapper: Inspected Hadoop version: 2.5.2
    15/09/22 19:44:19 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.5.2
    15/09/22 19:44:22 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    15/09/22 19:44:22 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
    15/09/22 19:44:22 INFO ObjectStore: ObjectStore, initialize called
    15/09/22 19:44:23 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
    15/09/22 19:44:23 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
    15/09/22 19:44:23 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
    15/09/22 19:44:25 WARN HiveMetaStore: Retrying creating default database after error: Error creating transactional connection factory
    javax.jdo.JDOFatalInternalException: Error creating transactional connection factory
        at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:587)
        at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:788)
        at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333)
        at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960)
        at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166)
        at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
        at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
        at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:365)
        at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:394)
        at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:291)
        at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:258)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
        at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:57)
        at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:593)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:571)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:199)
        at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
        at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005)
        at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024)
        at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1234)
        at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174)
        at org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:166)
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
        at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:171)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.apache.spark.sql.hive.client.IsolatedClientLoader.liftedTree1$1(IsolatedClientLoader.scala:183)
        at org.apache.spark.sql.hive.client.IsolatedClientLoader.<init>(IsolatedClientLoader.scala:179)
        at org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:227)
        at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:186)
        at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:393)
        at org.apache.spark.sql.hive.HiveContext.defaultOverrides(HiveContext.scala:175)
        at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:178)
        at StreamingIntoHive$.main(StreamingIntoHive.scala:42)
        at StreamingIntoHive.main(StreamingIntoHive.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
    NestedThrowablesStackTrace:
    java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
        at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:325)
        at org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:282)
        at org.datanucleus.store.AbstractStoreManager.<init>(AbstractStoreManager.java:240)
        at org.datanucleus.store.rdbms.RDBMSStoreManager.<init>(RDBMSStoreManager.java:286)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
        at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301)
        at org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1187)
        at org.datanucleus.NucleusContext.initialise(NucleusContext.java:356)
        at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:775)
        at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333)
        at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960)
        at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166)
        at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
        at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
        at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:365)
        at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:394)
        at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:291)
        at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:258)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
        at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:57)
        at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:593)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:571)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:199)
        at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
        at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005)
        at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024)
        at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1234)
        at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174)
        at org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:166)
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
        at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:171)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.apache.spark.sql.hive.client.IsolatedClientLoader.liftedTree1$1(IsolatedClientLoader.scala:183)
        at org.apache.spark.sql.hive.client.IsolatedClientLoader.<init>(IsolatedClientLoader.scala:179)
        at org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:227)
        at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:186)
        at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:393)
        at org.apache.spark.sql.hive.HiveContext.defaultOverrides(HiveContext.scala:175)
        at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:178)
        at StreamingIntoHive$.main(StreamingIntoHive.scala:42)
        at StreamingIntoHive.main(StreamingIntoHive.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
    Caused by: java.lang.OutOfMemoryError: PermGen space
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
        at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.doLoadClass(IsolatedClientLoader.scala:165)
        at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.loadClass(IsolatedClientLoader.scala:153)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at org.datanucleus.store.rdbms.connectionpool.DBCPConnectionPoolFactory.createConnectionPool(DBCPConnectionPoolFactory.java:59)
        at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238)
        at org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131)
        at org.datanucleus.store.rdbms.ConnectionFactoryImpl.<init>(ConnectionFactoryImpl.java:85)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
        at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:325)
        at org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:282)
        at org.datanucleus.store.AbstractStoreManager.<init>(AbstractStoreManager.java:240)
    ………………
    ………………
    Process finished with exit code 1
    

    =============== Can anyone help me to find out the reason? Thanks.

    • Shirui Tang
      Shirui Tang over 8 years
      Question Solved: this is because spark trying identify the database type of metastore_db however failed to read the "hive-site.xml" put in conf directory, therefore using the default DbType which is derby instead of the correct type mysql. You'll have to open the driver UI on the driver machine(port 4040), then find Classpath Entries. Within its directory copy the "hive-site.xml" in it and then make sure every machine in the cluster have "hive-site.xml" within the same directory. hive.metastore.uris need to be set.
    • zero323
      zero323 over 8 years
      If you solved this problem by yourself please post it as an answer and accept it. There is no reason to keep this question open.
  • Shirui Tang
    Shirui Tang over 8 years
    Actually I created metastore_db manually so if spark connects mysql, it will pass the direct sql DbType test. However since it didn't pass the mysql direct sql therefore spark used derby as default metastore db as what was shown in the log4j. This implies spark was not connected to metastore_db in mysql although hive-site.xml was correctly configured. I found a solution for this which is described in the question comment. Hope it helps.