failing to connect to spark driver when submitting job to spark in yarn mode
Running Spark in Yarn mode (which what I was doing) is the right to use spark in HDP as stated here: https://community.hortonworks.com/questions/52591/standalone-spark-using-ambari.html
which means I should not specify a master or use the start-master / start-slave commands.
The problem was that the driver IP was taken as 0.0.0.0 for some reason and all the cluster nodes were trying to contact the driver using the local interface and thus fail. I fixed this by setting the following configuration in conf/spark-defaults.conf:
spark.driver.port=20002
spark.driver.host=HOST_NAME
and by changing the deploy-mode to client to make it deply the driver locally.
tariq abughofa
Updated on June 22, 2022Comments
-
tariq abughofa almost 2 years
When I submit a spark job to the cluster it failed with the following exeption in the shell:
> Exception in thread "main" org.apache.spark.SparkException: > Application application_1497125798633_0065 finished with failed status > at org.apache.spark.deploy.yarn.Client.run(Client.scala:1244) > at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1290) > at org.apache.spark.deploy.yarn.Client.main(Client.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:750) > at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 17/06/29 10:25:36 INFO ShutdownHookManager: Shutdown hook called
This is what it gives in Yarn logs:
> Caused by: java.io.IOException: Failed to connect to /0.0.0.0:35994 at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:232) > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:182) > at > org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197) > at org.apache.spark.rpc.netty.Outbox$anon$1.call(Outbox.scala:194) at > org.apache.spark.rpc.netty.Outbox$anon$1.call(Outbox.scala:190) at > java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745)
Which I guess means it failed to connect to the driver. I tried to increase "spark.yarn.executor.memoryOverhead" parameter but that didn't work.
This is the submit command I use:
/bin/spark-submit \ --class example.Hello \ --jars ... \ --master yarn \ --deploy-mode cluster \ --supervise \ --conf spark.yarn.driver.memoryOverhead=1024 ...(jar file path)
I am using HDP-2.6.1.0 and spark 2.1.1
-
tariq abughofa almost 7 yearsI tried that and I got this exception: ` 17/06/29 23:35:30 ERROR TransportResponseHandler: Still have 1 requests outstanding when connection from /127.0.0.1:6 066 is closed 17/06/29 23:35:30 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master 127.0.0.1:6066 org.apache.spark.SparkException: Exception thrown in awaitResult Caused by: java.io.IOException: Connection from /127.0.0.1:6066 closed` this is what I got from the master log:
WARN HttpParser: Illegal character 0x0 in state=START for buffer
-
Binita Bharati almost 6 yearsI was having the same issue on Spark 2.3.1 This solution worked in my case, ie adding spark.driver.host property in the conf/spark-defaults.conf