Zeppelin: How to restart sparkContext in zeppelin

17,734

Solution 1

You can restart the interpreter for the notebook in the interpreter bindings (gear in upper right hand corner) by clicking on the restart icon to the left of the interpreter in question (in this case it would be the spark interpreter).

https://i.stack.imgur.com/MAm7a.png

Solution 2

While working with Zeppelin and Spark I also stumbled upon the same problem and made some investigations. After some time, my first conclusion was that:

  • Stopping the SparkContext can be accomplished by using sc.stop() in a paragraph
  • Restarting the SparkContext only works by using the UI (Menu -> Interpreter -> Spark Interpreter -> click on restart button)

However, since the UI allows restarting the Spark Interpreter via a button press, why not just reverse engineer the API call of the restart button! The result was, that restarting the Spark Interpreter sends the following HTTP request:

PUT http://localhost:8080/api/interpreter/setting/restart/spark

Fortunately, Zeppelin has the ability to work with multiple interpreters, where one of them is also a shell Interpreter. Therefore, i created two paragraphs:

The first paragraph was for stopping the SparkContext whenever needed:

%spark
// stop SparkContext
sc.stop()

The second paragraph was for restarting the SparkContext programmatically:

%sh
# restart SparkContext
curl -X PUT http://localhost:8080/api/interpreter/setting/restart/spark

After stopping and restarting the SparkContext with the two paragraphs, I run another paragraph to check if restarting worked...and it worked! So while this is no official solution and more of a workaround, it is still legit as we do nothing else than "pressing" the restart button within a paragraph!

Zeppelin version: 0.8.1

Solution 3

I'm investigated the problem why sc stop in spark in yarn-client. I find that it's the problem of spark itself(Spark version >=1.6). In spark client mode, the AM connect to the Driver via RPC connection, there are two connections. It setup NettyRpcEndPointRef to connect to the driver's service 'YarnSchedulerBackEnd' of server 'SparkDriver', and other another connection is EndPoint 'YarnAM'.

In these RPC connections between AM and Driver ,there are no heartbeats. So the only way AM know the Driver is connectted or not is that the OnDisconnected method in EndPoint 'YarnAM'. The disconnect message of driver and AM connetcion though NettyRpcEndPointRef will 'postToAll' though RPCHandler to the EndPoint 'YarnAM'. When the TCP connetion between them disconnected, or keep alive message find the tcp not alive(2 hours maybe in Linux system), it will mark the application SUCCESS.

So when the Driver Monitor Process find the yarn application state change to SUCCESS, it will stop the sc.

So the root cause is that , in Spark client, there are no retry connect to the driver to check the driver is live or not,but just mark the yarn application as quick as possible.Maybe Spark can modify this issue.

Share:
17,734
eatSleepCode
Author by

eatSleepCode

SOreadytohelp

Updated on June 19, 2022

Comments

  • eatSleepCode
    eatSleepCode almost 2 years

    I am using Isolated mode of zeppelins spark interpreter, with this mode it will start a new job for each notebook in spark cluster. I want to kill the job via zeppelin when the notebook execution is completed. For this I did sc.stop this stopped the sparkContext and the job is also stopped from spark cluster. But next time when I try to run the notebook its not starting the sparkContext again. So how to do that?

  • conradlee
    conradlee almost 4 years
    This didn't work for me on EMR. When running the curl command via the shell interpreter, I got the following error: curl: (7) Failed to connect to localhost port 8080: Connection refused