How to debug a scala based Spark program on Intellij IDEA

15,508

Solution 1

First define environment variable like below

export SPARK_SUBMIT_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=7777 

Then create the Debug configuration in Intellij Idea as follows

Rub-> Edit Configuration -> Click on "+" left top cornor -> Remote -> set port and name

After above configuration run spark application with spark-submit or sbt run and then run debug which is created in configuration. and add checkpoints for debug.

Solution 2

If you're using the scala plugin and have your project configured as an sbt project, it should basically work out of the box.

Go to Run->Edit Configurations... and add your run configuration normally.

Since you have a main class, you probably want to add a new Application configuration.

You can also just click on the blue square icon, to the left of your main code.

Once your run configuration is set up, you can use the Debug feature.

Solution 3

I've run into this when I switch between 2.10 and 2.11. SBT expects the primary object to be in src->main->scala-2.10 or src->main->scala-2.11 depending on your version.

Share:
15,508
lserlohn
Author by

lserlohn

Updated on June 23, 2022

Comments

  • lserlohn
    lserlohn almost 2 years

    I am currently building my development IDE using Intellij IDEA. I followed exactly the same way as http://spark.apache.org/docs/latest/quick-start.html

    build.sbt file

    name := "Simple Project"
    
    version := "1.0"
    
    scalaVersion := "2.11.7"
    
     libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0"
    

    Sample Program File

    import org.apache.spark.SparkContext
    import org.apache.spark.SparkContext._
    import org.apache.spark.SparkConf
    
    object MySpark {
    
        def main(args: Array[String]){
            val logFile = "/IdeaProjects/hello/testfile.txt" 
            val conf = new SparkConf().setAppName("Simple Application")
            val sc = new SparkContext(conf)
            val logData = sc.textFile(logFile, 2).cache()
            val numAs = logData.filter(line => line.contains("a")).count()
            val numBs = logData.filter(line => line.contains("b")).count()
            println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
        }
    }
    

    If I use command line:

    sbt package
    

    and then

    spark-submit --class "MySpark" --master local[4] target/scala-2.11/myspark_2.11-1.0.jar
    

    I am able to generate jar package and spark runs well.

    However, I want to use Intellij IDEA to debug the program in the IDE. How can I setup the configuration, so that if I click "debug", it will automatically generate the jar package and automatically launch the task by executing "spark-submit-" command line.

    I just want everything could be simple as "one click" on the debug button in Intellij IDEA.

    Thanks.

  • lserlohn
    lserlohn over 7 years
    Thanks. I created a sbt project and put the script file there. If I run directly on it, it would say "Exception in thread "main" java.lang.ClassNotFoundException: MySpark" Could you let me know exactly how could I set the parameters?
  • lserlohn
    lserlohn over 7 years
    Thanks for answering. How can I find the port and name of my local spark?
  • lserlohn
    lserlohn over 7 years
    is the port and name "localhost:7077"? I got "Error running Spark: Unable to open debugger port (localhost:7077): java.net.ConnectException "Connection refused""
  • Sandeep Purohit
    Sandeep Purohit over 7 years
    you should not put spark port you can simply put port same as address in SPARK_SUBMIT_OPTS in above SPARK_SUBMIT_OPTS you can see address =7777 so now simply change port in remote configuration and add one checkpoint in ur code now sun spark-sumit and its show msg listening port 7777 then go to itellij and click on debug which u create
  • Krishna Pandey
    Krishna Pandey over 6 years
    I used below to get it working. export SPARK_DAEMON_JAVA_OPTS=-agentlib:jdwp=transport=dt_socket,se‌​rver=y,suspend=y,add‌​ress=<port_no>
  • ecoe
    ecoe about 6 years
    This solution has a full step-by-step tutorial (with pictures) here: bigendiandata.com/…
  • ammills01
    ammills01 almost 6 years
    @Iserlohn when you setup your SparkConf() you have set the master as "local[2]" before you pass your SparkConf into the SparkContext. If your SparkConf were named sparkConf it would look like this: sparkConf.setMaster("local[2]") This only applies to debugging through the IDE. If you leave this in by accident when you deploy your code to the server it will not behave correctly.
  • Dims
    Dims about 3 years
    How to ensure that the source code of Spark installed is the same as used inside JAR?