How I know the runtime of a code in scala?

29,527

Solution 1

Based on discussion here, you'll want to use System.nanoTime to measure the elapsed time difference:

val t1 = System.nanoTime

/* your code */

val duration = (System.nanoTime - t1) / 1e9d

Solution 2

Starting from Spark2+ we can use spark.time(<command>)(only in scala until now) to get the time taken to execute the action/transformation..

Example:

Finding count of records in a dataframe

scala> spark.time(
                 sc.parallelize(Seq("foo","bar")).toDF().count() //create df and count
                 )
Time taken: 54 ms //total time for the execution
res76: Long = 2  //count of records

Solution 3

The most basic approach would be to simply record the start time and end time, and do subtraction.

val startTimeMillis = System.currentTimeMillis()

/* your code goes here */

val endTimeMillis = System.currentTimeMillis()
val durationSeconds = (endTimeMillis - startTimeMillis) / 1000

Solution 4

You can use scalameter: https://scalameter.github.io/

Just put your block of code in the brackets:

val executionTime = measure {
  //code goes here
}

You can configure it to warm-up the jvm so the measurements will be more reliable:

val executionTime = withWarmer(new Warmer.Default) measure {
  //code goes here
}

Solution 5

  • Case : Before spark 2.1.0

< Spark 2.1.0 explicitly you can use this function in your code to measure time in milli seconds

/**
   * Executes some code block and prints to stdout the time taken to execute the block. This is
   * available in Scala only and is used primarily for interactive testing and debugging.
   *
   */
  def time[T](f: => T): T = {
    val start = System.nanoTime()
    val ret = f
    val end = System.nanoTime()
     println(s"Time taken: ${(end - start) / 1000 / 1000} ms")
     ret
  }

Usage :

  time {
    Seq("1", "2").toDS().count()
  }
//Time taken: 3104 ms
  • Case : After spark 2.1.0

>= Spark 2.1.0 There is a built in function given in SparkSession

you can use spark.time

Usage :

  spark.time {
    Seq("1", "2").toDS().count()
  }
//Time taken: 3104 ms
Share:
29,527

Related videos on Youtube

David Rebe Garcia
Author by

David Rebe Garcia

Updated on May 20, 2020

Comments

  • David Rebe Garcia
    David Rebe Garcia almost 4 years

    I need to calculate the runtime of a code in scala. The code is.

    val data = sc.textFile("/home/david/Desktop/Datos Entrada/household/household90Parseado.txt")
    
    val parsedData = data.map(s => Vectors.dense(s.split(' ').map(_.toDouble))).cache()
    
    val numClusters = 5
    val numIterations = 10 
    val clusters = KMeans.train(parsedData, numClusters, numIterations)
    

    I need to know the runtime to process this code, the time have to be on seconds.

  • evan.oman
    evan.oman almost 8 years
    I think I read that you want to use System.nanoTime rather than System.currentTimeMillis().
  • cduhn
    cduhn almost 8 years
    If he just wants to measure wall time to the nearest second, I personally wouldn't sweat it. Based on the question itself, I'm guessing the intent isn't to do intense, high-precision profiling.
  • evan.oman
    evan.oman almost 8 years
    From the answer: 'The purpose of nanoTime is to measure elapsed time, and the purpose of currentTimeMillis is to measure wall-clock time. You can't use the one for the other purpose ... You may say, "this doesn't sound like it would ever really matter that much," to which I say, maybe not, but overall, isn't correct code just better than incorrect code? Besides, nanoTime is shorter to type anyway.'
  • cduhn
    cduhn almost 8 years
    I appreciate the thought, but in the scenario described in the question, it really won't matter. If this question about nanoTime vs currentTimeMillis was asked by a junior developer sitting next to me, I'd tell him to not sweat it and worry about more important things.
  • evan.oman
    evan.oman almost 8 years
    Sure, there are more important things, but why not use the more correct version if there is absolutely no cost to switch to it?
  • S12000
    S12000 over 7 years
    As it is a nano second we need to divide it by 1000000000. in 1e9d "d" stands for double. (My aim is just to give some clarifications regarding 1e9d)
  • evan.oman
    evan.oman over 7 years
    Yep, 1e9d is 10^9 as a double. We want a double so that the result will be a double rather than a long (integer division vs double division)
  • evan.oman
    evan.oman over 7 years
    scala> 1 / 3 res0: Int = 0 vs scala> 1 / 3d res1: Double = 0.333...
  • S12000
    S12000 over 7 years
    Thanks evan058. It adds usefull infos.
  • Yeikel
    Yeikel over 5 years
    What about using timeunit like TimeUnit.NANOSECONDS.toMinutes(total) instead of that manual division?
  • user3190018
    user3190018 almost 4 years
    clear example for both lower and higher version of spark with scala
  • panc
    panc over 3 years
    can I use spark.time to time a model training and save the model ?

Related