Difference when serializing a lazy val with or without @transient

13,853

see here - http://fdahms.com/2015/10/14/scala-and-the-transient-lazy-val-pattern/

In Scala lazy val denotes a field that will only be calculated once it is accessed for the first time and is then stored for future reference. With @transient on the other hand one can denote a field that shall not be serialized.

Share:
13,853

Related videos on Youtube

Hao Ren
Author by

Hao Ren

Data Engineer @ leboncoin.fr Paris, France

Updated on May 31, 2020

Comments

  • Hao Ren
    Hao Ren about 4 years

    Working on spark, sometimes I need to send a non-serializable object in each task.

    A common pattern is @transient lazy val, e.g

    class A(val a: Int)
    
    def compute(rdd: RDD[Int]) = {
      // lazy val instance = {
      @transient lazy val instance = {
        println("in lazy object")
        new A(1)
      }
      val res = rdd.map(instance.a + _).count()
      println(res)
    }
    
    compute(sc.makeRDD(1 to 100, 8))
    

    I found that @transient is not necessary here. lazy val can already create the non-serializable upon each task is executed. But people suggest using @transient.

    1. What is the advantage, if we set @transient on the non-initialized lazy val when serializing it ?

    2. Does it make sense to make a non-initialized val transient for serialization, knowing that nothing will be serialized, just like in the example above ?

    3. How is a @transient lazy val serialized ? Is it treated as a method or something else ?

    Some details on serializing @transient lazy val and the compiled java bytecode is awesome.