How to make shark/spark clear the cache?

64,773

Solution 1

To remove all cached data:

sqlContext.clearCache()

Source: https://spark.apache.org/docs/2.0.1/api/java/org/apache/spark/sql/SQLContext.html

If you want to remove an specific Dataframe from cache:

df.unpersist()

Solution 2

Are you using the cache() method to persist RDDs?

cache() just calls persist(), so to remove the cache for an RDD, call unpersist().

Solution 3

This is weird. The questions asked has nothing to do the answers. The cache OP posted is owned by operation system and has nothing to do with spark. It is an optimization of the OS and we shouldn't be worried about that particular cache.

And spark cache is usually in memory, but that will be in the RSS section, not the cache section of the OS.

Solution 4

I followed this one and it worked fine for me ::

for ((k,v) <- sc.getPersistentRDDs) {
   v.unpersist()
}

sc.getPersistentRDDs is a Map which stores the details of the cached data.

scala> sc.getPersistentRDDs

res48: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map()

Share:
64,773

Related videos on Youtube

venkat
Author by

venkat

Updated on November 26, 2021

Comments

  • venkat
    venkat over 2 years

    when i run my shark queries, the memory gets hoarded in the main memory This is my top command result.


    Mem: 74237344k total, 70080492k used, 4156852k free, 399544k buffers Swap: 4194288k total, 480k used, 4193808k free, 65965904k cached


    this doesn't change even if i kill/stop shark,spark, hadoop processes. Right now, the only way to clear the cache is to reboot the machine.

    has anyone faced this issue before? is it some configuration problem or a known issue in spark/shark?

    • fabrizioM
      fabrizioM about 10 years
      what version of shark / spark are you using ?
    • Henrique Florencio
      Henrique Florencio about 6 years
      Possible duplicate of How to uncache RDD?