How to make shark/spark clear the cache?

hadoop hive apache-spark shark-sql

64,773

Solution 1

To remove all cached data:

sqlContext.clearCache()

Source: https://spark.apache.org/docs/2.0.1/api/java/org/apache/spark/sql/SQLContext.html

If you want to remove an specific Dataframe from cache:

df.unpersist()

Solution 2

Are you using the cache() method to persist RDDs?

cache() just calls persist(), so to remove the cache for an RDD, call unpersist().

Solution 3

This is weird. The questions asked has nothing to do the answers. The cache OP posted is owned by operation system and has nothing to do with spark. It is an optimization of the OS and we shouldn't be worried about that particular cache.

And spark cache is usually in memory, but that will be in the RSS section, not the cache section of the OS.

Solution 4

I followed this one and it worked fine for me ::

for ((k,v) <- sc.getPersistentRDDs) {
   v.unpersist()
}

sc.getPersistentRDDs is a Map which stores the details of the cached data.

scala> sc.getPersistentRDDs

res48: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map()

View more solutions

64,773

venkat

Updated on November 26, 2021

Comments

venkat over 2 years

when i run my shark queries, the memory gets hoarded in the main memory This is my top command result.

Mem: 74237344k total, 70080492k used, 4156852k free, 399544k buffers Swap: 4194288k total, 480k used, 4193808k free, 65965904k cached

this doesn't change even if i kill/stop shark,spark, hadoop processes. Right now, the only way to clear the cache is to reboot the machine.

has anyone faced this issue before? is it some configuration problem or a known issue in spark/shark?
- fabrizioM about 10 years
  
  what version of shark / spark are you using ?
- Henrique Florencio about 6 years
  
  Possible duplicate of How to uncache RDD?