Is it really my job to clean up ThreadLocal resources when classes have been exposed to a thread pool?

17,600

Solution 1

Sigh, this is old news

Well, a bit late to the party on this one. In October 2007, Josh Bloch (co-author of java.lang.ThreadLocal along with Doug Lea) wrote:

"The use of thread pools demands extreme care. Sloppy use of thread pools in combination with sloppy use of thread locals can cause unintended object retention, as has been noted in many places."

People were complaining about the bad interaction of ThreadLocal with thread pools even then. But Josh did sanction:

"Per-thread instances for performance. Aaron's SimpleDateFormat example (above) is one example of this pattern."

Some Lessons

  1. If you put any kind of objects into any object pool, you must provide a way to remove them 'later'.
  2. If you 'pool' using a ThreadLocal, you have limited options for doing that. Either: a) you know that the Thread(s) where you put values will terminate when your application is finished; OR b) you can later arrange for same thread that invoked ThreadLocal#set() to invoke ThreadLocal#remove() whenever your application terminates
  3. As such, your use of ThreadLocal as an object pool is going to exact a heavy price on the design of your application and your class. The benefits don't come for free.
  4. As such, use of ThreadLocal is probably a premature optimization, even though Joshua Bloch urged you to consider it in 'Effective Java'.

In short, deciding to use a ThreadLocal as a form of fast, uncontended access to "per thread instance pools" is not a decision to be taken lightly.

NOTE: There are other uses of ThreadLocal other than 'object pools', and these lessons do not apply to those scenarios where the ThreadLocal is only intended to be set on a temporary basis anyway, or where there is genuine per-thread state to keep track of.

Consequences for Library implementors

Threre are some consequences for library implementors (even where such libraries are simple utility classes in your project).

Either:

  1. You use ThreadLocal, fully aware that you might 'pollute' long-running threads with extra baggage. If you are implementing java.util.concurrent.ThreadLocalRandom, it might be appropriate. (Tomcat might still whinge at users of your library, if you aren't implementing in java.*). It's interesting to note the discipline with which java.* makes sparing use of the ThreadLocal technique.

OR

  1. You use ThreadLocal, and give clients of your class/package: a) the opportunity to choose to forego that optimization ("don't use ThreadLocal ... I can't arrange for cleanup"); AND b) a way to clean up ThreadLocal resources ("it's OK to use ThreadLocal ... I can arrange for all Threads which used you to invoke LibClass.releaseThreadLocalsForThread() when I am finished with them.

Makes your library 'hard to use properly', though.

OR

  1. You give your clients the opportunity to supply their own object-pool impelementation (which might use ThreadLocal, or synchronization of some sort). ("OK, I can give you a new ExpensiveObjectFactory<T>() { public T get() {...} } if you think it is really neccesasry".

Not so bad. If the object are really that important and that expensive to create, explicit pooling is probably worthwhile.

OR

  1. You decide it's not worth that much to your app anyway, and find a different way to approach the problem. Those expensive-to-create, mutable, non-threadsafe objects are causing you pain ... is using them really the best option anyway?

Alternatives

  1. Regular object pooling, with all its contended synchronization.
  2. Not pooling objects - just instantiate them in a local scope and discard later.
  3. Not pooling threads (unless you can schedule cleanup code when you like) - don't use your stuff in a JaveEE container
  4. Thread pools which are smart enough to clean-up ThreadLocals without whinging at you.
  5. Thread pools which allocate Threads on a 'per application' basis, and then let them die when the application is stopped.
  6. A protocol between thread-pooling containers and applications which allowed registration of a 'application shutdown handler', which the container could schedule to run on Threads which had been used to service the application ... at some point in the future when that Thread was next available. Eg. servletContext.addThreadCleanupHandler(new Handler() {@Override cleanup() {...}})

It'd be nice to see some standardisation around the last 3 items, in future JavaEE specs.

Bootnote

Actually, instantiation of a GregorianCalendar is pretty lightweight. It's the unavoidable call to setTime() which incurs most of the work. It also doesn't hold any significant state between different points of a thread's exeuction. Putting a Calendar into a ThreadLocal is unlikely to give you back more than it costs you ... unless profiling definitely shows a hot spot in new GregorianCalendar().

new SimpleDateFormat(String) is expensive by comparison, because it has to parse the format string. Once parsed, the 'state' of the object is significant to later uses by the same thread. It's a better fit. But it might still be 'less expensive' to instantiate a new one, than give your classes extra responsibilities.

Solution 2

Since the thread was not created by you, it only was rented by you, I think it is fair to require to clean it before stop using - just as you fills up the tank of a rented car when returning. Tomcat could just clean everything itself, but it does you a favor, reminding of forgot things.

ADD: The way you use prepared GregorianCalendar is simply wrong: since service requests can be concurrent, and there is no synchronization, doCalc can take getTime ater setTime invoked by another request. Introducing synchronization would make things slow, so that creating a new GregorianCalendar could be a better option.

In other words, your question should be: how to keep pool of prepared GregorianCalendar instances so that its number is adjusted to request rate. So as a minimum, you need a singleton which contains that pool. Each Ioc container has means to manage a singleton, and most have ready object pool implementations. If you do not yet use an IoC container, start to use one (String, Guice), rather than reinvent the wheel.

Solution 3

If its any help I use a custom SPI (an interface) and the JDK ServiceLoader. Then all of my various internal libraries (jars) that need to do unloading of threadlocals just follow the ServiceLoader pattern. So if a jar needs threadlocal cleanup it will automatically get picked if it has the appropriate /META-INF/services/interface.name.

Then I do the unloading in a filter or listener (I had some issues with listeners but I can't recall what).

It would be ideal if the the JDK/JEE came with a standard SPI for clearing threadlocals.

Solution 4

After thinking about this for a year, I've decided it is not acceptable for a JavaEE container to share pooled worker threads between instances of un-related applications. This is not 'enterprise' at all.

If you're really going to share threads around, java.lang.Thread (in a JavaEE environment, at least) should support methods like setContextState(int key) and forgetContextState(int key) (mirroring setClasLoaderContext()), which permit the container to isolate application-specific ThreadLocal state, as it hands the thread between various applications.

Pending such modifications in the java.lang namespace, it is only sensible for application deployers to adopt a 'one thread-pool, one instance of related applications' rule, and for application developers to assume that 'this thread is mine until ThreadDeath us do part'.

Share:
17,600

Related videos on Youtube

David Bullock
Author by

David Bullock

Updated on June 02, 2022

Comments

  • David Bullock
    David Bullock almost 2 years

    My use of ThreadLocal

    In my Java classes, I sometimes make use of a ThreadLocal mainly as a means of avoiding unnecessary object creation:

    @net.jcip.annotations.ThreadSafe
    public class DateSensitiveThing {
    
        private final Date then;
    
        public DateSensitiveThing(Date then) {
            this.then = then;
        }
    
        private static final ThreadLocal<Calendar> threadCal = new ThreadLocal<Calendar>()   {
            @Override
            protected Calendar initialValue() {
                return new GregorianCalendar();
            }
        };
    
        public Date doCalc(int n) {
            Calendar c = threadCal.get();
            c.setTime(this.then):
            // use n to mutate c
            return c.getTime();
        }
    }
    

    I do this for the proper reason - GregorianCalendar is one of those gloriously stateful, mutable, non-threadsafe objects, which provides a service across multiple calls, rather than representing a value. Further, it is considered to be 'expensive' to instantiate (whether this is true or not is not the point of this question). (Overall, I really admire it :-))

    How Tomcat Whinges

    However, if I use such a class in any environment which pools threads - and where my application is not in control of the lifecycle of those threads - then there is the potential for memory leaks. A Servlet environment is an good example.

    In fact, Tomcat 7 whinges like so when a webapp is stopped:

    SEVERE: The web application [] created a ThreadLocal with key of type [org.apache.xmlbeans.impl.store.CharUtil$1] (value [org.apache.xmlbeans.impl.store.CharUtil$1@2aace7a7]) and a value of type [java.lang.ref.SoftReference] (value [java.lang.ref.SoftReference@3d9c9ad4]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak. Dec 13, 2012 12:54:30 PM org.apache.catalina.loader.WebappClassLoader checkThreadLocalMapForLeaks

    (Not even my code doing it, in that particular case).

    Who is to blame?

    This hardly seems fair. Tomcat is blaming me (or the user of my class) for doing the right thing.

    Ultimately, that's because Tomcat wants to reuse threads it offered to me, for other web apps. (Ugh - I feel dirty.) Probably, it's not a great policy on Tomcat's part - because threads actually do have/cause state - don't share 'em between applications.

    However, this policy is at least common, even if it is not desirable. I feel that I'm obliged - as a ThreadLocal user, to provide a way for my class to 'release' the resources which my class has attached to various threads.

    But what to do about it?

    What is the right thing to do here?

    To me, it seems like the servlet engine's thread-reuse policy is at odds with the intent behind ThreadLocal.

    But maybe I should provide a facility to allow users to say "begone, evil thread-specific state associated with this class, even though I am in no position to let the thread die and let GC do its thing?". Is it even possible for me to do this? I mean, it's not like I can arrange for ThreadLocal#remove() to be called on each of the Threads which saw ThreadLocal#initialValue() at some time in the past. Or is there another way?

    Or should I just say to my users "go and get yourself a decent classloader and thread pool implementation"?

    EDIT#1: Clarified how threadCal is used in a vanailla utility class which is unaware of thread lifecycles EDIT#2: Fixed a thread safety issue in DateSensitiveThing

    • schtever
      schtever over 11 years
    • David Bullock
      David Bullock over 11 years
      @schtever Erm, I know that one certainly would not use a ThreadLocal them to store per-request information in a Servlet. However, there are other reasons to use them, and they still interact poorly with servlet engines. My question is about what to do about it, not whether it should be done.
  • David Bullock
    David Bullock over 11 years
    Any ideas what my DateSensitiveThing could actually do to clean up?
  • David Bullock
    David Bullock over 11 years
    The afterExecution hook would make sense when I am in control of starting and stopping the Thread. But when I'm the tenant in a servlet container, I don't (nor should I) have any control over the lifecycle of threads in the pool. Code which uses a ThreadLocal presumes that the currently-executing thread is owned by the currently-executing application. In a servlet container, that's unfortunately not true.
  • David Bullock
    David Bullock over 11 years
    OK, I fixed the code so that it is really thread-safe now (multiple threads in doCalc are fine now). You have a good point though - I'm basically using 'ThreadLocal' as a simple (efficient, low-contention) object pool, but I'm not willing to 'pay the price' and manage the lifecycle of objects I'm putting into that pool.
  • Adam Gent
    Adam Gent about 9 years
    The only way you could achieve what you want is for Tomcat to have a separate thread pool for each web app. I don't know of a single servlet container that does this nor do I think its a good idea especially considering that hot deploy and multi-war deployments are generally dead (ie spring-boot and dropwizard uber jars are becoming the norm). Its also pretty much impossible in general because the request has to be initial processed to figure out which webapp (war) to dispatch to...
  • David Bullock
    David Bullock about 9 years
    @Adam Yep, the entire notion of hosting more than 1 app in an app-container is simply not something that Java EE is able to deliver with any kind of resource-usage-safety. (Which is a blight on the entire concept that Java EE was originally marketed with). Tomcat's approach (detecting the leak and bleeding the threadpool over time is about all a container can really do about it.
  • David Bullock
    David Bullock about 9 years
    In the end, I think the implications are that library developers should offer an API for cleaning up threads which were previously presented to the library but the user now wants 'cleaned up'. And future editions of the Java EE specification should provide that application containers should offer a shutdown hook such that 'all the threads previously presented to this app' are presented to the app itself for cleanup.
  • Adam Gent
    Adam Gent about 9 years
    I did exactly that for my own internal libs. See my answer: stackoverflow.com/a/28945239/318174
  • David Bullock
    David Bullock about 9 years
    OK, I understand how using ServiceLoader can help with obtaining a ThreadLocalScrubberService and if library authors committed to providing one, then framework authors could make use of it. (To that end, would you mind posting your service Interface?) What I don't understand is how you as an application author are able to coax the servlet container to schedule the cleanup work on the threads which which need cleaning? Or do you assiduously scrub after each HTTP request, condemning the library to re-initialize its ThreadLocals, essentially thwarting performance for safety sake?
  • Adam Gent
    Adam Gent about 9 years
    Bingo on the safetysake... If your using threadlocals for object pooling, cacheing or because your datastructure isn't threadsafe IMO your doing it wrong. You should stop using whatever library does that... with some exceptions being ThreadLocalRandom. We use ThreadLocals for passing context down the request or off the message bus. So the cleaning is really to prevent accidentally context reuse but also happens to prevent your problem.