Synchronizing on String objects in Java

53,945

Solution 1

Without putting my brain fully into gear, from a quick scan of what you say it looks as though you need to intern() your Strings:

final String firstkey = "Data-" + email;
final String key = firstkey.intern();

Two Strings with the same value are otherwise not necessarily the same object.

Note that this may introduce a new point of contention, since deep in the VM, intern() may have to acquire a lock. I have no idea what modern VMs look like in this area, but one hopes they are fiendishly optimised.

I assume you know that StaticCache still needs to be thread-safe. But the contention there should be tiny compared with what you'd have if you were locking on the cache rather than just the key while calling getSomeDataForEmail.

Response to question update:

I think that's because a string literal always yields the same object. Dave Costa points out in a comment that it's even better than that: a literal always yields the canonical representation. So all String literals with the same value anywhere in the program would yield the same object.

Edit

Others have pointed out that synchronizing on intern strings is actually a really bad idea - partly because creating intern strings is permitted to cause them to exist in perpetuity, and partly because if more than one bit of code anywhere in your program synchronizes on intern strings, you have dependencies between those bits of code, and preventing deadlocks or other bugs may be impossible.

Strategies to avoid this by storing a lock object per key string are being developed in other answers as I type.

Here's an alternative - it still uses a singular lock, but we know we're going to need one of those for the cache anyway, and you were talking about 50 threads, not 5000, so that may not be fatal. I'm also assuming that the performance bottleneck here is slow blocking I/O in DoSlowThing() which will therefore hugely benefit from not being serialised. If that's not the bottleneck, then:

  • If the CPU is busy then this approach may not be sufficient and you need another approach.
  • If the CPU is not busy, and access to server is not a bottleneck, then this approach is overkill, and you might as well forget both this and per-key locking, put a big synchronized(StaticCache) around the whole operation, and do it the easy way.

Obviously this approach needs to be soak tested for scalability before use -- I guarantee nothing.

This code does NOT require that StaticCache is synchronized or otherwise thread-safe. That needs to be revisited if any other code (for example scheduled clean-up of old data) ever touches the cache.

IN_PROGRESS is a dummy value - not exactly clean, but the code's simple and it saves having two hashtables. It doesn't handle InterruptedException because I don't know what your app wants to do in that case. Also, if DoSlowThing() consistently fails for a given key this code as it stands is not exactly elegant, since every thread through will retry it. Since I don't know what the failure criteria are, and whether they are liable to be temporary or permanent, I don't handle this either, I just make sure threads don't block forever. In practice you may want to put a data value in the cache which indicates 'not available', perhaps with a reason, and a timeout for when to retry.

// do not attempt double-check locking here. I mean it.
synchronized(StaticObject) {
    data = StaticCache.get(key);
    while (data == IN_PROGRESS) {
        // another thread is getting the data
        StaticObject.wait();
        data = StaticCache.get(key);
    }
    if (data == null) {
        // we must get the data
        StaticCache.put(key, IN_PROGRESS, TIME_MAX_VALUE);
    }
}
if (data == null) {
    // we must get the data
    try {
        data = server.DoSlowThing(key);
    } finally {
        synchronized(StaticObject) {
            // WARNING: failure here is fatal, and must be allowed to terminate
            // the app or else waiters will be left forever. Choose a suitable
            // collection type in which replacing the value for a key is guaranteed.
            StaticCache.put(key, data, CURRENT_TIME);
            StaticObject.notifyAll();
        }
    }
}

Every time anything is added to the cache, all threads wake up and check the cache (no matter what key they're after), so it's possible to get better performance with less contentious algorithms. However, much of that work will take place during your copious idle CPU time blocking on I/O, so it may not be a problem.

This code could be commoned-up for use with multiple caches, if you define suitable abstractions for the cache and its associated lock, the data it returns, the IN_PROGRESS dummy, and the slow operation to perform. Rolling the whole thing into a method on the cache might not be a bad idea.

Solution 2

Synchronizing on an intern'd String might not be a good idea at all - by interning it, the String turns into a global object, and if you synchronize on the same interned strings in different parts of your application, you might get really weird and basically undebuggable synchronization issues such as deadlocks. It might seem unlikely, but when it happens you are really screwed. As a general rule, only ever synchronize on a local object where you're absolutely sure that no code outside of your module might lock it.

In your case, you can use a synchronized hashtable to store locking objects for your keys.

E.g.:

Object data = StaticCache.get(key, ...);
if (data == null) {
  Object lock = lockTable.get(key);
  if (lock == null) {
    // we're the only one looking for this
    lock = new Object();
    synchronized(lock) {
      lockTable.put(key, lock);
      // get stuff
      lockTable.remove(key);
    }
  } else {
    synchronized(lock) {
      // just to wait for the updater
    }
    data = StaticCache.get(key);
  }
} else {
  // use from cache
}

This code has a race condition, where two threads might put an object into the lock table after each other. This should however not be a problem, because then you only have one more thread calling the webservice and updating the cache, which shouldn't be a problem.

If you're invalidating the cache after some time, you should check whether data is null again after retrieving it from the cache, in the lock != null case.

Alternatively, and much easier, you can make the whole cache lookup method ("getSomeDataByEmail") synchronized. This will mean that all threads have to synchronize when they access the cache, which might be a performance problem. But as always, try this simple solution first and see if it's really a problem! In many cases it should not be, as you probably spend much more time processing the result than synchronizing.

Solution 3

Strings are not good candidates for synchronization. If you must synchronize on a String ID, it can be done by using the string to create a mutex (see "synchronizing on an ID"). Whether the cost of that algorithm is worth it depends on whether invoking your service involves any significant I/O.

Also:

  • I hope the StaticCache.get() and set() methods are threadsafe.
  • String.intern() comes at a cost (one that varies between VM implementations) and should be used with care.

Solution 4

Here is a safe short Java 8 solution that uses a map of dedicated lock objects for synchronization:

private static final Map<String, Object> keyLocks = new ConcurrentHashMap<>();

private SomeData[] getSomeDataByEmail(WebServiceInterface service, String email) {
    final String key = "Data-" + email;
    synchronized (keyLocks.computeIfAbsent(key, k -> new Object())) {
        SomeData[] data = StaticCache.get(key);
        if (data == null) {
            data = service.getSomeDataForEmail(email);
            StaticCache.set(key, data);
        }
    }
    return data;
}

It has a drawback that keys and lock objects would retain in map forever.

This can be worked around like this:

private SomeData[] getSomeDataByEmail(WebServiceInterface service, String email) {
    final String key = "Data-" + email;
    synchronized (keyLocks.computeIfAbsent(key, k -> new Object())) {
        try {
            SomeData[] data = StaticCache.get(key);
            if (data == null) {
                data = service.getSomeDataForEmail(email);
                StaticCache.set(key, data);
            }
        } finally {
            keyLocks.remove(key); // vulnerable to race-conditions
        }
    }
    return data;
}

But then popular keys would be constantly reinserted in map with lock objects being reallocated.

Update: And this leaves race condition possibility when two threads would concurrently enter synchronized section for the same key but with different locks.

So it may be more safe and efficient to use expiring Guava Cache:

private static final LoadingCache<String, Object> keyLocks = CacheBuilder.newBuilder()
        .expireAfterAccess(10, TimeUnit.MINUTES) // max lock time ever expected
        .build(CacheLoader.from(Object::new));

private SomeData[] getSomeDataByEmail(WebServiceInterface service, String email) {
    final String key = "Data-" + email;
    synchronized (keyLocks.getUnchecked(key)) {
        SomeData[] data = StaticCache.get(key);
        if (data == null) {
            data = service.getSomeDataForEmail(email);
            StaticCache.set(key, data);
        }
    }
    return data;
}

Note that it's assumed here that StaticCache is thread-safe and wouldn't suffer from concurrent reads and writes for different keys.

Solution 5

Others have suggested interning the strings, and that will work.

The problem is that Java has to keep interned strings around. I was told it does this even if you're not holding a reference because the value needs to be the same the next time someone uses that string. This means interning all the strings may start eating up memory, which with the load you're describing could be a big problem.

I have seen two solutions to this:

You could synchronize on another object

Instead of the email, make an object that holds the email (say the User object) that holds the value of email as a variable. If you already have another object that represents the person (say you already pulled something from the DB based on their email) you could use that. By implementing the equals method and the hashcode method you can make sure Java considers the objects the same when you do a static cache.contains() to find out if the data is already in the cache (you'll have to synchronize on the cache).

Actually, you could keep a second Map for objects to lock on. Something like this:

Map<String, Object> emailLocks = new HashMap<String, Object>();

Object lock = null;

synchronized (emailLocks) {
    lock = emailLocks.get(emailAddress);

    if (lock == null) {
        lock = new Object();
        emailLocks.put(emailAddress, lock);
    }
}

synchronized (lock) {
    // See if this email is in the cache
    // If so, serve that
    // If not, generate the data

    // Since each of this person's threads synchronizes on this, they won't run
    // over eachother. Since this lock is only for this person, it won't effect
    // other people. The other synchronized block (on emailLocks) is small enough
    // it shouldn't cause a performance problem.
}

This will prevent 15 fetches on the same email address at one. You'll need something to prevent too many entries from ending up in the emailLocks map. Using LRUMaps from Apache Commons would do it.

This will need some tweaking, but it may solve your problem.

Use a different key

If you are willing to put up with possible errors (I don't know how important this is) you could use the hashcode of the String as the key. ints don't need to be interned.

Summary

I hope this helps. Threading is fun, isn't it? You could also use the session to set a value meaning "I'm already working on finding this" and check that to see if the second (third, Nth) thread needs to attempt to create the or just wait for the result to show up in the cache. I guess I had three suggestions.

Share:
53,945
matt b
Author by

matt b

Hello, world!

Updated on October 07, 2020

Comments

  • matt b
    matt b over 3 years

    I have a webapp that I am in the middle of doing some load/performance testing on, particularily on a feature where we expect a few hundred users to be accessing the same page and hitting refresh about every 10 seconds on this page. One area of improvement that we found we could make with this function was to cache the responses from the web service for some period of time, since the data is not changing.

    After implementing this basic caching, in some further testing I found out that I didn't consider how concurrent threads could access the Cache at the same time. I found that within the matter of ~100ms, about 50 threads were trying to fetch the object from the Cache, finding that it had expired, hitting the web service to fetch the data, and then putting the object back in the cache.

    The original code looked something like this:

    private SomeData[] getSomeDataByEmail(WebServiceInterface service, String email) {
    
      final String key = "Data-" + email;
      SomeData[] data = (SomeData[]) StaticCache.get(key);
    
      if (data == null) {
          data = service.getSomeDataForEmail(email);
    
          StaticCache.set(key, data, CACHE_TIME);
      }
      else {
          logger.debug("getSomeDataForEmail: using cached object");
      }
    
      return data;
    }
    

    So, to make sure that only one thread was calling the web service when the object at key expired, I thought I needed to synchronize the Cache get/set operation, and it seemed like using the cache key would be a good candidate for an object to synchronize on (this way, calls to this method for email [email protected] would not be blocked by method calls to [email protected]).

    I updated the method to look like this:

    private SomeData[] getSomeDataByEmail(WebServiceInterface service, String email) {
    
      
      SomeData[] data = null;
      final String key = "Data-" + email;
      
      synchronized(key) {      
        data =(SomeData[]) StaticCache.get(key);
    
        if (data == null) {
            data = service.getSomeDataForEmail(email);
            StaticCache.set(key, data, CACHE_TIME);
        }
        else {
          logger.debug("getSomeDataForEmail: using cached object");
        }
      }
    
      return data;
    }
    

    I also added logging lines for things like "before synchronization block", "inside synchronization block", "about to leave synchronization block", and "after synchronization block", so I could determine if I was effectively synchronizing the get/set operation.

    However it doesn't seem like this has worked. My test logs have output like:

    (log output is 'threadname' 'logger name' 'message')  
    http-80-Processor253 jsp.view-page - getSomeDataForEmail: about to enter synchronization block  
    http-80-Processor253 jsp.view-page - getSomeDataForEmail: inside synchronization block  
    http-80-Processor253 cache.StaticCache - get: object at key [[email protected]] has expired  
    http-80-Processor253 cache.StaticCache - get: key [[email protected]] returning value [null]  
    http-80-Processor263 jsp.view-page - getSomeDataForEmail: about to enter synchronization block  
    http-80-Processor263 jsp.view-page - getSomeDataForEmail: inside synchronization block  
    http-80-Processor263 cache.StaticCache - get: object at key [[email protected]] has expired  
    http-80-Processor263 cache.StaticCache - get: key [[email protected]] returning value [null]  
    http-80-Processor131 jsp.view-page - getSomeDataForEmail: about to enter synchronization block  
    http-80-Processor131 jsp.view-page - getSomeDataForEmail: inside synchronization block  
    http-80-Processor131 cache.StaticCache - get: object at key [[email protected]] has expired  
    http-80-Processor131 cache.StaticCache - get: key [[email protected]] returning value [null]  
    http-80-Processor104 jsp.view-page - getSomeDataForEmail: inside synchronization block  
    http-80-Processor104 cache.StaticCache - get: object at key [[email protected]] has expired  
    http-80-Processor104 cache.StaticCache - get: key [[email protected]] returning value [null]  
    http-80-Processor252 jsp.view-page - getSomeDataForEmail: about to enter synchronization block  
    http-80-Processor283 jsp.view-page - getSomeDataForEmail: about to enter synchronization block  
    http-80-Processor2 jsp.view-page - getSomeDataForEmail: about to enter synchronization block  
    http-80-Processor2 jsp.view-page - getSomeDataForEmail: inside synchronization block  
    

    I wanted to see only one thread at a time entering/exiting the synchronization block around the get/set operations.

    Is there an issue in synchronizing on String objects? I thought the cache-key would be a good choice as it is unique to the operation, and even though the final String key is declared within the method, I was thinking that each thread would be getting a reference to the same object and therefore would synchronization on this single object.

    What am I doing wrong here?

    Update: after looking further at the logs, it seems like methods with the same synchronization logic where the key is always the same, such as

    final String key = "blah";
    ...
    synchronized(key) { ...
    

    do not exhibit the same concurrency problem - only one thread at a time is entering the block.

    Update 2: Thanks to everyone for the help! I accepted the first answer about intern()ing Strings, which solved my initial problem - where multiple threads were entering synchronized blocks where I thought they shouldn't, because the key's had the same value.

    As others have pointed out, using intern() for such a purpose and synchronizing on those Strings does indeed turn out to be a bad idea - when running JMeter tests against the webapp to simulate the expected load, I saw the used heap size grow to almost 1GB in just under 20 minutes.

    Currently I'm using the simple solution of just synchronizing the entire method - but I really like the code samples provided by martinprobst and MBCook, but since I have about 7 similar getData() methods in this class currently (since it needs about 7 different pieces of data from a web service), I didn't want to add almost-duplicate logic about getting and releasing locks to each method. But this is definitely very, very valuable info for future usage. I think these are ultimately the correct answers on how best to make an operation like this thread-safe, and I'd give out more votes to these answers if I could!

  • Dave Costa
    Dave Costa over 15 years
    quite right, and the reason it works when you have a constant key is that string literals are automatically intern'd.
  • Steve Jessop
    Steve Jessop over 15 years
    Ah, that answers my vague 'I can't remember' ramble. Thanks.
  • matt b
    matt b over 15 years
    Question - do you mean if I synchronize on the SAME interned string elsewhere in the application?
  • Tim Frey
    Tim Frey over 15 years
    Agreed. Props for pointing out the problem and providing a pretty good solution. Maybe edit this response to include actual code to seal the deal?
  • Steve Jessop
    Steve Jessop over 15 years
    Would it be theft if I incorporated this into my answer, to give a single complete criticism? You deserve the rep points, though, for the most advanced answer, so feel free to turn yours into the all-singing all-dancing.
  • McDowell
    McDowell over 15 years
    Your emailLocks leaks - this approach needs some sort of reference count and release mechanism.
  • McDowell
    McDowell over 15 years
    Calling String.intern() on a web server where the String cache might live indefinitely is not a good idea. Not saying this post is incorrect, but it is not a good solution to the original question.
  • Steve Jessop
    Steve Jessop over 15 years
    "ints don't need to be interned" - but don't have monitors. Your summary hints at the exciting world of wait/notifyAll, which is always a laugh :-)
  • Steve Jessop
    Steve Jessop over 15 years
    @McDowell: I think a WeakHashMap might actually be appropriate here. However, that's such a rare occurrence than I'm hesitant to say any more...
  • Steve Jessop
    Steve Jessop over 15 years
    I agree. I'll mention it, and I've asked martinprobst below whether he wants to do anything about merging the various relevant points.
  • Alexander
    Alexander over 15 years
    You're right, I forgot about ConcurrentHashmap! However, judging by the update of the question, StaticCache doesn't really support concurrent modification.
  • Tim Frey
    Tim Frey over 15 years
    The example here is too compicated. I think it should just show the Map of String -> Mutex, fetching the mutex and synchronizing on it. Not sure what this StaticCache thing is but I don't think it's needed to illustrate the point.
  • palantus
    palantus over 15 years
    The StaticCache is from the question.
  • Jedihomer Townend
    Jedihomer Townend over 15 years
    @McDowell: That's why I said the LRUMap in the comments, it would prevent you from ever having more than say 30 entries. You could also setup a thread to do it.
  • Jedihomer Townend
    Jedihomer Townend over 15 years
    @onebyone: Good point. Integers do, but an Integer doesn't equal an Integer like int, you have to use .intValue() or .equals(). LIke I said, they were quick suggestions. I would have caught that when I noticed the code didn't work :)
  • Steve Jessop
    Steve Jessop over 15 years
    I think I've worked up your third suggestion into a solution for medium-load applications. The code is astonishingly similar to the time I once implemented pthread_once() in C in an embedded implementation of part of POSIX. So I sincerely hope it's not buggy...
  • Steve Jessop
    Steve Jessop over 15 years
    At least, I hope it's similar, because any differences are almost certainly bugs in what I've posted above. It's supposed to be a bog-standard condition variable pattern.
  • Martin Probst
    Martin Probst over 15 years
    I think this is a very nice solution. From an algorithmic standpoint, it's quite similar to the table of locks method, the difference being that you use the static cache as the lock-table, and do not actually store locks but just the flag. So, this is the best solution.
  • Jared
    Jared over 15 years
    Just a note to indicate that synchronizing on a string is ALWAYS a bad idea: there is nothing keeping some random third-party jar on your classpath from synchronizing on the same interned String. That will ALWAYS yield undesired behavior.
  • Steve Jessop
    Steve Jessop about 15 years
    Well, you could not put random third-party jars on your classpath ;-). Agreed, though. My initial answer to the question was of the kind "answer the actual question, but fail to spot a terrible mistake being committed". My code solution avoids the problems.
  • Holger
    Holger over 7 years
    interned strings don’t live forever; that’s a tenacious myth. They get garbage collected just like any other string. However, it’s still not a good idea. As correctly pointed out, the global visibility opens the door to hell. Further, the intern() operation doesn’t come for free, its required global cache synchronization might destroy any benefit that the idea of using (different) strings as synchronization key promised.
  • Steve Jessop
    Steve Jessop over 7 years
    @Holger: since I wrote, "creating intern strings is permitted to cause them to exist in perpetuity", I deduce that my thinking at the time was that it's not guaranteed by the VM spec that interned strings are GCed normally. By all means a particular implementation can do so, and probably should.
  • Holger
    Holger over 7 years
    @Steve Jessop: I would see it the other way round, as long as there is no explicit statement about it in the specification, there is no reason to think that certain objects are allowed to be exempt from garbage collection. But what’s more important, I think, it’s dangerous to name something as first (or main) reason that isn’t an issue on common/real life JVMs. The other reasons are, JVM independently, a thing…
  • Krease
    Krease over 7 years
    Liking this solution a lot - no need to keep a separate in-progress sentinel or separate map of keys to locks - it either contains the Future (another thread is working on it, just call get), or it doesn't (no other threads working on it, do the expensive calculation)
  • Vadzim
    Vadzim over 6 years
    This has the same drawbacks as String.intern with higher probability of getting into trouble.
  • Vadzim
    Vadzim over 6 years
    ConcurrentHashMap can be used for map of locks to avoid the first synchronized section like in my answer.
  • Vadzim
    Vadzim over 6 years
    Take a look at properly synchronized lock table in my answer.
  • Kimball Robinson
    Kimball Robinson about 6 years
    I recommend this answer, using Google Guava's Striped<Lock> type: stackoverflow.com/a/11125602/116810
  • Eric B.
    Eric B. over 5 years
    I'm very interested by your ValueLock object, but confused by it. The way I read it, if Thread 2 calls` lock()` while Thread1 is already doing something, it will wait at the conditions.awaitUninteruptibly(). Which will block T1 from calling the unlock() method since the ReentrantLock is being held by T2 at that point. Am I mis-reading something?
  • igor.zh
    igor.zh over 5 years
    @Eric B, Neither of Threads 1 or 2 of yours holds an inner (Reentrant) lock except for very short period of time while operating Value-to-Condition map. When Thread 1 holds an Value lock it means it holds a lock on certain Value-bound Condition, not on inner (Reentrant) Lock. When Thread 2 awaitsUninterruptibly on this Value-bound Condition it actually releases the inner(Reentrant) Lock. Until it acquires this Condition, then acquires an inner (Reentrant) Lock, puts the Condition into map and immediately releases inner (Reentrant) Lock. Please let me know if I will have to rephrase it
  • knittl
    knittl over 5 years
    You could in theory use the calculated key as lock-object, then you avoid re-allocations of new objects (the key needs to be calculated either way). If a ConcurrentHashMap is used for keyLocks that shouldn't leave room for race conditions? computeIfAbsent would only insert the first one, later attempts would return the identical instance?
  • Vadzim
    Vadzim over 5 years
    @knittl, yes, but it would not be reliable without intern(). And it would suffer of all downsides mentioned in Martin Probst's answer here.
  • knittl
    knittl over 5 years
    why would it not be reliable without interning the string? As far as I see all synchronize block would use the same string instance (because it is put in the map and reused). The string is local, because it is concatenated inside the method.
  • Vadzim
    Vadzim over 5 years
    @knittl, there is no guarantee that subsequent concatenations would return the same object. And there is no guarantee that concatenated string would not be the same as interned instance that may be synchronized in some unrelated place elsewhere.
  • knittl
    knittl over 5 years
    You don't need identical objects. You are using the concatenated strings as key in the hashmap. They provide hash and value equality, so you will find the correct value for equal keys. But having strings auto-interned is a good point! (I doubt it though, because the emitted byte code is usually a regular string builder (or fancy invokedynamic code in newer JDK versions))
  • Vadzim
    Vadzim over 5 years
    @knittl, now I get the point of the first comment of using keyLocks.computeIfAbsent(key, key). But I think that my second snippet would be still vulnerable to race conditions as other thread's keyLocks.remove may sneak between computeIfAbsent and containing synchronized.
  • Bret Royster
    Bret Royster almost 4 years