Optimized implementations of java.util.Map and java.util.Set?

12,637

Solution 1

Normally these methods are pretty quick. There are a couple of things you should check: are your hash codes implemented? Are they sufficiently uniform? Otherwise you'll get rubbish performance.

http://trove4j.sourceforge.net/ <-- this is a bit quicker and saves some memory. I saved a few ms on 50,000 updates

Are you sure that you're using maps/sets correctly? i.e. not trying to iterate over all the values or something similar. Also, e.g. don't do a contains and then a remove. Just check the remove.

Also check if you're using Double vs double. I noticed a few ms performance improvements on ten's of thousands of checks.

Have you also set up the initial capacity correctly/appropriately?

Solution 2

Have you looked at Trove4J ? From the website:

Trove aims to provide fast, lightweight implementations of the java.util.Collections API.

Benchmarks provided here.

Solution 3

Here are the ones I know, in addition to Google and Commons Collections:

Of course you can always implement your own data structures which are optimized for your use cases. To be able to help better, we would need to know you access patterns and what kind of data you store in the collections.

Solution 4

Try improving the performance of your equals and hashCode methods, this could help speed up the standard containers use of your objects.

Solution 5

You can extend AbstractMap and/or AbstractSet as a starting point. I did this not too long ago to implement a binary trie based map (the key was an integer, and each "level" on the tree was a bit position. left child was 0 and right child was 1). This worked out well for us because the key was EUI-64 identifiers, and for us most of the time the top 5 bytes were going to be the same.

To implement an AbstractMap, you need to at the very least implement the entrySet() method, to return a set of Map.Entry, each of which is a key/value pair.

To implement a set, you extend AbstractSet and supply implementations of size() and iterator().

That's at the very least, however. You will want to also implement get and put, since the default map is unmodifiable, and the default implementation of get iterates through the entrySet looking for a match.

Share:
12,637
Sean Owen
Author by

Sean Owen

Data Science @ Databricks

Updated on June 16, 2022

Comments

  • Sean Owen
    Sean Owen almost 2 years

    I am writing an application where memory, and to a lesser extent speed, are vital. I have found from profiling that I spend a great deal of time in Map and Set operations. While I look at ways to call these methods less, I am wondering whether anyone out there has written, or come across, implementations that significantly improve on access time or memory overhead? or at least, that can improve these things given some assumptions?

    From looking at the JDK source I can't believe that it can't be made faster or leaner.

    I am aware of Commons Collections, but I don't believe it has any implementation whose goal is to be faster or leaner. Same for Google Collections.

    Update: Should have noted that I do not need thread safety.

  • brady
    brady about 15 years
  • brady
    brady about 15 years
    Commons Collections doesn't support generics and is old. Google Collections has been through a lot of scrutiny by a lot of smart people. I'd look there first.
  • Sean Owen
    Sean Owen about 15 years
    I don't see how the nature of the objects changes how fast a Set or Map can look them up, or why a database would be leaner and faster than a Map implementation
  • Sean Owen
    Sean Owen about 15 years
    Yeah hashCode() is OK as is equals() and yeah I'm not being too dumb (i.e. using entrySet() where applicable for instance). trove4j is a good lead.
  • Sean Owen
    Sean Owen about 15 years
    Yeah they are as fast as possible -- merely comparing / returning ints in my case. Good point though.
  • Sean Owen
    Sean Owen about 15 years
    May have to go that way -- was hoping to hear someone had nailed this already.
  • Sean Owen
    Sean Owen about 15 years
    Yeah good lead here but these implementations are trying to optimize away thread contention in a thread-safe implementation, in a mostly read-only environment. I should have noted I don't need thread-safety.
  • Sean Owen
    Sean Owen about 15 years
    Sorted trees should be slower for general lookup since their structure is oriented to maintaining ordering. Hash-based implementations ought to be O(1) in comparison. You are right to think about overhead in the data structures -- that is exactly what I am concerned about. Both TreeMap and HashMap use a Map.Entry object internally for each key. HashMap I suppose has a little more overhead due to empty hash table slots but it's minor. But yeah I want to avoid all those Map.Entry objects for instance.
  • Brett
    Brett about 15 years
    More memory means harder worked cache. Implementation of equals and hashCode is also important. If equals has to chase down various data in different allocations of memory, that is going to be slow. If hashCode causes collisions that's going to be slow.
  • Sean Owen
    Sean Owen about 15 years
    Profiling shows significant time spent within methods of HashMap, HashSet, etc. Their absolute speed is irrelevant compared to the relative amount of time spent there. I can look at arrays and Map.Entry objects allocated from HashMap, for instance, to get a sense of the memory overhead of the data structure.
  • Neil Coffey
    Neil Coffey about 15 years
    Nowadays, I'd really just use the concurrent collections introduced in Java 5.
  • Sean Owen
    Sean Owen about 15 years
    Yep, on Java 6, and passing -server for sure.
  • Neil Coffey
    Neil Coffey about 15 years
    N.B. Strictly this example isn't linear probing. We actually allocate mini lists at each "bucket", it's just that those mini-lists are allocated from an array.
  • Egwor
    Egwor about 15 years
    just a thought: have you thought about making your objects immutable and then pre-computing the hash code.
  • maaartinus
    maaartinus about 13 years
    "which you'd expect after hashing in the order of 2^32 or a couple of billion items if you have a good hash function" - No matter how good your hash function is, with about 64-bit long hashes and 2**32 keys conflicts are nearly sure. You'd need much less keys to make their probability lower. IMHO, it's unusable as a Map, but it may be good enough for caches.
  • Neil Coffey
    Neil Coffey about 13 years
    @maartinus That was my point: even with a theoretically "perfect" hash function, you would on average expect to have a collision after inserting 2^32 keys. And yes, that doesn't mean you should add 2^32 keys because the chance of a collision is obviously not an all-or-nothing thing. However, it means that you can add a few million keys with the chance of a collision being negligible. For many applications, a few million keys is enough.