Which one is faster? List.contains() or Map.containsKey()

39,722

Solution 1

I later realized that I don't really use the values contained in my Map so a List will suffice.

Map isn't just a list of key-value pairs, it is a unique mapping from keys to values. So when you change from Map to List, you are allowing duplicates where you previously didn't. On the other hand, a Set is exactly a Map without the values. So consider using a HashSet.

As for the search complexities:

list.contains is O(n), hashSet.contains is O(1), and treeSet.contains is O(log n).

For general information on now HashMap works, google for "hashtable". For TreeMap, google for "binary tree" or similar. Wikipedia has good entries on these subjects.

Be careful, however, to avoid the class Hashtable. It's an archaeological artefact in the modern library. For your case HashSet is probably the best choice.

Solution 2

Mapand List are interfaces, so there is no information on their implementation nor their performance. But if you use the most current implementations (LinkedList or ArrayList for List, and HashMap for Map), the contains() method must, in the worst case, go through the entire list, and compare your element with each entry. It is an O(n) operation.

If you use an HashMap, the implementation is radically different : the HashMap contains an array with more entries than elements in it (in practice, you have an array size of between 4n/3 an 3n/2 for n elements in the map). It computes the hash of the key, which is an int, and wrap it between 0 and your array size (let's say this number is i). Then it will put the element at the index i of the array (or i+1, i+2… if previous indexes are already taken). So, when you check for the key presence with containsKey, it will re-compute the hash and the i value, and check the i, i+1… indexes until it finds an empty array cell. Theorically, you can have an O(n) worst-case, if the array is almost full, are all the keys have almost identicals i values, but with a good hash function, you have constant-time contains and get functions. (However, adding elements is fast if you don't need to resize the array, which is REALLY slow - I think you need to recompute the indexes of each key).

So a map is really faster if you need to check the key appearance in a collection, and do not need to keep the order (there is a SortedHashMap for that, but I don't know it's performance), but it will take more memory.

Also, if you don't need the key-value thing, you can use a HashSet (which is internally the same as an HashMap).

Solution 3

HashSet seems to be faster:

  • HashMap: 267
  • ArrayList: 2183
  • HashSet: 57

Also note that .contains() normally does not need to be called on HashMap and HashSet, but I kept it on the code to more accurately answer your question:

    long t = System.currentTimeMillis();
    HashMap<String, Boolean> map = new HashMap<>();
    for (int i = 0; i < 10000; i++) {
        String s = (Math.random() * 100) + "";
        if (!map.containsKey(s)) {
            map.put(s, true);
        }
    }
    System.out.println("HashMap: " + (System.currentTimeMillis() - t));

    t = System.currentTimeMillis();
    ArrayList<String> list = new ArrayList<>();
    for (int i = 0; i < 10000; i++) {
        String s = (Math.random() * 100) + "";
        if (!list.contains(s)) {
            list.add(s);
        }
    }
    System.out.println("ArrayList: " + (System.currentTimeMillis() - t));

    t = System.currentTimeMillis();
    HashSet<String> set = new HashSet<>();
    for (int i = 0; i < 10000; i++) {
        String s = (Math.random() * 100) + "";
        if (!set.contains(s)) {
            set.add(s);
        }
    }
    System.out.println("HashSet: " + (System.currentTimeMillis() - t));
Share:
39,722
Adam Arold
Author by

Adam Arold

I do programming for a living. I like tinkering with stuff from bare metal to designing architectures. I also contribute to the Open Source community. Currently I'm based on the JVM (Kotlin/Clojure/Java) but I'm still looking for an Elixir (pun intended). Currently I'm hacking simulation-based games and world/flora/fauna/civilization generation (Dwarf Fortress style). If you think we can work together (programming/pixel art) drop me a mail. Check out my Hexagonal Grid library if interested: Hexameter If you need a multiplatform messaging bus: Riptide or a multiplatform terminal emulator for games: Zircon You can find my LinkedIn profile here I also created the unicorns tag on meta and #SOreadytohelp.

Updated on July 09, 2022

Comments

  • Adam Arold
    Adam Arold almost 2 years

    I'm writing an algorithm where I look for pairs of values which when added together results in another value I'm looking for.

    I figured out that using a Map will speed up my algorithm from O(n²). I later realized that I don't really use the values contained in my Map so a List will suffice.

    I did a power search on Google but I did not find any information on the asymptotic running time of those methods in the title of my question.

    Can you point out where should I look for such information?