Using char[] array in HashMap in java

11,877

Solution 1

I have a huge GBs of data to be read and if I use String, I am running out of the heap size.

You will have to think of a different way then.

  • You could use consider using a TreeMap<char[], V> with a custom Comparator ... but this is a stop-gap measure.

  • You could implement a custom hash table-based Map class that used char[] as the key, but did the array hashing and equality tests without using the key object's equals(Object) and hashcode() methods. This also is a stop-gap measure.

  • You could get a bigger machine ... another stop-gap measure.

  • You could redesign the algorithm so that it doesn't have to put all of the data into a big in-memory hash table in the address space of one Java program.

What you are currently doing doesn't scale. Even if you manage to implement a map using char[] keys instead of String keys, you probably only reduce the space used to hold the keys by half. The best this is going to give you is the ability to handle roughly twice the number of keys you can currently handle. Then you hit the wall again.

In the long term, the last option is the best if you want to keep scaling up.


Incidentally @Sean Patrick Floyd's suggestion of interning the key strings will probably only make matters worse. First, you won't ever get Strings that are equal but not == as keys in the same map. So interning doesn't save anything. Second, interning is performed using a private hash table, and the JVM needs to allocate space to represent this table.

The only scenario where interning is potentially worthwhile is if the strings that you are using to to do map lookups are likely to survive multiple GC cycles. Only then could interning save space.


Finally, there's one scenario which may cause your key strings to use far more memory than you realize. Consider this:

BufferedReader br = ...
Map<String, Value> map = new HashMap<String, Value>();

String line;
while ((line = br.readLine()) != null) {
    ...
    String key = line.substring(...);
    map.put(key, ...);
}

The problem is that the substring method shares the same backing char[] as the original String. If key is long-lived (which it probably will be), this means that the original large backing array will also be long lived, even though we're only ever going to refer to a slice of that array it via the key object.

The solution is to write this:

    String key = new String(line.substring(...));

which forces the copying of the characters into a new (smaller) character array.

UPDATE - Changes to the implementation of java.lang.String in Java 7 addressed this problem. The substring methods now make a copy of the relevant slice of the backing array.

Solution 2

Printing is the least of your poblems, really. The real problem is that char[] does not have meaningful implementation of the equals() and hashCode() methods that HashMap relies on. Therefore the map will not recognize different arrays with the same content as the same keys, which in all probablity defeats your purpose of using a hash map in the first place.

So you need to wrap your arrays in something that does have equals() and hashCode(). May I suggest wrapping them in java.lang.String?

Solution 3

That's because arrays don't have a toString() method. What you are seeing is the Output of Object.toString():

return getClass().getName() + "@" + Integer.toHexString(hashCode());

In your case:

[C     = internal name of char[].class
@      = "@"
31ff23 = your array's identity hash code as hey String
=1     = the value

BTW, they don't have a hashCode() implementation either, so they make bad keys for HashMaps (see this similar question about using byte arrays as HashMap keys).

Solution 4

char[]'s .toString() method (which is used when printing map contents) is not implemented to display the contents of the array - it uses the default toString() implementation which includes the class name ([C in this case) followed by the hashcode, which is typically the (hex) memory address in the JVM

You shouldn't use arrays as keys in HashMap, because they don't implement hashCode() and equals() based on the elements. Prefer String.

Share:
11,877
vamsi360
Author by

vamsi360

Engineer

Updated on June 19, 2022

Comments

  • vamsi360
    vamsi360 about 2 years

    I am using a HashMap with a char array as the key. But when I put the key, value into the HashMap and print it the value printed as the key is some strange value. If I use a string instead, it is working fine. Please let me know how to make this work for char[] values.

                while((ch = br.read()) != -1) {
                s = new char[10];
                s[i++] = (char)ch;
    
                while( (i < 10) && ((ch = br.read()) != -1)) {
                    s[i++] = (char)ch;                  
                }
    
                //System.out.println(s);
                hmap.put(s, 1); 
                //System.out.println(hmap);
                i = 0;              
            }
    

    Key: BZOUA1578L Value: 1

    The contents in the hashmap are {[C@31ff23=1} instead of {BZOUA1578L, 1}

  • jmj
    jmj almost 13 years
    char[] is an Object which has .toString() but not implemented to show the content as OP is expecting
  • vamsi360
    vamsi360 almost 13 years
    I have a huge GBs of data to be read and if I use String, I am running out of the heap size.
  • vamsi360
    vamsi360 almost 13 years
    But if I have a string of length 10 and if I use a string, it is not that efficient.
  • Sean Patrick Floyd
    Sean Patrick Floyd almost 13 years
    @Subhash ok, but why do you need to use them as HashMap keys in the first place? Perhaps you could say what you are doing, so we can provide an alternative strategy.
  • vamsi360
    vamsi360 almost 13 years
    I am having a fixed length string of 10 chars and I have lots of records of them. If I use a string, I am unable to process all of those as I am running out of heap space. I used a String Buffer to manipulate the data on each iteration but it prints the same values as the char array.
  • Bozho
    Bozho almost 13 years
    why do you think it is not efficient? I can assure you it is. The string takes up almost as much memory as the array. If you need more, increase it with -Xmx
  • Sean Patrick Floyd
    Sean Patrick Floyd almost 13 years
    @vamsi360 try interning your Strings, that way the non-interned Strings can be garbage collected: map.put(string.intern(),1)
  • vamsi360
    vamsi360 almost 13 years
    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOfRange(Unknown Source) at java.lang.String.<init>(Unknown Source) at java.lang.StringBuilder.toString(Unknown Source)
  • Stephen C
    Stephen C almost 13 years
    Interning won't help; see my answer.