HashMap serialization and deserialization changes

12,687

Solution 1

You are doing nothing wrong, it just can't be done with a HashMap. In a HashMap, order is not guaranteed. Use a TreeMap instead.

Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key. (The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.) This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.

Source: Hashmap

Solution 2

Your check sum cannot depend on the order of entries as HashMap is not ordered. An alternative to using TreeMap is LinkedHashMap (which retains an order), but the real solution is to use a hashCode which doesn't depending on the order of the entries.

Share:
12,687
dgaviola
Author by

dgaviola

Java Developer

Updated on June 18, 2022

Comments

  • dgaviola
    dgaviola about 2 years

    We are working with an in memory data grid (IMDG) and we have a migration tool. In order to verify that all the objects are migrated successfully, we calculate the chucksum of the objects from its serialized version.

    We are seeing some problems with HashMap, where we serialize it, but when we deserialize it the checksum changes. Here is a simple test case:

    @Test
    public void testMapSerialization() throws IOException, ClassNotFoundException {
        TestClass tc1 = new TestClass();
        tc1.init();
        String checksum1 = SpaceObjectUtils.calculateChecksum(tc1);
    
        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        ObjectOutput out = null;
        byte[] objBytes = null;
        out = new ObjectOutputStream(bos);
        out.writeObject(tc1);
        objBytes = bos.toByteArray();
        out.close();
        ByteArrayInputStream bis = new ByteArrayInputStream(objBytes);
        ObjectInputStream in = new ObjectInputStream(bis);
        TestClass tc2 = (TestClass) in.readObject();
        String checksum2 = SpaceObjectUtils.calculateChecksum(tc2);
    
        assertEquals(checksum1, checksum2);
    }
    

    The TestClass looks like this:

    class TestClass implements Serializable {
        private static final long serialVersionUID = 5528034467300853270L;
    
        private Map<String, Object> map;
    
        public TestClass() {
        }
    
        public Map<String, Object> getMap() {
            return map;
        }
    
        public void setMap(Map<String, Object> map) {
            this.map = map;
        }
    
        public void init() {
            map = new HashMap<String, Object>();
            map.put("name", Integer.valueOf(4));
            map.put("type", Integer.valueOf(4));
            map.put("emails", new BigDecimal("43.3"));
            map.put("theme", "sdfsd");
            map.put("notes", Integer.valueOf(4));
            map.put("addresses", Integer.valueOf(4));
            map.put("additionalInformation", new BigDecimal("43.3"));
            map.put("accessKey", "sdfsd");
            map.put("accountId", Integer.valueOf(4));
            map.put("password", Integer.valueOf(4));
            map.put("domain", new BigDecimal("43.3"));
        }
    }
    

    And this is the method to calculate the checksum:

    public static String calculateChecksum(Serializable obj) {
        if (obj == null) {
            throw new IllegalArgumentException("The object cannot be null");
        }
        MessageDigest digest = null;
        try {
            digest = MessageDigest.getInstance("MD5");
        } catch (java.security.NoSuchAlgorithmException nsae) {
            throw new IllegalStateException("Algorithm MD5 is not present", nsae);
        }
        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        ObjectOutput out = null;
        byte[] objBytes = null;
        try {
            out = new ObjectOutputStream(bos);
            out.writeObject(obj);
            objBytes = bos.toByteArray();
            out.close();
        } catch (IOException e) {
            throw new IllegalStateException(
                    "There was a problem trying to get the byte stream of this object: " + obj.toString());
        }
        digest.update(objBytes);
        byte[] hash = digest.digest();
        StringBuilder hexString = new StringBuilder();
        for (int i = 0; i < hash.length; i++) {
            String hex = Integer.toHexString(0xFF & hash[i]);
            if (hex.length() == 1) {
                hexString.append('0');
            }
            hexString.append(hex);
        }
        return hexString.toString();
    }
    

    If you print the maps of tc1 and tc2, you can see that the elements are not in the same place:

    {accessKey=sdfsd, accountId=4, theme=sdfsd, name=4, domain=43.3, additionalInformation=43.3, emails=43.3, addresses=4, notes=4, type=4, password=4}
    {accessKey=sdfsd, accountId=4, name=4, theme=sdfsd, domain=43.3, emails=43.3, additionalInformation=43.3, type=4, notes=4, addresses=4, password=4}
    

    I would like to be able to serialize the HashMap and get the same checksum when I deserialize it. Do you know if there is a solution or if I'm doing something wrong?

    Thanks!

    Diego

  • Sean Patrick Floyd
    Sean Patrick Floyd about 13 years
    I was also thinking about suggesting LinkedHashMap, but is the order guaranteed even through Deserialization? Must be, I guess. This isn't really clear: download.oracle.com/javase/6/docs/api/…
  • Vishy
    Vishy about 13 years
    @Sean, I am not sure its documented anywhere, but I have found it to be. I tend to use LHM as a matter of course because it tends to make debugging easier, but I would avoid relying on its order for production purposes.
  • dgaviola
    dgaviola about 13 years
    The checksum depends on the serialization of HashMap and apparently that depends on the order. I ended up changing the type to TreeMap because it also helped us to solve other issues when persisting from the IMDG to a relational database.
  • dgaviola
    dgaviola about 13 years
    Thanks! I want to avoid changing it to another implementation, but it ended up being the better solution as it also address some other issues we were having when persisting from the IMDG to a relational database.