GetHashCode() on byte[] array

29,238

Solution 1

Like other non-primitive built-in types, it just returns something arbitrary. It definitely doesn't try to hash the contents of the array. See this answer.

Solution 2

Arrays in .NET don't override Equals or GetHashCode, so the value you'll get is basically based on reference equality (i.e. the default implementation in Object) - for value equality you'll need to roll your own code (or find some from a third party). You may want to implement IEqualityComparer<byte[]> if you're trying to use byte arrays as keys in a dictionary etc.

EDIT: Here's a reusable array equality comparer which should be fine so long as the array element handles equality appropriately. Note that you mustn't mutate the array after using it as a key in a dictionary, otherwise you won't be able to find it again - even with the same reference.

using System;
using System.Collections.Generic;

public sealed class ArrayEqualityComparer<T> : IEqualityComparer<T[]>
{
    // You could make this a per-instance field with a constructor parameter
    private static readonly EqualityComparer<T> elementComparer
        = EqualityComparer<T>.Default;

    public bool Equals(T[] first, T[] second)
    {
        if (first == second)
        {
            return true;
        }
        if (first == null || second == null)
        {
            return false;
        }
        if (first.Length != second.Length)
        {
            return false;
        }
        for (int i = 0; i < first.Length; i++)
        {
            if (!elementComparer.Equals(first[i], second[i]))
            {
                return false;
            }
        }
        return true;
    }

    public int GetHashCode(T[] array)
    {
        unchecked
        {
            if (array == null)
            {
                return 0;
            }
            int hash = 17;
            foreach (T element in array)
            {
                hash = hash * 31 + elementComparer.GetHashCode(element);
            }
            return hash;
        }
    }
}

class Test
{
    static void Main()
    {
        byte[] x = { 1, 2, 3 };
        byte[] y = { 1, 2, 3 };
        byte[] z = { 4, 5, 6 };

        var comparer = new ArrayEqualityComparer<byte>();

        Console.WriteLine(comparer.GetHashCode(x));
        Console.WriteLine(comparer.GetHashCode(y));
        Console.WriteLine(comparer.GetHashCode(z));
        Console.WriteLine(comparer.Equals(x, y));
        Console.WriteLine(comparer.Equals(x, z));
    }
}

Solution 3

byte[] inherits GetHashCode() from object, it doesn't override it. So what you get is basically object's implementation.

Solution 4

Simple solution

    public static int GetHashFromBytes(byte[] bytes)
    {
        return new BigInteger(bytes).GetHashCode();
    }

Solution 5

If it's not the same instance, it will return different hashes. I'm guessing it is based on the memory address where it is stored somehow.

Share:
29,238
Chesnokov Yuriy
Author by

Chesnokov Yuriy

Updated on July 08, 2022

Comments

  • Chesnokov Yuriy
    Chesnokov Yuriy almost 2 years

    What does GetHashCode() calculate when invoked on the byte[] array? The 2 data arrays with equal content do not provide the same hash.

  • Chesnokov Yuriy
    Chesnokov Yuriy almost 13 years
    no, it is not the same instance, I presume in that case hashes would be equal
  • Jon Skeet
    Jon Skeet almost 13 years
    @Chesnokov Yuriy: Okay, I've edited some code into my answer.
  • Chesnokov Yuriy
    Chesnokov Yuriy almost 13 years
    thank you very much for the useful snippet. A bit off the topic if you please, your C# in depth book is very intresting, I'm going to read it. The g+ idea is superb introducing circles, compared to facebook, where you can not separate your contacts. It would be great to show different user page account content and information to every circle, e.g. one would not be happy to show some of work circle page content to his friends and vice versa. Can you advise if soon will we be able to register there?
  • Jon Skeet
    Jon Skeet almost 13 years
    @Chesnokov: That's a bit off-topic for here, I'm afraid - and I wouldn't be able to tell you about any upcoming features anyway.
  • Douglas
    Douglas over 8 years
    There seems to be some debate on whether GetHashCode should scan over the entire sequence. Interestingly, the internal implementation for Array.IStructuralEquatable.GetHashCode only considers the last eight items of an array, sacrificing hash uniqueness for speed.
  • Peter - Reinstate Monica
    Peter - Reinstate Monica over 7 years
    I did something similar using Enumerable.SequenceEqual(). Is there a particular reason to hand-code the element comparison? (Admittedly it's probably a bit faster.)
  • Jon Skeet
    Jon Skeet over 7 years
    @PeterA.Schneider: I don't think SequenceEqual is optimized to compare lengths first if the source implements appropriate interfaces.
  • Guy Langston
    Guy Langston about 4 years
    Seeing this solution made me smile. Clean, elegant. Digging deeper the hash implementation ends up calling github.com/microsoft/referencesource/blob/master/…
  • Xeorge Xeorge
    Xeorge Xeorge about 4 years
    cough cough GetHashCode(); returns int32.
  • Erusso87
    Erusso87 over 3 years
    @JonSkeet Since we have new primitives like Memory<T>, Span<T> or Sequence<T> can this code be optimised in any way? For example we do have SequenceEqual for ReadOnlySpan<T> now.
  • Jon Skeet
    Jon Skeet over 3 years
    @bitbonk: I don't know whether that would be any faster - maybe; you'd need to run actual benchmarks for it. (It's possible that SequenceEqual is optimized to compare 8 bytes at a time, for example.)
  • Dave Jellison
    Dave Jellison over 3 years
    @XeorgeXeorge so?
  • fjch1997
    fjch1997 over 3 years
    @DaveJellison There is a (2^32) in 1 chance of collision, which is negalegible for most scenarios but is something that must be kept in mind whenever there's a hash code.
  • Dave Jellison
    Dave Jellison over 3 years
    Agreed, but this is inherent with hashing as a rule. It's like going to the dictionary.com to complain about the definition of a word.
  • Steve Pick
    Steve Pick almost 3 years
    Note this method incurs a copy of the whole byte array, so may not be efficient. Also It's important to understand the purpose of GetHashCode() - it's not intended to produce a unique value but rather a well-distributed value for allocating buckets in a Dictionary or HashSet, which benefit from each bucket being roughly equal size. Both types use a combination of GetHashCode() and Equals() to determine whether a collision has really occurred.