GetHashCode() on byte[] array
Solution 1
Like other non-primitive built-in types, it just returns something arbitrary. It definitely doesn't try to hash the contents of the array. See this answer.
Solution 2
Arrays in .NET don't override Equals
or GetHashCode
, so the value you'll get is basically based on reference equality (i.e. the default implementation in Object
) - for value equality you'll need to roll your own code (or find some from a third party). You may want to implement IEqualityComparer<byte[]>
if you're trying to use byte arrays as keys in a dictionary etc.
EDIT: Here's a reusable array equality comparer which should be fine so long as the array element handles equality appropriately. Note that you mustn't mutate the array after using it as a key in a dictionary, otherwise you won't be able to find it again - even with the same reference.
using System;
using System.Collections.Generic;
public sealed class ArrayEqualityComparer<T> : IEqualityComparer<T[]>
{
// You could make this a per-instance field with a constructor parameter
private static readonly EqualityComparer<T> elementComparer
= EqualityComparer<T>.Default;
public bool Equals(T[] first, T[] second)
{
if (first == second)
{
return true;
}
if (first == null || second == null)
{
return false;
}
if (first.Length != second.Length)
{
return false;
}
for (int i = 0; i < first.Length; i++)
{
if (!elementComparer.Equals(first[i], second[i]))
{
return false;
}
}
return true;
}
public int GetHashCode(T[] array)
{
unchecked
{
if (array == null)
{
return 0;
}
int hash = 17;
foreach (T element in array)
{
hash = hash * 31 + elementComparer.GetHashCode(element);
}
return hash;
}
}
}
class Test
{
static void Main()
{
byte[] x = { 1, 2, 3 };
byte[] y = { 1, 2, 3 };
byte[] z = { 4, 5, 6 };
var comparer = new ArrayEqualityComparer<byte>();
Console.WriteLine(comparer.GetHashCode(x));
Console.WriteLine(comparer.GetHashCode(y));
Console.WriteLine(comparer.GetHashCode(z));
Console.WriteLine(comparer.Equals(x, y));
Console.WriteLine(comparer.Equals(x, z));
}
}
Solution 3
byte[]
inherits GetHashCode()
from object
, it doesn't override it. So what you get is basically object
's implementation.
Solution 4
Simple solution
public static int GetHashFromBytes(byte[] bytes)
{
return new BigInteger(bytes).GetHashCode();
}
Solution 5
If it's not the same instance, it will return different hashes. I'm guessing it is based on the memory address where it is stored somehow.
Chesnokov Yuriy
Updated on July 08, 2022Comments
-
Chesnokov Yuriy almost 2 years
What does
GetHashCode()
calculate when invoked on thebyte[]
array? The 2 data arrays with equal content do not provide the same hash. -
Chesnokov Yuriy almost 13 yearsno, it is not the same instance, I presume in that case hashes would be equal
-
Jon Skeet almost 13 years@Chesnokov Yuriy: Okay, I've edited some code into my answer.
-
Chesnokov Yuriy almost 13 yearsthank you very much for the useful snippet. A bit off the topic if you please, your C# in depth book is very intresting, I'm going to read it. The g+ idea is superb introducing circles, compared to facebook, where you can not separate your contacts. It would be great to show different user page account content and information to every circle, e.g. one would not be happy to show some of work circle page content to his friends and vice versa. Can you advise if soon will we be able to register there?
-
Jon Skeet almost 13 years@Chesnokov: That's a bit off-topic for here, I'm afraid - and I wouldn't be able to tell you about any upcoming features anyway.
-
Douglas over 8 yearsThere seems to be some debate on whether
GetHashCode
should scan over the entire sequence. Interestingly, the internal implementation forArray.IStructuralEquatable.GetHashCode
only considers the last eight items of an array, sacrificing hash uniqueness for speed. -
Peter - Reinstate Monica over 7 yearsI did something similar using Enumerable.SequenceEqual(). Is there a particular reason to hand-code the element comparison? (Admittedly it's probably a bit faster.)
-
Jon Skeet over 7 years@PeterA.Schneider: I don't think
SequenceEqual
is optimized to compare lengths first if the source implements appropriate interfaces. -
Guy Langston about 4 yearsSeeing this solution made me smile. Clean, elegant. Digging deeper the hash implementation ends up calling github.com/microsoft/referencesource/blob/master/…
-
Xeorge Xeorge about 4 yearscough cough GetHashCode(); returns int32.
-
Erusso87 over 3 years@JonSkeet Since we have new primitives like
Memory<T>
,Span<T>
orSequence<T>
can this code be optimised in any way? For example we do haveSequenceEqual
forReadOnlySpan<T>
now. -
Jon Skeet over 3 years@bitbonk: I don't know whether that would be any faster - maybe; you'd need to run actual benchmarks for it. (It's possible that SequenceEqual is optimized to compare 8 bytes at a time, for example.)
-
Dave Jellison over 3 years@XeorgeXeorge so?
-
fjch1997 over 3 years@DaveJellison There is a
(2^32)
in 1 chance of collision, which is negalegible for most scenarios but is something that must be kept in mind whenever there's a hash code. -
Dave Jellison over 3 yearsAgreed, but this is inherent with hashing as a rule. It's like going to the dictionary.com to complain about the definition of a word.
-
Steve Pick almost 3 yearsNote this method incurs a copy of the whole byte array, so may not be efficient. Also It's important to understand the purpose of GetHashCode() - it's not intended to produce a unique value but rather a well-distributed value for allocating buckets in a Dictionary or HashSet, which benefit from each bucket being roughly equal size. Both types use a combination of GetHashCode() and Equals() to determine whether a collision has really occurred.