Skip to content
View Article Network

A Brief Discussion on C#'s GetHashCode()

Last week, when a colleague was sharing about Bloom Filters, it reminded me of key comparisons in Dictionary, as both rely on hash values to quickly determine if an element exists. Of course, while both use an array structure and hash values as indices, their specific usage differs.

GetHashCode()

GetHashCode() is a method used to generate an integer hash value, which is primarily used to compare whether object values are equal. Implementing GetHashCode() must adhere to the following principles:

  • If two objects are considered equal (Equals() returns true), GetHashCode() must return the same value.
  • If two objects are not equal, GetHashCode() does not necessarily have to return different values.
  • GetHashCode() should not throw exceptions.

The GetHashCode() method is commonly used in the following scenarios:

  • Implementations of IDictionary<TKey, TValue> and Dictionary: Such as Dictionary<TKey, TValue> and HashTable, these classes use GetHashCode() for fast key comparison.
  • Implementations of ISet<T>: Such as HashSet<T>, which uses GetHashCode() to determine element uniqueness.
  • LINQ's Distinct() method: Uses GetHashCode() to remove duplicate elements.

If you override Equals() but do not override GetHashCode(), these classes may still consider objects unequal even if they are equal under Equals(), leading to incorrect behavior. In fact, when only Equals() is overridden, Visual Studio will display the following warning.

gethashcode overwrite warning

Use the following code to test:

csharp
Test test1 = new() {
    Name = "王大明",
    Age = 10
};

Test test2 = new() {
    Name = "王大明",
    Age = 10
};

Dictionary<Test, string> dic = new() {
    [test1] = "測試"
};

Console.WriteLine(test1.Equals(test2));
Console.WriteLine(dic.ContainsKey(test2));


public class Test {
    public string Name { get; set; }

    public int Age { get; set; }

    public override bool Equals(object obj) => obj is Test test && Name == test.Name && Age == test.Age;
}

The results are as follows:

text
True
False

Implementation of GetHashCode()

The simplest way to implement it is to use the refactoring feature in Visual Studio. Take the following code as an example:

csharp
public class Test {
    public string Name { get; set; }

    public int Age { get; set; }
}

Click on the Test class, press ALT + Enter to open the refactoring options, and select "Generate Equals and GetHashCode...".

vs generate equals menu

Select the members to be used for Equals() judgment. You can also choose to implement IEquatable<T> and operators (== and !=) at the same time. We will not check these extra items here:

vs generate equals dialog

The generated code is as follows:

csharp
public class Test {
    public string Name { get; set; }

    public int Age { get; set; }

    public override bool Equals(object obj) => obj is Test test && Name == test.Name && Age == test.Age;
    public override int GetHashCode() => HashCode.Combine(Name, Age);
}

However, since HashCode.Combine() was added in .NET Core 2.1, it is not supported in .NET Framework. For .NET Framework versions, the generated code might look like this:

csharp
 public class Test {
    public string Name { get; set; }

    public int Age { get; set; }

    public override bool Equals(object obj) => obj is Test test && Name == test.Name && Age == test.Age;

    public override int GetHashCode() {
        int hashCode = -1360180430;
        hashCode = hashCode * -1521134295 + EqualityComparer<string>.Default.GetHashCode(Name);
        hashCode = hashCode * -1521134295 + Age.GetHashCode();
        return hashCode;
    }
}

Code explanation:

  • -1360180430 and -1521134295 are constants used to initialize and combine hash codes to produce a more uniformly distributed hash value. These constants are usually large prime numbers because prime numbers help reduce the probability of collisions in hash functions.
    • -1360180430: This is the initial hash code value. By using a non-zero starting value (usually a prime number), you can ensure that the object's hash code is different from the default value and that it will not return zero as a hash code even if the object has no properties.
    • -1521134295: This is the multiplier used to combine hash values. Using a prime multiplier to mix the hash values of individual fields can increase the randomness of the result, thereby reducing the probability of hash collisions.
  • The reason for using EqualityComparer<T>.Default to calculate hash values for reference objects:
    • To avoid null value cases; when the value is null, it returns 0.
    • To ensure that the same hash calculation rules are used for objects of different classes.

Of course, this .NET Framework version is not easy to remember. For .NET Framework versions that do not support HashCode.Combine(), some developers might use anonymous objects to get GetHashCode(). Note that this approach might slightly affect performance due to the creation of temporary anonymous objects.

csharp
public override int GetHashCode() => new { Name, Age }.GetHashCode();

Implementation of ContainsKey(TKey key)

I will briefly explain this here. If you want to know the complete approach, the implementation from .NET 8 is excerpted below for your own research.

The ContainsKey() method internally uses the FindValue() method to search for the existence of the specified TKey. If the corresponding value is found, it means the TKey exists in the Dictionary<TKey, TValue>. Dictionary has two main array fields:

  • _entries: Type Entry[], where Entry contains TKey, TValue, and the index of the next entry with the same bucket value.
  • _buckets: Type int[], which stores the mapping table of each bucket value (the value calculated from the TKey's HashCode after modulo operations, etc.) and the corresponding entry index.

The implementation flow of FindValue() is as follows:

  1. Calculate the HashCode of the passed key. This HashCode is used to quickly locate the corresponding entry index in the _buckets array. Through this index, you can directly obtain the corresponding entry from the _entries array, avoiding a linear search and thus improving search efficiency.
  2. First, compare the TKey's HashCode with the HashCode stored in the entry. This allows for quick filtering of non-matching items, reducing the number of calls to the Equals() method. In most cases, the computational cost of Equals() is higher than that of a hash comparison, so this can further improve performance.
  3. Since different objects may have the same HashCode, and because the index in _buckets is the result of a calculation, the entry index obtained from _buckets may not necessarily point to the desired item. If a mismatch is found, entry.next is used to point to the index of the next item with the same bucket, continuing the comparison until a match is found or all possible items have been checked.

The following is the implementation code of FindValue() in .NET 8:

csharp
internal ref TValue FindValue(TKey key) {
    if (key == null) {
        ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
    }

    ref Entry entry = ref Unsafe.NullRef<Entry>();
    if (_buckets != null) {
        Debug.Assert(_entries != null, "expected entries to be != null");
        IEqualityComparer<TKey>? comparer = _comparer;
        if (typeof(TKey).IsValueType && // comparer can only be null for value types; enable JIT to eliminate entire if block for ref types
            comparer == null) {
            uint hashCode = (uint)key.GetHashCode();
            int i = GetBucket(hashCode);
            Entry[]? entries = _entries;
            uint collisionCount = 0;

            // ValueType: Devirtualize with EqualityComparer<TKey>.Default intrinsic
            i--; // Value in _buckets is 1-based; subtract 1 from i. We do it here so it fuses with the following conditional.
            do {
                // Should be a while loop https://github.com/dotnet/runtime/issues/9422
                // Test in if to drop range check for following array access
                if ((uint)i >= (uint)entries.Length) {
                    goto ReturnNotFound;
                }

                entry = ref entries[i];
                if (entry.hashCode == hashCode && EqualityComparer<TKey>.Default.Equals(entry.key, key)) {
                    goto ReturnFound;
                }

                i = entry.next;

                collisionCount++;
            } while (collisionCount <= (uint)entries.Length);

            // The chain of entries forms a loop; which means a concurrent update has happened.
            // Break out of the loop and throw, rather than looping forever.
            goto ConcurrentOperation;
        } else {
            Debug.Assert(comparer is not null);
            uint hashCode = (uint)comparer.GetHashCode(key);
            int i = GetBucket(hashCode);
            Entry[]? entries = _entries;
            uint collisionCount = 0;
            i--; // Value in _buckets is 1-based; subtract 1 from i. We do it here so it fuses with the following conditional.
            do {
                // Should be a while loop https://github.com/dotnet/runtime/issues/9422
                // Test in if to drop range check for following array access
                if ((uint)i >= (uint)entries.Length) {
                    goto ReturnNotFound;
                }

                entry = ref entries[i];
                if (entry.hashCode == hashCode && comparer.Equals(entry.key, key)) {
                    goto ReturnFound;
                }

                i = entry.next;

                collisionCount++;
            } while (collisionCount <= (uint)entries.Length);

            // The chain of entries forms a loop; which means a concurrent update has happened.
            // Break out of the loop and throw, rather than looping forever.
            goto ConcurrentOperation;
        }
    }

    goto ReturnNotFound;

ConcurrentOperation:
    ThrowHelper.ThrowInvalidOperationException_ConcurrentOperationsNotSupported();
ReturnFound:
    ref TValue value = ref entry.value;
Return:
    return ref value;
ReturnNotFound:
    value = ref Unsafe.NullRef<TValue>();
    goto Return;
}

Change Log

  • 2024-09-01 Initial document creation.