A Brief Introduction to C#'s GetHashCode()
TLDR
- When overriding
Equals(), you must also overrideGetHashCode(); otherwise, hash-based collections likeDictionaryorHashSetwill fail to correctly identify equal objects. - Core principle of
GetHashCode(): Equal objects must return the same hash code; unequal objects are not strictly required to return different values; the method itself should not throw exceptions. - In .NET Core 2.1+ environments, it is recommended to use
HashCode.Combine()to implementGetHashCode(). - During
Dictionarylookups, the HashCode is compared first, followed byEquals(). This mechanism effectively improves performance and reduces unnecessary object comparisons.
Implementation Principles and Importance of GetHashCode()
When do you encounter this issue: When a custom class is used as a Dictionary key or stored in a HashSet, overriding Equals() without also handling GetHashCode() will cause the collection to fail to correctly determine if objects are duplicates.
GetHashCode() is used to generate an integer hash value and is the foundation for methods like Dictionary<TKey, TValue>, HashSet<T>, and LINQ's Distinct(). When implementing it, you must adhere to the following principles:
- If two objects are determined to be equal by
Equals(),GetHashCode()must return the same value. - If two objects are not equal,
GetHashCode()is not strictly required to return different values (hash collisions are allowed). GetHashCode()should not throw exceptions.
If these principles are not followed, even logically equal objects will be treated as distinct entities within a collection. Below is an incorrect example:
public class Test {
public string Name { get; set; }
public int Age { get; set; }
public override bool Equals(object obj) => obj is Test test && Name == test.Name && Age == test.Age;
}
// Test results
Test test1 = new() { Name = "王大明", Age = 10 };
Test test2 = new() { Name = "王大明", Age = 10 };
Dictionary<Test, string> dic = new() { [test1] = "測試" };
Console.WriteLine(test1.Equals(test2)); // True
Console.WriteLine(dic.ContainsKey(test2)); // FalseImplementation Recommendations
When do you encounter this issue: During development, when you need to manually implement GetHashCode() to ensure the correctness of object comparisons.
Depending on the .NET version, the following approaches are recommended:
.NET Core 2.1 and later: Use
HashCode.Combine(), which is the most concise and performant way.csharppublic override int GetHashCode() => HashCode.Combine(Name, Age);.NET Framework versions: Since
HashCode.Combine()is not supported, it is recommended to manually combine field hash values and use prime numbers as multipliers to reduce collisions.csharppublic override int GetHashCode() { int hashCode = -1360180430; hashCode = hashCode * -1521134295 + EqualityComparer<string>.Default.GetHashCode(Name); hashCode = hashCode * -1521134295 + Age.GetHashCode(); return hashCode; }
TIP
Using EqualityComparer<T>.Default to calculate hash values properly handles null values and ensures consistent calculation rules across different classes.
Dictionary Search Mechanism and Performance Optimization
When do you encounter this issue: When you need to understand why Dictionary maintains high performance during comparisons even when processing large amounts of data.
Dictionary internally uses two array structures, _buckets and _entries, to store data. The workflow of its FindValue() search method is as follows:
- Calculate Hash Value: First, calculate the HashCode of the Key to locate the index in
_buckets, allowing direct access to the corresponding_entriesand avoiding a global search. - Preliminary Filtering: Compare whether the HashCodes are identical. If the HashCodes differ, they are immediately determined to be unequal, avoiding the need to call the more costly
Equals()method. - Handle Collisions: If the HashCodes are the same but the object contents differ (hash collision), compare the objects in the chain one by one using the
entry.nextpointer until a match is found.
This mechanism ensures that in most cases, collection operations can be completed with a time complexity close to O(1).
Change Log
- 2024-09-01 Initial documentation created.
