Friday, June 6, 2008

Six C# features Java developers will kill for...

Using GetHashCode and Equals

The ability to override GetHashCode is available on every object but is seldom required for POCOs. The Equals method usually provides all of the comparison functionality we'll ever need. But when using an ORM such as NHibernate, GetHashCode takes a more prominent role as it helps NHibernate determine if an object already exists in a collection. Not overriding GetHashCode, or doing so inappropriately, may lead to duplicate objects showing up in collections and/or objects missing altogether. When needed, most people implement both methods and end up with similar code in both. So to ease the burden of managing both of these methods, there's exploitable overlap between Equals and GetHashCode to kill two birds with one stone.


In my project work, I consider the following to be true when comparing two objects:

* If two objects have IDs, and have the same IDs, then they may be considered equal without examining them further. (I'm assuming ID to be an identity field, or equivalent, in the DB.)
* If two objects have IDs, but have different IDs, than they may be considered not equal without examining them further. E.g., If customer A has an ID of 4 while customer B has an ID of 5, then they are not equal, QED.
* If neither object has an ID or only one of them has an ID, but their "business value signatures" are identical, then they're equal. E.g., customer A has an ID of 4 and a social-security-number of 123-45-6789 while customer B has no ID but also has a social-security-number of 123-45-6789. In this case, customer A and customer B are equal. By "business signatures," I imply a combination of those properties which deem an entity unique, regardless of its ID.

* If one of them is null, then they are not equal. (Whoo, that was easy.)

With the above considerations in mind, we'll want to write code to make the following unit test pass. Note that Customer takes a company name into its constructor. It also has a settable contact name. The combination of its company name and contact name give Customer its unique business signature.

view plaincopy to clipboardprint?

1. [Test]
2. public void CanCompareCustomers() {
3. Customer customerA = new Customer("Acme");
4. Customer customerB = new Customer("Anvil");
5.
6. Assert.AreNotEqual(customerA, null);
7. Assert.AreNotEqual(customerA, customerB);
8.
9. customerA.SetIdTo(1);
10. customerB.SetIdTo(1);
11.
12. // Even though the signatures are different,
13. // the persistent IDs were the same. Call me
14. // crazy, but I put that much trust into IDs.
15. Assert.AreEqual(customerA, customerB);
16.
17. Customer customerC = new Customer("Acme");
18.
19. // Since customerA has an ID but customerC
20. // doesn't, their signatures will be compared
21. Assert.AreEqual(customerA, customerC);
22.
23. customerC.ContactName = "Coyote";
24.
25. // Signatures are now different
26. Assert.AreNotEqual(customerA, customerC);
27.
28. // customerA.Equals(customerB) because they
29. // have the same ID.
30. // customerA.Equals(customerC) because they
31. // have the same signature.
32. // customerB.DoesNotEquals(customerC) because
33. // we can't compare their IDs, since
34. // customerC is transient, and their
35. // signatures are different.
36. Assert.AreNotEqual(customerB, customerC);
37. }

[Test] public void CanCompareCustomers() { Customer customerA = new Customer("Acme"); Customer customerB = new Customer("Anvil"); Assert.AreNotEqual(customerA, null); Assert.AreNotEqual(customerA, customerB); customerA.SetIdTo(1); customerB.SetIdTo(1); // Even though the signatures are different, // the persistent IDs were the same. Call me // crazy, but I put that much trust into IDs. Assert.AreEqual(customerA, customerB); Customer customerC = new Customer("Acme"); // Since customerA has an ID but customerC // doesn't, their signatures will be compared Assert.AreEqual(customerA, customerC); customerC.ContactName = "Coyote"; // Signatures are now different Assert.AreNotEqual(customerA, customerC); // customerA.Equals(customerB) because they // have the same ID. // customerA.Equals(customerC) because they // have the same signature. // customerB.DoesNotEquals(customerC) because // we can't compare their IDs, since // customerC is transient, and their // signatures are different. Assert.AreNotEqual(customerB, customerC); }

Although some argue against a single object which all other persistable domain objects inherit from, I use one nonetheless and ingeniously call it "DomainObject." (Those are Dr. Evil quotes there.) "DomainObject," in its entirety, contains the following:

view plaincopy to clipboardprint?

1. public abstract class DomainObject
2. {
3. ///
4. /// ID may be of type string, int,
5. /// custom type, etc.
6. ///

7. public IdT ID {
8. get { return id; }
9. }
10.
11. public override sealed bool Equals(object obj) {
12. DomainObject compareTo =
13. obj as DomainObject;
14.
15. return (compareTo != null) &&
16. (HasSameNonDefaultIdAs(compareTo) ||
17. // Since the IDs aren't the same, either
18. // of them must be transient to compare
19. // business value signatures
20. (((IsTransient()) || compareTo.IsTransient()) &&
21. HasSameBusinessSignatureAs(compareTo)));
22. }
23.
24. ///
25. /// Transient objects are not associated with an
26. /// item already in storage. For instance, a
27. /// Customer is transient if its ID is 0.
28. ///

29. public bool IsTransient() {
30. return ID == null || ID.Equals(default(IdT));
31. }
32.
33. ///
34. /// Must be implemented to compare two objects
35. ///

36. public abstract override int GetHashCode();
37.
38. private bool HasSameBusinessSignatureAs(DomainObject compareTo) {
39. return GetHashCode().Equals(compareTo.GetHashCode());
40. }
41.
42. ///
43. /// Returns true if self and the provided domain
44. /// object have the same ID values and the IDs
45. /// are not of the default ID value
46. ///

47. private bool HasSameNonDefaultIdAs(DomainObject compareTo) {
48. return (ID != null &&
49. ! ID.Equals(default(IdT))) &&
50. (compareTo.ID != null &&
51. ! compareTo.ID.Equals(default(IdT))) &&
52. ID.Equals(compareTo.ID);
53. }
54.
55. ///
56. /// Set to protected to allow unit tests to set
57. /// this property via reflection and to allow
58. /// domain objects more flexibility in setting
59. /// this for those objects with assigned IDs.
60. ///

61. protected IdT id = default(IdT);
62. }

public abstract class DomainObject { /// /// ID may be of type string, int, /// custom type, etc. /// public IdT ID { get { return id; } } public override sealed bool Equals(object obj) { DomainObject compareTo = obj as DomainObject; return (compareTo != null) && (HasSameNonDefaultIdAs(compareTo) || // Since the IDs aren't the same, either // of them must be transient to compare // business value signatures (((IsTransient()) || compareTo.IsTransient()) && HasSameBusinessSignatureAs(compareTo))); } /// /// Transient objects are not associated with an /// item already in storage. For instance, a /// Customer is transient if its ID is 0. /// public bool IsTransient() { return ID == null || ID.Equals(default(IdT)); } /// /// Must be implemented to compare two objects /// public abstract override int GetHashCode(); private bool HasSameBusinessSignatureAs(DomainObject compareTo) { return GetHashCode().Equals(compareTo.GetHashCode()); } /// /// Returns true if self and the provided domain /// object have the same ID values and the IDs /// are not of the default ID value /// private bool HasSameNonDefaultIdAs(DomainObject compareTo) { return (ID != null && ! ID.Equals(default(IdT))) && (compareTo.ID != null && ! compareTo.ID.Equals(default(IdT))) && ID.Equals(compareTo.ID); } /// /// Set to protected to allow unit tests to set /// this property via reflection and to allow /// domain objects more flexibility in setting /// this for those objects with assigned IDs. /// protected IdT id = default(IdT); }

Note that Equals is sealed and cannot be overridden by a DomainObject implementation. I suppose it could be unsealed, but since I put a lot of work into that method, I don't want anyone mucking it up!

Now assume that Customer implements DomainObject. As mentioned above, the combination of its company name and contact name give it its unique signature. So its GetHashCode would be as follows:

view plaincopy to clipboardprint?

1. public override int GetHashCode() {
2. return (GetType().FullName + "|" +
3. CompanyName + "|" +
4. ContactName).GetHashCode();
5. }

public override int GetHashCode() { return (GetType().FullName + "|" + CompanyName + "|" + ContactName).GetHashCode(); }

You'll notice that the start of the method includes the full name of the class type itself. With this in place, two different classes would never return the same signature. (You'll have to reconsider how GetHashCode is implemented to handle inheritance structures; e.g. a Customer and an Employee both inherit from a Person class but Customer and Employee may be equal in some instances...for this, I'd probably only add GetHashCode to the Person class.) Additionally, note that GetHashCode should only contain the "business signature" of the object and not include its ID. Including the ID in the signature would make it impossible to find equality between a transient object and a ! transient object. (Equality for all regardless of transience I say!)

Pramod Gupta