The issues of hash codes, equals methods, and what object identity really means is quite fundamental, and quite fraught. Certainly the Java solution of programmers defining equals and hash methods – according to some rather strict guidelines – has proven to have many problems in practice, and James has found it an annoying two lectures to teach for Java or C++.
William Cook (OOPSLA 2009) has argued that if objects are truly abstractions, equals and hash-code should be left up to each object to implement: there should be “no built-in notion of equality” because this would break the abstractions offered by objects. On the other hand, it’s not clear a language would be useful without some kind of equals for at least some kind of objects. Haskell offers another option: make an explicit interface (like the Eq type class) and only objects that provide equality should confirm to that interface. Hashcodes could be treated in the same way. While this would mean not every object had to participate in equality, we would still have to teach students about the necessarily difficult contracts of those methods. We could be more extreme: following Derek Rayside & Daniel Jackson and generate equality methods automatically based on explicit class invariants. But that seems a little far for a practical language today.
We propose that Grace will adopt a much more straightforward approach. Grace’s equals will be based on Henry Baker’s Egal predicate – structural equality of immutable state – thus making it similar to Clojure. Any mutable object (an object with at least one field) will only equal itself. Immutable objects (comprising only constants and methods, but no writeable fields) will be equal to other objects with the same structure (constants and methods) and the same values for all of those constants. Grace will provide just one equality predicate, written = (or !=) using this definition, and programmers will not be able to override it. Compatible hashcodes will also be provided.
Compared with Java or Smalltalk, this is hopefully much simpler, “does the right thing” for many common cases, and encourages immutable objects. The two main difficulties are first, that it cannot cope with objects whose abstract state may be equal but their “concrete” state is not. A linked list and an array list with identical contents will not be =, even if they are immutable. The second difficulty is that two mutable objects, two array lists, say, will never be = even if they have identical contents and are two instances of the same class. = in Grace is stable, if e1 = e2 is ever true, it will always be true. This cannot apply to mutable objects (unless e1 and e2 are actually the same object) . To compare or index using collections of objects, programmers will need use some kind of immutable objects: Grace will provide immutable tuple literals and methods to convert collections into tuples.
A key advantage of this definition of = is that immutable objects can act as pure values. Especially for (so-called) value objects — immutable objects containing only other immutable objects — Grace programs cannot distinguish between different instances with the same value, just as in most languages integers are compared by value, not by reference (you can’t tell the difference between the integer “1” even at different memory addresses). A Grace implementation can support value objects using whatever implementation is most efficient: either passing by reference always, by passing some types by value, or even by inlining fields into their containing objects, and doing a field-update if the containing object assigns in a new value.
In this way, Grace hopes to support a simple and stable definition of object equality, hashcodes, and value types – all without requiring Grace programmers to write any code.
Some references:
- William Cook’s essay http://www.cs.utexas.edu/~wcook/Drafts/2009/essay.pdf
- http://c2.com/cgi/wiki?ValueObjectHypotheses
- http://c2.com/cgi/wiki?CanValueObjectsContainReferenceObjects
- Dirk Riehle’s Value Objects http://dirkriehle.com/computer-science/research/2006/plop-2006-value-object.pdf
- Value Objects in Smalltalk http://www.esug.org/data/ESUG2009/IWST/iwst09_submission_3.pdf
- Henry Baker’s Equal Rights for Functional Objects http://home.pipeline.com/~hbaker1/ObjectIdentity.html
- Derek Rayside & Daniel Jackson on generating equals and hashes from abstraction functions in Java http://sdg.csail.mit.edu/pubs/2009/icse09-object-contract.pdf
- Stephen Nelson’s study of Java equals and collections: http://ecs.vuw.ac.nz/~stephen/papers/tools10-paper.pdf
The idea of making equality built-in: identity for mutable objects and deep for immutable objects is attractive, but has some problems. On the plus side, the fact that Java has unstable equality (since equality is mandated for mutable collections) inconvenient at first blush but very confusing in the end. So making mutable objects use identity is the right idea. On the negative side: deep equality won’t work for most clever immutable data structures. Not only (as the article concedes) two lists with different implementations never be equal, but even ADTs such as immutable binary search trees would not be able to support equality unless they always canonicalized their representations. In essence, equality would blow the cover on data hiding.
Whether equality should be user-defined:
Equality should be symmetric, which is difficult to guarantee in an object-oriented.language which privileges teh first operand of the operation.
The issue of collections is interesting: teh definition of equality becomes frozen at one level of the hierarchy — subclasses may which to optimize equality testing for particular arguments types (a la parasitic methods) but should always defer to a generic implementation for types that they are not particularly aware of.
If Grace does make equality testing (partly?) built-in, it may need to add a special Eq “interface” to mark places where an ADT is being defined. Not precisely sure of the semantics, and what happens if an object
wishes to implement multiple ADTs at once.
I have always liked the way CLU handled this. The main feature CLU had that your design doesn’t is that the built in = operator was sugar for a method name (equals), and there was a notion of a “similar” method that was designed to compare as true when both arguments had the same abstract value. Thus two set objects could be similar if they are observably equivalent, even if their representations (such as lists) are different (not =).
These were matched in CLU with the copy operator, so that
T$similar(x, T$copy(x))
should always be true.
Further, there was a useful notion of shallow copy with copy1 and a shallow similarity with similar1, in CLU.
But all these were conventions, and types might not have a copy or similar operator if the designer didn’t want to provide them. in fact some built in types, in particular all the function types, have no similar operator, and = for function typesion CLU is object identity, even those the objects of these types are immutable.