Sets are not functors

I was a bit surprised to learn from Mark Seeman's blog that sets aren't functors. But, after all, I have almost no knowledge of category theory and, perhaps like the fox of the fairy tale, little interest in getting more acquainted with it. I get reassured by authors like Tomas Petricek who certainly do understand the formal laws, yet deemphasize their prominence in programming.

Gen is not a monad

In fact, also QuickCheck generators violate monad laws, as acknowledged by the authors themselves in the original paper. The details are not relevant for this post, except maybe the following quote:

"We cannot fix the problem just by reinterpreting equality for the Gen type claiming the two sides are just different representations of the same abstract generator. This isn't good enough because we can actually observe the difference at other types ..."

This post is about equality and abstraction. The main point is that a proper abstraction should not let us observe any difference between two values considered equal.

DateTimeOffset breaks substitution

What I find really surprising and interesting in Mark Seeman's post is the example used to break the functor laws: DateTimeOffset overrides equality and, quite reasonably, conflates values with different offsets but representing the same point in time. The problem is that the offset remains visible, hence violating the substitution property of equality.

I've always been aware that any override of Equals should be reflexive, symmetric and transitive (the properties of an equivalence relation), but only now I realize the importance of this even more fundamental property of mathematical equality, substitution: x = y => f(x) = f(y)

DateTimeOffset clearly violates this property since we can observe different offsets on equal values.

This hampers equational reasoning, the hallmark of functional programming.

My gut reaction is that we should put such poor abstractions in the same league as of impure stuff. At least, side effects, mutable state and randomness can be useful to get things done, and there's growing awareness on how to avoid or limit their use.

Abstract types to the rescue

This way of messing up things, choosing the easy route, is reminiscent of implementation inheritance: it's tempting to derive from a class instead of composing it, except that once you specialize a class you are committed to that abstraction, which is not limited to a signature but includes semantics (invariants, pre and post conditions); otherwise you end up breaking the Liskov substitution principle. I digress, but Barbara Liskov reminds us of abstract data types. Maybe I'm obsessed with this concept but it's a real pity that our industry care so little about it or at least some lightweight variant like Design by Contract. And, yes, certainly DateTimeOffset is not a proper ADT, as substitution is an essential property to qualify as such.

Laws (called axioms) defining the formal semantics of an abstract data type are expressed with equational logic, which is based on substitution; they determine a congruence on ADT expressions (terms). In my attempt at explaining a bit of this initial algebra approach I refer to it as an equivalence relation, but actually it's a congruence, that is a stronger notion encompassing substitution. In the final semantics approach, two terms denote the same value if they can be substituted for each other and no difference can be observed. The last sentence in this nice summary of the above concepts clearly states that ADT consistency requires the substitution property.

What about Haskell?

Armed with this theoretical evidence I turned to Haskell and, to my bigger surprise, I discovered that there's no agreement on the need for the Eq type class to satisfy the substitution property. I understand = and == are not the same but I do not dare to even start any philosophical nor technical discussion about it. Fortunately I'm not alone in being confused.

What seems uncontroversial is that the substitution property is compromised by lack of encapsulation.

This article explains pretty well why also HashSet is a leaky abstraction, in Haskell too: it may yield elements in different orders for equal instances.

A draconian fix would be to not expose any enumeration of elements: a set should be used only to check if it contains an element. Such a minimalist kind of set should satisfy the functor's laws when the element type is a proper ADT. Some evidence is provided in the following academic paper which explores the relationship between functors and ADTs (not an easy read, at least for me).

Conclusion

Maybe I gave the impression of dismissing abstractions like functor and monad. Actually I don't. I just don't get (and I'm more than happy to read comments about it) why wrapping our heads around non trivial abstractions while caring so little about simpler ones like dates and sets.