The lost art of data abstraction
Two types of ADT
Algebraic Data Types and Abstract Data Types share the same acronym but are very different concepts. Both are clever ideas but with opposite aims. In this post I will use F# to show abstract data types hinting at old and fascinating techniques for their algebraic specification.
Algebraic Data Types
Algebraic Data Types are the workhorses of statically typed functional languages like F#. They are also known as sum types or discriminated unions, so I will refer to them as DUs here, reserving the ADT acronym for abstract data types which is the main subject of the post.
There's no need to introduce DUs because I assume they are well known since they're often praised as a great tool for domain modeling and making illegal states unrepresentable. In this post we will see that they are not completely unrelated to abstract data types: in fact DUs are useful as internal representation of ADT values and they also play a role in the algebraic specification of ADTs (as the algebraic adjective may suggest).
Abstract Data Types
While DUs are concrete data structures, abstraction is (unsurprisingly) the defining trait of abstract data types. ADTs are the ultimate tool for encapsulation and information hiding: their behavior can be described only implicitly, relating values obtained applying the ADT operations without referring to their internal representation.
Objects and ADTs
There are important differences between objects and ADTs but even Bertrand Meyer advocates conflating the two concepts and I sympathize a lot with this view because for me the similarities outweigh the differences.
It's a fact of (computing) history that objects won over ADTs and even a functional first language like F# embraces objects. That's why I'm using F# objects to show ADTs (modules are used in the ML tradition). This is also an occasion to show that immutable objects are very nice and close to the mathematical notion of abstract data type.
The obligatory stack example
The standard pedagogical example of ADT is the stack, whose signature
features the operations New
, Push
, Pop
and Top
:
1: 2: 3: 4: 5: |
|
Unfortunately neither constructors nor static members are allowed
in .NET interfaces (Java 8 allows static methods) so we cannot
express the New
operation in the above signature. Let's then
move to a proper class, although with a dummy implementation:
1: 2: 3: 4: 5: 6: 7: 8: |
|
The class constructor is private and takes a private representation
object. For now all the ADT constructors (New
, Push
and Pop
)
necessarily use the same dummy representation value (TODO
).
A side note on terminology: in ADT parlance a constructor is any
operation returning a value of the ADT type. That's why in our stack
all operations (except Top
) qualify as constructors.
The above implementation is clearly wrong but, to claim that, we first need to specify what is the correct behavior and we can do this with axioms stating the properties of the ADT operations.
We can use FsCheck to verify the correctness of our implementation (hence we won't prove it but at least test it thoroughly with many random values).
Stack Axioms
Here are the stack axioms (we use int
for the generic parameter
but any type will do):
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: |
|
The meaning of the axioms is fairly readable:
- popping an empty stack should leave it empty
- there's no top of an empty stack
- pop undo the effect of a push
- pushing an item make it the top
Different API's are possible, but this one has the advantage of
defining total functions: Top
is defined (albeit with value None
)
also on the empty stack, and popping an empty stack is possible
although useless because it will leave it empty.
Equality
Before running FsCheck we need to fix a couple of things: the first
one is that since axioms are expressed as equations comparing ADT
values, to compare stacks we have to override Equals
so that stack
values with the same internal representation (whatever it is) are
equivalent:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: |
|
Term Algebra
Now we are equipped with stack equality but the remaining issue is
that FsCheck can't generate instances of the Stack
type (because
it is not a concrete data type). The trick is to define the so
called term algebra as a DU:
1: 2: 3: 4: |
|
Notice how data constructors in StackTerm
correspond to ADT
constructors in the Stack
API and how straightforward is to
create stack instances from terms:
1: 2: 3: 4: 5: |
|
The stack
function highlights the correspondence between the
term algebra and the stack API. Terms are just stack expressions,
for example term1
corresponds to stack1
:
1: 2: |
|
and applying the stack
function to term1
we expect to get
a stack instance equivalent to stack1
:
1:
|
|
Random terms can now be generated by FsCheck, and we just need to adapt our properties a little:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: |
|
We added also a new axiom for equality (axiom0
). This is not
specific for stacks, in every ADT we expect two values to be
equivalent if they are constructed with the same sequence of
operations.
We can finally run FsCheck and see our tests fail. Now it's time to get back to the implementation and make it compliant with the axioms.
Initial Algebra
There are of course multiple implementation options, but first we will pursue one of theoretical interest only. We use terms also for the internal representation:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: |
|
Clearly this implementation is not good from a performance point of view:
the representation object always keeps growing, even when Pop
is called,
and a lot of processing is needed (the reduce
function) for Top
, Equals
and GetHashCode
. But it works, and it's derived almost
mechanically from the axioms.
In fact there are specification languages in which this kind of implementation can often be automatically derived from axioms.
Axioms determine an equivalence relation on terms (two terms are equivalent if the axioms allow to reduce them to the same term) and the ADT values are conceptually the equivalence classes of this relation. This theoretical model is a sort of reference implementation (quotient term algebra) and this approach (initial algebra) has strong connections with category theory (so it has to be fancy!). There is also the dual approach of final algebra, where two terms are considered equivalent unless the axioms forbid it. There's a lot of beautiful theory behind abstract data types.
Canonical Constructors
The key insight to improve the implementation is nicely explained
here. The idea is to identify canonical constructors,
the ones intuitively sufficient to construct all values. In our
example New
and Push
are enough because every stack involving
Pop
can be expressed in a simpler form using only New
and
Push
. For example:
1:
|
|
This approach provides a precise guideline to specify axioms: one axiom is needed for each combination of a non-canonical operation applied to the result of a canonical operation. Our stack axioms happen to follow this pattern:
- Pop(New) = ...
- Top(New) = ...
- Pop(Push(x, s)) = ...
- Top(Push(x, s)) = ...
The first two axioms specify the behavior of the non-canonical
operations (Pop
and Top
) when applied to the empty stack;
the other two axioms specify the behavior of the non-canonical
operations when applied to a stack obtained with a push operation.
There are no more cases to cover.
Since only New
and Push
are needed to build stacks,
we can simplify our representation type:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: |
|
Also this implementation just mirrors the axioms, but this is a realistic one. If you squint a bit you notice that our final representation type is isomorphic to the good old F# (linked) list, which is in fact a reasonable internal representation for a stack.
Conclusion
The purpose of this post is to present a few old but interesting concepts about data abstraction. I'm not claiming that specifying and implementing an ADT is always as nice and simple as in the stack example, nor I'm advocating formal methods; but I think these ideas may still provide a useful intellectual guideline.
<summary>The type 'unit', which has only one value "()". This value is special and always uses the representation 'null'.</summary>
<category index="1">Basic Types</category>
<summary>Contains operations for working with options.</summary>
<category>Options</category>
type Stack<'a> = private new : repr:DummyRepresentation -> Stack<'a> member Pop : unit -> Stack<'a> member Push : x:'a -> Stack<'a> static member New : unit -> Stack<'a> member Top : Option<'a>
--------------------
private new : repr:DummyRepresentation -> Stack<'a>
<summary>The representation of "No value"</summary>
val int : value:'T -> int (requires member op_Explicit)
<summary>Converts the argument to signed 32-bit integer. This is a direct conversion for all primitive numeric types. For strings, the input is converted using <c>Int32.Parse()</c> with InvariantCulture settings. Otherwise the operation requires an appropriate static conversion method on the input type.</summary>
<param name="value">The input value.</param>
<returns>The converted int</returns>
--------------------
[<Struct>] type int = int32
<summary>An abbreviation for the CLI type <see cref="T:System.Int32" />.</summary>
<category>Basic Types</category>
--------------------
type int<'Measure> = int
<summary>The type of 32-bit signed integer numbers, annotated with a unit of measure. The unit of measure is erased in compiled code and when values of this type are analyzed using reflection. The type is representationally equivalent to <see cref="T:System.Int32" />.</summary>
<category>Basic Types with Units of Measure</category>
<summary>The representation of "Value of type 'T"</summary>
<param name="Value">The input value.</param>
<returns>An option representing the value.</returns>
static member Check.Quick : name:string * property:'Testable -> unit
type Stack<'a> = private new : repr:DummyRepresentation -> Stack<'a> override Equals : obj:obj -> bool override GetHashCode : unit -> int member Pop : unit -> Stack<'a> member Push : x:'a -> Stack<'a> static member New : unit -> Stack<'a> member Top : Option<'a> member private repr : DummyRepresentation
--------------------
private new : repr:DummyRepresentation -> Stack<'a>
val obj : obj
--------------------
type obj = System.Object
<summary>An abbreviation for the CLI type <see cref="T:System.Object" />.</summary>
<category>Basic Types</category>
type Stack<'a (requires equality)> = private new : repr:StackTerm<'a> -> Stack<'a> override Equals : obj:obj -> bool override GetHashCode : unit -> int member Pop : unit -> Stack<'a> member Push : x:'a -> Stack<'a> static member New : unit -> Stack<'a0> (requires equality) member Top : Option<'a> member private repr : StackTerm<'a>
--------------------
private new : repr:StackTerm<'a> -> Stack<'a>
type Stack<'a (requires equality)> = private new : repr:Representation<'a> -> Stack<'a> override Equals : obj:obj -> bool override GetHashCode : unit -> int member Pop : unit -> Stack<'a> member Push : x:'a -> Stack<'a> static member New : unit -> Stack<'a0> (requires equality) member Top : Option<'a> member private repr : Representation<'a>
--------------------
private new : repr:Representation<'a> -> Stack<'a>