what happens when computer: early algebraic data types

Algebraic data types are a very convenient and expressive way of modeling data. You're familiar with them if you've ever used any ML-like language: you have products, which are pairs or records:

("hello", 7) : String * Int

and sums, which are discriminated unions:

Left "hello" : String + Int

Right 7      : String + Int

Usually, modern ML-like languages provide a mechanism for creating named sums, where you provide your own names for each discriminant, or constructor:

data IntList = Cons (Int * IntList) | Nil

You also have the so-called unit or top type, which is the type with a single possible value, usually written (), and—at least in the underlying theory—the so-called bottom type, which has no values. The top and bottom types are sometimes written with the symbols \(\top\) and \(\bot\), and sometimes with the numerals 1 and 0. This is because they act as the identity elements in the algebra of types. The types a+0 and 0+a are both isomorphic to a. Likewise, a*1 and 1*a are also both isomorphic to a.

Algebraic data types first appeared as a usable feature in the Hope programming language, which was a very early statically typed functional language. Despite some syntactic weirdness, they worked very much like modern algebraic data types: the operator for product types was written as # instead of *, and sum constructors were written with ++ in between variants, but Hope type declarations still look remarkably familiar 45 years later:

data numlist == cons ( num # numlist ) ++ nil ;

Pattern-matching on those types also looks remarkably similar to modern functional languages, complete with definition-level pattern-matching as found in languages like Haskell:

dec sum : numlist -> num ;
--- sum ( nil )            <= 0 ;
--- sum ( cons ( x, xs ) ) <= x + sum xs ;

Algebraic data types also featured in early ML implemenations, but in a very awkward way. The first DEC-10 ML implementation didn't have a mechanism for defining named sums; instead, you had to use the built-in sum operator + with its constructors: The symbols * and ** in these signatures are type variables, like 'a in modern versions of ML.

inl : *  -> * + **
inr : ** -> * + **

You couldn't pattern-match on those values: you had to use the accessor functions outl and outr instead, which would throw exceptions if you tried to access the wrong value. You could explicitly check the tag with the function isl:

outl : * + ** -> *
outr : * + ** -> **
isl  : * + ** -> bool

We could create a data type by giving the algebraic representation of the type in terms of products, sums, and existing types. Declarations used the abstype or absrectype keywords, depending on whether the type was recursive or not. Like Hope, ML used # for pairs, and also used the symbol . to stand for unit:

absrectype intlist = (int # intlist) + . ;;

This definition would create a new type as well as two functions, absintlist and repintlist, with the following types:

absintlist : ((int # intlist) + .) -> intlist
repintlist : intlist -> ((int # intlist) + .)

Whenever we defined a type T as isomorphic to a representation R, the language would define the functions absT : R -> T and repT : T -> R, filling in whatever particular type name and representation we give. Those functions allow us to, for example, construct values of type intlist by "wrapping" a value of the representation type:

val mylist = absintlist(inl(1, absintlist(inr ())));;

Of course, constructing all our values like that would be a nightmare, so DEC-10 ML programs often used the with construct. The with keyword is preceeded by a set of declarations and followed by a different set of declarations. The value definitions before the with keyword are available to the rest of the declarations in the with construct, but only the ones after the with keyword are introduced into the surrounding scope. In our case, this allows us to write a type with a set of constructor and accessor functions, but keep the abs and rep functions hidden:

absrectype intlist = (int # intlist) + .
   with cons(x, xs) = absintlist(inl(x, xs))
    and nil         = absintlist(inr())
    and head lst    = fst(outl(repintlist lst))
    and tail lst    = snd(outl(repintlist lst))
    and isempty lst = not(isl(repintlist lst));;

The semantics of with ensure that the absintlist and repintlist functions don't escape the declaration block, and lists can only be constructed the user-written constructor functions. We can then construct our values much more tersely:

val mylist = cons(1, nil)

The successor to DEC-10 ML, VAX ML, did include the same datatype DEC-10 ML and VAX ML are both collectively referred to as Edinburgh ML. The names come via Luca Cardelli, who wrote VAX ML as a successor to and improvement on the original DEC-10 ML. mechanism as described above, but also included both named sums and named products. Named products corresponded to what are called structs or records, but unlike structs in languages like C or Pascal, they didn't need to have their types named or declared, as structural descriptions of the types would suffice. The VAX ML syntax used so-called "decorated parentheses":

(| a = "foo"; b = 7 |) : (| a: string; b: int |)

The named sum mechanism also did not need any declarations: the type instead was written as the set of possible constructor names along with their argument types:

[| c = "foo" |] : [| c: string; d: int |]
[| d = 7     |] : [| c: string; d: int |]

Unlike sums and products in DEC-10 ML, these sums and products could be picked apart by pattern matching, using a syntax very similar to the syntax used to create the values:

case e : [| c: string; d: int |] of
  [| c = s . 0;
     d = n . n + n
   |]

The effort to create Standard ML started in the early 80's, and it drew on features from both Hope and the existing ML languages. In A Proposal for Standard ML, Robin Milner explains that:

…a difficult decision had to be made concerning HOPE's treatment of data types—present only in embryonic form in the original ML—and the labelled records and variants which Cardelli introduced in his VAX version. The latter have definite advantages which the former lack; on the other hand, the HOPE treatment is well-rounded in its own terms. Though a combination of these features is possible, it seemed (at least to me, but some disagreed!) to entail too rich a language for the present proposal. Thus the HOPE treatment is fully adopted here.

Consequently, the first versions of SML drew their algebraic data types from Hope's implemenation, abandoning the tedious and difficult DEC-10 ML versions. Later versions did end up drawing on Cardelli's named products to create records, with a change in notation from (| ... |) to { ... } , and thus we get the algebraic data types we all know and love.

programming