-
Notifications
You must be signed in to change notification settings - Fork 54
Discriminated unions
Discriminated unions allow the definition of new, structured types. They encompass both enumeration and record types in other languages.
The simplest example of a discriminated union type is an enumeration. Here is an example:
:- type fruit
---> apple
; banana
; cherry.
The definition begins with :- type
followed by the type name fruit
.
Following the arrow (containing three dashes) is a list of one or more
data constructors separated by semicolons. Each constructor represents
a possible term of the type. This type definition makes it possible to write
code involving terms apple
, banana
and cherry
.
The compiler will reject code which tries to make use of constructors which were not defined or imported, preventing mistakes.
Each data constructor in the type definition may have arguments.
Here is a type to represent binary trees of int
values.
:- type tree
---> leaf(int)
; branch(tree, int, tree).
Terms of the type tree
can come in two variants, distinguished by
the data constructor. Terms with leaf
as the top-level constructor
must have a single argument of type int
. Terms with branch
as the
top-level constructor must have three arguments of the right types,
including two sub-trees. A term matching the type is branch(leaf(1), 2, branch(leaf(3), 4, leaf(5)))
.
An enumeration is just a discriminated union where none of the constructors have arguments.
Types may be parameterised with type variables. Type variables begin with an uppercase letter or underscore, just like program variables.
We can generalise the binary tree type like this:
:- type tree(T)
---> leaf(T)
; branch(tree(T), T, tree(T)).
tree(T)
is a polymorphic type. We can substitute any type for T
to
make more types. tree(string)
is the type of binary trees of strings,
tree(tree(int))
is the type of binary trees of binary trees of ints,
and so on.
Ordinarily, type variables occurring in the body of the type definition must occur as arguments on the left side of the arrow. This is invalid:
:- type invalid_type
---> foo(T).
A term of type invalid_type
must have a top-level constructor foo
but
what would be the type of the argument? We do not have a substitution for
T
, so that information would have to carried around with the term at
runtime. It is possible using a feature called existentially quantified
type variables, which deserves its own chapter.
The fields of a data constructor may be labelled using the syntax
FIELDNAME :: FIELDTYPE
. One reason to label fields is to make the type
definition more obvious.
:- type circle
---> circle(
x :: float,
y :: float,
radius :: float
).
Note that two fields in the same module cannot have the same name.
Field names allow you to select individual fields of a term using an
expression of the form TERM ^ FIELDNAME
. Actually, the right side is
a list that can be chained, so TERM ^ FIELDNAME1 ^ ... ^ FIELDNAME
can select a field in a nested term.
Radius = Circle ^ radius
achieves the same thing as
Circle = circle(_, _, Radius)
Field names also allow you to "update" individual fields of a term
with an expression of the form TERM ^ FIELD_LIST := FIELDVALUE
.
The expression returns a copy of TERM
with the value of the field
specified replaced by FIELDVALUE
.
UnitCircle = Circle ^ radius := 1.0,
achieves the same thing as
Circle = circle(X, Y, _),
UnitCircle = circle(X, Y, 1.0)
We will have more to say about named fields and record syntax after we introduce more of the language.
A single type can have multiple constructors with the same name but different arities (number of arguments). Different types may have constructors of the same name and arity.
This is valid:
:- type foo1
---> foo
; foo(int)
; foo(int, int)
; foo(int, int, int).
:- type foo2
---> foo
; foo(int)
; foo(int, int)
; foo(int, int, int).
Excessive overloading of constructors can slow down type checking and can make the program confusing for human readers, so overloading should not be over-used.
Due to overloading or polymorphism, sometimes it will be necessary to
resolve ambiguities when a term could match multiple types using a type
qualification expression of the form TERM : TYPE
.
It usually arises when calling a predicate or function that can accept any type.
It might be instructive to have a look at some of the types defined in the standard library.
In the bool
module we have a boolean type.
:- type bool
---> no
; yes.
The constants are named yes
and no
to avoid confusion with the
predicates true
and false
that exist in Mercury. It is often a good
idea to use purpose-specific types instead of bool
, as the meaning of
a boolean parameter can be unclear, or confused with unrelated parameters.
In the maybe
module we have a type for representing optional values.
:- type maybe(T)
---> no
; yes(T).
The pair
module defines a type to pair up two terms into a compound term,
similar to a 2-tuple.
:- type pair(T1, T2)
---> (T1 - T2).
An example of a pair is "abc" - 123
where -
is the infix constructor.
The -
symbol is easily confused for a minus operator so we would
not recommend it in new code (it comes from Prolog, which doesn't have
functions).
A very common type is list(T)
defined in the list
module.
:- type list(T)
---> []
; [T | list(T)].
Because lists are so common, the parser supports some special syntax that makes working with lists more convenient. (The syntax can be used for other types, though it may be confusing.)
We can also write the list(T)
type definition as:
:- type list(T)
---> '[]'
; '[|]'(T, list(T)).
That might be a little clearer. A list is either the empty list
[]
, or a non-empty list with a top-level constructor [|]
with two
arguments, the head of the list and the rest of the list.
We will probably devote a chapter to working with lists.