Skip to content
Julien Fischer edited this page Aug 19, 2017 · 4 revisions

Lists are a common data structure in Mercury, as in most functional programming languages.

list(T) is a discriminated union type defined in the standard list module:

:- type list(T)
    --->    []
    ;       [T | list(T)].

All elements of a list must have the same type T.

[] denotes the empty list, also pronounced "nil".

If you have an element X and another list Xs then [X | Xs] is a list with X at the head of the list, followed by the tail of the list Xs. The [|]/2 constructor is pronounced "cons".

List terms are written as comma-separated terms between square brackets. [a, b, c] is short for [a | [b | [c | []]]].

You can write multiple comma-separated values before the vertical bar. [a, b, c | Tail] is short for [a | [b | [c | Tail]]].

The list module has a lot of useful operations on lists. We'll look at a few of them now, and consider how they are implemented.

length

The list.length/1 function return the lengths of a list; there is also a predicate version. They are declared as:

:- func length(list(T)) = int.

:- pred length(list(T), int).
:- mode length(in, out) is det.

We can implement the function straightforwardly:

length([]) = 0.
length([_X | Xs]) = 1 + length(Xs).

That works, but there is a small problem. Each call to + cannot proceed until both its arguments have been computed. To compute the right operand we make a recursive call to length/1. For each element in the list, the call stack will grow by one stack frame. For a very long input list, the program may run out of stack space and crash.

Fortunately, there is a simple solution. We can take advantage of the associativity of addition and write the function in an "accumulator style", where we evaluate the addition first and make the recursive call afterwards.

length(Xs) = Len :-
    length_acc(Xs, 0, Len).

:- pred length_acc(list(T), int, int).
:- mode length_acc(in, in, out) is det.

length_acc([], N, N).
length_acc([_X | Xs], N0, N) :-
    length_acc(Xs, N0 + 1, N).

The length_acc procedure is tail recursive. The recursive call (in the second clause) is the final action before returning to its caller. The return values of the recursive call (in this case, just N) will be exactly the values, in the right order, that the current call needs to return to its caller. Therefore, there is no need for the recursive call to return to the current call; it could return the values directly to the parent caller. In such a case, the compiler can perform tail call elimination: the tail call will be compiled to a jump that reuses the stack frame of the current call, and when it returns will return directly to the current call's caller.

The accumulator version of length/1 uses constant amount of stack space regardless of the length of the input list.

The Mercury compiler has an option --warn-non-tail-recursion to help detect non-tail recursive procedures.

append

Another useful operation on lists is to append them together. This can be done with the predicate append(Start, End, List), which is true if and only if List is the result of concatenating Start and End.

:- pred append(list(T), list(T), list(T)).
:- mode append(di, di, uo) is det.
:- mode append(in, in, out) is det.
:- mode append(in, in, in) is semidet.    % implied
:- mode append(in, out, in) is semidet.
:- mode append(out, out, in) is multi.

There are also function versions:

:- func append(list(T), list(T)) = list(T).
:- func list(T) ++ list(T) = list(T).

so you can write [1, 2, 3] ++ [4, 5, 6] or append([1, 2, 3], [4, 5, 6]).

Let's consider the implementation of the predicate. There are two cases. Appending an empty list to another list just produces that second list. To append a non-empty list [X | Xs] to another list Ys, we just need to "cons" X to the result of appending Xs and Ys.

append([], Ys, Ys).
append([X | Xs], Ys, [X | Zs]) :-
    append(Xs, Ys, Zs).

That's all there is to it.

What about all those modes?

:- mode append(in, in, out) is det.

This means append/3 can append two input lists to produce an output list, as expected.

:- mode append(in, in, in) is semidet.

Given three input lists, this mode can tell you if the third list is the result of appending the first two lists. If not, the predicate fails, so this mode is semidet. You could append two lists then compare it with the third, but this mode avoids building the temporary list.

:- mode append(in, out, in) is semidet.

For a prefix list Xs and a list Zs, this mode can produce a list Ys such that appending Xs and Ys produces Zs. If Xs is not a prefix of Zs then the predicate can fail, so this mode is semidet.

:- mode append(out, out, in) is multi.

For a list Zs, append/3 can generate the possible pairs of Xs and Ys such that appending Xs and Ys produces Zs. So append(Xs, Ys, [1, 2, 3]) has these solutions:

append([], [1, 2, 3], [1, 2, 3])
append([1], [2, 3], [1, 2, 3])
append([1, 2], [3], [1, 2, 3])
append([1, 2, 3], [], [1, 2, 3])

Since this mode must have at least one solution, and possibly more, its determinism is multi.

:- mode append(di, di, uo) is det.

In theory, you could provide two unique lists to append/3. Unique means there are no other references to either list, so append could destructively update them or reuse their memory cells to produce the result, which will also be unique. Unfortunately, it doesn't work; the result list is allocated afresh. It's also very hard to provide unique arguments of the input arguments in the first place.

Now, is the append(in, in, out) procedure tail recursive? If we rearrange the second clause a little:

append([], Ys, Ys).
append([X | Xs], Ys, Result) :-
    append(Xs, Ys, Zs),
    Result = [X | Zs].

It's clear that the recursive call is not is not in a tail position. After the recursive call returns Zs, we still need to produce the Result term. If we could construct most of the result term first, leaving a space for the recursive call to fill in, then we could move the recursive call into a tail position and thus enable tail call elimination. This is called the tail call modulo cons transformation (or last call modulo cons), enabled by the --optimise-constructor-last-call option.

The append(in, in, out) procedure will be compiled as if it were written something like this (if Mercury had C's & and * operators):

% not actual Mercury code

append([], Ys, Ys).
append([X | Xs], Ys, Result) :-
    % allocate cell for Result, but fill in only X
    Result = [X | Zs],
    % call helper predicate, pass address of the second field of Result cell
    append_lcmc(Xs, Ys, &Zs).

append_lcmc([], Ys, AddrOfZs) :-
    % place Ys at location given by AddrOfZs
    *AddrOfZs = Ys.
append_lcmc([X | Xs], Ys, AddrOfZs) :-
    % allocate cell but fill in only X
    Result = [X | Zs1],
    % place Result at location given by AddrOfZs
    *AddrOfZs = [X | Zs1],
    % tail call, pass address of the second field of Result cell
    append_lcmc(Xs, Ys, &Zs1).

The recursive call to append_lcmc is in a tail position and, after tail call elimination, this version of the append(in, in, out) procedure uses constant stack space.

member

member(Element, List) is true if and only if Element is a member of List. It can be called in the forward direction to test if Element occurs in List. Or it can be called in the reverse direction, to generate all the elements which occur in the list. The predicate is declared and defined thus:

:- pred member(T, list(T)).
:- mode member(in, in) is semidet.
:- mode member(out, in) is nondet.

member(X, [X | _]).
member(X, [_ | Xs]) :-
    member(X, Xs).

Perhaps the only thing you might get wrong in the implementation is if you wrote the first clause as:

member(X, [X]).

That clause would only succeed if the second argument is unified with a list of exactly one element X, not what was intended.

reverse

reverse(List, Reverse) is true if and only Reverse is a list containing the same elements as List but in the reverse order. The predicate and its functional counterpart are declared thus:

:- pred reverse(list(T), list(T)).
:- mode reverse(in, out) is det.

:- func reverse(list(T)) = list(T).

One way to implement reverse is: to reverse a non-empty list, remove the head of the list, reverse the rest of the list, then append the head element to the end of the reversed list.

reverse([], []).
reverse([X | Xs], Reverse) :-
    reverse(Xs, RevXs),
    append(RevXs, [X], Reverse).

It's very inefficient. The call to append(RevXs, [X], Rev) calls the procedure with mode append(in, in, out) which cannot destructively update its input arguments. The tail part of the appended list will be exactly the second input list, so the result can share memory with the second input list. However, the procedure must allocate a cons cell for every element in the first input list to construct the result list.

We can count how many memory allocations will be incurred:

  • reverse([], []) allocates 0 cons cells. [] is a constant term that does not require memory allocation.

  • reverse([3], Reverse) will

    • call reverse([], RevXs) - allocates 0 cons cells
    • create [3] - allocates 1 cons cell (remember [3] is shorthand for [3 | []])
    • call append([], [3], Reverse) - allocates 0 cons cells
    • for a total of 1 cons cell
  • reverse([2, 3], Reverse) will

    • call reverse([3], RevXs) - allocates 1 cons cell
    • create [2] - allocates 1 cons cell
    • call append([3], [2], Reverse) - allocates 1 cons cell
    • for a total of 3 cons cells
  • reverse([1, 2, 3], Reverse) will

    • call reverse([2, 3], RevXs) - allocates 3 cons cells
    • create [1] - allocates 1 cons cell
    • call append([3, 2], [1], Reverse) - allocates 2 cons cells
    • for a total of 6 cons cells

In general, this definition of reverse/2 will allocate (n+1)*n/2 memory cells for an input and output list of length n, a lot of unnecessary garbage.

A better implementation of reverse/2 builds up the output list as we go down the input list. This implementation allocates one cons cell for each element of the input list, and is also tail recursive.

reverse(List, Rev) :-
    reverse_acc(List, [], Rev).

:- pred reverse_acc(list(T), list(T), list(T)).
:- mode reverse_acc(in, in, out) is det.

reverse_acc([], L, L).
reverse_acc([X | Xs], L0, L) :-
    reverse_acc(Xs, [X | L0], L).

It turns out the helper predicate is quite useful in of itself, so the standard library exports it under the name list.reverse_prepend/3.

Different clauses for different modes

We previously declared a single mode for reverse/2. If B is the reverse of A then A is the reverse of B so, logically, we ought to be able to call reverse/2 with the second argument as input and get the reversed list out as the first argument. We can try to declare a second mode for reverse/2:

:- mode reverse(out, in) is det.

Alas, it doesn't work (for either definition of reverse). For the accumulator version you get this error message:

In clause for `reverse(out, in)':
  in argument 1 of call to predicate `rev.reverse_acc'/3:
  mode error: variable `List' has instantiatedness `free',
  expected instantiatedness was `ground'.

We're missing a mode for reverse_acc/3 where the third argument is input (ground). Try as we might, we can't get such a mode to compile. reverse_acc doesn't work in reverse.

In this case, reordering conjunctions is insufficient to get the reverse(out, in) mode that we want. In other cases, reordering conjunctions might produce a workable but inefficient procedure.

To handle both cases, Mercury allows you to annotate clauses with modes so that the clause is only used for that mode of the predicate or function. Then you can provide different implementations for different modes, such as for reverse/2:

reverse(List::in, Rev::out) :-
    reverse_acc(List, [], Rev).
reverse(Rev::out, List::in) :-
    reverse_acc(List, [], Rev).

There is one more piece of administravia. When you provide different clauses for the same predicate or function, you may have inadvertently (or deliberately!) written a predicate or function that does not have the same logical meaning for all modes. By default, the compiler will assume that's the case and treat the predicate or function as "impure" (we will discuss the purity system later).

Since we're sure reverse/2 means the same thing in both modes, we can go ahead and declare that to the compiler:

:- pragma promise_equivalent_clauses(reverse/2).

reverse(List::in, Rev::out) :-
    reverse_acc(List, [], Rev).
reverse(Rev::out, List::in) :-
    reverse_acc(List, [], Rev).

merge sort

The list module provides some predicates to sort lists. The basic one has the signature:

:- pred sort(list(T)::in, list(T)::out) is det.

The sorting algorithm of choice for linked lists is merge sort, and that's the algorithm used by the standard library. We'll implement it now.

Recall the merge sort algorithm: to sort a list, split the list into two sub-lists (of approximately equal lengths), recursively sort each sub-list, then merge the two sorted sub-lists. In code:

:- pred merge_sort(list(T)::in, list(T)::out) is det.

merge_sort(List, SortedList) :-
    length(List, Length),
    ( if Length > 1 then
        HalfLength = Length // 2,
        det_split_list(HalfLength, List, Front, Back),
        merge_sort(Front, SortedFront),
        merge_sort(Back, SortedBack),
        merge(SortedFront, SortedBack, SortedList)
    else
        SortedList = List
    ).

int.// performs truncating integer division (but so does int./).

det_split_list(N, List, Start, End) splits List into two lists Start and End, where Start has length N. You might wonder why it has a "det" prefix. There is a semidet predicate split_list/4 which fails if List has fewer than N elements. det_split_list/4 throws an exception instead of failing. We don't want to deal with a failure here (it can't happen anyway) so we use det_split_list/4.

Next, we make two recursive calls to sort the sub-lists. Notice merge_sort/2 always recomputes the length of its input, but we already know the length of the sub-lists. You may want to improve on this.

Once we have the two sorted sub-lists, we need to merge them together. This is the job of merge/3.

:- pred merge(list(T)::in, list(T)::in, list(T)::out) is det.

Let's get the base cases out of the way. Merging an empty list with another list just produces the other list:

merge([], [], []).
merge([A | As], [], [A | As]).
merge([], [B | Bs], [B | Bs]).

To merge two non-empty lists, we just need to compare the heads of both lists and place the lesser element at the start of the (recursively) merged list:

merge([A | As], [B | Bs], Cs) :-
    ( if A < B then
        merge(As, [B | Bs], Cs0),
        Cs = [A | Cs0]
    else
        merge([A | As], Bs, Cs0),
        Cs = [B | Cs0]
    ).

Oops! There is a int.< predicate and there is a float.< predicate, but we want merge to work for elements of any type. How can we compare two elements of some arbitrary type? Is there even an ordering on elements of arbitrary types?

The reference manual says:

For every Mercury type there exists a standard ordering; any two values of the same type can be compared under this ordering by using the builtin.compare/3 predicate. The ordering is total, meaning that the corresponding binary relations are reflexive, transitive and anti-symmetric.

The existence of this ordering makes it possible to implement generic data structures such as sets and maps, without needing to know the specifics of the ordering. Furthermore, different platforms often have their own natural orderings which are not necessarily consistent with each other. As such, the standard ordering for most types is not fully defined.

So if we don't mind the standard ordering, we can use builtin.compare/3 to compare elements to define the ordering of our sort.

The builtin module is implicitly imported into every Mercury module. It defines, amongst other things, the compare/3 predicate and the comparison_result type:

:- type comparison_result
    --->    (=)
    ;       (<)
    ;       (>).

:- pred compare(comparison_result, T, T).
:- mode compare(uo, in, in) is det.
% some more modes

Then merge can be written as:

merge([], [], []).
merge([A | As], [], [A | As]).
merge([], [B | Bs], [B | Bs]).
merge([A | As], [B | Bs], Cs) :-
    compare(R, A, B),
    (
        ( R = (<)
        ; R = (=)
        ),
        merge(As, [B | Bs], Cs0),
        Cs = [A | Cs0]
    ;
        R = (>),
        merge([A | As], Bs, Cs0),
        Cs = [B | Cs0]
    ).

The builtin module also defines some inequality predicates @<, @=<, @>, @>= defined in terms of compare/3, but some people may consider those symbols a bit cryptic.

Higher-order predicates

The list module also provides the usual higher-order predicates and functions that you'd expect: map, filter, and way too many folds/reduce. Since they make for good examples, we'll discuss them when we introduce higher-order programming in Mercury.

Sample code

Here is the code from this page. Some names have been prefixed to disambiguate them from names in the list module.

:- module lists.
:- interface.

:- import_module io.

:- pred main(io::di, io::uo) is det.

:- implementation.

:- import_module int.
:- import_module list.

% length

:- func my_length(list(T)) = int.

my_length(Xs) = Len :-
    length_acc(Xs, 0, Len).

:- pred length_acc(list(T), int, int).
:- mode length_acc(in, in, out) is det.

length_acc([], N, N).
length_acc([_X | Xs], N0, N) :-
    length_acc(Xs, N0 + 1, N).

% append

:- pred my_append(list(T), list(T), list(T)).
:- mode my_append(di, di, uo) is det.
:- mode my_append(in, in, out) is det.
:- mode my_append(in, in, in) is semidet.    % implied
:- mode my_append(in, out, in) is semidet.
:- mode my_append(out, out, in) is multi.

my_append([], Ys, Ys).
my_append([X | Xs], Ys, [X | Zs]) :-
    my_append(Xs, Ys, Zs).

% member

:- pred my_member(T, list(T)).
:- mode my_member(in, in) is semidet.
:- mode my_member(out, in) is nondet.

my_member(X, [X | _]).
my_member(X, [_ | Xs]) :-
    my_member(X, Xs).

% reverse

:- pred my_reverse(list(T), list(T)).
:- mode my_reverse(in, out) is det.
:- mode my_reverse(out, in) is det.

:- pragma promise_equivalent_clauses(my_reverse/2).

my_reverse(List::in, Rev::out) :-
    reverse_acc(List, [], Rev).
my_reverse(Rev::out, List::in) :-
    reverse_acc(List, [], Rev).

:- pred reverse_acc(list(T), list(T), list(T)).
:- mode reverse_acc(in, in, out) is det.

reverse_acc([], L, L).
reverse_acc([X | Xs], L0, L) :-
    reverse_acc(Xs, [X | L0], L).

% merge sort

:- pred merge_sort(list(T)::in, list(T)::out) is det.

merge_sort(List, SortedList) :-
    length(List, Length),
    ( if Length > 1 then
        HalfLength = Length // 2,
        det_split_list(HalfLength, List, Front, Back),
        merge_sort(Front, SortedFront),
        merge_sort(Back, SortedBack),
        merge(SortedFront, SortedBack, SortedList)
    else
        SortedList = List
    ).

:- pred my_merge(list(T)::in, list(T)::in, list(T)::out) is det.

my_merge([], [], []).
my_merge([A | As], [], [A | As]).
my_merge([], [B | Bs], [B | Bs]).
my_merge([A | As], [B | Bs], Cs) :-
    compare(R, A, B),
    (
        ( R = (<)
        ; R = (=)
        ),
        my_merge(As, [B | Bs], Cs0),
        Cs = [A | Cs0]
    ;
        R = (>),
        my_merge([A | As], Bs, Cs0),
        Cs = [B | Cs0]
    ).

% main

main(!IO) :-
    merge_sort([3, 1, 4, 1, 5, 9], SortedList),
    write(SortedList, !IO),
    nl(!IO).
Clone this wiki locally