This is a fun project to implement LISP 1.5 as described in "LISP 1.5 Programmer’s Manual". It’s not a complete LISP 1.5 environment emulator; instead, it is to get a glimpse of how it is constructed.
When LISP was in its infancy, it wasn’t wrapped with so many parentheses which would later become its most memorable characteristic. At least in front of people’s eyes.
An early LISP program looked like this:
member[a;l] = [null[l] → NIL; eq[a;car[l]] → l; T → member[a;cdr[l]]]
S-expressions were also used, but only to denote data.
member[C;(A B C D E)]
LISP programmers back then read and wrote programs in this form, on paper. Yes, program sources were meant to be read by humans. In order for computers to understand programs, one must have prepared a deck of cards with holes punched according to BCD character code. The character set didn’t have many punctuations and it was impossible to punch M-expressions directly. Instead, people translated M-expressions into S-expressions, which was already used to describe nested list structures. In other words, people fed the machine the internal representation of syntax tree directly.
Eventually, people got used to write programs in S-expresions directly, and M-expressions were faded into oblivion.
However, when we read papers of that era, we see M-expressions everywhere. Now that we can write M-expressions in a text file, let’s make the computer understand it directly!
Here’s a summary of M-expression sytnax:
-
Identifiers are in all lowercase. Literal symbols are in all uppercase. No
quote
in M-expression. Literal list can be written using parentheses, e.g.(A B (C D . E))
, without a quote. -
Function call is written as
fn[arg1;arg2;…]
-
Conditional is
[test1 -> expr1; test2 -> expr2;…]
. It works just likecond
. Unicode arrow character→
(U+2192) can be used in place of->
. -
Symbol
NIL
is used for list terminator. Hencecons[A;NIL]
yields(A)
. -
Literal function is written as
lambda[[arg;..];expr]
. This lambda form itself doesn’t have a value — it must be called with arguments to take effect. -
Local recursive function can be written as
label[name;lambda[[arg;…];expr]]
, where you can usename
in theexpr
to refer to itself. -
In toplevel, you can define a function by
name[arg;…] = expr
.
An M-expression can be translated into an S-expression as follows:
-
Literal data (symbols and lists) are embedded in the output by
(QUOTE <literal>)
. -
Function call
fn[arg1;arg2;…]
becomes (FN ARG1 ARG2 …)`. We didn’t have lower case characters on computers back then. -
Conditional
[test1 -> expr1; test2 -> expr2;…]
becomes(COND (TEST1 EXPR1) (TEST2 EXPR2) …)
. -
Literal function
lambda[[arg;..];expr]
becomes(LAMBDA (ARG …) EXPR)
. -
Local recursive function
label[name;lambda[[arg;…];expr]]
becomes(LABEL NAME (LAMBDA (ARG …) EXPR))
.
Module LISP1.5.mexpr
implements parsers. A procedure
parse-mexpr
takes a string or an input port, and parses one M-expression
and returns an S-expression.
gosh> ,u LISP1.5.mexpr gosh> (parse-mexpr "cons[(A . B);C]") (CONS (QUOTE (A . B)) (QUOTE C)) gosh> (parse-mexpr "lambda[[x];[eq[NIL;x]→T; T→F]]") (LAMBDA (X) (COND ((EQ (QUOTE NIL) X) (QUOTE T)) ((QUOTE T) (QUOTE F))))
See how liteals (upper-case names and lists) are QUOTE
-d
once parsed.
💡
|
If you try the above example on your REPL, make sure your terminal character encoding matches your Gauche setting (it’s utf-8 by default). |
Interestingly, there seemed no direct translation of definition
syntax name[arg;…] = expr
into an S-expression.
When a user translates a prorgam in M-expressions into S-expressions
to punch the cards,
she gathers all the definitions into a call of DEFINE
pseudo function
(see the note below).
For our purpose, we want the parser to yield a single S-expression
from one M-expression. So, we return ($= …)
form
as the result of parsing a defintion:
gosh> (parse-mexpr "null[x] = [eq[NIL;x]→T; T→F]]") ($= (NULL X) (COND ((EQ (QUOTE NIL) X) (QUOTE T)) ((QUOTE T) (QUOTE F))))
(The preceding $
of $=
indicates it is an internal or
implementation-dependent feature.)
The parse-mexpr
function only parses one M-expression.
If the input may contain more than one M-expression, use parse-mexprs
instead, which reads input up to EOF and returns an lseq of result
S-expressions.
We also add these extensions, for the convenience.
There’s no comment syntax defined in M-expressions. Since
it’s for humans to read, you could freely intermix code
and natural language descriptions. For our purpose,
we make a hash sign #
to the end of line is a comment.
(We avoid ;
, for it is used as a separator.)
In Appendix B of LISP 1.5 Programmer’s Manual, they use a pseudo extension of conditional expression for concise explanation, in which you can access the result of test expression from the expression in that branch. Such extension wasn’t formalized and the actual code is written in assembly language instead of M-expressions. But for our purpose it’ll be convnient to support such extension.
It is to allow a conditional expression to have the following clause:
test => fun
Here, fun
must be a LAMBDA form that takes one argument,
or an expression that yield a function.
First, test
is evaluated, and if it yiels a true value
(a value neither NIL
nor F
), the value is passed
to the function. It’s the same as Scheme’s cond
feature with =>
.
We’ll explain the actual use case and implementation of this extension when we get to the full toplevel environment support.
With Gauche’s reader directive feature, you can write source in M-expressions, as follows:
;; ;; Scheme comments ;; (use LISP1.5.mexpr) #!m-expr # M-expression function definitions function1[arg;...] = expression1 function2[arg;...] = expression2 ...
For our purpose, we want to treat M-expressions as our source code, and the parser returns a single S-expression as a result. So we introduce our own extension.
($TOPLEVEL <toplevel-form> ...) <toplevel-form> : ($= <expr> <expr> [<key>]) | <expr>
When read, the entire source is wrapped in $TOPLEVEL
form.
Inside it, each toplevel form becomes either
($= <expr> <expr>)
(in case of definition) or just an <expr>
in case of toplevel function call. This $TOPLEVEL
form is
merely our parser’s way to wrap the result, and its interpretation
depends on the caller of the parser; it doesn’t mean we’ll have
a special form called $TOPLEVEL
.
ℹ️
|
In the actual use case, all definitions in a program were gathered and translated into the following form to be punched: DEFINE (( (NAME (LAMBDA (ARG ...) EXPR)) (NAME (LAMBDA (ARG ...) EXPR)) ... )) This is actially a special syntax to execute a function call on toplevel.
It takes a form If you want to perform some calculation, you list the call of
the function after the DEFINE (( ... definitions .. (DOSOMETHING (LABMDA (ARG ...) EXPR)) )) DOSOMETHING (ARG ...) Examples are shown in p.15 and pp.48-51 of LISP1.5 Programmer’s Manual. |
The #!m-expr
directive translates those M-expressions into
a LISP1.5 DEFINE
form:
($TOPLEVEL ($= (FUNCTION1 ARG ...) EXPRESSION1) ($= (FUNCTION2 ARG ...) EXPRESSION2) ...)
Note that you have to have definitions of $TOPLEVEL
and other primitive
LISP1.5 forms before loading the source file; The LISP1.5.mexpr
module
only handles parsing.
We provide several implementations of those LISP1.5 primitives, which we’ll show you in the following chapters.
Section 1.6 of "LISP 1.5 Programmer’s Manual" is one of the pinnacles of the document. They show how to implement Lisp interpreter on top of Lisp systems. They call it "a Universal LISP function".
We write out their code in mx/eval.mx.
What’s interesting about it is that you only need a handful of
functions and syntaxes to run the interpreter. We define those
minimal set of primitives in LISP1/5/axioms.scm.
It provides the definition of the following primitives:
CAR
, CDR
, CONS
, ATOM
, EQ
, QUOTE
, and COND
, as well as
a definition of $TOPLEVEL
to handle toplevel forms.
To try the eval function, first use
the axioms module, then
load the eval.mx
file. Assuming you have
load path set to the top directory of LISP1.5 source,
you can say the following in the gosh REPL:
gosh> ,u LISP1.5.axioms gosh> ,l mx/eval.mx #t
Or, you can start gosh with loading necessary modules (this assumes you’re in the top directory of LISP1.5 source):
$ gosh -I. -u LISP1.5.axioms -l mx/eval.mx
On the gosh prompt, you can call EVAL
. The first argument
is the S-expression to evaluate, and the second argument
is the environment (assoc list of symbols and values):
gosh> (EVAL '(CONS (CAR (QUOTE (X . Y))) (QUOTE Z)) 'NIL) (X . Z)
Be aware of the difference of '
(quote
) and QUOTE
.
The former one is recognized by Gauche. The latter one is recognized by
EVAL
.
If you prefer, you can write M-expressions using
read-time constructor #,(m-expr "…")
:
gosh> (EVAL '#,(m-expr "cons[car[(X . Y)];Z]") 'NIL) (X . Z)
Following is a bit more convoluted example. It defines append
as a recursive funciton using LABEL
, and calls it with
two arguments, (A B C)
and (X Y Z)
:
gosh> (EVAL '#,(m-expr "label[append;lambda[[xs;r];\ [eq[xs;NIL] -> r;\ T -> cons[car[xs];append[cdr[xs];r]]]]]\ [(A B C);(X Y Z)]") 'NIL) (A B C X Y Z)
This interpreter only knows the minimal 7 primitives:
CAR
, CDR
, CONS
, ATOM
, EQ
, QUOTE
, and COND
.
To refer to anything other than that,
you have to pass them in the environment argument.
The following example reverses a list, using the
definition of NULL
, APPEND
and REVERSE
given to the environment:
gosh> (EVAL '#,(m-expr "reverse[(A B C D E F G)]") '((NULL . #,(m-expr "lambda[[x];[eq[x;NIL] -> T; T -> F]]")) (APPEND . #,(m-expr "lambda[[xs;r];\ [eq[xs;NIL] -> r;\ T -> cons[car[xs];append[cdr[xs];r]]]]")) (REVERSE . #,(m-expr "lambda[[xs];\ [null[xs] -> NIL;\ T -> append[reverse[cdr[xs]];cons[car[xs];NIL]]]]")) )) (G F D C B A)
ℹ️
|
We need to provide the function |
💡
|
When you refer to an identifier that’s neither one of the built-in primitive nor the one given in the environment, you’ll get an error like the following: *** ERROR: pair required, but got NIL Stack Trace: _______________________________________ 0 (car x) at "./LISP1/5/axioms.scm":9 1 (CAR X) [unknown location] 2 (CAAR A) [unknown location] 3 (EQUAL (CAAR A) X) [unknown location] 4 (ASSOC E A) [unknown location] 5 (EVAL FN A) [unknown location] ... The code searches the environment alist by |
Since the universal LISP function defined in eval.mx
understands
the primitives required to interpret functions in eval.mx
, you can use
our EVAL
to evaluate eval.mx
to run EVAL
on top of
EVAL
— now you’re running a metacircular interpreter!
You might have noticed though, that axioms.scm
provides $TOPLEVELS
,
which is missing in eval.mx
. In our context of discussing
metacircular interpreter, $TOPLEVELS
appears as a result of
parsing M-expression definitions, and should be understood
as a meta-language to direct the set-up, rather than an integrated
part of the language (one way to think of it is that if other primitives
are C built-ins then $TOPLEVELS
is #pragma
or Makefile
— they belong
to a different layer.)
Of course, it is more convenient to have an ability in the core language to add new toplevel definitions, and we’ll deal with it later. For now, let’s stick to the 7 primitives.
In order to run EVAL
inside EVAL
, we need to prepare the definitions
in eval.mx
as an environment alist passed to outer EVAL
.
Run the following command in the toplevel source directory:
$ gosh tools/mexpr-env.scm mx/eval.mx
It reads eval.mx
and prints the definitions in an alist. Copy the output,
then start gosh
again, read axioms
and load eval.mx
, and evaluate
the EVAL
expression, passing the copied alist as the environment
(don’t forget the quote before the alist!):
gosh> ,u LISP1.5.axioms gosh> ,l mx/eval.mx #t gosh> (EVAL '(EVAL (QUOTE (CAR (QUOTE (X . Y)))) (QUOTE NIL)) '...<<here, copy & paste the output of mexpr-env.scm>>) X
The result X
is the result of (CAR (QUOTE (X . Y)))
, computed
by the EVAL
function implemented in LISP1.5, not the underlying Gauche.
If cut&pasting the environment alist is too tedious, mexpr-env.scm
can
create a definition of an auxiliary function EVAL*
, which calls EVAL
with the environment that has all the definitions in the given source file.
Run mexpr-env.scm
with -e
option, and save the result in lisp/eval.lisp
:
$ gosh tools/mexpr-env.scm -e mx/eval.mx > lisp/eval.lisp
💡
|
Instead of manually executing |
We use suffix lisp
to indicate it is not a Scheme code (even though
Gauche can understand it after using LISP1.5.axioms
).
The created lisp/eval.lisp
looks as follows:
($TOPLEVELS ($= (EVAL* X) (EVAL X '...<<environment defined in eval.mx>>... )))
That is, it defines EVAL*
which takes one LISP1.5 expression and
evaluates it under the enviornment where all the definitions in eval.mx
is visible.
The created eval.lisp
can be loaded to gosh
after using LISP1.5.axioms
.
Together with mx/eval.mx
, you can run EVAL
on top of EVAL
:
$ gosh -I. -uLISP1.5.axioms -lmx/eval.mx -leval-star.lisp gosh> (EVAL* '#,(m-expr"eval[(CONS (QUOTE X) (QUOTE Y));NIL]")) (X . Y)
This time we used M-expression in the inner call. It’s the same
as writing '(EVAL (QUOTE (CONS (QUOTE X) (QUOTE Y))) (QUOTE NIL))
.
Let’s recap what’s happening. The outer EVAL
(via EVAL*
) is
executed by Gauche, using the initially loaded eval.mx
. The
inner EVAL
is interpreted by the outer EVAL
, using the
enviornment created by mexpr-env.scm
.
And the expression (CONS (QUOTE X) (QUOTE Y))
is interpreted by
the inner EVAL
:
+----------------------------+ | (CONS (QUOTE X) (QUOTE Y)) | +----------------------------+ | EVAL | ; inner EVAL +----------------------------+ | EVAL | ; outer EVAL +----------------------------+ | Gauche | +----------------------------+
If it is not obvious, try it with an altered environment.
For example, edit the eval.lisp
created above
to change the inner EVAL
recognizes KWOTE
instead of QUOTE
.
There’s only one place to change:
(EVAL LAMBDA (E A) (COND ((ATOM E) (CDR (ASSOC E A))) ((ATOM (CAR E)) (COND ((EQ (CAR E) (QUOTE KWOTE)) (CADR E)) ^^^^^ ((EQ (CAR E) (QUOTE COND)) (EVCON (CDR E) A)) ((QUOTE T) (APPLY (CAR E) (EVLIS (CDR E) A) A)))) ((QUOTE T) (APPLY (CAR E) (EVLIS (CDR E) A) A))))
(Leave other QUOTE
intact, for they are recognized by the outer EVAL
).
Now, try it:
(EVAL* '(EVAL (QUOTE (CONS (KWOTE X) (KWOTE Y))) (QUOTE NIL))) => (X . Y)
The two QUOTE
s are recognized by the outer EVAL
, and the two
KWOTE
s are recognized by the inner EVAL
. Furthermore,
the '
(quote
) is recognized by Gauche.
(If you know what we’ll talk about from the section title, you can skip this section. Yes, it’s just about that.)
One advantage of having a simple language with a concise interpreter is that we can tweak it easily.
In the universal EVAL
, a function is represented as a literal list
whose car is LAMBDA
. It is a powerful idea—now you can have
a function as a first-class citizen of the language, that you can
construct it, pass it to another function, and return it from another
funciton. However, it has a flaw.
Let’s try a failure case and see if we can fix it.
Consider MAPCAR
function, which takes a function and a list, and
returns a list of results of the function applied to each element of the
given list (that is, Scheme’s map
function):
mapcar[fn;x] = [null[x] -> NIL; T -> cons[fn[car[x]];mapcar[fn;cdr[x]]]]
It is in mx/mapcar.mx. You can’t load it directly
into Gauche, however. Treating a list starting with LAMBDA
as
a function is a feature of EVAL
, not Gauche.
We have to make EVAL
understand the above definition.
We can use the same technique we used in the metacircular interpreter — that is, translate the definition of MAPCAR
above into an enviroment
alist. We also need the definition of NULL
, so let’s combine
eval.mx
together with mapcar.mx
. It can be done with the following
command line:
$ gosh tools/mexpr-env.scm -e mx/eval.mx mx/mapcar.mx > lisp/mapcar.lisp
Alternatively, run ./configure
then make
in the toplevel source directory.
Once you have lisp/mapcar.lisp
, you can load it (after mx/eval.mx
)
and you can call MAPCAR
inside EVAL*
:
$ gosh -I. -uLISP1.5.axioms gosh> ,l mx/eval.mx #t gosh> ,l lisp/mapcar.lisp #t gosh> (EVAL* '(MAPCAR (QUOTE (LAMBDA (X) (CONS X (QUOTE Y)))) (QUOTE (A B C)))) ((A . Y) (B . Y) (C . Y)) gosh> (EVAL* '#,(m-expr "mapcar[(LAMBDA (X) (CONS X (QUOTE Y)));(A B C)]")) ((A . Y) (B . Y) (C . Y))
So far, so good.
Now, Let’s try nesting MAPCAR
. We’ll do equivalent to the following
Scheme code:
(map (lambda (x) (map (lambda (y) (cons x y)) '(p q r))) '(a b c)) => (((a . p) (a . q) (a . r)) ((b . p) (b . q) (b . r)) ((c . p) (c . q) (c . r)))
Here’s LISP1.5 version:
(EVAL* '(MAPCAR (QUOTE (LAMBDA (X) (MAPCAR (QUOTE (LAMBDA (Y) (CONS X Y))) (QUOTE (P Q R))))) (QUOTE (A B C)))) => ((((P Q R) . P) ((Q R) . Q) ((R) . R)) (((P Q R) . P) ((Q R) . Q) ((R) . R)) (((P Q R) . P) ((Q R) . Q) ((R) . R)))
Oops, what happened? Let’s examine the details.
Outer MAPCAR
receives two actual parameters, (LAMBDA (X) …)
and (A B C)
(QUOTE
s are stripped when arguments are evaluated
by evlis
before calling the function). They are bound to the
local parameters, FN
and X
, respectively. In other words,
the body of MAPCAR
:
[null[x] -> NIL; T -> cons[fn[car[x]];mapcar[fn;cdr[x]]]]
is evaluated with the following environment:
((FN . (LAMBDA (X) (MAPCAR (QUOTE (LAMBDA (Y) (CONS X Y))) (QUOTE (P Q R))))) (X . (A B C)))
Since X
is not NIL
, evaluation goes to cons[…]
branch.
The first argument is fn[car[x]]
, so first car[x]
is evaluated
and yields A
, fn
evaluated to the outer LAMBDA
form
and we call it with A
. The body of inner LAMBDA
form, which
is the inner MAPCAR
call, is evaluated with the following environment
(Keep in mind that the new local bindings are inserted in front of
outer environment):
((X . A) (FN . (LAMBDA (X) (MAPCAR (QUOTE (LAMBDA (Y) (CONS X Y))) (QUOTE (P Q R))))) (X . (A B C)))
Inner MAPCAR
gets (LAMBDA (Y) (CONS X Y))
and (P Q R)
as two
actual parameters, which are bound to MAPCAR
's formal paramter
FN
and X
again, and the environment under which innter MAPCAR
's
body is evaluated looks like this:
((FN . (LAMBDA (Y) (CONS X Y))) (X . (P Q R)) (X . A) (FN . (LAMBDA (X) (MAPCAR (QUOTE (LAMBDA (Y) (CONS X Y))) (QUOTE (P Q R))))) (X . (A B C)))
Finally, innter LAMBDA
is called — first, P
as the
actual parameter, which is bound to Y
. Hence the body
of the inner LAMBDA
, which is (CONS X Y)
, is evaluated
under the following environment:
((Y . P) (FN . (LAMBDA (Y) (CONS X Y))) (X . (P Q R)) (1) (X . A) (2) (FN . (LAMBDA (X) (MAPCAR (QUOTE (LAMBDA (Y) (CONS X Y))) (QUOTE (P Q R))))) (X . (A B C))) (3)
-
Argument for the inner
MAPCAR
-
Argument for the outer
LAMBDA
-
Argument for the outer
MAPCAR
Now it is clear why it didn’t work. When we write the
initial nested MAPCAR
form, we expect that X
in the
innermost expression (CONS X Y)
refer to the formal parameter of the
outer LAMBDA
. But it is shadowed by the formal parameter of the
MAPCAR
.
This is a well-known problem, and in lambda calculus it is avoided
by renaming the parameter names to avoid conflict. In our case,
if we rename the formal parameter of inner LAMBDA
to something
different from the formal parameter of MAPCAR
, it works as expected:
(EVAL* '(MAPCAR (QUOTE (LAMBDA (Z) (1) (MAPCAR (QUOTE (LAMBDA (Y) (CONS Z Y))) (QUOTE (P Q R))))) (QUOTE (A B C)))) => (((A . P) (A . Q) (A . R)) ((B . P) (B . Q) (B . R)) ((C . P) (C . Q) (C . R)))
-
We use
Z
to avoid confclit withMAPCAR
'sX
.
However, we can’t possibly avoid all potential conflict manually, and renaming all formal parameters programatically to unique ones can be costly.
LISP1.5 employed another way to solve this problem. Instead of passing
LAMBDA
form quoted, it introduced another form, called FUNCTION
.
The rule is that whenever you pass a function as an argument,
you wrap it with FUNCTION
instead of QUOTE
. With this rule,
our call of nested MAPCAR
would look like this:
(EVAL* '(MAPCAR (FUNCTION (LAMBDA (X) (MAPCAR (FUNCTION (LAMBDA (Y) (CONS X Y))) (QUOTE (P Q R))))) (QUOTE (A B C))))
Now we modify our universal LISP function to deal with FUNCTION
.
We only need to change two lines. First, make EVAL
understand
(FUNCTION <fn>)
form. Whenver it sees the form, it just
returns a list (FUNARG <fn> <env>)
, where <env>
is the evaluation
enviornment:
eval[e;a] = [atom[e] -> cdr[assoc[e;a]]; atom[car[e]] -> [eq[car[e];QUOTE] -> cadr[e]; eq[car[e];FUNCTION] -> cons[FUNARG;cons[cadr[e];cons[a;NIL]]]; (1) eq[car[e];COND] -> evcon[cdr[e];a]; T -> apply[car[e];evlis[cdr[e];a];a]]; T -> apply[car[e];evlis[cdr[e];a];a]]
-
If we see
(FUNCTION <fn>)
form, wrap the function and the current environment inFUNARG
form, as(FUNARG <fn> <env>)
.
Then, in APPLY
, we call <fn>
with the rememberd <env>
instead of
the passed environment:
apply[fn;x;a] = [atom[fn] -> [eq[fn;CAR] -> caar[x]; eq[fn;CDR] -> cdar[x]; eq[fn;CONS] -> cons[car[x];cadr[x]]; eq[fn;ATOM] -> atom[car[x]]; eq[fn;EQ] -> eq[car[x];cadr[x]]; T -> apply[eval[fn;a];x;a]]; eq[car[fn];FUNARG] -> apply[cadr[fn];x;caddr[fn]]; (1) eq[car[fn];LAMBDA] -> eval[caddr[fn];pairlis[cadr[fn];x;a]]; eq[car[fn];LABEL] -> apply[caddr[fn];x;cons[cons[cadr[fn];caddr[fn]];a]]]
-
Apply the wrapped function in the rememberd environment
The changed definitions are in mx/funarg.mx. You can load it and see it addresses the issue (which has been called FUNARG problem).
$ gosh -I. -u LISP1.5.axioms -l mx/funarg.mx gosh> ,l lisp/mapcar.lisp #t gosh> (EVAL* '(MAPCAR (FUNCTION (LAMBDA (X) (MAPCAR (FUNCTION (LAMBDA (Y) (CONS X Y))) (QUOTE (P Q R))))) (QUOTE (A B C)))) (((A . P) (A . Q) (A . R)) ((B . P) (B . Q) (B . R)) ((C . P) (C . Q) (C . R)))
ℹ️
|
Did you notice that you actually did’t need eval[e;a] = [atom[e] -> cdr[assoc[e;a]]; atom[car[e]] -> [eq[car[e];QUOTE] -> cadr[e]; eq[car[e];LAMBDA] -> cons[FUNARG;cons[e;cons[a;NIL]]]; eq[car[e];COND] -> evcon[cdr[e];a]; T -> apply[car[e];evlis[cdr[e];a];a]]; T -> apply[car[e];evlis[cdr[e];a];a]] The updated definition is in mx/funarg-lambda.mx. Using it,
calling $ gosh -I. -u LISP1.5.axioms -l mx/funarg-lambda.mx gosh> ,l lisp/mapcar.lisp #t gosh> (EVAL* '(MAPCAR (LAMBDA (X) (MAPCAR (LAMBDA (Y) (CONS X Y)) (QUOTE (P Q R)))) (QUOTE (A B C)))) (((A . P) (A . Q) (A . R)) ((B . P) (B . Q) (B . R)) ((C . P) (C . Q) (C . R))) This idea was realized by Sussman and Steele in 1975, as a dialect Scheme. The first paper of Scheme stated it at the beginning: SCHEME is essentially a full-funarg LISP. LAMBDA expressions need not be QUOTEd, FUNCTIONed, or *FUNCTIONed when passed as arguments or returned as values; they will evaluate to closures themselves. |
So far, our EVAL
requires any bindings to be provided
via the environment argument. Preprocessing the source with mexpr-env.scm
was a remedy, but it’s still troublesome. So our next step is to
add a toplevel environment, that keeps global bindings of symbols.
The easiest way is to keep a global table, and when we search
a variable binding via ASSOC
(in the first branch of EVAL
),
we also look up the table when we didn’t find any local bindings.
However, LISP1.5 took a bit different approach. Since its symbol had a property list, or plist, which could hold arbitrary key-value pairs, so I suspect it was natural to store the global value of the symbol in its plist. In fact, even the name of a symbol was merely one of its properties. In LISP1.5, a symbol was just another type of list where the car of its head was marked with a special value (-1).
ℹ️
|
A property list (plist) associates keys to values, much like
an associative list (alist),
but its structure alternates keys and values. For example, if
key ;; alist ((A . APPLE) (B . BANANA)) ;; plist (A APPLE B BANANA) The number of cons cells used are the same. We’re not sure why LISP1.5
creators used plist for symbol properties, while they used
alist for environment in |
In our minimal infrastructure (LISP1/5/axioms.scm) we just used Gauche symbols for LISP symbols. It might be interesting, though, to reproduce what LISP1.5 did — using a list to implement symbols!
That is, from now on, our LISP symbol is a pair whose car is
a special marker. We use Gauche symbol ATOM
. From LISP world,
a LISP symbol is an unbreakable unit (hence it is called atom), so
the marker is never be visible. Under the hood, in Gauche level,
we can break an atom to access its internal structure. It is as
if LISP world deals with chemical reactions and Gauche world deals
with nuclear reactions.
In LISP symbols, its name is stored as a value of the property
PNAME
. Since the property list is scanned by LISP function,
we have to use LISP symbols as the property key. For the name itself,
we use a Scheme string; in real LISP1.5, the name is stored
in a special way and treated specially (there wasn’t a string type).
Thus, LISP symbol PNAME
has the following structure in Gauche:
(define *PNAME* '#0=(ATOM #0# "PNAME"))
The #0=
notation is a Scheme way to write a circular structure.
The symbol PNAME
has a propoerty list, in which the key PNAME
is associated to the name "PNAME"
. Note that they LISP symbol
PNAME
itself doesn’t have a global value.
The global value of symbols is stored as a propery value with
the key APVAL
. So we need the LISP symbol APVAL
, which looks
like the following in Gauche. APVAL
itself doesn’t have a global
value either:
(define *APVAL* `(ATOM ,*PNAME* "APVAL"))
Once we have PNAME
and APVAL
, we can define NIL
, whose name
is "NIL"
and value is itself. We can’t use #0=
notation this time,
since we have to construct the list using values of *PNAME\*
etc.
(define *NIL* (rlet1 nil (list 'ATOM *PNAME* "NIL" *APVAL*)
(set! (cddddr nil) (list nil))))
Here’s how *NIL*
looks like in Gauche world.
1=(ATOM #1 "PNAME")
is LISP symbol PNAME
, and
(ATOM 1 "APVAL")
is LISP symbol APVAL
. Remember we’re looking
at the internal of atoms — from LISP world, this is just a symbol
NIL
.
gosh> *NIL* #0=(ATOM #1=(ATOM #1# "PNAME") "NIL" (ATOM #1# "APVAL") #0#)
We can define several symbols in this way. See LISP1/5/runtime.scm for all the predefined symbols.
Let’s start building infrastructure. Our LISP world only have symbols
and cons cells so far (we’ll add numbers later). We can define $atom?
and $cons?
as follows (The $
indicates it deals with LISP objects):
(define ($atom? obj) (and (pair? obj) (eq? (car obj) 'ATOM)))
(define ($cons? obj) (and (pair? obj) (not (eq? (car obj) 'ATOM))))
Then we can define $lisp->scheme
, which converts LISP data structure
into Scheme data structure, handy for debugging.
We map NIL
inside the structure into Scheme empty list, so that
list structure can be printed naturally (instead of having . NIL)
at the end.) We also convert non-LISP object into a string #[…]
.
(define ($lisp->scheme obj)
(define (rec obj)
(cond [(eq? obj *NIL*) '()]
[($atom? obj) (string->symbol (cadr (member *PNAME* (cdr obj))))]
[(pair? obj) (cons (rec (car obj)) (rec (cdr obj)))]
[else (format "#[~s]" obj)]))
(if (eq? obj *NIL*)
'NIL
(rec obj)))
It’s also handy to have $scheme->lisp
, which converts Scheme
structure into LISP structure. One important point: We want to keep
symbol’s eq
-ness, that is, LISP symbols with the same name
can be compared with eq
. So we keep a hashtable to map Scheme
symbol to LISP symbols.
(define *obtable* (hash-table-r7 eq-comparator
'NIL *NIL*
'PNAME *PNAME*
'APVAL *APVAL*))
(define ($scheme->lisp obj)
(cond [(null? obj) *NIL*]
[(symbol? obj) (or (hash-table-get *obtable* obj #f)
(rlet1 s (list 'ATOM *PNAME* (symbol->string obj))
(hash-table-put! *obtable* obj s)))]
[(pair? obj) (cons ($scheme->lisp (car obj))
($scheme->lisp (cdr obj)))]
[else (errorf "Cannot convert ~s to LISP" obj)]))
Let’s try them. Converting Scheme (A B C D E)
into LISP results
somewhat scary structure, but converting it back shows it’s nothing
to be afraid of:
gosh> ($scheme->lisp '(A B C D E)) ((ATOM #0=(ATOM #0# "PNAME") "A") (ATOM #0# "B") (ATOM #0# "C") (ATOM #0# "D") (ATOM #0# "E") . #1=(ATOM #0# "NIL" (ATOM #0# "APVAL") #1#)) gosh> ($lisp->scheme *1) (A B C D E)
Not all global values are stored in APVAL
property. LISP1.5 uses
several different keys, depending on the type of the value. APVAL
is used when a symbol is used as a variable, and other keys are
used when a symbol is used in the function position of the function call.
Key | Value |
---|---|
|
The value is a LISP object. |
|
The value is a LISP-defined function (LAMBDA or FUNARG form). The arguments are evaluated before passed to it. |
|
The value is a LISP-defined function (LAMBDA or FUNARG form). The arguments are not evaluated, and passed as a single list. |
|
The value is a native function (written in assembly in the acutal LISP1.5, written in Gauche in our case). The arguments are evaluated before passed it. |
|
The value is a native function (written in assembly in the acutal LISP1.5, written in Gauche in our case). The arguments are not evaluated, and passed as a single list. |
It is worth to mention that EXPR form receives fixed-number of arguments. If you want to write a function in LISP that takes variable number of arguments, you have to make it FEXPR, and evaluate the given list of arguments by yourself.
ℹ️
|
Lisp dialects can be categorized to either Lisp-1 or Lisp-2. They are not versions, but about namespaces. Lisp-1 unifies function and variable namespaces, so in the function call syntax, the function name is looked up the same way as variable look-up. Scheme is Lisp-1. Lisp-2 have separate namespaces for functions and variables.
You can use the argument named This design of having different keys for function call and
variable makes LISP1.5 a Lisp-2. However, interestingly,
to call a function stored in a variable you can place the variable
in the function position, without |
The EVAL
procedure that uses symbol’s property lists are
shown in Appending B of "LISP1.5 Programmer’s Manual". However,
it contains some pseudo code which were actually implemented
in the assembly. Although we can rewrite the code in pure
LISP1.5, it would be pretty verbose; instead, we enhance our M-expressions
a bit so that the pseudo code in Appendix B can be written naturally
in our implementation.
Specifically, we allow this clause in the cond
form:
test => proc
The test
expression is evaluated, and if it yields neither NIL
nor F
,
the procedure proc
is called with the result of test
as the sole
argument. It is the same as Scheme’s cond
. You can also
use Unicode character ⇒
(U+21d2) in place of ⇒
.
We also allow λ
(U+03bb) in place of lambda
, for conciseness
and similarity to the listing in "LISP1.5 Programmer’s Manual".
Here’s a Scheme’s filter-map
written in our M-expression:
filtermap[pred;lis] = [null[lis] → NIL; pred[car[lis]] ⇒ λ[[x];cons[x;filtermap[pred;cdr[lis]]]]; T → filtermap[pred;cdr[lis]]]
It works as follows:
filtermap[atom;(A (B) C (D) E)] ⇒ (A C E)
As we did in our first version with axioms.scm and eval.mx, we want to keep Scheme code minimal and write the rest of the system in LISP itself. We also want to write so-called standard libraries in LISP, too.
When you write language X in the language X itself, you have to be epecially careful which world you’re dealing with. Before proceeding, let’s recap the layered structure we saw in the previous sections.
-
In
axioms.scm
, we defined minimal operators in Scheme to run LISP 1.5. It is the bottom world, or the Basement. We can see all the mechanics that runs the LISP system from the Basenment. -
Then we loaded
eval.mx
, which is written in LISP 1.5 itself. At this time though, the functions ineval.mx
, such asNULL
,ASSOC
orEVAL
, are actually Gauche variables, bound to Gauche procedures; TheDEFINE
macro inaxioms.scm
translates LISP 1.5 definitions into Gauche definitons. The functions ineval.mx
doesn’t know about Gauche, even though they themselves are running as Gauche procedures. We’re in the Ground Floor. -
Then we processed
eval.mx
withmexpr-env.scm
to produceeval.lisp
. It hasEVAL*
, which is still Ground Floor function. It takes a LISP1.5 expression and evaluates it. The expression passed toEVAL*
lives in the First Floor, above the Ground Floor. As we’ve seen, the habitants in the First Floor knows nothing about the Ground Floor or the Basement, except the bindings passed as the environment.
Now, in our revised runtime, difference between the Basement and the Ground Floor becomes wider: A LISP symbol is an unbreakable atom in the Ground Floor, but it’s just a pair in the Basement.
We do need a few conduits between the floors, so that the upper floor can access the functionality of lower floor:
-
SUBR
andFSUBR
are functions implemented in the basement. In the original LISP1.5, they were implemented in assembly language. In our case, they are written in Gauche. In order to invoke those functions from the ground floor, we need an additional primitive,CALLSUBR
, which takes the instance ofSUBR
orFSUBR
and arguments, to call it. -
Atoms in LISP world now has structure---somewhat like that an atom in original sense was the minimal unbreakable building block of the universe, then mankind found it has electrons and nucleus in it. We do want to treat LISP atoms as unbreakable entity for most of the time, except when we want to access its property list.
-
The previous version of
EVAL
doesn’t have error handling mechanism. For usability, we need some minimal mechanism to signal an error. Theoretically, a sophisticated error handling mechanism can be implemented fully in LISP layer---e.g. we can define a special "error" value, and whenever something yields an error value, we let all the expressions that got the error value just returns it, so that the error value propagates to the final result. However, it is convenient to stop and examine the evaluator at the moment when error occurs, and such a mechanism needs access to the basement. For now, we provideERROR
procedure as another primitive. -
The previous version of
EVAL
didn’t have literal fuction (lambda form) inEVAL
code itself---the lambda form is dealt byEVAL
, but the Basement that executesEVAL
doesn’t need to know that. Now that, for our convenience, we use several lambda forms in the definition ofEVAL
, so the Basement needs to deal with them, too.
The revised runtime Basement routines---replacing axioms.scm
is
in LISP1/5/runtime.scm
:
(define (CAR x) (if (null? (car x)) *NIL* (car x))) (define (CDR x) (if (null? (cdr x)) *NIL* (cdr x))) (define (CONS x y) (cons x (if (eq? y *NIL*) '() y))) (define (ATOM x) (if ($atom? x) *T* *F*)) (define (EQ x y) (if (eq? x y) *T* *F*)) (define (CALLSUBR subr args) (apply subr args)) (define (ERROR obj) (error "Meta*LISP Error:" ($lisp->scheme obj))) (define T *T*) (define F *NIL*) (define NIL *NIL*) (define-syntax LAMBDA lambda) (define-syntax QUOTE (syntax-rules () [(_ x) ($scheme->lisp 'x)])) (define-syntax COND (syntax-rules (=>) [(_) *NIL*] [(_ (test expr) . more) (let ([t test]) (if (or (eq? t *NIL*) (eq? t *F*)) (COND . more) expr))] [(_ (test => proc) . more) ; extension (let ([t test]) (if (or (eq? t *NIL*) (eq? t *F*)) (COND . more) (proc t)))])) (define-syntax $TOPLEVELS (syntax-rules ($=) [(_ ($= (name args ...) expr) ...) (begin (define name (let ([lsym ($scheme->lisp 'name)] [lfn ($scheme->lisp '(LAMBDA (args ...) expr))]) (set! (cdr lsym) `(,($scheme->lisp 'EXPR) ,lfn ,@(cdr lsym))) (lambda (args ...) expr))) ...)]))
When we read the definition of new EVAL
in Gauche, the literals
are passed as Gauche’s liteals. We treat it as LISP1.5 literals,
hence our QUOTE
form in the basement translates Gauche literals
to LISP1.5 literals by $scheme→lisp
.
The COND
is also enhanced to handle our extension.
The $TOPLEVELS
now not only defines Gauche procedures,
but also registers the defined form to the LISP1.5 symbol’s EXPR
property. That is, if we load this M-expression on top of the
Basement:
caar[x] = car[car[x]]
It defines (1) a Gauche procedure CAAR
, and (2) it registers EXPR
property of LISP1.5 symbol CAAR
with S-expression
(LAMBDA (X) (CAR (CAR X)))
.
The runtime.scm
also defines SUBR
property of
several symbols that are used as primitives:
(define-syntax defglobal (syntax-rules () [(_ var key val) (let1 lsym ($scheme->lisp 'var) (set! (cdr lsym) `(,($scheme->lisp key) ,val ,@(cdr lsym))))])) (defglobal CAR 'SUBR CAR) (defglobal CDR 'SUBR CDR) (defglobal CONS 'SUBR CONS) (defglobal ATOM 'SUBR ATOM) (defglobal EQ 'SUBR EQ) (defglobal ERROR 'SUBR ERROR) (defglobal CALLSUBR 'SUBR CALLSUBR)
Now we write EVAL
that understands the global environment.
First we need a couple of auxiliary procedures.
Previously, we used assoc
to search local bindings in the
local environment. We didn’t consider an error there, so if you
use undefined variable it yielded Gauche error. The following sassoc
takes an extra thunk (a procedure with no arguments) and invokes it
when the item x
isn’t found in an associative list a
:
sassoc[x;a;thunk] = [null[a] → thunk[]; equal[caar[a];x] → car[a]; T → sassoc[x;cdr[a];thunk]]
Another function is to get the property value with the key y
in
the symbol x
. As we explained above, we access symbol’s property list
just using car
and cdr
:
get[x;y] = [null[x] → NIL; eq[car[x];y] → cadr[x]; T → get[cdr[x];y]]
Note: Since a property list alternates keys and values, it must loop
with cddr
---skipping a value. However, LISP1.5 Programmer’s Manual
lists the above code, just looping with cdr
. I don’t know if it
was just an overlook or they just reused exisitng get
procedure.
One consequence of that choice is that we can’t simply store the
symbol’s global value as APVAL’s value, since if that value happens
to be a symbol such as `EXPR
, it’ll confuse get
procedure.
So the `APVAL’s value is wrapped with a list.
Now, we can write revised APPLY
and EVAL
that maintaines
the global environment in symbols' propetry lists:
apply[fn;args;a] = [null[fn] → NIL; atom[fn] → [get[fn;EXPR] ⇒ λ[[e];apply[e;args;a]]; get[fn;SUBR] ⇒ λ[[s];callsubr[s;args]]; T → apply[cdr[sassoc[fn;a;λ[[];error[A2]]]];args;a]]; eq[car[fn];LABEL] → apply[caddr[fn];args;cons[cons[cadr[fn];caddr[fn]];a]]; eq[car[fn];FUNARG] → apply[cadr[fn];args;caddr[fn]]; eq[car[fn];LAMBDA] → eval[caddr[fn];pairlis[cadr[fn];args;a]]; T → apply[eval[fn;a];args;a]] eval[form;a] = [null[form] → NIL; atom[form] → [get[form;APVAL] ⇒ λ[[v];car[v]]; T → cdr[sassoc[form;a;λ[[];error[A8]]]]]; eq[car[form];QUOTE] → cadr[form]; eq[car[form];FUNCTION] → cons[FUNARG;cons[cadr[form];cons[a;NIL]]]; eq[car[form];COND] → evcon[cdr[form];a]; atom[car[form]] → [get[car[form];EXPR] ⇒ λ[[e];apply[e;evlis[cdr[form];a];a]]; get[car[form];FEXPR] ⇒ λ[[f];apply[f;cons[cdr[form];cons[a;NIL]];a]]; get[car[form];SUBR] ⇒ λ[[s];callsubr[s;evlis[cdr[form];a]]]; get[car[form];FSUBR] ⇒ λ[[f];callsubr[f;cons[cdr[form];cons[a;NIL]]]]; T → eval[cons[cdr[sassoc[car[form];a;λ[[];error[A9]]]]; cdr[form]]; a]]; T → apply[car[form];evlis[cdr[form];a];a]]
This is pretty close to what is shown in the Appendix B of "LISP1.5 Programmer’s Manual".
Note that We have a couple of calls of ERROR
. The argument is an
error code.
-
error[A2]
: A symbol is used as a procedure but doesn’t have a binding. -
error[A8]
: A symbol is used as a variable but doesn’t have a binding. -
error[A9]
: A symbol is used as a procedure or syntax but doesn’t have a binding.
Another curious point. Check the atom
branch of eval
:
atom[form] → [get[form;APVAL] ⇒ λ[[v];car[v]]; T → cdr[sassoc[form;a;λ[[];error[A8]]]]];
It first accesses symbol’s APVAL
property, then
searches the environment. That is, symbol’s global value takes
precedence from local values.
You can run this version of EVAL
by loading
mx/genv.mx
into Gauche.
Note that EVAL
now accepts LISP1.5 data, and returns
LISP1.5 data. You have to convert them to Gauche’s data back and forth.
Here, we just evaluate the global variable F
, whose value is
NIL
:
gosh> ($lisp->scheme (EVAL ($scheme->lisp 'F) ($scheme->lisp '()))) NIL
The following code defines REVERSE
function locally and calls it.
Note that null
and append
are already stored as EXPR
property
of those symbols when we loaded genv.mx
, so we don’t need
to provide them in the environment:
gosh> ($lisp->scheme (EVAL ($scheme->lisp '#,(m-expr "reverse[(A B C D E F G)]")) ($scheme->lisp '((REVERSE . #,(m-expr "lambda[[xs];\ [null[xs] -> NIL;\ T -> append[reverse[cdr[xs]];cons[car[xs];NIL]]]]")))))) (G F E D C B A)