purelisp: object types

As was noted in the previous post, objects in purelisp are partitioned into disjoint types.

types of objects

Some LISP object types have read syntax, which is a way to generate string representation of a given object. Built-in function (read) builds object given its string representation. Read syntax is given together with description of object types below.

Number. Represent arbitrary range integers (implemented in local.purelisp.eval.LInt by java.math.BigInteger). Number evaluates to itself, cannot be applied to anything, supports basic arithmetic operations.

read syntax for integers
all represent number 10

String. Evaluates to itself, cannot be applied. Implemented on top of java.lang.String.

Environment. Environment is used to evaluate objects of type symbol. Specifically, each environment contains bindings from symbols to objects. Binding can be thought of as a pair (s, o), where s is a symbol, and o is an object s is bound to. Environment, therefore, is a partial function from symbols to objects. New bindings can be installed and value symbol is bound to can be changed.

Environments are arranged into tree-like hierarchy: each environment (except the root of the tree) has parent environment, and if symbol binding is not found in the environment, search is repeated in the parent recursively. Environment is said to extend its parent. At the top of the tree is global environment that contains bindings for standard LISP symbols. Interactive interpreter maintains its own environment where user adds new or modifies existing symbol bindings. There is second environment hierarchy, not affected by LAMBDA evaluation, used to implement traditional LISP dynamic scoping, but it is not current used in the language. Dynamic parent environment can be accessed through (env-dynamic env) built-in function.

During computation there always is so-called current environment. Initially it is environment of interactive interpreter, later it can be replaced when applying LAMBDA functions (see below) or evaluating (eval-in env o) built-in function (see below). Evaluation is always performed in the current environment, and, therefore, there is no need to explicitly mention evaluation environment.

Environments have no read syntax.

Symbol. Symbol is a LISP object with unique name. Name is the only identity and state that symbol has. Symbols are used to point to other objects. It has to be stressed that while superficially similar to variables in other languages, symbols are quite different:

  • symbol has NO value attached to it. It can be used as a key to look up value in environment, but in different environments it can have different values.
  • symbol is first-class object itself: it can be stored in data-structures including environments (so that value of symbol can be another symbol).
Symbols cannot be applied and their evaluation was described above (see Environment).

Read syntax for a symbol is just its name. If unknown yet valid identifier is read, new symbol is created.

Symbols, integers, and strings are collectively known as atoms.

Cons cell. Cons cell is a pair of references to other LISP objects. First and seconds references are idiosyncratically known as CAR and CDR, and are accessed through (car c) and (cdr c) built-in functions respectively. Cons cells are used to build linked lists: CAR of the first cell in a list points to the first object in the list, CDR point to the second cell in the list, whose CAR points to the second object in the list, and CDR points to... List it terminated by the pointer to NIL (see below).

Obviously, much more general possibly cyclic data-structures, can be built from cons cells. We shall call NIL terminated lists described above well-formed lists. If list is terminated by reference to an object that is neither cons cell nor NIL, it will be called dotted list, otherwise data-structure rooted at the cons cell is called improper list

read syntax for well-formed lists
(1 2 3)
(1 (1 2 "foo") "bar")
read syntax for dotted lists
(1 2 . 3)
Note that dotted list notation can be used to represent any cons cell whose CAR and CDR have read syntax: (A . B) builds cons cell with A as CAR, and B as CDR.

Cons cell cannot be applied, and has peculiar and very important evaluation rule. To evaluate cons cell following is done in order:

  • if CDR of cell it not well-formed list, abort evaluation;
  • cell's CAR is evaluated, let's call resulting object F;
  • create copy of CDR list, i.e., create new well-formed list containing pointers to the same objects and in the same order as CDR of cell being evaluated. Call first cons cell of resulting list A0;
  • if F is built-in function or LAMBDA function, evaluate all objects in A0 and create a list of evaluation results called A1. Otherwise let A1 be A0;
  • apply F to A1.
This obviously accounts for neither concurrency nor re-entrancy issues (i.e., structure of argument list could change while evaluated, either as result of said evaluation, or due to some concurrent activity).

Basically, to evaluate well-formed list, its CAR is evaluated and applied to the remaining elements of list as arguments. Arguments are evaluated if CAR evaluates to function (either built-in or LAMBDA), and are not evaluated otherwise (which leaves us with CAR evaluating to special form).

NIL. NIL is a special object the denotes empty list. It is used to define well-formed lists above. NIL object is bound to symbol "nil" in the global environment. Cannot be applied, evaluates to itself. As a special exception, built-in functions (car) and (cdr) can be applied to NIL and return NIL in this case.

read syntax for NIL

LAMBDA. LAMBDA a is LISP object (special form) used to create lambda-functions. First, the notion of lambda-form is needed. Lambda form is a cons cell at which following well-formed list is rooted:

(lambda (P1 ... PM) E1 ... EN)
  • First element of lambda form can be anything that evaluates to LAMBDA (e.g., "lambda" symbol). Note that this makes notion of lambda-form dependent on the environment in which form is evaluated.
  • Second element of lambda-form is well-formed list called list of parameters. Elements of this list are called parameters. In traditional LISP, parameters have to be symbols. Purelisp has some extension (to be described later.) Parameters list may be empty.
  • Remaining elements of lambda-form are said to constitute its body. Body may be empty.
LAMBDA form examples
(lambda (x y) y)
(lambda (f) ((lambda (x) (f (x x))) (lambda (y) (f (y y)))))
(lambda nil 3)

Lambda-form is not treated in any special way by the interpreter. Instead its evaluation is done as for any cons cell (see above), and results in application of LAMBDA to the list ((P1 ... PM) E1 ... EN). As LAMBDA is special form, elements of this list are not evaluated. Result of this application is new LISP object: lambda-function.

Lambda-function. Lambda-function can be thought as a pair (ENV, ((P1 ... PM) E1 ... EN)), where env is an environment in which lambda-form was evaluated. Lambda-function evaluates to itself. Its importance stems from its application rule. To apply lambda-function (ENV, ((P1 ... PM) E1 ... EN)) to the list of arguments (A1 ... AK) do the following:

  • if K != M abort application;
  • create new environment ELOCAL extending ENV;
  • for each i: 1 <= i <= M, bind Pi to Ai in ELOCAL;
  • for each i: 1 <= i <= N, evaluate Ei in ELOCAL;
  • result of application is the result of EN evaluation, or NIL if body is empty.

Again, this doesn't account for pathological cases where structure of lambda-form recorded in lambda-function changes during evaluation.

As one can see lambda-functions allow programmer to construct objects with arbitrary application rules, and, therefore (through lists of the form (lambda-function A1 ... AM)), objects with arbitrary evaluation rules. Lambda-functions are main computation-structuring mechanism in LISP.

[in next installation: what makes purelisp different?]

1 comment: