Berlin Brown and Software Development: July 2008

Wednesday, July 30, 2008

Neophyte Scheme/Lisp Interpreter in Erlang

Overview

This is a follow-up to my earlier entry on Lisp in Java; the entry described implementing a Lisp interpreter in Java. This blog posts covers implementing a similar approach using Erlang. In essence, a port of the JSceme interpreter to Erlang. So you can see compare and contrast the differences in Erlang and Java code. Like I always do, I am going to first introduce Erlang with simple examples and then go over the simple Lisp implementation. All of the source is provided at the end of the entry.

What is Erlang?

Erlang is a general purpose programming language that is great for writing concurrent applications and components. That is accomplished by some of the slimmed-down language features including immutable code, simple message passing primitives, and lightweight processes. It was created by Ericsson in 1987 and later open sourced in 1998. Erlang language syntax is based on Prolog.

Why Erlang?
I don't know why the software community can't accept another programming language. Actually, it doesn't matter if X language is accepted or not. The developers will continue to work on it and their users will continue to enjoy the applications that they write. In 2008 and into the future, I can't see a new programming language gaining the kind of adoption and popularity that Java or .NET have received. But, in the next ten to twenty years, a majority of developers will continue to work with one or more of the following general purpose programming languages. With that being said, Erlang has a lot of features that other more popular languages like Java also have. This includes but is not limited to: Erlang OTP, Mnesia database, XML parsing libraries, capable web frameworks. Several killer applications have become mainstream including CouchDB, Yahoo's SimpleDB, Tsung, RabbitMQ. On top of that, Joe Armstrong released a very thorough book on Erlang that includes an overview of many practical applications.

Python/Jython - Easy to learn, dynamic programming language with some advanced constructs.
Ruby/JRuby - Easy to learn, built-in regex, dynamic programming language with some advanced constructs.
Erlang - Concurrent programming, functional programming, fast of multicore machines, rich enterprise libraries (including Erlang OTP, Mnesia, etc), mature, strong community, immutable variables.
Haskell - Strong, static typing, functional programming, advanced functional programming features, GHC compiler compiles to native code.
Scala -Strong, static typing, runs on the Java virtual machine, functional programming, advanced functional programming features.
Clojure - Lisp dialect, advanced programming features, runs on the Java virtual machine
Smalltalk/Seaside - Objected oriented development, strong community
Common Lisp (SBCL, Clisp) - Simple syntax, fast, advanced functional programming features, dynamic typing
Scheme - Simple syntax, fast, advanced functional programming features, dynamic typing
Factor - Concatenative langauge, influenced by Forth the stack language, Joy and others, strong dynamic typing, simple syntax, many advanced libraries.
OCaml
C - Fast - mostly compiles to native code, imperative programming, static weak, popular
C++ - Fast - mostly compiles to native code, imperative programming, object oriented
C# (and other .NET languages) - Built by Microsoft
F# - Built by Microsoft

Immutable Variables and Single Assignment

If you are used to imperative programming languages; you are normally familiar with the ability to change the state of your variables. In Erlang, this is now allowed.

Getting used to Erlang

Notice how I use the "term 'getting' used to" as opposed to saying something like, "Erlang has the following problems". After working with the language, I can say the the syntax, architecture and libraries appear well planned out but the beginning Erlang programmer will need to get used to the language and understand the reasons for single assignment variables. Here are some of the quirks of the language especially if you are more familiar with an imperative style of programming.

The BEAM Interpreter

BEAM/Erl is Erlang's virtual machine. It interprets beam compiled bytecode. Normally, an Erlang developer will compile erl file source code with erlc and erlc generates the BEAM bytecode. Like the Java class file; there can be one beam file for one erl module source.

In the output below, I compile the Erlang source file and then run the module.


%% compile1.erl
-module(compile1).
-export([main/0]).
main() ->
    io:format("Hello World~n").


$ erlc compile1.erl

$ ls
compile1.beam  compile1.erl
$ erl -noshell -pa . -s compile1 main -s erlang halt
Hello World

Or you can compile and run the 'compile1' module within the Erlang shell like so.


Eshell V5.6  (abort with ^G)
1> c(compile1).
{ok,compile1}
2> compile1:main().
Hello World
ok
3>q().  %%% I am done, I quit

Using a Makefile
As you see, it really isn't that difficult to build your application. If you have built Java bytecode class files using some Makefile system, you could probably use similar techniques to work with your Erlang application; you could just use erlc over javac.


Create the 'src' and 'ebin' binary directories.
$ mkdir -p src ebin test

And then create a ./src/compile1.erl hello world file and copy
the erlang code into this file.

I have been using some variant of this Makefile over the last couple of weeks. It is pretty basic, it detects changes to the Erlang source and at the 'make' command will output the beam bytecode files to the ebin directory.


## Makefile
TOPDIR := $(shell pwd)
DATE   = $(shell date +%Y%m%d)

APPLICATION = compile1 

ESRC  = ./src
EBIN  = ./ebin

ERLC  = erlc
ERL   = erl

OPT   = -W

SRC         = $(wildcard $(ESRC)/*.erl)
TARGET      = $(addsuffix .beam, $(basename \
                 $(addprefix $(EBIN)/, $(notdir $(SRC)))))

all: clean ${APPLICATION}

LIB_TARGET_OBJS = $(EBIN)/compile1.beam

${APPLICATION}: $(TARGET) $(LIB_TARGET_OBJS) 

ebin/%.beam: $(ESRC)/%.erl
    $(ERLC) $(OPT) -o ebin $<

run: $(APPLICATION)
    $(ERL) -noshell -pa $(TOPDIR)/ebin/ -s $(APPLICATION) main -s erlang halt

clean:
    -rm -vf $(TARGET)
    -rm -vf erl_crash.dump
    -rm -vf ebin/*.beam

To compile our compile1 we need to just enter make at the command prompt as shown in the following output.


$ make
rm -vf ./ebin/compile1.beam
rm -vf erl_crash.dump
rm -vf ebin/*.beam
erlc -W -o ebin src/compile1.erl

And to run the application enter make and use the run target.


$ make run
erl -noshell -pa /ebin/ -s compile1 main -s erlang halt

Note: the argument compile1 is associated with the module/beam file name and main is the name of the exported function.

Basic Examples in Erlang

The earlier section described how to compile and run Erlang code. Now, lets look at some actual examples. Generally, most of the code can be run within the Erlang REPL shell but can also be compiled through source file. Functions on the other hand are defined within module source and must be exported before they can be called from outside the module. But there isn't much difference between functions defined as funs compared to the ones defined in your module source. This is shown in the output below, notice the error after my attempt at declaring a function.


Eshell V5.6.3  (abort with ^G)
1> MyFunc = fun() -> io:format("Hello World~n") end. 
#Fun
2> MyFunc().
Hello World
ok
3> myfunc() -> io:format("Ooops") end.
* 1: syntax error before: '->'
3>

There are a couple of things to note with this output and I will go over these concepts repeatedly in my blog post. MyFunc is an anonymous function and the body includes printing Hello World to the console. We assign the 'fun' to the variable MyFunc and then invoke the fun.


SingleAssignmentMyFunc = fun() -> ...BODY SPACE_TOKEN END_TOKEN PERIOD_TOKEN

Every character and even the case of those characters is important. A misplaced period or semi-colon or extraneous character may generate a misleading error message. Damien Katz in his "What Sucks about Erlang" article, he makes note of the Erlang terminators, "In Algol based languages the statement terminators are the same everywhere (usually semi-colon, newline or both)", in Erlang "you can have different character ending each line: bar(); bar(), bar()" depending on the context of your code. If you have worked with Java, C or C#, it may take a bit of experience to get used to the character endings. Here is some INVALID and VALID Erlang expressions. Take note of the error messages.

Erlang Session: testing invalid variable assignments


Eshell V5.6.3  (abort with ^G)
1> %% Store the value 3 into this variable
1> %% Actually this is invalid code, because ATOMs start with lowercase text
1> x = 3.
** exception error: no match of right hand side value 3
2> %% Let create a variable correctly
2> X = 3.
3
3> %% All variables must start with a uppercase letter.
3>                                                 
3> %% What if I leave off the period character
3> X = 3
3> Y = 3
3> .
* 5: syntax error before: Y
3> q().  %% What if I want to exit this session?
ok
4>  %% What if I want to exit this session?
bbrown@houston:~$

Erlang Session: testing invalid single assignments


Eshell V5.6.3  (abort with ^G)
1> %% What happens if I try to break the single assignment rule?
1> X = 3.
3
2> X = 4.
** exception error: no match of right hand side value 4
3> %% Not that friendly an error message is it? 
3> q().
ok
4> bbrown@houston:~$

Erlang Session: Case statement and various end of expression characters.


Eshell V5.6.3  (abort with ^G)
1> %% Even though we haven't introduced if or case statements
1> %% I will want to show you the differences in the character
1> %% terminators.
1> 
1> %% We can use case or if statements to match against a set of patterns 
1> %% In theory, similar to 'switch' statements in Java/C
1> 
1> Expression1 = 3.
3
2> case Expression1 of
2>   4 -> io:format("I don't think 3 == 4~n"),
2>        io:format("No, you are right, 3 != 4~n");
2>   3 -> io:format("What 3 == 3~n"),
2>        io:format("YES!!!~n"),
2>        io:format("Note: no COMMA, SEMICOLON,only space 'end' keyword~n")    2> end.
What 3 == 3
YES!!!
Note: no COMMA, SEMICOLON,only space 'end' keyword
ok
3>

Erlang Module (mytest.erl): What happens in error?
Now, what does Erlang return if we aren't used to the language yet and add the wrong character ending? And it is better to see this error in a Erlang module.


%% Filename: mytest.erl
-module(mytest).
-export([mymain/0]).

mymain() ->
  %% Invalid code, comma should be a semicolon
  Expression = 3,
  case Expression of
    4 -> io:format("Oops, there shouldn't be a COMMA here~n"),
    3 -> io:format("Doesn't matter, error in last line~n")
  end.

bbrown@houston:~/tmp/tmp5$ erlc mytest.erl
./mytest.erl:10: syntax error before: '->'
./mytest.erl:3: function mymain/0 undefined

Erlang Session: Valid if example
And for the sake of completeness, here is a valid use of the if expression. Here is a snippet from Joe Armstrong's "Programming Erlang".


Eshell V5.6.3  (abort with ^G)
1> Expression = 3.
3
2> ForGuard1 = 4.
4
3> ForGuard2 = 3.
3
4> if
4>   ForGuard1 == Expression ->
4>     io:format("Nope, not going to happen");
4>   ForGuard2 == Expression ->
4>     io:format("Yes, I think this is it")
4> end.
Yes, I think this is it
5> 

From the Erlang book:
if
  Guard1 ->
    Expr_seq1;
  Guard2 ->
    Expr_seq2;
  ...
  ...
end

"Guards are constructs that we can use to increase the power of pattern matching. Using guards, we can perform simple tests and comparisons on the variables in a pattern" [1] -- Erlang Book.

Another distinct facet of Erlang is pattern matching, "In Erlang, = denotes a pattern matching expression. Lhs = Rhs really means this: evaluate the right side (Rhs), and then match the result against the pattern on the left side (Lhs)". And you can bind variables to the left side.

Here are some valid operations.


Eshell V5.6.3  (abort with ^G)
1> %% Define a tuple of numbers and bind to the variable X
1> X = { 1, 2, 3 }.
{1,2,3}
2> X
2> . 
{1,2,3}
3> {A, B, _ } = X.
{1,2,3}
4> io:format("Print the value of A: ==>~p", [A]).
Print the value of A: ==>1ok
5> io:format("Print the value of A: ==>~p~n", [A]).
Print the value of A: ==>1
ok
6>

List Operations and Strings


Eshell V5.6.3  (abort with ^G)
1> A = [ 1, 2, 3].
[1,2,3]
2> length(A).
3
3> [Head|Tail] = [ 1, 2, 3, 4 ].
[1,2,3,4]
4> Head
4> .
1
5> Tail.
[2,3,4]
6> %% To access some element by index, use lists module, nth function.
6> lists:nth(2, A).   
2
7> %% Or to reverse a list.
7> lists:reverse(A).
[3,2,1]
8> %% Print the list.
8> io:format("==>~p~n", [A]).
==>[1,2,3]
ok
10> io:format("==>~p~p~n", [A, Tail]).  
==>[1,2,3][2,3,4]
ok
11>

Basic Data Structures, Records and Immutable


Eshell V5.6.3  (abort with ^G)
1> UpdateDict = fun(D) ->
1>    io:format("Can I update this dict? pass by value~n"),
1>    D2 = dict:store("key", 999, D),
1>    io:format("Note: set updated dict to D2, Dict ==>~n~p~n", [D2])
1> end.
#Fun
2> %% Create an empty dict
2> D = dict:new().
{dict,0,16,16,8,80,48,
      {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
      {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}
3> UpdateDict(D).
Can I update this dict? pass by value
Note: set updated dict to D2, Dict ==>
{dict,1,16,16,8,80,48,
      {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
      {{[],[],[],[],[],[],[],[],[],[],[],[],
        [["key"|999]],
        [],[],[]}}}
ok
4>


Eshell V5.6.3  (abort with ^G)
1> UpdateDict = fun(D) ->
1>    io:format("Update the dict, but NOT return the updated value~n"),
1>    D2 = dict:store("key", 999, D),
1>    some_atom_not_dict
1> end.
#Fun
2> %% This time, create a new dict; "create/update" another
2> %% dict with the key=999, but do NOT return the update/newly created
2> %% dict.  Essentially, we are going to get a code logic error.
2> MyNewDict = dict:new().
{dict,0,16,16,8,80,48,
      {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
      {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}
3> UpdateDict(MyNewDict).
Update the dict, but NOT return the updated value
some_atom_not_dict
4> %% Is the dict updated?  pass by value like with Java.
4> io:format("==>~p~n", [MyNewDict]).
==>{dict,0,16,16,8,80,48,
         {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
         {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}
ok
5> %% Hmm, no 'key'?
5> q(). 
ok
6> bbrown@houston:~$

So here is a variation of the anonymous function in the code snippet above. The function sets the key to 999 and the dict store function returns a newly created dict with those values. It is sort of way to update the dict by creating an exact copy of the current one. Because Erlang "is a functional programming language that has a non-mutable state" [1], you can't simply create a variable even a hash table and expect the values within the structure to change.


Eshell V5.6.3  (abort with ^G)
1>  UpdateDict = fun(D) -> 
1>    D2 = dict:store("key", 999, D) 
1> end.
#Fun
2> D = dict:new().
{dict,0,16,16,8,80,48,
      {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
      {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}
3> D2X = UpdateDict(D). 
{dict,1,16,16,8,80,48,
      {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
      {{[],[],[],[],[],[],[],[],[],[],[],[],
        [["key"|999]],
        [],[],[]}}}

With Java, the state of the hash table is changed from calling the method 'test'. This would not be the case with Erlang.


import java.util.HashMap;
import java.util.Map;
public class A {
    public static void test(final Map map) {
        map.put("key", "999");
    }
    public static void main(String [] args) {
        Map mymap = new HashMap();
        // In this instance, key is still kept.
        test(mymap);
        System.out.println("==>" + mymap.get("key"));
    }
}


%% Simple test of using record
%% Also note, ever character and CASE is important
-module(for_record).
-export([main/0]).

%% Simple record definition
-record(the_record, {
          xkey = 1.0,
          ykey = 3.0,
          buffer = "dogs and cats",
          status = not_done_yet,
          got_update
          }).

simple_record_test(OrigRecord) ->
    XVal = OrigRecord#the_record.xkey,
    XBuf = OrigRecord#the_record.buffer,
    %% Update a value
    R2 = OrigRecord#the_record{got_update="MyData"},
    io:format("New Record =>~p~n", [R2]),
    %% Return an updated record
    OrigRecord#the_record{buffer="hahah"}.
    
with_pattern_match(#the_record{buffer=Buf} = Rcrd) ->
    io:format("==> Buffer: ~p~n", [Buf]),
    io:format("==> Record: ~p~n", [Rcrd]),
    Rcrd.

common_idiom_tupl(#the_record{buffer=Buf} = Rcrd) ->
    %% A common Erlang pattern, return a TUPLE with the
    %% status of the function call.  Not an ATOM is
    %% set to the first value of the tuple.
    { ok_atom, Rcrd }.
    
main() ->
    FirstRec = #the_record{status=new_rec},
    io:format("First Record =>~p~n", [FirstRec]),
    %% Notice that we are passing the updated records to the
    %% next function call.
    R2 = simple_record_test(FirstRec),
    R3 = with_pattern_match(R2),
    R4WithTuple = common_idiom_tupl(R3),
    io:format("Record AND Tuple =>~p~n", [R4WithTuple]).


$ erl -s for_record main
Erlang (BEAM) emulator version 5.6.3 [source] [async-threads:0] [hipe] [kernel-poll:false]

First Record =>{the_record,1.0,3.0,"dogs and cats",new_rec,undefined}
New Record =>{the_record,1.0,3.0,"dogs and cats",new_rec,"MyData"}
==> Buffer: "hahah"
==> Record: {the_record,1.0,3.0,"hahah",new_rec,undefined}
Record AND Tuple =>{ok_atom,{the_record,1.0,3.0,"hahah",new_rec,undefined}}
Eshell V5.6.3  (abort with ^G)
1> q().
ok

Overview of Mini JScheme

In my last entry about my Java Scheme implementation, "Peter Norvig created JScheme, a Lisp Scheme dialect written in Java. It was created in 1998 and it has since been forked, extended and used quite extensively in other modern projects for the web and elsewhere. I mention JScheme because the code I am about present is all based on the JScheme interpreter." Here is a description of the Erlang version of that implementation. A lot function names are the same as well as the logic used. Note: The approach used may not represent the Erlang best practices but there is a large correlation between the two versions. And so far, we have covered most of the Erlang idioms that will be used in our Mini Scheme implementation. If you are more familiar with imperative languages, you should be able to go back and forth between the Java and Erlang code.

Input Reader Parser Functionality in Erlang

The input reader module (mel_input_reader.erl) contains functions for parsing the scheme data. In the Java implementation, we read one character at a time and depending on the character,


while (isWhitespace((char) ch)) {
    ch = inputReader.read();
}
switch (ch) {
  case -1:
    return EOF;
  case TOK_LEFT_PAREN:
    return "(";
  case TOK_RIGHT_PAREN:
    return ")";             
  default:
    buff.setLength(0);              
    int c = ch;
    buildGenericToken(ch);
    if (c == '.' || c == '+' || c == '-' 
        || (c >= '0' && c <= '9')) {
        try {
            // Number type is currently in the buffer queue
            return new Double(buff.toString());
        } catch (NumberFormatException e) {
            ;
        }
    } // End of If
    return buff.toString().toLowerCase();
} // End of the Switch

I hope the Java scheme input parser code is fairly straightforward. We read and collect the next character from the file input stream and then determine if the character is the left of right character token or an alpha numeric character which we define as just another symbol to be used for further processing. I tried to keep some of that same logic for the Erlang implementation. There are two major differences.

We implemented our own simple "input stream reader" using Erlang's list and string operations. This approach is a quick hack that is only ten lines of code but in the long run is not going to be very robust. Our stream is simply a collection of characters that we "pop" off the stack buffer.

The "reader_info" record is used to keep the "state" of our "input stream" as well as other important variables. With the Java implementation, this was easily done in the Input Reader, Pair and BuiltInFunction classes with a collection of various class fields. With the Erlang implementation, the reader_info record is all that is needed for keeping track of the state of the buffer, the scheme character stack, and the token stack.

For example, shown here is the reader_info and the stream implementation:


%% Where status = init|ok|done
-record(reader_info, {status=init, last_ch=-1,
                      reader_strm="", token_buffer=[], 
                      char_stack=[], token_stack=[]}).


%%----------------------------------------------------------------------
%% Function: stream_read/2
%% Purpose:  Read the next character/value in the stream.  In this case, 
%%           a stream is just a "list" of values and we pop the next
%%           value in the list until the list is empty.
%% Note:   Reader only returns a 'Reader Info' record
%% Args:   Reader is an instance of the record "Reader" record.
%% Returns: The status of the read, the Top/Head character
%%          updated in the Reader record.
%%          { Reader, status = ok } (Reader read successfully)
%%          { Reader, status = done } (Reader reached EOF)
%%----------------------------------------------------------------------
stream_read(Reader) ->
    Str = Reader#reader_info.reader_strm,
    { Done, H, T } = stream_read(Str, ok),
                                                % Where tail can be an empty list of []                      
                                                % Update the record info with; 
                                                % set the status, last char
    { Reader#reader_info{status=Done, last_ch=H,
                         reader_strm = T} }.
stream_read([], _) -> { done, -1, [] };
stream_read(-1, _) -> { done, -1, [] };
stream_read(Str, _) ->
    [ Head | Tail ] = Str,
    { ok, Head, Tail }.

Once again, it is not very robust but I could write comparable stream reader code in Erlang as I could in Java. And here is the Erlang implementation of the "nextToken" Java code shown above. The Erlang syntax is considerably different from Java but a lot of the logic is nearly the same including the order of operations.


next_token_build(Reader) ->
    ChStk = Reader#reader_info.char_stack,
                                                % Task 1: Pop the token 
                                                % and character stacks
    { Ch, RUpt } = 
        case (not is_empty(ChStk)) of
            true ->
                                                % Pop char off char stack
                { _, Top, Rest } = pop_stack(ChStk),    
                                                % Return top of stack, 
                                                % and current reader,
                                                % with the updated stack
                { Top, Reader#reader_info{char_stack=Rest} };
            false ->
                                                % The state of the reader has changed, 
                                                % mutated char reader stream.
                { Rx } = stream_read(Reader),
                { peek_char_reader(Rx), Rx }
        end,
                                                % Task 2: Check for and 
                                                % ignore whitespace
    { Ch2, R2Upt } = case is_whitespace(Ch) of
                         true -> 
                             { R2x } = while_whitespace(RUpt),
                             { peek_char_reader(R2x), R2x };
                         false -> { Ch, RUpt }
                     end,
                                                % Build exit points
    if
        Ch2 == -1   -> { "#!EOF", R2Upt };
        Ch2 == $(  -> { "(",     R2Upt };
        Ch2 == $)  -> { ")",     R2Upt };
        true ->
                                                % Reset the atom buffer
                                                % Set the buffer for the NEW reader
            BufRst = new_stack(),
            RResetBufUpt = R2Upt#reader_info{token_buffer=BufRst},
                                                % Buffer is updated by build token
            { R3Uptx } = build_generic_token(RResetBufUpt, Ch2),
            case convert_tok_float(R3Uptx, Ch2) of
                { ok, FloatVal } -> 
                                                % Return <float, Reader>
                    { FloatVal, R3Uptx };
                { _, _ } ->
                                                % Error case for convert attempt, 
                                                % act like normal
                                                % Return <string, Reader>
                    { peek_buf_reader(R3Uptx), R3Uptx }
            end
    end.

Erlang already has the Lisp operations we need?

It may not be evident at first but Erlang has a many similarities with many Lisp dialects. Differences are merely in the appearance of the syntax. For example here is a session to test various List construction routines.



Simple SBCL Common Lisp operations:
* (let ((A (list 1 2 3 4)))
    (print A))
(1 2 3 4) 
* (let ((A (list 1 2 3 4)))
   (print (first A)))
1 
* (let ((A (list 1 2 3 4))) (print (cdr A)))
(2 3 4) 
;; Build a list with CONS, constructor function.
* (let ((A (cons 1 (cons 2 (cons 3 '()))))) (print A))
(1 2 3) 
(1 2 3)

Erlang:
Eshell V5.6.3  (abort with ^G)
1> A = [1, 2, 3, 4]. 
[1,2,3,4]
2> [HeadFirst|Tail] = A.
[1,2,3,4]
3> io:format("==>~p~n", [HeadFirst]).
==>1
ok
4> io:format("==>~p~n", [Tail]).     
==>[2,3,4]
ok
5> io:format("==>~p~n", [hd(A)]). 
==>1
ok
6> io:format("==>~p~n", [tl(A)]).
==>[2,3,4]
ok
7> Cons = fun(First, Rest) -> [First|Rest].
* 1: syntax error before: '.'
7> Cons = fun(First, Rest) -> [First|Rest] end.
#Fun
8> Cons(1, Cons(2, Cons(3, []))).
[1,2,3]
9>

Various Linked List/Lisp Operations in Java

Because Java does not even closely resemble a functional programming language, we had to retrofit a lot the Lisp operations using Java linked list routines where the Pair class is associated with the linked list node and most of the methods are for operations on the list.


/** Build a pair from two components. * */
public Pair(Object first, Object rest) {
 this.first = first;
 this.rest = rest;
}
public static Pair cons(Object a, Object b) {
 return new Pair(a, b);
}
public static Object rest(Object x) {
 return (x instanceof Pair) ? ((Pair) x).getRest() : null;
}
public static int length(Object x) {
 int len = 0;
 while (x instanceof Pair) {
  len++;
  x = ((Pair)x).getRest();
 }
 return len;
}


// Java Pair Test Cases:
Pair pair = new Pair("+", new Pair("1", new Pair("2", null)));

Pair pair = SchemeUtil.cons("1", "2");

Pair pair = SchemeUtil.cons("1", SchemeUtil.cons("2", null));

For the Erlang implementation, here are the Lisp functions used in the Scheme interpreter:


cons(Fst, Rest) -> [Fst|Rest].

%% Like Common Lisp first; car of a Pair, or null for anything else.
first([Fst|Rest]) -> Fst;
first(_) -> [].

%% Like Common Lisp rest; car of a Pair, or null for anything else.
%% "A cons cell is composed of two pointers; the car operation 
%% extracts the first pointer, and the cdr operation extracts the second."
%% 
%% "Thus, the expression (car (cons x y)) evaluates to x, 
%% and (cdr (cons x y)) evaluates to y."
rest([Fst|Rest]) -> Rest;
rest(_) -> [].

second(X) ->
    first(rest(X)).

Implementing Eval

The scheme parser and the expression evaluation meets in this read_tail function. Once the token parser determines the beginning of a list denoted by the left parenthesis, the next set of symbols will normally consist of a function name and the list of arguments. The call to cons does the building on the linked list pair data. Once all tokens have been read, the linked list pair node is returned to the evaluation routines.

Figure: singly-linked list containing two values: the value of the current node and a link to the next node


read_tail(#reader_info{token_stack=TokStk}=Reader) ->
    { Tok, R } = next_token(Reader),
    if 
        Tok == "#!EOF" ->
            io:format("ERROR: readTail() - EOF during read.~n"),
            { [], R };
        Tok == ")" ->
            { [], R };
        true ->
            T2Stk = push_stack(TokStk, Tok),
            R2x = R#reader_info{token_stack=T2Stk},
                                                % R[2]'x' is the current Reader info
                                                % Build a pair out of the token from read
                                                % and read_tail.
            { PairRead, R3x} = read(R2x),
            { PairTail, R4x } = read_tail(R3x),
            { cons(PairRead, PairTail), R4x }
    end.

Implementing Basic Numeric Operations and Built In Functions

A simple scheme lisp implementation like ours is simply a wrapper over the host language. Parse the input code, restructure the code as a linked list of symbols and then iterate over the sequence and perform various operations when they are found. Our Erlang based Scheme only supports numeric scheme operations like ADD, MULTIPLY and DIVIDE. First, we detect if an ADD or other numeric operation is being invoked and then supply the arguments to the function. In the Java implementation, the function name (+, -, /) are stored within a hash table and are associated with the Java code for actually applying the numeric operations against the supplied arguments.

The Erlang version uses a similar approach. Instead of a HashMap, we use Erlang's dict.


public class Environment {
    
    private Map mapDataVars = new HashMap();
    
    public Environment() {
    }

    /** 
     * Find the value of a symbol, in this environment or a parent. 
     */
    public Object lookup(String symbol) {
        Object o = this.mapDataVars.get(symbol);
        if (o != null) {
            return o;
        } else {
            return SchemeUtil.error("Unbound variable: [" + symbol + "]");
        }
    }

    /** Add a new variable,value pair to this environment. */
    public Object define(Object var, Object val) {
        
        this.mapDataVars.put(var, val);         
        if (val instanceof Procedure
                && ((Procedure) val).getName().equals(Procedure.DEFAULT_NAME)) {
            ((Procedure) val).setName(var.toString());
        }
        return var;
    }   
    public Environment defineBuiltInProc(String name, int id, int minArgs) {
        define(name, new BuiltInFunction(id, minArgs, minArgs));
        return this;
    }   
}

Here is the code and data structures for defining variables and functions on the Erlang side.


new_environment() ->
    dict:new().
lookup(Env, Key, Default) ->
    case dict:find(Key, Env) of
        {ok, Value} -> Value;
        error       -> Default
    end.
define(Env, Key, Value) ->
    dict:store(Key, Value, Env).
define_procedure(Env, Name, Id, MinArgs, MaxArgs) ->
    define(Env, Name, new_procedure(Id, Name, MinArgs, MaxArgs)).
define_procedure(Env, Name, Id, MinArgs) ->
    define(Env, Name, new_procedure(Id, Name, MinArgs, MinArgs)).

NumCompute is a relevant method and leads us into our Lisp implementation in Erlang. Essentially, the numCompute method is a traversal method to sum all of the Nodes in the list chain. This is kind of important, Lisp is a List processing language. Our traversal method is an operation on a singly linked list.


num_compute_while(Args, Op, Result) ->
                                                % Perform the operation against the linked list
    case (get_type(Args) == pair) of
        false -> 
                                                % Exit with the result
            Result;
        true ->
            X = first(Args),
                                                % Traverse      
            MoreArgs = rest(Args),
            NewRes = case Op of
                         '+' -> X + Result;
                         '-' -> X - Result;
                         '*' -> X * Result;
                         '/' -> X / Result;
                         _ ->
                             io:format("Internal Error: unrecognized op: [~p]~n",[Op]),
                             -1
                                                % End of Op Case
                     end,
                                                % Continue with loop
                                                % and return some result
            num_compute_while(MoreArgs, Op, NewRes)
    end. % End Pair Type Check

Running and testing the Interpreter

I want to make it clear, the scheme interpreter operates on long strings. The scheme code is contained within the string. So, we can use Erlang's one call function to read an entire file into binary data and then covert the data into a string. Then pass the string to the scheme interpreter.


eval_file(Filename) ->
    {ok, Bin} = file:read_file(Filename),
    io:format("Binary Data [ ~p ]~n", [Bin]),
    EvalRes = eval_all_expr(binary_to_list(Bin)),
    io:format("Pair => [~p]", [EvalRes]),
    [].


Parsing the simple test.scm file.  This file contains a set of 
simple scheme numeric operations.
( + 2 2 1 )
( + 2 2 2 )
( + 2 2 3 )
...
...
...

Outputs:

trace: eval[t0] - eval type=> [float]/{5.0}
trace: eval[t0] - eval type=> [float]/{5.0}

trace: proc_apply => [{proc_info,"+",0,9999,27,undefined}]
last-eval =>10.0
trace: ERR convert =>"+"
trace: eval[t0] - eval type=> [pair]/{["+",5.0,5.0]}
trace: eval[t1] - instance of procedure call ["+"]
trace: eval[t0] - eval type=> [string]/{"+"}
trace: eval[t0] - eval type=> [float]/{5.0}
trace: eval[t0] - eval type=> [float]/{5.0}

trace: proc_apply => [{proc_info,"+",0,9999,27,undefined}]
last-eval =>10.0
trace: ERR convert =>"+"
trace: eval[t0] - eval type=> [pair]/{["+",9.0,9.0]}
trace: eval[t1] - instance of procedure call ["+"]
trace: eval[t0] - eval type=> [string]/{"+"}
trace: eval[t0] - eval type=> [float]/{9.0}
trace: eval[t0] - eval type=> [float]/{9.0}

trace: proc_apply => [{proc_info,"+",0,9999,27,undefined}]
last-eval =>18.0
Pair => [true]bbrown@houston:~/workspace_octane/octane_mel/src/erlang$


%% Example Test Cases
-module(test_input_reader).
-include_lib("eunit/include/eunit.hrl").

-include("mel_records.hrl").
-import(mel_input_reader, [is_whitespace/1, next_token/1]).
-import(mel_input_reader, [new_reader/1, stream_read/1, while_whitespace/1]).
-import(mel_input_reader, [new_stack/0, push_stack/2, pop_stack/1, peek_stack/1]).
-import(mel_input_reader, [read/1]).
-import(mel_scheme, [eval/1, eval/2, eval_all_expr/1]).

whitespace_test_() -> 
[
 ?_assert(2 == 2),
 ?_assert(is_whitespace(1)  == false)
].
test_compute_main() ->
    R = new_reader("   (cons 1 (cons 2 3))  (+ 1 2)"), 
    { R2 } = while_whitespace(R),
    io:format("Test While White Space =>~p~n", [R2]),
    { Pair, RUpt } = read(R),
    io:format("Test Read=>~p~n", [Pair]),
    Pair2 = eval(Pair),
    io:format("Pair => [~p]~n", [Pair2]),
    X = new_reader("   "), 
    { YPair, RYUpt } = read(X),
    io:format("~nTest EOF=>~p, ~p~n", [YPair, RYUpt]),  
    EvalRes = eval_all_expr("    (+ 1 2) (+ 4 5) (* 3 (+ 4 9)) (cons 1 2) "),
    io:format("~nTest Eval All=>~p~n", [EvalRes]),  
    [].

References and Full Source

http://octane-lang.googlecode.com/svn/trunk/misc/tutorial/erlang_scheme/ - Browse subversion repository.

http://haskellnotebook.googlecode.com/files/erlang_scheme.tar.gz

http://norvig.com/jscheme.html
http://norvig.com/license.html
http://factorcode.org/
http://www-formal.stanford.edu/jmc/recursive/node3.html
http://www-formal.stanford.edu/jmc/

Java JScheme Source Code

http://octane-lang.googlecode.com/svn/trunk/misc/tutorial/jscheme_new/
http://octane-lang.googlecode.com/svn/trunk/misc/tutorial/jscheme_new/src/org/bot/jscheme/linkedlist/
http://octane-lang.googlecode.com/svn/trunk/misc/tutorial/jscheme_new/src/org/bot/jscheme/
http://jvmnotebook.googlecode.com/files/mini_jscheme.tar.gz

(FAQ) Frequently Asked Questions

Q. I noticed that the Erlang and even the Java implementation is a little bit verbose with some Erlang best practices ignored? Why should I take interest in a neophyte example?

Neophyte, Lazy, Implemented very fast for the intent of a blog entry...same thing. I implemented the Java and Erlang Scheme implementations over the course of about 5 days with 3-5 hours on each day. My intent was to implement Lisp based on JScheme. I have accomplished most of what I needed during that time. However, there are already best practices that I ignored for the purpose of highlighting some of the Erlang syntax. The code doesn't represent a best approach or the cleanest functional programming code. But it does work. Why write about it at all then? Well, I could have picked a more pedestrian topic like the ongoings of Paris Hilton or Kim Kardashian. I figure that a Scheme in Erlang is more interesting. And if you don't like my approach of describing Erlang and then a short description of the Scheme implementation then you can just browse all of the source code. That is what I normally end up doing anyway, breezing through the writing and then downloading the source code and going back to blog if something isn't fully clear in the code.
Q. This blog post sure is long. What is wrong with you?

With my longer programming related blogs, I like to do two things. These things are normally missing in a lot of other blogs. But generally the good blogs adopt this policy and provide some good writing.

Provide a light introduction (with historical footnotes) to some of the introductory programming concepts used in more complex examples. For whatever reason, I like to look at the full use of basics and then go back and forth between the more complicated syntax. Reintroducing basic concepts gives those unfamiliar with the language a better look into the world of Erlang. I mentioned compiling Erlang code, a simple Makefile, running Erlang through the Erlang shell. Hopefully this will help people visualize the step-by-step approach for more complex concepts like the use of the Y Combinator.
Provide all of the source and make sure that it works with most recent compilers and runtime. If you can't take away anything from the blog, at least you have the code.

Tuesday, July 29, 2008

Botlist Updates to Documentation

"Google recently released news that Google's index contains one trillion URLs. This is great news that the Internet has grown to such enormous size but not that great for users that want to traverse all of that information. Google and other search engines provide applications for users to "search" the web and get to information they are interested based on keywords and other queries. Social networking sites turns that paradigm around a bit. Applications like Del.icio.us, Digg, Reddit and Botlist provide user portal sites for users to post their favorite links. As opposed to users querying and searching for interesting links. Users submit their links to sites like Reddit and tag them with interesting titles and keywords. Hopefully the user submitted links fall inline with content that you are also interested in. Botlist attempted to go a step further and focus on the bot/agent generated links. News aggregation and user submitted links.

Botlist contains an open source suite of software applications for social bookmarking and collecting online news content for use on the web. Multiple web front-ends exist based on Django (through Google AppEngine), Rails, and J2EE. Users and remote agents are allowed to submit interesting articles. There are additional remote agent libraries for back-end text mining operations. The system is broken up by the back-end specification and front-end specification. "

http://code.google.com/p/openbotlist/

----

Wednesday, July 16, 2008

Simple Lisp Implementation in Java (Ode to the Linked List)

Overview
I have never had much affinity for writing (in natural language); I have grown up and enjoy programming and that is what I continue to do 20 years later. But I really wanted to get this blog entry out and it helped me rethink and reframe how I look at a particular piece of code. So I am going to move in all different directions starting with one of the most elementary data structures in Computer Science, the linked list. Instead of thinking of Lisp code (for now anyway) as a "listing" of elements. Think of Lisp code as a linked list of elements.

How do you implement a linked list in C (or Java or Fortran)?
How would you print all of the elements of the linked list?
How would you sum all of the elements?

After thinking about the last numeric operation? Can you think of possible ways to implement various List operations, Lisp operations like 'map'?

The Linked List

Why are linked lists important to Lisp? "The name Lisp derives from "List Processing Language". Linked lists are one of Lisp languages' major data structures"

In a Java/C/C++ or other procedural language program, the linked list is normally composed of two fields, the data itself (composed of any Object type) and a "pointer" or reference to the next element A list that is linked. In the Java code snippet below, data refers to the element at the current node in the list. The Node object "next" is a reference to the next node in the list chain.


package org.bot.jscheme.linkedlist;

/**
 * A Node has two fields, data and next (or car and cdr).
 * This forms the basis of a simple, singly-linked list structure 
 * whose contents can be manipulated with cons, car, and cdr. 
 */
public class Node {

    /**
     * The first element of the pair.
     */
    public Object data;
    /**
     *  The other element of the pair.
     */
    public Node next;

    /** Build a pair from two components. * */
    public Node(Object data, Node next) {
        this.data = data;
        this.next = next;
    }
    /**
     * Return a String representation of the pair.
     */
    public String toString() {
        return "" + data;
    }
}

The linked list Java class is also as simple as the Node class. There is one field "head" which is a Node type. The head is the first Node in the list chain. The insert at head operation inserts a newly created Node object. It replaces the head Node with the new node and the head's next node becomes the previous node. For example, the code below contains a complete implementation.


package org.bot.jscheme.linkedlist;

public class LinkedList {
    /**
     * Head element in linked list.
     */
    public Node head;

    private LinkedList() { }
    
    public LinkedList(Node head) {
        this.head = head;
    }
    public LinkedList(String data) {
        this.head = new Node(data, null);
    }   
    /**
     * Add a new node and replace the "root" node with the new one.
     * 
     * @param node
     * @return
     */
    public LinkedList insertHead(final Node node) {
        node.setNext(head);
        head = node;        
        return this;
    }   
    public void insertTail(final Node node) {
        if (head == null) {
            head = node;
        } else {
            Node p, q;
            // Traverse to the end of the list
            for (p = head; (q = p.getNext()) != null; p = q) {
                ;
            }               
            p.setNext(node);
        }
    }
}

The insertTail operation is more inline with the lisp "cons" constructor operation. Traverse to the last Node in the list chain and set the next node reference to the new Node we want to add to the end of the chain.

If we needed to print the data at each element in the Node, we create a reference to the head node and then iterate to each subsequent "next" Node until the end Node contains a null reference.


public String traverse() {
        // Append the contents to a string
        StringBuffer buf = new StringBuffer();
        buf.append("(");    
        buf.append(this.getHead());
        for (Node node = this.getHead().getNext(); node != null; node = node.getNext()) {
            buf.append(' ');
            buf.append(node.getData());         
        }        
        buf.append(")");
        return buf.toString();
    }

Once we start getting into the Lisp functionality we will cover the List operations in more detail but here is another traversal example that is similar to the Lisp lisp Pair to string methods. The other difference between the Lisp implementation and this code are the member names.


public String stringifyNode(StringBuffer buf) {
        buf.append('(');
        buf.append(data);
        Object tail = this.getNext();
        while (tail instanceof Node) {
            buf.append(' ');
            buf.append(((Node) tail).getData());
            tail = ((Node) tail).getNext();
        }
        if (tail != null) {
            buf.append(" . ");
            ((Node) tail).stringifyNode(buf);
        }
        buf.append(')');
        return buf.toString();
    }

Figure: singly-linked list containing two values: the value of the current node and a link to the next node

Linked List Test Case

Here are a set of tests for our Linked List. The toString method for our LinkedList returns a String composed of a Lisp symbolic expression left parenthesis, list arguments and a right parenthesis.


    public static void main(String [] args) {
        System.out.println("Running linked list test");
        
        final LinkedList list = new LinkedList("+");
        list.assertEquals("" + list, "(+)");
        
        // Start off backwards; insert using insertHead.
        list.insertHead(new Node("1", null));       
        list.insertHead(new Node("2", null));
        list.insertHead(new Node("3", null));
        list.assertEquals("" + list, "(3 2 1 +)");
        list.testStatus();
        list.assertEquals(
                "(3 2 1 +)",
                list.getHead().stringifyNode());
        
        final Node node = new Node("+", new Node("1", null));
        list.assertEquals(
                "(+ 1)",
                node.stringifyNode());
        
        final Node node2 = new Node("+", new Node("1", new Node("2", null)));
        list.assertEquals(
                "(+ 1 2)",
                node2.stringifyNode());
        
        final Node node3 = new Node("+", new Node("1", new Node("2", null)));
        final LinkedList list2 = new LinkedList(node3);
        list.assertEquals(
                "(+ 1 2)",
                list2.getHead().stringifyNode());
        list.assertEquals(
                "(+ 1 2)", "" + list2);             
                        
        final LinkedList list3 = new LinkedList("+");
        list3.assertEquals("" + list3, "(+)");
        list3.insertTail(new Node("1", null));      
        list3.insertTail(new Node("2", null));
        list3.insertTail(new Node("3", null));
        list.assertEquals("" + list3, "(+ 1 2 3)");     
        list.testStatus();
                
        final LinkedList list4 = new LinkedList("1");       
        list4.insertTail(new Node("2", null));      
        list4.insertTail(new Node("3", null));
        list4.insertTail(new Node("4", null));
        list.assertEquals("" + list4, "(1 2 3 4)");
        list.assertEquals("" + list4.numCompute('+'), "10.0");
        list.assertEquals("" + list4.numCompute('*'), "24.0");
    }

Ding, Ding, Ding, *DING*

You may have noticed that I left out some implementation details on some of the methods used in the test case. They aren't totally relevant, use the source download at the bottom of the page to get all of the source. NumCompute is a relevant method and leads us into our Lisp implementation in Java. Essentially, the numCompute method is a traversal method to sum all of the Nodes in the list chain. This is kind of important, Lisp is a List processing language. Our traversal method is an operation on a singly linked list. I hope the light-bulb is going off for you.


    /**
     * Perform an arithmetic operation on the List.
     * @param operation
     * @return
     */
    public double numCompute(final char operation) {
        double result = 0.0;
        if (operation == '+' || operation == '-') {
            result = 0.0;
        } else if (operation == '*' || operation == '/') {
            result = 1.0;
        } else {
            throw new RuntimeException("Invalid Operation: try +, -, *, /");
        }
        for (Node node = this.getHead(); node != null; node = node.getNext()) {
            Double d = new Double((String) node.getData());
            double x = d.doubleValue();
            switch (operation) {            
            case '+':
                result += x;
                break;
            case '-':
                result -= x;
                break;
            case '*':
                result *= x;
                break;
            case '/':
                result /= x;
                break;
            default:
                throw new RuntimeException("Invalid Operation: try +, -, *, /");                
            }
        }               
        return result;
    }

Figure: SBCL prompt with the Lisp call (+ 1 2 3 5 5 6 7 9 10)


The lisp code:
(+ 1 2 3 4)

Is essentially a representation of the following Java code:


final LinkedList list4 = new LinkedList("1");       
list4.insertTail(new Node("2", null));      
list4.insertTail(new Node("3", null));
list4.insertTail(new Node("4", null));
list4.numCompute('+')

On Lisp

"On LISP's approximate 21st anniversary, no doubt something could be said about coming of age, but it seems doubtful that the normal life expectancy of a programming language is three score and ten. In fact, LISP seems to be the second oldest surviving programming language after Fortran, so maybe we should plan on holding one of these newspaper interviews in which grandpa is asked to what he attributes having lived to 100." -- John McCarthy

I don't want to delve too much into Lisp history or even the pros and cons; essentially Lisp has a rich history and is a very simple, powerful language. Lisp programs are represented by symbolic expressions as a simple linked list structure that is stored memory. We can determine the lisp function by the first ATOM in a list and perform that operation on the rest of the arguments. And there are a small set of selector and constructor operations expressed as functions (car, cdr and cons).

Lisp in Java, Mini JScheme

Figure: Picture of Peter Norvig, inventor of JScheme

Peter Norvig created JScheme, a Lisp Scheme dialect written in Java. It was created in 1998 and it has since been forked, extended and used quite extensively in other modern projects for the web and elsewhere. I mention JScheme because the code I am about present is all based on the JScheme interpreter. The code is copyrighted to Peter Norvig (including the picture of him above), I have only made modifications to make our Lisp even lighter than it was originally. Norvig's most recent implementation of JScheme contains 1905 lines of Java code. My implementation contains 905 lines wrapped in additional Javadoc comments.

There are eight classes in this smaller implementation. The core classes are shown in the UML class diagram. The InputReader is the most interesting class; if you are interested in learning about basic character/token parsing routines, you should look at this code. The read method is the first core method invoked. It is needed to read each character from the InputStream (E.g. an implementation of a FileInputStream) and return the tokens to the eval method.


    public Object read() {
        try {
            Object token = nextToken();
            if (token == "(") {
                return readTail();
            } else if (token == ")") {
                System.out.println("WARN: Extra ')' ignored.");
                return read();
            } else {
                return token;
            } // End of the if - else
        } catch (IOException e) {
            System.out.println("WARN: On input, exception: " + e);
            return EOF;
        } // End try - catch
    }
    private Object readTail() throws IOException {
        Object token = nextToken();
        System.out.println("trace: readTail(): " + token);
        if (token == EOF) {
            final String msg = "ERROR: readTail() - EOF during read.";
            System.err.println(msg);
            throw (new RuntimeException(msg));
        } else if (token == ")") {
            return null;
        } else if (token == ".") {
            Object result = read();
            token = nextToken();
            if (token != ")") {
                System.out.println("WARN: Missing ')'? Received " + token + " after .");
            }
            return result;
        } else {
            tokenStack.push(token);
            return SchemeUtil.cons(read(), readTail());
        }
    }
    private Object nextToken() throws IOException {
        int ch;

        // See if we should re-use a pushed char or token
        // Task 1: Pop the token and character stacks
        if (!this.tokenStack.empty() && (this.tokenStack.peek() != null)) {
            return this.tokenStack.pop();
        } else if (!this.charStack.empty() && (this.charStack.peek() != null)) {
            ch = ((Character) this.charStack.pop()).charValue();
        } else {
            ch = inputReader.read();
        }

        // Ignore whitespace
        // Task 2: Check for and ignore whitespace
        while (isWhitespace((char) ch)) {
            ch = inputReader.read();
        }
        System.out.println("trace: nextToken() -> " + (char) ch + " $" + ch);

        // See what kind of non-white character we got
        // Task 3: Check if the character is of various token types.
        switch (ch) {
            case -1:
                return EOF;
            case TOK_LEFT_PAREN:
                return "(";
            case TOK_RIGHT_PAREN:
                return ")";
            ...
            ...
            default:
                buff.setLength(0);              
                int c = ch;
                buildGenericToken(ch);
                // Try potential numbers, but catch any format errors.
                if (c == '.' || c == '+' || c == '-' || (c >= '0' && c <= '9')) {
                    try {
                        // Number type is currently in the buffer queue
                        return new Double(buff.toString());
                    } catch (NumberFormatException e) {
                        ;
                    }
                } // End of If
                return buff.toString().toLowerCase();
        } // End of the Switch
    }

Linked Lists, Cons and Pairs,

In that cluster of code, this small snippet should stick out in your mind:


readTail:

tokenStack.push(token);
return SchemeUtil.cons(read(), readTail());

Invocation of our cons operator. Cons, short for the term constructor, returns a new Pair object based on some data Node and possibly some pointer to another Pair. Notice that I am interchanging the term Pair and Node; where Node is the class in our LinkedList implementation. Essentially, a Pair is a Node and most of our Scheme processing is some variation of the LinkedList operations.


    /**
     * cons(x, y) is the same as new Pair(x, y).
     * 
     * Cons presents and interesting function that is fundamental to lisp.
     * Here are some examples of cons usage (tested in common lisp).
     * <code>
     * (cons 1 2):
     * Pair pair = SchemeUtil.cons("1", "2");
     * assertEquals("" + pair, "(1 . 2)");
     * 
     * (cons 1 nil):
     * Pair pair = SchemeUtil.cons("1", null);
     * assertEquals("" + pair, "(1)");
     * 
     * (cons 1 (cons 2 nil)):
     * 
     * Pair pair = SchemeUtil.cons("1", SchemeUtil.cons("2", null));
     * assertEquals("" + pair, "(1 2)");
     *  
     * </code>
     */
    public static Pair cons(Object a, Object b) {
        return new Pair(a, b);
    }

///
/// WHAT IS NEW PAIR????
///

/**
 * A Pair has two fields, first and rest (or car and cdr).
 * This forms the basis of a simple, singly-linked list structure 
 * whose contents can be manipulated with cons, car, and cdr. 
 * 
 * "One mathematical consideration that influenced LISP was to 
 * express programs as applicative expressions built up 
 * from variables and constants using functions" -- John McCarthy
 * http://www-formal.stanford.edu/jmc/history/lisp/node3.html
 */
public class Pair {
    /**
     * The first element of the pair.
     */
    private Object first;
    /**
     *  The other element of the pair.
     */
    private Object rest;

    /** Build a pair from two components. * */
    public Pair(Object first, Object rest) {
        this.first = first;
        this.rest = rest;
    }
    /**
     * Return a String representation of the pair.
     */
    public String toString() {
        return SchemeUtil.stringify(this);
    }
    /**
     * Build up a String representation of the Pair in a StringBuffer.
     */
    void stringifyPair(StringBuffer buf) {
        buf.append('(');
        SchemeUtil.stringify(first, buf);
        Object tail = rest;
        while (tail instanceof Pair) {
            buf.append(' ');
            SchemeUtil.stringify(((Pair) tail).first, buf);
            tail = ((Pair) tail).rest;
        }
        if (tail != null) {
            buf.append(" . ");
            SchemeUtil.stringify(tail, buf);
        }
        buf.append(')');
    }
}

The input reader is our IO class for reading from the Stream and returning a PAIR to to the calling method. Once we have a Pair object (essentially a LinkedList structure), we can then perform Lisp operations on the given list elements. For example, here are two Test Case methods:


public void testInputReaderPair() {
        ByteArrayInputStream stream = 
            new ByteArrayInputStream("(+ 1 2 3)".getBytes());
        InputReader reader = new InputReader(stream);
        Object o = reader.read();
        Pair p = (Pair) o;
        assertEquals("(+ 1.0 2.0 3.0)", "" + p);
    }
    public void testEval1() {
        Environment globalEnvironment = new Environment();
        BuiltInFunction.installBuiltInFunctions(globalEnvironment);
        ByteArrayInputStream stream = new ByteArrayInputStream("(+ 1 2 3)".getBytes());
        InputReader reader = new InputReader(stream);
        Object o = reader.read();
        Scheme scheme = new Scheme();
        Object res = scheme.eval(o, globalEnvironment);
        double d = ((Double) res).doubleValue();
        assertEquals(d, 6.0d, 0.01d);
    }

How do we get to performing Lisp operations on Lists?

In the testInputReaderPair test case, we convert a String representation of Lisp code and then create an InputReader object. Invoke the read method to return a Pair object. In this case, Pair is the head node in the LinkedList. The data in this Node is the String representation of the PLUS/ADD/+ operation. The rest of the elements are arguments to the function. The testEval1 test adds one more call to our first test. Eval. The Pair class is in memory, now we actually have to invoke Lisp operations to get any work done. In my example, I want to add those numbers to together to return a sum. The first element in the list is the function, the rest are my arguments.

"Based on the "token" and the parse rule being matched, the parser does something -- push the token on the parse stack, pop the previous token and emit some code, implement some "Semantic Action" based on the tokens and rules matched so far, etc. etc" -- SaveTheHubble (CrazyOnTap.com)

The Scheme.eval method is a driver method where the execution on the Pair begins.


    public Object eval(Object x, Environment env) { 
        while (true) {
            if (x instanceof String) {          
                // VARIABLE
                System.out.println("trace: eval - instance of String: " + x);
                // Look up a variable or a procedure (built in function).
                return env.lookup((String) x);              
            } else if (!(x instanceof Pair)) { 
                // CONSTANT
                System.out.println("trace: eval - instance of Pair =>" + x);
                return x;
            } else {
                // Procedure Call
                System.out.println("trace: eval[t1] - instance of procedure call");
                Object fn = SchemeUtil.first(x);
                Object args = SchemeUtil.rest(x);
                System.out.println("trace: eval[t2] - fn => [" + fn + "] args => " + args);
                fn = eval(fn, env);
                // Coerce the object to a procedure (E.g. '+/add' procedure)
                Procedure p = Procedure.proc(fn);               
                return p.apply(this, evalList(args, env));                                  
            } // End of if - else
        } // End of While
    } // End of Method

What about the environment?

The environment class is essentially a Hashtable data-structure for storing variable definitions. The key represents the name of the function and the value is an instance of the BuiltInFunction class. Other user defined variables are also stored here (this feature is not implemented in MiniJScheme). There is a major variation from Norvig's JScheme implemention of the Environment class. He used a LinkedList structure to store all of the variables. To lookup a variable by name, you traverse the list until the variable/function is found. I used simple Hashtable get and put operations. It is the least interesting of all of the classes, but probably the most important. If you don't have a reference to the ADD operation, then you can't a sum on a set of numbers.


public class Environment {
    
    private Map mapDataVars = new HashMap();
    
    public Environment() {
    }

    /** 
     * Find the value of a symbol, in this environment or a parent. 
     */
    public Object lookup(String symbol) {
        Object o = this.mapDataVars.get(symbol);
        if (o != null) {
            return o;
        } else {
            return SchemeUtil.error("Unbound variable: [" + symbol + "]");
        }
    }

    /** Add a new variable,value pair to this environment. */
    public Object define(Object var, Object val) {
        
        this.mapDataVars.put(var, val);         
        if (val instanceof Procedure
                && ((Procedure) val).getName().equals(Procedure.DEFAULT_NAME)) {
            ((Procedure) val).setName(var.toString());
        }
        return var;
    }   
    public Environment defineBuiltInProc(String name, int id, int minArgs) {
        define(name, new BuiltInFunction(id, minArgs, minArgs));
        return this;
    }   
}

Installing/Loading the Built-In Functions

We are finally at the point to talk about calling Lisp functions. Here are some basic "Scheme" operations. The cons/constructor(new Pair) operation should be pretty straightforward. Same for the "first" function. I didn't include the code for numCompute because the implementation is the same as the LinkedList code shown earlier.


public static Environment installBuiltInFunctions(Environment env)  {
        int n = Integer.MAX_VALUE;
        env
        
         .defineBuiltInProc("cons",     CONS,      2)        
        
         .defineBuiltInProc("*",        TIMES,     0, n)         
         .defineBuiltInProc("+",        PLUS,      0, n)
         .defineBuiltInProc("-",        MINUS,     1, n)
         .defineBuiltInProc("/",        DIVIDE,    1, n)
        
         .defineBuiltInProc("caaaar",         CXR,       1)
         .defineBuiltInProc("caar",           CXR,       1)      
         .defineBuiltInProc("cadr",         SECOND,      1)          
         .defineBuiltInProc("cdar",           CXR,       1)
         .defineBuiltInProc("cddr",           CXR,       1)
         
         .defineBuiltInProc("car",            CAR,       1)
         .defineBuiltInProc("cdr",            CDR,       1)     
                    
         .defineBuiltInProc("*",        TIMES,     0, n)         
         .defineBuiltInProc("+",        PLUS,      0, n)
         .defineBuiltInProc("-",        MINUS,     1, n)
         .defineBuiltInProc("/",        DIVIDE,    1, n);
        return env;
    }   
/** 
     * Apply a primitive built-in-function to a list of arguments.
     */
    public Object apply(Scheme interp, Object args) {
        // First make sure there are the right number of arguments. 
        int nArgs = SchemeUtil.length(args);
        if (nArgs < minArgs) {
            return SchemeUtil.error("too few args, " + nArgs + ", for " + this.getName() + ": "
                    + args);
        } else if (nArgs > maxArgs) {
            return SchemeUtil.error("too many args, " + nArgs + ", for " + this.getName()
                    + ": " + args);
        } // End of the If
        Object x = SchemeUtil.first(args);
        Object y = SchemeUtil.second(args);
        switch (idNumber) {
            case PLUS:
                return numCompute(args, '+', 0.0);
            case MINUS:
                return numCompute(SchemeUtil.rest(args), '-', num(x));
            case TIMES:
                return numCompute(args, '*', 1.0);
            case DIVIDE:
                return numCompute(SchemeUtil.rest(args), '/', num(x));          
            case THIRD:
                return SchemeUtil.third(x);
            case CONS:
                return SchemeUtil.cons(x, y);
            case CAR:
                return SchemeUtil.first(x);
            case CDR:
                return SchemeUtil.rest(x);
            case CXR:
                for (int i = this.getName().length() - 2; i >= 1; i--) {
                    x = (this.getName().charAt(i) == 'a') ? SchemeUtil.first(x) : SchemeUtil.rest(x);
                }
                return x;
            default:
                return SchemeUtil.error("internal error: unknown primitive: " + this
                        + " applied to " + args);
        }
    } // End of the Apply

More Test Cases

Here are some more Java test cases and their Scheme/Common Lisp equivalent:


public class TestPairList extends TestCase {
    
    public static final String test1 = "(cons 1 2)";
    public static final String test2 = "(cons 1 (cons 2 ()))";
    
    public void testBasicBuildPair() {
        // Construct a linked list/pair of
        // a function '+', 1 and 2
        // (+ 1 2)
        Pair pair = new Pair("+", new Pair("1", new Pair("2", null)));
        assertEquals("" + pair, "(+ 1 2)");
    }   
    public void testPair1() {
        // Construct a linked list/pair of
        // a function '+', 1 and 2
        // (+ 1 2)
        Pair pair = new Pair("+", new Pair("1", new Pair("2", ":test")));       
        assertEquals("" + pair, "(+ 1 2 . :test)");
    }   
    /**
     * (cons 1 2).  
     */
    public void testCons() {
        Pair pair = SchemeUtil.cons("1", "2");
        assertEquals("" + pair, "(1 . 2)");
    }
    /**
     * (cons 1 nil).
     */
    public void testConsOne() {
        Pair pair = SchemeUtil.cons("1", null);
        assertEquals("" + pair, "(1)");
    }
    /**
     * (cons 1 (cons 2 nil)).    
     */
    public void testConsTwo() {
        Pair pair = SchemeUtil.cons("1", SchemeUtil.cons("2", null));
        assertEquals("" + pair, "(1 2)");
    }
    
    public void testEvalCons1() {
        Environment globalEnvironment = new Environment();
        BuiltInFunction.installBuiltInFunctions(globalEnvironment);
        ByteArrayInputStream stream = new ByteArrayInputStream(test1.getBytes());
        InputReader reader = new InputReader(stream);
        Object o = reader.read();
        Scheme scheme = new Scheme();
        Object res = scheme.eval(o, globalEnvironment);
        System.out.println(res);
    }
    public void testEvalCons2() {
        Environment globalEnvironment = new Environment();
        BuiltInFunction.installBuiltInFunctions(globalEnvironment);
        ByteArrayInputStream stream = new ByteArrayInputStream(test2.getBytes());
        InputReader reader = new InputReader(stream);
        Object o = reader.read();
        Scheme scheme = new Scheme();
        Object res = scheme.eval(o, globalEnvironment);
        assertEquals("" + res, "(1.0 2.0)");
    }
}

Conclusion

This concludes my run down of a Scheme/Lisp implementation in Java starting from Linked List data structures and continuing to the interpreter code. There is a lot that can be added to Norvig's implementation. Add full support for a R5RS Scheme Implementation. Add better input/output support. Copy interesting functions from Arc or Common Lisp. Write a Scheme compiler. In fact, the source code in my implementation and Norvig's is fairly straightforward, I was planning on writing a Forth interpreter based on some of this code. Lisp's defining data-structure is the List. Forth's is the stack. It won't be nearly as complete as the Factor programming language, but it would still be a fun project.

Source Code

http://octane-lang.googlecode.com/svn/trunk/misc/tutorial/jscheme_new/
http://octane-lang.googlecode.com/svn/trunk/misc/tutorial/jscheme_new/src/org/bot/jscheme/linkedlist/
http://octane-lang.googlecode.com/svn/trunk/misc/tutorial/jscheme_new/src/org/bot/jscheme/
http://jvmnotebook.googlecode.com/files/mini_jscheme.tar.gz

References

http://norvig.com/jscheme.html
http://norvig.com/license.html
http://factorcode.org/
http://www-formal.stanford.edu/jmc/recursive/node3.html
http://www-formal.stanford.edu/jmc/

Addendum: Implementation of Closures, introducing lambda

I did not add Closures to my implementation of JScheme, but it can be added in twenty lines of code. Here is Norvig's implementation of the Closure class:

"A closure is a user-defined procedure. It is "closed" over the environment in which it was created. To apply the procedure, bind the parameters to the passed in variables, and evaluate the body."


public class Closure extends Procedure {
    Object parms;
    Object body;
    Environment env;    
    /** Make a closure from a parameter list, body, and environment. **/
    public Closure (Object parms, Object body, Environment env) {
        this.parms = parms;
    this.env = env;
    this.body = (body instanceof Pair && rest(body) == null)
        ? first(body) : cons("begin", body);
    }
    /**
     * Apply a closure to a list of arguments.
     */
    public Object apply(Scheme interpreter, Object args) {
        return interpreter.eval(body, new Environment(parms, args, env));
    }
}

The Scheme.eval method; the lambda List closure is a block/list of Lisp code that contains a set of operations that are defined without a name (anonymous function call). In common lisp, we would do the following:


Break 1 [5]> (defun abc (x) (funcall #'(lambda (y) (+ x y)) 5))
ABC
Break 1 [5]> (abc 4)
9
Break 1 [5]>


/** 
 * Evaluate an object, x, in an environment. 
 */
public Object eval(Object x, Environment env) {
  Object fn = first(x);
  Object args = rest(x); 
  if (fn == "define") { // DEFINE
      if (first(args) instanceof Pair)
          return env.define(first(first(args)), eval(cons(
                                "lambda", cons(rest(first(args)), 
                                               rest(args))),
                                env));
      else
          return env.define(first(args), eval(second(args), env));
  } else if (fn == "lambda") { // LAMBDA
    return new Closure(first(args), rest(args), env);
  }