Wednesday, July 30, 2008

Neophyte Scheme/Lisp Interpreter in Erlang

Overview

This is a follow-up to my earlier entry on Lisp in Java; the entry described implementing a Lisp interpreter in Java. This blog posts covers implementing a similar approach using Erlang. In essence, a port of the JSceme interpreter to Erlang. So you can see compare and contrast the differences in Erlang and Java code. Like I always do, I am going to first introduce Erlang with simple examples and then go over the simple Lisp implementation. All of the source is provided at the end of the entry.

What is Erlang?

Erlang is a general purpose programming language that is great for writing concurrent applications and components. That is accomplished by some of the slimmed-down language features including immutable code, simple message passing primitives, and lightweight processes. It was created by Ericsson in 1987 and later open sourced in 1998. Erlang language syntax is based on Prolog.

Why Erlang?
I don't know why the software community can't accept another programming language. Actually, it doesn't matter if X language is accepted or not. The developers will continue to work on it and their users will continue to enjoy the applications that they write. In 2008 and into the future, I can't see a new programming language gaining the kind of adoption and popularity that Java or .NET have received. But, in the next ten to twenty years, a majority of developers will continue to work with one or more of the following general purpose programming languages. With that being said, Erlang has a lot of features that other more popular languages like Java also have. This includes but is not limited to: Erlang OTP, Mnesia database, XML parsing libraries, capable web frameworks. Several killer applications have become mainstream including CouchDB, Yahoo's SimpleDB, Tsung, RabbitMQ. On top of that, Joe Armstrong released a very thorough book on Erlang that includes an overview of many practical applications.

  1. Python/Jython - Easy to learn, dynamic programming language with some advanced constructs.
  2. Ruby/JRuby - Easy to learn, built-in regex, dynamic programming language with some advanced constructs.
  3. Erlang - Concurrent programming, functional programming, fast of multicore machines, rich enterprise libraries (including Erlang OTP, Mnesia, etc), mature, strong community, immutable variables.
  4. Haskell - Strong, static typing, functional programming, advanced functional programming features, GHC compiler compiles to native code.
  5. Scala -Strong, static typing, runs on the Java virtual machine, functional programming, advanced functional programming features.
  6. Clojure - Lisp dialect, advanced programming features, runs on the Java virtual machine
  7. Smalltalk/Seaside - Objected oriented development, strong community
  8. Common Lisp (SBCL, Clisp) - Simple syntax, fast, advanced functional programming features, dynamic typing
  9. Scheme - Simple syntax, fast, advanced functional programming features, dynamic typing
  10. Factor - Concatenative langauge, influenced by Forth the stack language, Joy and others, strong dynamic typing, simple syntax, many advanced libraries.
  11. OCaml
  12. C - Fast - mostly compiles to native code, imperative programming, static weak, popular
  13. C++ - Fast - mostly compiles to native code, imperative programming, object oriented
  14. C# (and other .NET languages) - Built by Microsoft
  15. F# - Built by Microsoft

Immutable Variables and Single Assignment

If you are used to imperative programming languages; you are normally familiar with the ability to change the state of your variables. In Erlang, this is now allowed.

Getting used to Erlang


Notice how I use the "term 'getting' used to" as opposed to saying something like, "Erlang has the following problems". After working with the language, I can say the the syntax, architecture and libraries appear well planned out but the beginning Erlang programmer will need to get used to the language and understand the reasons for single assignment variables. Here are some of the quirks of the language especially if you are more familiar with an imperative style of programming.

The BEAM Interpreter

BEAM/Erl is Erlang's virtual machine. It interprets beam compiled bytecode. Normally, an Erlang developer will compile erl file source code with erlc and erlc generates the BEAM bytecode. Like the Java class file; there can be one beam file for one erl module source.

In the output below, I compile the Erlang source file and then run the module.


%% compile1.erl
-module(compile1).
-export([main/0]).
main() ->
io:format("Hello World~n").


$ erlc compile1.erl

$ ls
compile1.beam compile1.erl
$ erl -noshell -pa . -s compile1 main -s erlang halt
Hello World

Or you can compile and run the 'compile1' module within the Erlang shell like so.

Eshell V5.6 (abort with ^G)
1> c(compile1).
{ok,compile1}
2> compile1:main().
Hello World
ok
3>q(). %%% I am done, I quit

Using a Makefile
As you see, it really isn't that difficult to build your application. If you have built Java bytecode class files using some Makefile system, you could probably use similar techniques to work with your Erlang application; you could just use erlc over javac.

Create the 'src' and 'ebin' binary directories.
$ mkdir -p src ebin test

And then create a ./src/compile1.erl hello world file and copy
the erlang code into this file.

I have been using some variant of this Makefile over the last couple of weeks. It is pretty basic, it detects changes to the Erlang source and at the 'make' command will output the beam bytecode files to the ebin directory.

## Makefile
TOPDIR := $(shell pwd)
DATE = $(shell date +%Y%m%d)

APPLICATION = compile1

ESRC = ./src
EBIN = ./ebin

ERLC = erlc
ERL = erl

OPT = -W

SRC = $(wildcard $(ESRC)/*.erl)
TARGET = $(addsuffix .beam, $(basename \
$(addprefix $(EBIN)/, $(notdir $(SRC)))))

all: clean ${APPLICATION}

LIB_TARGET_OBJS = $(EBIN)/compile1.beam

${APPLICATION}: $(TARGET) $(LIB_TARGET_OBJS)

ebin/%.beam: $(ESRC)/%.erl
$(ERLC) $(OPT) -o ebin $<

run: $(APPLICATION)
$(ERL) -noshell -pa $(TOPDIR)/ebin/ -s $(APPLICATION) main -s erlang halt

clean:
-rm -vf $(TARGET)
-rm -vf erl_crash.dump
-rm -vf ebin/*.beam

To compile our compile1 we need to just enter make at the command prompt as shown in the following output.

$ make
rm -vf ./ebin/compile1.beam
rm -vf erl_crash.dump
rm -vf ebin/*.beam
erlc -W -o ebin src/compile1.erl

And to run the application enter make and use the run target.

$ make run
erl -noshell -pa /ebin/ -s compile1 main -s erlang halt

Note: the argument compile1 is associated with the module/beam file name and main is the name of the exported function.

Basic Examples in Erlang


The earlier section described how to compile and run Erlang code. Now, lets look at some actual examples. Generally, most of the code can be run within the Erlang REPL shell but can also be compiled through source file. Functions on the other hand are defined within module source and must be exported before they can be called from outside the module. But there isn't much difference between functions defined as funs compared to the ones defined in your module source. This is shown in the output below, notice the error after my attempt at declaring a function.

Eshell V5.6.3 (abort with ^G)
1> MyFunc = fun() -> io:format("Hello World~n") end.
#Fun
2> MyFunc().
Hello World
ok
3> myfunc() -> io:format("Ooops") end.
* 1: syntax error before: '->'
3>

There are a couple of things to note with this output and I will go over these concepts repeatedly in my blog post. MyFunc is an anonymous function and the body includes printing Hello World to the console. We assign the 'fun' to the variable MyFunc and then invoke the fun.

SingleAssignmentMyFunc = fun() -> ...BODY SPACE_TOKEN END_TOKEN PERIOD_TOKEN

Every character and even the case of those characters is important. A misplaced period or semi-colon or extraneous character may generate a misleading error message. Damien Katz in his "What Sucks about Erlang" article, he makes note of the Erlang terminators, "In Algol based languages the statement terminators are the same everywhere (usually semi-colon, newline or both)", in Erlang "you can have different character ending each line: bar(); bar(), bar()" depending on the context of your code. If you have worked with Java, C or C#, it may take a bit of experience to get used to the character endings. Here is some INVALID and VALID Erlang expressions. Take note of the error messages.

Erlang Session: testing invalid variable assignments


Eshell V5.6.3 (abort with ^G)
1> %% Store the value 3 into this variable
1> %% Actually this is invalid code, because ATOMs start with lowercase text
1> x = 3.
** exception error: no match of right hand side value 3
2> %% Let create a variable correctly
2> X = 3.
3
3> %% All variables must start with a uppercase letter.
3>
3> %% What if I leave off the period character
3> X = 3
3> Y = 3
3> .
* 5: syntax error before: Y
3> q(). %% What if I want to exit this session?
ok
4> %% What if I want to exit this session?
bbrown@houston:~$

Erlang Session: testing invalid single assignments

Eshell V5.6.3 (abort with ^G)
1> %% What happens if I try to break the single assignment rule?
1> X = 3.
3
2> X = 4.
** exception error: no match of right hand side value 4
3> %% Not that friendly an error message is it?
3> q().
ok
4> bbrown@houston:~$

Erlang Session: Case statement and various end of expression characters.

Eshell V5.6.3 (abort with ^G)
1> %% Even though we haven't introduced if or case statements
1> %% I will want to show you the differences in the character
1> %% terminators.
1>
1> %% We can use case or if statements to match against a set of patterns
1> %% In theory, similar to 'switch' statements in Java/C
1>
1> Expression1 = 3.
3
2> case Expression1 of
2> 4 -> io:format("I don't think 3 == 4~n"),
2> io:format("No, you are right, 3 != 4~n");
2> 3 -> io:format("What 3 == 3~n"),
2> io:format("YES!!!~n"),
2> io:format("Note: no COMMA, SEMICOLON,only space 'end' keyword~n") 2> end.
What 3 == 3
YES!!!
Note: no COMMA, SEMICOLON,only space 'end' keyword
ok
3>

Erlang Module (mytest.erl): What happens in error?
Now, what does Erlang return if we aren't used to the language yet and add the wrong character ending? And it is better to see this error in a Erlang module.

%% Filename: mytest.erl
-module(mytest).
-export([mymain/0]).

mymain() ->
%% Invalid code, comma should be a semicolon
Expression = 3,
case Expression of
4 -> io:format("Oops, there shouldn't be a COMMA here~n"),
3 -> io:format("Doesn't matter, error in last line~n")
end.

bbrown@houston:~/tmp/tmp5$ erlc mytest.erl
./mytest.erl:10: syntax error before: '->'
./mytest.erl:3: function mymain/0 undefined

Erlang Session: Valid if example
And for the sake of completeness, here is a valid use of the if expression. Here is a snippet from Joe Armstrong's "Programming Erlang".

Eshell V5.6.3 (abort with ^G)
1> Expression = 3.
3
2> ForGuard1 = 4.
4
3> ForGuard2 = 3.
3
4> if
4> ForGuard1 == Expression ->
4> io:format("Nope, not going to happen");
4> ForGuard2 == Expression ->
4> io:format("Yes, I think this is it")
4> end.
Yes, I think this is it
5>

From the Erlang book:
if
Guard1 ->
Expr_seq1;
Guard2 ->
Expr_seq2;
...
...
end

"Guards are constructs that we can use to increase the power of pattern matching. Using guards, we can perform simple tests and comparisons on the variables in a pattern" [1] -- Erlang Book.

Another distinct facet of Erlang is pattern matching, "In Erlang, = denotes a pattern matching expression. Lhs = Rhs really means this: evaluate the right side (Rhs), and then match the result against the pattern on the left side (Lhs)". And you can bind variables to the left side.

Here are some valid operations.

Eshell V5.6.3 (abort with ^G)
1> %% Define a tuple of numbers and bind to the variable X
1> X = { 1, 2, 3 }.
{1,2,3}
2> X
2> .
{1,2,3}
3> {A, B, _ } = X.
{1,2,3}
4> io:format("Print the value of A: ==>~p", [A]).
Print the value of A: ==>1ok
5> io:format("Print the value of A: ==>~p~n", [A]).
Print the value of A: ==>1
ok
6>

List Operations and Strings

Eshell V5.6.3 (abort with ^G)
1> A = [ 1, 2, 3].
[1,2,3]
2> length(A).
3
3> [Head|Tail] = [ 1, 2, 3, 4 ].
[1,2,3,4]
4> Head
4> .
1
5> Tail.
[2,3,4]
6> %% To access some element by index, use lists module, nth function.
6> lists:nth(2, A).
2
7> %% Or to reverse a list.
7> lists:reverse(A).
[3,2,1]
8> %% Print the list.
8> io:format("==>~p~n", [A]).
==>[1,2,3]
ok
10> io:format("==>~p~p~n", [A, Tail]).
==>[1,2,3][2,3,4]
ok
11>

Basic Data Structures, Records and Immutable

Eshell V5.6.3 (abort with ^G)
1> UpdateDict = fun(D) ->
1> io:format("Can I update this dict? pass by value~n"),
1> D2 = dict:store("key", 999, D),
1> io:format("Note: set updated dict to D2, Dict ==>~n~p~n", [D2])
1> end.
#Fun
2> %% Create an empty dict
2> D = dict:new().
{dict,0,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}
3> UpdateDict(D).
Can I update this dict? pass by value
Note: set updated dict to D2, Dict ==>
{dict,1,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],
[["key"|999]],
[],[],[]}}}
ok
4>


Eshell V5.6.3 (abort with ^G)
1> UpdateDict = fun(D) ->
1> io:format("Update the dict, but NOT return the updated value~n"),
1> D2 = dict:store("key", 999, D),
1> some_atom_not_dict
1> end.
#Fun
2> %% This time, create a new dict; "create/update" another
2> %% dict with the key=999, but do NOT return the update/newly created
2> %% dict. Essentially, we are going to get a code logic error.
2> MyNewDict = dict:new().
{dict,0,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}
3> UpdateDict(MyNewDict).
Update the dict, but NOT return the updated value
some_atom_not_dict
4> %% Is the dict updated? pass by value like with Java.
4> io:format("==>~p~n", [MyNewDict]).
==>{dict,0,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}
ok
5> %% Hmm, no 'key'?
5> q().
ok
6> bbrown@houston:~$

So here is a variation of the anonymous function in the code snippet above. The function sets the key to 999 and the dict store function returns a newly created dict with those values. It is sort of way to update the dict by creating an exact copy of the current one. Because Erlang "is a functional programming language that has a non-mutable state" [1], you can't simply create a variable even a hash table and expect the values within the structure to change.

Eshell V5.6.3 (abort with ^G)
1> UpdateDict = fun(D) ->
1> D2 = dict:store("key", 999, D)
1> end.
#Fun
2> D = dict:new().
{dict,0,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}
3> D2X = UpdateDict(D).
{dict,1,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],
[["key"|999]],
[],[],[]}}}

With Java, the state of the hash table is changed from calling the method 'test'. This would not be the case with Erlang.

import java.util.HashMap;
import java.util.Map;
public class A {
public static void test(final Map map) {
map.put("key", "999");
}
public static void main(String [] args) {
Map mymap = new HashMap();
// In this instance, key is still kept.
test(mymap);
System.out.println("==>" + mymap.get("key"));
}
}


%% Simple test of using record
%% Also note, ever character and CASE is important
-module(for_record).
-export([main/0]).

%% Simple record definition
-record(the_record, {
xkey = 1.0,
ykey = 3.0,
buffer = "dogs and cats",
status = not_done_yet,
got_update
}).

simple_record_test(OrigRecord) ->
XVal = OrigRecord#the_record.xkey,
XBuf = OrigRecord#the_record.buffer,
%% Update a value
R2 = OrigRecord#the_record{got_update="MyData"},
io:format("New Record =>~p~n", [R2]),
%% Return an updated record
OrigRecord#the_record{buffer="hahah"}.

with_pattern_match(#the_record{buffer=Buf} = Rcrd) ->
io:format("==> Buffer: ~p~n", [Buf]),
io:format("==> Record: ~p~n", [Rcrd]),
Rcrd.

common_idiom_tupl(#the_record{buffer=Buf} = Rcrd) ->
%% A common Erlang pattern, return a TUPLE with the
%% status of the function call. Not an ATOM is
%% set to the first value of the tuple.
{ ok_atom, Rcrd }.

main() ->
FirstRec = #the_record{status=new_rec},
io:format("First Record =>~p~n", [FirstRec]),
%% Notice that we are passing the updated records to the
%% next function call.
R2 = simple_record_test(FirstRec),
R3 = with_pattern_match(R2),
R4WithTuple = common_idiom_tupl(R3),
io:format("Record AND Tuple =>~p~n", [R4WithTuple]).


$ erl -s for_record main
Erlang (BEAM) emulator version 5.6.3 [source] [async-threads:0] [hipe] [kernel-poll:false]

First Record =>{the_record,1.0,3.0,"dogs and cats",new_rec,undefined}
New Record =>{the_record,1.0,3.0,"dogs and cats",new_rec,"MyData"}
==> Buffer: "hahah"
==> Record: {the_record,1.0,3.0,"hahah",new_rec,undefined}
Record AND Tuple =>{ok_atom,{the_record,1.0,3.0,"hahah",new_rec,undefined}}
Eshell V5.6.3 (abort with ^G)
1> q().
ok

Overview of Mini JScheme


In my last entry about my Java Scheme implementation, "Peter Norvig created JScheme, a Lisp Scheme dialect written in Java. It was created in 1998 and it has since been forked, extended and used quite extensively in other modern projects for the web and elsewhere. I mention JScheme because the code I am about present is all based on the JScheme interpreter." Here is a description of the Erlang version of that implementation. A lot function names are the same as well as the logic used. Note: The approach used may not represent the Erlang best practices but there is a large correlation between the two versions. And so far, we have covered most of the Erlang idioms that will be used in our Mini Scheme implementation. If you are more familiar with imperative languages, you should be able to go back and forth between the Java and Erlang code.

Input Reader Parser Functionality in Erlang


The input reader module (mel_input_reader.erl) contains functions for parsing the scheme data. In the Java implementation, we read one character at a time and depending on the character,

while (isWhitespace((char) ch)) {
ch = inputReader.read();
}
switch (ch) {
case -1:
return EOF;
case TOK_LEFT_PAREN:
return "(";
case TOK_RIGHT_PAREN:
return ")";
default:
buff.setLength(0);
int c = ch;
buildGenericToken(ch);
if (c == '.' || c == '+' || c == '-'
|| (c >= '0' && c <= '9')) {
try {
// Number type is currently in the buffer queue
return new Double(buff.toString());
} catch (NumberFormatException e) {
;
}
} // End of If
return buff.toString().toLowerCase();
} // End of the Switch

I hope the Java scheme input parser code is fairly straightforward. We read and collect the next character from the file input stream and then determine if the character is the left of right character token or an alpha numeric character which we define as just another symbol to be used for further processing. I tried to keep some of that same logic for the Erlang implementation. There are two major differences.

  1. We implemented our own simple "input stream reader" using Erlang's list and string operations. This approach is a quick hack that is only ten lines of code but in the long run is not going to be very robust. Our stream is simply a collection of characters that we "pop" off the stack buffer.

  2. The "reader_info" record is used to keep the "state" of our "input stream" as well as other important variables. With the Java implementation, this was easily done in the Input Reader, Pair and BuiltInFunction classes with a collection of various class fields. With the Erlang implementation, the reader_info record is all that is needed for keeping track of the state of the buffer, the scheme character stack, and the token stack.

For example, shown here is the reader_info and the stream implementation:

%% Where status = init|ok|done
-record(reader_info, {status=init, last_ch=-1,
reader_strm="", token_buffer=[],
char_stack=[], token_stack=[]}).

%%----------------------------------------------------------------------
%% Function: stream_read/2
%% Purpose: Read the next character/value in the stream. In this case,
%% a stream is just a "list" of values and we pop the next
%% value in the list until the list is empty.
%% Note: Reader only returns a 'Reader Info' record
%% Args: Reader is an instance of the record "Reader" record.
%% Returns: The status of the read, the Top/Head character
%% updated in the Reader record.
%% { Reader, status = ok } (Reader read successfully)
%% { Reader, status = done } (Reader reached EOF)
%%----------------------------------------------------------------------
stream_read(Reader) ->
Str = Reader#reader_info.reader_strm,
{ Done, H, T } = stream_read(Str, ok),
% Where tail can be an empty list of []
% Update the record info with;
% set the status, last char
{ Reader#reader_info{status=Done, last_ch=H,
reader_strm = T} }.
stream_read([], _) -> { done, -1, [] };
stream_read(-1, _) -> { done, -1, [] };
stream_read(Str, _) ->
[ Head | Tail ] = Str,
{ ok, Head, Tail }.

Once again, it is not very robust but I could write comparable stream reader code in Erlang as I could in Java. And here is the Erlang implementation of the "nextToken" Java code shown above. The Erlang syntax is considerably different from Java but a lot of the logic is nearly the same including the order of operations.

next_token_build(Reader) ->
ChStk = Reader#reader_info.char_stack,
% Task 1: Pop the token
% and character stacks
{ Ch, RUpt } =
case (not is_empty(ChStk)) of
true ->
% Pop char off char stack
{ _, Top, Rest } = pop_stack(ChStk),
% Return top of stack,
% and current reader,
% with the updated stack
{ Top, Reader#reader_info{char_stack=Rest} };
false ->
% The state of the reader has changed,
% mutated char reader stream.
{ Rx } = stream_read(Reader),
{ peek_char_reader(Rx), Rx }
end,
% Task 2: Check for and
% ignore whitespace
{ Ch2, R2Upt } = case is_whitespace(Ch) of
true ->
{ R2x } = while_whitespace(RUpt),
{ peek_char_reader(R2x), R2x };
false -> { Ch, RUpt }
end,
% Build exit points
if
Ch2 == -1 -> { "#!EOF", R2Upt };
Ch2 == $( -> { "(", R2Upt };
Ch2 == $) -> { ")", R2Upt };
true ->
% Reset the atom buffer
% Set the buffer for the NEW reader
BufRst = new_stack(),
RResetBufUpt = R2Upt#reader_info{token_buffer=BufRst},
% Buffer is updated by build token
{ R3Uptx } = build_generic_token(RResetBufUpt, Ch2),
case convert_tok_float(R3Uptx, Ch2) of
{ ok, FloatVal } ->
% Return <float, Reader>
{ FloatVal, R3Uptx };
{ _, _ } ->
% Error case for convert attempt,
% act like normal
% Return <string, Reader>
{ peek_buf_reader(R3Uptx), R3Uptx }
end
end.

Erlang already has the Lisp operations we need?


It may not be evident at first but Erlang has a many similarities with many Lisp dialects. Differences are merely in the appearance of the syntax. For example here is a session to test various List construction routines.


Simple SBCL Common Lisp operations:
* (let ((A (list 1 2 3 4)))
(print A))
(1 2 3 4)
* (let ((A (list 1 2 3 4)))
(print (first A)))
1
* (let ((A (list 1 2 3 4))) (print (cdr A)))
(2 3 4)
;; Build a list with CONS, constructor function.
* (let ((A (cons 1 (cons 2 (cons 3 '()))))) (print A))
(1 2 3)
(1 2 3)

Erlang:
Eshell V5.6.3 (abort with ^G)
1> A = [1, 2, 3, 4].
[1,2,3,4]
2> [HeadFirst|Tail] = A.
[1,2,3,4]
3> io:format("==>~p~n", [HeadFirst]).
==>1
ok
4> io:format("==>~p~n", [Tail]).
==>[2,3,4]
ok
5> io:format("==>~p~n", [hd(A)]).
==>1
ok
6> io:format("==>~p~n", [tl(A)]).
==>[2,3,4]
ok
7> Cons = fun(First, Rest) -> [First|Rest].
* 1: syntax error before: '.'
7> Cons = fun(First, Rest) -> [First|Rest] end.
#Fun
8> Cons(1, Cons(2, Cons(3, []))).
[1,2,3]
9>

Various Linked List/Lisp Operations in Java


Because Java does not even closely resemble a functional programming language, we had to retrofit a lot the Lisp operations using Java linked list routines where the Pair class is associated with the linked list node and most of the methods are for operations on the list.

/** Build a pair from two components. * */
public Pair(Object first, Object rest) {
this.first = first;
this.rest = rest;
}
public static Pair cons(Object a, Object b) {
return new Pair(a, b);
}
public static Object rest(Object x) {
return (x instanceof Pair) ? ((Pair) x).getRest() : null;
}
public static int length(Object x) {
int len = 0;
while (x instanceof Pair) {
len++;
x = ((Pair)x).getRest();
}
return len;
}

// Java Pair Test Cases:
Pair pair = new Pair("+", new Pair("1", new Pair("2", null)));

Pair pair = SchemeUtil.cons("1", "2");

Pair pair = SchemeUtil.cons("1", SchemeUtil.cons("2", null));

For the Erlang implementation, here are the Lisp functions used in the Scheme interpreter:

cons(Fst, Rest) -> [Fst|Rest].

%% Like Common Lisp first; car of a Pair, or null for anything else.
first([Fst|Rest]) -> Fst;
first(_) -> [].

%% Like Common Lisp rest; car of a Pair, or null for anything else.
%% "A cons cell is composed of two pointers; the car operation
%% extracts the first pointer, and the cdr operation extracts the second."
%%
%% "Thus, the expression (car (cons x y)) evaluates to x,
%% and (cdr (cons x y)) evaluates to y."
rest([Fst|Rest]) -> Rest;
rest(_) -> [].

second(X) ->
first(rest(X)).

Implementing Eval


The scheme parser and the expression evaluation meets in this read_tail function. Once the token parser determines the beginning of a list denoted by the left parenthesis, the next set of symbols will normally consist of a function name and the list of arguments. The call to cons does the building on the linked list pair data. Once all tokens have been read, the linked list pair node is returned to the evaluation routines.

Figure: singly-linked list containing two values: the value of the current node and a link to the next node


read_tail(#reader_info{token_stack=TokStk}=Reader) ->
{ Tok, R } = next_token(Reader),
if
Tok == "#!EOF" ->
io:format("ERROR: readTail() - EOF during read.~n"),
{ [], R };
Tok == ")" ->
{ [], R };
true ->
T2Stk = push_stack(TokStk, Tok),
R2x = R#reader_info{token_stack=T2Stk},
% R[2]'x' is the current Reader info
% Build a pair out of the token from read
% and read_tail.
{ PairRead, R3x} = read(R2x),
{ PairTail, R4x } = read_tail(R3x),
{ cons(PairRead, PairTail), R4x }
end.

Implementing Basic Numeric Operations and Built In Functions


A simple scheme lisp implementation like ours is simply a wrapper over the host language. Parse the input code, restructure the code as a linked list of symbols and then iterate over the sequence and perform various operations when they are found. Our Erlang based Scheme only supports numeric scheme operations like ADD, MULTIPLY and DIVIDE. First, we detect if an ADD or other numeric operation is being invoked and then supply the arguments to the function. In the Java implementation, the function name (+, -, /) are stored within a hash table and are associated with the Java code for actually applying the numeric operations against the supplied arguments.

The Erlang version uses a similar approach. Instead of a HashMap, we use Erlang's dict.

public class Environment {

private Map mapDataVars = new HashMap();

public Environment() {
}

/**
* Find the value of a symbol, in this environment or a parent.
*/

public Object lookup(String symbol) {
Object o = this.mapDataVars.get(symbol);
if (o != null) {
return o;
} else {
return SchemeUtil.error("Unbound variable: [" + symbol + "]");
}
}

/** Add a new variable,value pair to this environment. */
public Object define(Object var, Object val) {

this.mapDataVars.put(var, val);
if (val instanceof Procedure
&& ((Procedure) val).getName().equals(Procedure.DEFAULT_NAME)) {
((Procedure) val).setName(var.toString());
}
return var;
}
public Environment defineBuiltInProc(String name, int id, int minArgs) {
define(name, new BuiltInFunction(id, minArgs, minArgs));
return this;
}
}

Here is the code and data structures for defining variables and functions on the Erlang side.

new_environment() ->
dict:new().
lookup(Env, Key, Default) ->
case dict:find(Key, Env) of
{ok, Value} -> Value;
error -> Default
end.
define(Env, Key, Value) ->
dict:store(Key, Value, Env).
define_procedure(Env, Name, Id, MinArgs, MaxArgs) ->
define(Env, Name, new_procedure(Id, Name, MinArgs, MaxArgs)).
define_procedure(Env, Name, Id, MinArgs) ->
define(Env, Name, new_procedure(Id, Name, MinArgs, MinArgs)).

NumCompute is a relevant method and leads us into our Lisp implementation in Erlang. Essentially, the numCompute method is a traversal method to sum all of the Nodes in the list chain. This is kind of important, Lisp is a List processing language. Our traversal method is an operation on a singly linked list.

num_compute_while(Args, Op, Result) ->
% Perform the operation against the linked list
case (get_type(Args) == pair) of
false ->
% Exit with the result
Result;
true ->
X = first(Args),
% Traverse
MoreArgs = rest(Args),
NewRes = case Op of
'+' -> X + Result;
'-' -> X - Result;
'*' -> X * Result;
'/' -> X / Result;
_ ->
io:format("Internal Error: unrecognized op: [~p]~n",[Op]),
-1
% End of Op Case
end,
% Continue with loop
% and return some result
num_compute_while(MoreArgs, Op, NewRes)
end. % End Pair Type Check

Running and testing the Interpreter


I want to make it clear, the scheme interpreter operates on long strings. The scheme code is contained within the string. So, we can use Erlang's one call function to read an entire file into binary data and then covert the data into a string. Then pass the string to the scheme interpreter.

eval_file(Filename) ->
{ok, Bin} = file:read_file(Filename),
io:format("Binary Data [ ~p ]~n", [Bin]),
EvalRes = eval_all_expr(binary_to_list(Bin)),
io:format("Pair => [~p]", [EvalRes]),
[].


Parsing the simple test.scm file. This file contains a set of
simple scheme numeric operations.
( + 2 2 1 )
( + 2 2 2 )
( + 2 2 3 )
...
...
...

Outputs:

trace: eval[t0] - eval type=> [float]/{5.0}
trace: eval[t0] - eval type=> [float]/{5.0}

trace: proc_apply => [{proc_info,"+",0,9999,27,undefined}]
last-eval =>10.0
trace: ERR convert =>"+"
trace: eval[t0] - eval type=> [pair]/{["+",5.0,5.0]}
trace: eval[t1] - instance of procedure call ["+"]
trace: eval[t0] - eval type=> [string]/{"+"}
trace: eval[t0] - eval type=> [float]/{5.0}
trace: eval[t0] - eval type=> [float]/{5.0}

trace: proc_apply => [{proc_info,"+",0,9999,27,undefined}]
last-eval =>10.0
trace: ERR convert =>"+"
trace: eval[t0] - eval type=> [pair]/{["+",9.0,9.0]}
trace: eval[t1] - instance of procedure call ["+"]
trace: eval[t0] - eval type=> [string]/{"+"}
trace: eval[t0] - eval type=> [float]/{9.0}
trace: eval[t0] - eval type=> [float]/{9.0}

trace: proc_apply => [{proc_info,"+",0,9999,27,undefined}]
last-eval =>18.0
Pair => [true]bbrown@houston:~/workspace_octane/octane_mel/src/erlang$


%% Example Test Cases
-module(test_input_reader).
-include_lib("eunit/include/eunit.hrl").

-include("mel_records.hrl").
-import(mel_input_reader, [is_whitespace/1, next_token/1]).
-import(mel_input_reader, [new_reader/1, stream_read/1, while_whitespace/1]).
-import(mel_input_reader, [new_stack/0, push_stack/2, pop_stack/1, peek_stack/1]).
-import(mel_input_reader, [read/1]).
-import(mel_scheme, [eval/1, eval/2, eval_all_expr/1]).

whitespace_test_() ->
[
?_assert(2 == 2),
?_assert(is_whitespace(1) == false)
].
test_compute_main() ->
R = new_reader(" (cons 1 (cons 2 3)) (+ 1 2)"),
{ R2 } = while_whitespace(R),
io:format("Test While White Space =>~p~n", [R2]),
{ Pair, RUpt } = read(R),
io:format("Test Read=>~p~n", [Pair]),
Pair2 = eval(Pair),
io:format("Pair => [~p]~n", [Pair2]),
X = new_reader(" "),
{ YPair, RYUpt } = read(X),
io:format("~nTest EOF=>~p, ~p~n", [YPair, RYUpt]),
EvalRes = eval_all_expr(" (+ 1 2) (+ 4 5) (* 3 (+ 4 9)) (cons 1 2) "),
io:format("~nTest Eval All=>~p~n", [EvalRes]),
[].

References and Full Source


http://octane-lang.googlecode.com/svn/trunk/misc/tutorial/erlang_scheme/ - Browse subversion repository.

http://haskellnotebook.googlecode.com/files/erlang_scheme.tar.gz

http://norvig.com/jscheme.html
http://norvig.com/license.html
http://factorcode.org/
http://www-formal.stanford.edu/jmc/recursive/node3.html
http://www-formal.stanford.edu/jmc/

Java JScheme Source Code

http://octane-lang.googlecode.com/svn/trunk/misc/tutorial/jscheme_new/
http://octane-lang.googlecode.com/svn/trunk/misc/tutorial/jscheme_new/src/org/bot/jscheme/linkedlist/
http://octane-lang.googlecode.com/svn/trunk/misc/tutorial/jscheme_new/src/org/bot/jscheme/
http://jvmnotebook.googlecode.com/files/mini_jscheme.tar.gz

(FAQ) Frequently Asked Questions



Q. I noticed that the Erlang and even the Java implementation is a little bit verbose with some Erlang best practices ignored? Why should I take interest in a neophyte example?

Neophyte, Lazy, Implemented very fast for the intent of a blog entry...same thing. I implemented the Java and Erlang Scheme implementations over the course of about 5 days with 3-5 hours on each day. My intent was to implement Lisp based on JScheme. I have accomplished most of what I needed during that time. However, there are already best practices that I ignored for the purpose of highlighting some of the Erlang syntax. The code doesn't represent a best approach or the cleanest functional programming code. But it does work. Why write about it at all then? Well, I could have picked a more pedestrian topic like the ongoings of Paris Hilton or Kim Kardashian. I figure that a Scheme in Erlang is more interesting. And if you don't like my approach of describing Erlang and then a short description of the Scheme implementation then you can just browse all of the source code. That is what I normally end up doing anyway, breezing through the writing and then downloading the source code and going back to blog if something isn't fully clear in the code.
Q. This blog post sure is long. What is wrong with you?

With my longer programming related blogs, I like to do two things. These things are normally missing in a lot of other blogs. But generally the good blogs adopt this policy and provide some good writing.

  1. Provide a light introduction (with historical footnotes) to some of the introductory programming concepts used in more complex examples. For whatever reason, I like to look at the full use of basics and then go back and forth between the more complicated syntax. Reintroducing basic concepts gives those unfamiliar with the language a better look into the world of Erlang. I mentioned compiling Erlang code, a simple Makefile, running Erlang through the Erlang shell. Hopefully this will help people visualize the step-by-step approach for more complex concepts like the use of the Y Combinator.
  2. Provide all of the source and make sure that it works with most recent compilers and runtime. If you can't take away anything from the blog, at least you have the code.

3 comments:

Raoul Duke said...

very very nice post. thank you for your efforts from this erlang newbie.

Berlin Brown said...

Thanks. I really wanted to cover some more topics like loops, networking, and processes. But hopefully I can add that in another entry.

Raoul Duke said...

as an aside: any idea if there's a scheme that is 99% functions and macros and 1% a core kernel that you have to port? such a thing could easily be ported to jvm, clr, c, etc. (ideally on things like jvm it would use jvm's gc.)