Saturday, December 27, 2008

Simple lisp interpreters in C, Java and Erlang


Overview

I have already covered the lisp interpreters in Erlang and Java. I have created another example that is based on C. So, you have three different implementations in three different languages. Also, these interpreters are based on Peter Norvig's JScheme implementation. That is a fourth project that you can compare and contrast.

Peter Norvig created JScheme, a Lisp Scheme dialect written in Java. It was created in 1998 and it has since been forked, extended and used quite extensively in other modern projects for the web and elsewhere. The C lisp interpreter is based on the JScheme interpreter. The JScheme code is copyrighted to Peter Norvig This is only a modifications to make our Lisp even lighter
than it was originally. Norvig's most recent implementation of JScheme contains 1905 lines of Java code. This implementation contains 2000 lines of C code wrapped in additional Javadoc comments, including implementations of list, hashtable datastructures.

There is a lot that can be added to Norvig's implementation. Add full support for a R5RS Scheme Implementation. Add better input/output support. Copy interesting functions from Arc or Common Lisp. Write a Scheme compiler. In fact, the source code in my implementation and Norvig's is fairly straightforward,

C implementation

To Compile:
type 'make' at the command line prompt
To Run:
type 'make run' to run the tests or ./octanec

For Example:

test.lisp contains the following:
-------------
(+ 4 5 6)
(+ 6 6 6)
(+ 1 2 3 )
(+ 1 (+ 1 (* 1 1)))
-------------

./octanec test.lisp

Example Output

...
...
trace: eval[t1] - instance of procedure calltrace: eval - instance of String: data=[+]
trace: eval - instance of non-pair =>
trace: eval - instance of non-pair =>
trace: eval - instance of non-pair =>
at apply id= 27 4.000000 x==>97125c0 16
at plus 27

test eval [result after eval]: object=9712918 type=16 [15.000000]
...
...

Object Oriented C

The code is written with an object oriented approach. There is one generic object definition that contains data for different types.
octane_object.h

/*
 * The default object type for the scheme parser.
 * The obj_type determines.
 * 
 * If the struct is of a particular type 'E.g. String'
 * then we will need reference that particular pointer.
 *
 * String = string_data;
 * Char   = char_data;
 */
typedef struct Object_struct {
    int    obj_type;    
    char   *string_data;
    double double_data;
    int    boolean_data;
    char   char_data;

    struct Pair_struct      *pair_data;
    struct Procedure_struct *proc_data;
} Object;

To create a new string object, we would do something like the following:
struct Object_struct *new_empty_object(int obj_type) {
    struct Object_struct *new_obj = (struct Object_struct *) malloc(sizeof(struct Object_struct));
    new_obj->obj_type = obj_type;
    return new_obj;
}
...
...
...

struct Object_struct *new_str_object(char *string_data) {
    struct Object_struct* obj = new_object(OBJ_STRING_TYPE, string_data, 0, 0);
    return obj;
}

With Java, Compare and Contrast
The C implementation is an almost exact copy of the Java implementation. Of course, with the C implementation, we created the list, hashtable, and buffer libraries that are available in Java as well as some other modifications. But, the logic is essentially the same as the JScheme and mini scheme code. For example, here is the C and Java version of the lisp 'read' routine.
/**
 * Read and return a Scheme expression, or EOF.
 */
struct Object_struct *read(struct Octane_struct *octane) {
    struct Object_struct *token;
    char *str;
    printf("trace: read()\n");

    token = next_token(octane);
    if (token == NULL) {
        return NULL;
    }
    if (token->obj_type == OBJ_STRING_TYPE) {
        str = token->string_data;
        if (strcmp("(", str) == 0) {
            return read_tail(octane);
        } else if (strcmp(")", str) == 0) {
            printf("WARN: Extra ')' ignored.");
            return read(octane);
        } else {
            return token;
        } // End of the if - else    } else {
        return token;       
    } // End of the if - else}

And the Java version...
    /**
     * Read and return a Scheme expression, or EOF.
     */
    public Object read() {
        System.out.println("trace: read()");
        try {
            Object token = nextToken();
            System.out.println("trace: read() token=" + token);
            if (token == "(") {
                return readTail();
            } else if (token == ")") {
                System.out.println("WARN: Extra ')' ignored.");
                return read();
            } else {
                return token;
            } // End of the if - else        } catch (IOException e) {
            System.out.println("WARN: On input, exception: " + e);
            return EOF;
        } // End try - catch    }


References

Full Source for all three implementations, all_simple_lisp
Description of the Java Implementation

Description of the Erlang Implementation

http://jvmnotebook.googlecode.com/files/mini_jscheme.tar.gz

Bugs

In the load_scheme function, there is an error with the use of 'strcmp', change to the following.
load_scheme():
...
    if (object->obj_type == OBJ_STRING_TYPE) {
                if (strcmp(object->string_data, OCT_EOF) == 0) {
                    printf("exiting read loop (EOF)\n");
                    return;
                }
            }
...

-----------
Addendum(2) - Simple RPN Calculator.

With very little effort, I changed the lisp interpreter into a concatenative/stack language interpreter. In the update, instead of parsing the lisp syntax, the parser detects the various tokens and places or pops values off the stack. With Lisp, the main data structure is the 'list', with stack languages the core data structure mainly consists of a LIFO stack.

"In Reverse Polish notation the operators follow their operands; for instance, to add three and four, one would write "3 4 +" rather than "3 + 4". If there are multiple operations, the operator is given immediately after its second operand; so the expression written "3 − 4 + 5" in conventional infix notation would be written "3 4 − 5 +" in RPN: first subtract 4 from 3, then add 5 to that. An advantage of RPN is that it obviates the need for parentheses that are required by infix"

http://en.wikipedia.org/wiki/Reverse_Polish_notation

Here is an example source file (test.joy):
10 20 +
;; Result after call should be 30
4 * dup *

The values 10 and 20 are the first values put on the stack, these numbers are input parameters to the '+' operation.  The result of '30' is placed onto the stack.  Then the value of 4 is placed on the stack. The '*' multiplication operator is called with 4 and 30 as parameters.  And so on.  Here is the output from these operations:
Usage: octanec <source file>
trace: [EVAL RESULT]:  object=661ef8 type=16 [10.000000]

+- STACK -- top ---------+
| id:0 10.000000
+--------- bottom -------+
trace: [EVAL RESULT]:  object=662060 type=16 [20.000000]

+- STACK -- top ---------+
| id:0 20.000000
| id:1 10.000000
+--------- bottom -------+
        $ function input => (+)
+- STACK -- top ---------+
| id:0 20.000000
| id:1 10.000000
+--------- bottom -------+
trace: [EVAL RESULT]:  object=662160 type=16 [30.000000]

+- STACK -- top ---------+
| id:0 30.000000
+--------- bottom -------+
trace: [EVAL RESULT]:  object=662228 type=16 [4.000000]

+- STACK -- top ---------+
| id:0 4.000000
| id:1 30.000000
+--------- bottom -------+
        $ function input => (*)
+- STACK -- top ---------+
| id:0 4.000000
| id:1 30.000000
+--------- bottom -------+
trace: [EVAL RESULT]:  object=662328 type=16 [120.000000]

+- STACK -- top ---------+
| id:0 120.000000
+--------- bottom -------+
        $ function input => (dup)
trace: [EVAL RESULT]:  object=662328 type=16 [120.000000]

+- STACK -- top ---------+
| id:0 120.000000
| id:1 120.000000
+--------- bottom -------+
        $ function input => (*)
+- STACK -- top ---------+
| id:0 120.000000
| id:1 120.000000
+--------- bottom -------+
trace: [EVAL RESULT]:  object=662570 type=16 [14400.000000]

+- STACK -- top ---------+
| id:0 14400.000000
+--------- bottom -------+
        $ function input => (print_stack)
+- STACK -- top ---------+
| id:0 14400.000000
+--------- bottom -------+
trace: [EVAL RESULT]:  object=662570 type=16 [14400.000000]

+- STACK -- top ---------+
| id:0 14400.000000
+--------- bottom -------+
building (EOF)
;;*************************************;; Result after call should be = '14400.000000';; In Joy, we get the following as well:;; 10 20 + 4 * dup *.;;*************************************

Most of the modifications from the original Lisp interpreter code were made to the apply function. Instead of constructing a linked-list pair structure, all operations are made against the main stack:
...
...
    if (minArgs == 2) {
        switch (idNumber) {
        case PLUS:
        case MINUS:
        case DIVIDE:
        case TIMES:
            compute_res = stack_compute(idNumber, main_stack);
            if (compute_res != NULL) {
                stack_push(main_stack, compute_res);
            }
            return compute_res;
        case SWAP:
            // swap      :   X Y  ->   Y X            // Interchanges X and Y on top of the stack.            last_obj  = stack_pop(main_stack);
            last_obj2 = stack_pop(main_stack);
            stack_push(main_stack, last_obj);
            stack_push(main_stack, last_obj2);
            return last_obj2;
        default:
            printf("Invalid id number=%d\n", idNumber);
            error("Internal error: unknown primitive: ");
            return NULL;
        }; // End of Switch    } else if (minArgs == 1) {
        switch(idNumber) {
        case DUP:
            // dup      :   X  ->   X X            // Pushes an extra copy of X onto stack.            last_obj = stack_pop(main_stack);
            stack_push(main_stack, last_obj);
            stack_push(main_stack, last_obj);
            return last_obj;
        case ID:
            last_obj = stack_pop(main_stack);
            stack_push(main_stack, last_obj);
            return last_obj;
        case POP:
            // pop      :   X  ->            // Removes X from top of the stack.            last_obj = stack_pop(main_stack);
            return last_obj;
        default:
            printf("Invalid id number=%d\n", idNumber);
            error("Internal error: unknown primitive: ");
            return NULL;
        }; // End of Switch...
...

More RPN/Concatenative language resources:

http://www.latrobe.edu.au/philosophy/phimvt/joy.html

http://haskellnotebook.googlecode.com/files/simple_rpncalc.tar.gz

http://math-services.appspot.com/lang2.jsp
-----------

Thursday, December 25, 2008

Factor tech talk at google by Slava

Slava does a good job with this factor google tech talk. If you haven't already heard about it, here you go.

http://www.youtube.com/watch?v=f_0QlhYlS8g

Tuesday, December 23, 2008

Great article on the Hotspot JVM JIT compiler

http://ssw.jku.at/Research/Papers/Ko08/

Abstract

"Version 6 of Sun Microsystems' Java HotSpot™ VM ships with a redesigned version of the client just-in-time compiler that includes several research results of the last years. The client compiler is at the heart of the VM configuration used by default for interactive desktop applications. For such applications, low startup and pause times are more important than peak performance. This paper outlines the new architecture of the client compiler and shows how it interacts with the VM. It presents the intermediate representation that now uses static single assignment (SSA) form and the linear scan algorithm for global register allocation. Efficient support for exception handling and deoptimization fulfills the demands that are imposed by the dynamic features of the Java programming language. The evaluation shows that the new client compiler generates better code in less time. The popular SPECjvm98 benchmark suite is executed 45% faster, while the compilation speed is also up to 40% better. This indicates that a carefully selected set of global optimizations can also be integrated in just-in-time compilers that focus on compilation speed and not on peak performance. In addition, the paper presents the impact of several optimizations on execution and compilation speed. As the source code is freely available, the Java HotSpot™ VM and the client compiler are the ideal basis for experiments with new feedback-directed optimizations in a production-level Java just-in-time compiler. The paper outlines research projects that add fast algorithms for escape analysis, automatic object inlining, and array bounds check elimination."

Monday, December 22, 2008

Basic tools for monitoring garbage collection, and more on JVM memory analysis (draft)

(Draft)
I was having a discussion online. C is better than Java. Java is better than C. You know, the conversation you have everyday on the web and come up with the same conclusions. When people mention the benefits of Java, they normally mention WORA (write once, run anywhere) and Java's garbage collection. The Java developer does not have to worry about his application crashing and what-not. And it just occurred to me. I see more crashes with Java servers and systems than I could ever imagine with systems that were built with C++ or C. I don't think I have worked at a Java shop where they don't periodically reboot their Java application servers. Does this happen with Apache/PHP driven applications? Shrug?. I don't want to get into that debate, but it is the reality in the J2EE/Java world that the Java virtual machine is not immune to crashes or the dreaded 'Out of Memory' Errors. But like everything in life, there are no guarantees, right?. The Java developer is supposed to know what he or she is doing? I am posting this entry to find out a little bit more on why this happens and what is going on with the garbage collection at the time, also a couple of tools to help monitor the JVM.

From wikipedia:

"Java uses an automatic garbage collector to manage memory in the object lifecycle. The programmer determines when objects are created, and the Java runtime is responsible for recovering the memory once objects are no longer in use."

Here is an example C program with a very obvious error.

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv) {
char *buffer = (char *) malloc(4 * sizeof(char));
buffer[0] = 'a';
buffer[1] = 'b';
buffer[2] = 'c';
buffer[3] = '\0';
buffer[1000099] = 'd';
printf("%s\n", buffer);
free(buffer);
return -1;
}

Here is the output:
Segmentation fault

Obviously, this is a simple application. Imagine that you are writing a C based web application server. In the millions of millions of lines of code, you didn't properly handle HTTPS requests for clients in France (I am using this as a horrible example) and you have an error just like this one. What happens? The server crashes and you have to call somebody in the middle of the night to reboot it. Does something similar happen in the J2EE world? It actually may happen more often than you might think.

Here is an example of poor Java programming. I am using the String concatenation operator, which will create N many new strings.

public CrashJVM test_SmallString1() throws Exception {

final int size = 1000;
stime();
printMemoryStats("test small string, begin");
for(int i = 0; i < size; i++) {
this.data_append += data_small1;
}

etime();
printMemoryStats("test small string, done");
return this;

}


This is one method, a setter method in my simple bean class. Let's say that
If we ignore that Java does caching, String interning of our String data, we might end up with a String that is 200k chars in size. And creating all of those new objects gets expensive very quickly.

Here is code that I used to crash the JVM due to out of memory errors:

public CrashJVM test_SmallString1() throws Exception {

final int size = 1000;
stime();
printMemoryStats("test small string, begin");
for(int i = 0; i < size; i++) {
this.data_append += data_small1;
}

etime();
printMemoryStats("test small string, done");
return this;

}

public CrashJVM [] test_SmallObjects1() throws Exception {
stime();
printMemoryStats("test small objects, begin");
final int size = 100000;
CrashJVM [] items = new CrashJVM[size];
for(int i = 0; i < size; i++) {
items[i] = new CrashJVM();
}
Thread.sleep(100);
printMemoryStats("test small objects, end");
for(int i = 0; i < size; i++) {
//System.out.println(items[i]);
}
etime();
printMemoryStats("null");
return items;
}

I reduced the amount of load that the JVM was under, so I could get a snapshot of the heap usage. Here is a chart on what happens during the various garbage collections.

Running
6.084: [GC 6.084: [DefNew: 896K->64K(960K), 0.0037010 secs] 896K->196K(5056K), 0.0037720 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
6.309: [GC 6.309: [DefNew: 960K->64K(960K), 0.0047180 secs] 1092K->348K(5056K), 0.0047850 secs] [Times: user=0.00 sys=0.01, real=0.00 secs]
6.610: [GC 6.610: [DefNew: 960K->64K(960K), 0.0054940 secs] 1244K->456K(5056K), 0.0055600 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
6.979: [GC 6.979: [DefNew: 960K->64K(960K), 0.0041030 secs] 1352K->626K(5056K), 0.0041770 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
7.190: [GC 7.190: [DefNew: 960K->64K(960K), 0.0030750 secs] 1522K->742K(5056K), 0.0031420 secs] [Times: user=0.00 sys=0.01, real=0.00 secs]
10.850: [GC 10.850: [DefNew: 960K->18K(960K), 0.0013450 secs] 1638K->759K(5056K), 0.0014110 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
************************

The output is generated with the 'verbosegc' and other JVM options:
(Note: The duplicated 'verbosegc' options gives more information about the garbage collecting)

java -verbosegc -verbosegc -verbosegc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps CrashJVM -Xms24m -Xmx32m > gc_anal.txt

There are 1800 data points in the output file and in the image. Each GC line in the gc_anal analysis file contains GC information about a minor garbage collection.
[GC [<collector>: <starting occupancy1> -> <ending occupancy1>, <pause time1> secs] 
<starting occupancy3> -> <ending occupancy3>, <pause time3> secs]

"If the incremental collection of the tenured generation cannot be completed before tenured generation is full, a major collection occurs and is indicated in the verbosegc output by the 'MSC' in the output."


<starting occupancy1> is the occupancy of the young generation
before the collection
<ending occupancy1> is the occupancy of the young generation
after the collection
<starting occupancy3> is the occupancy of the entire heap
before the collection
<ending occupancy3> is the occupancy of the entire heap
after the collection

see http://java.sun.com/docs/hotspot/gc1.4.2/example.html

For this simple example, the chart is not that interesting. During the first 20 seconds, there is a Thread.sleep(2000) call. After that, we allocate many new objects, especially string objects and wait for the JVM to stop working. It is interesting that we are getting close to the maximum heap size of 32MB just based on this simple application.

Simple Server Application

The first 'String/Object' test was not that complex. I wrote the most basic simple HTTP server and tested the JVM with large objects as well as modified how often the client would connect to the server.


public void run() {
running = true;
String line;
try {
BufferedReader bufReader = new BufferedReader(new InputStreamReader(in));
while(running) {
// read in a line from the client
line = bufReader.readLine();
if (line == null)
break;
// and write out the reversed line
System.out.println("[server/" + line.length() + "]" + line);
if (line.length() == 0)
break;
}
// Write a html response back
StringBuffer buf = new StringBuffer();
buf.append("HTTP/1.1 200 Ok\r\n");
buf.append("Server: Apache-Test\r\n");
buf.append("Connection: close\r\n");
buf.append("Content-Type: text/html\r\n");
buf.append("\r\n");
...
...
...
out.println(buf);


this.printMemoryStats("incoming request");
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (out != null) out.close();
if (in != null) in.close();
client.close();
} catch (IOException e2) {;}
System.out.println("[server] closing connection");
} // End of Try - Catch - Finally

}

The code above represents the 'run' method in the server thread. Once a client connects to the server, the server returns with a valid HTTP response. The HTML output is like a tabulated report you might find with a Struts or Spring JSP application. For each row in the report, the following bean is instantiated. Essentially, it is your run on the mill Java POJO bean.
public class SimpleReportBean implements Serializable {
private static final long serialVersionUID = 3415594062848249479L;

private String someNo = "234234...";
private Date sigDate = new Date();
private String userId = "AB9303...";

private BigDecimal purchaseAmount = new BigDecimal(0);
private BigDecimal cPaid = new BigDecimal(0);
private Date bDate = new Date();
private Date dPropDate = new Date();
private Date rDate = new Date();
private String userTitle = "Very Important";

private String mailingAddr1 = "1023 Scary Hollow Ln";
private String mailingAddr2 = "Next to the dumpster, APT 030303030K";
private String mailingCity = "Austin";
private String mailingState = "Texas";
...
...



The garbage collection chart above is output based on light load. I used 'curl' to generate a request every 2 of seconds. There are 10 rows in the report.



The chart above depicts output under heavy load. I removed the delay between requests. There are 500 requests and I increased the number of rows in the report to 100. It is not reaching the maximum heap memory at all, but it is minor garbage collection much more rapidly.



Sun just recently add a useful application to the 1.6 toolset, the heap analysis tool or 'hat'. If you generate the heap profile binary output, you can run this tool on the output file to give you better insight into to what is happening to the heap. What are objects are being created? There is a histogram of the total number of allocated objects and the total size. There is also a sophisticated javascript query language for finding trouble objects. The hat tool reads the input file and then launches a HTTP server application. I used the following script to analyze my GC output.

#!/bin/sh
javac CrashJVM.java
FORMAT=,format=b

# Add profiling and verbose garbage collection
java -verbosegc -Xrunhprof:file=dump.txt${FORMAT} Main -Xms24MB -Xmx32MB > d.txt
# To run the server:
jhat dump.txt




In the figure above, we are looking at the heap histogram in the browser on port 7000.



Here is one variation of the Java code used to test against:

import server.BetterReportBean;
public class Main {
public static void main(String [] args) throws Exception {
System.out.println("Running");
BetterReportBean bean [] = new BetterReportBean[3000];
for (int i = 0; i < 3000; i++) {
bean[i] = new BetterReportBean();
}
Thread.sleep(2000);
for (int i = 0; i < 3000; i++) {
bean[i].toString();
}
System.out.println("Done");

}
}

With the better report bean, we used constant strings and referenced them in our class fields. We did this and made a reference to these constants as opposed to creating new String, Date or BigDecimal objects. Essentially, we avoided creating a bunch of new objects that we didn't need to. Notice the interesting statistics a little further down.

With a main class and the Better Report Bean, class server.BetterReportBean
36 garbage collections were performed.

Here is some output from the garbage colection report:
...
[GC 1277K->390K(5056K), 0.0006050 secs]
[GC 1286K->399K(5056K), 0.0005540 secs]
[GC 1294K->407K(5056K), 0.0005580 secs]
[GC 1303K->416K(5056K), 0.0009580 secs]
[GC 1311K->424K(5056K), 0.0006540 secs]
[GC 1320K->431K(5056K), 0.0007520 secs]
[GC 1327K->431K(5056K), 0.0012980 secs]
...
...
Time to garbage collect:
(* 39 0.00070)
0.027 seconds.

The histogram data from the heap analysis tool shows a total size of:
647,190

With the normal class, Simple Report Bean, there
were 468 garbage collections.

Here is some output from the garbage collection report:
...
GC 15539K->14179K(20844K), 0.0034280 secs]
[GC 15587K->14221K(20844K), 0.0033610 secs]
[GC 15629K->14269K(20844K), 0.0033300 secs]
[GC 15677K->14311K(20844K), 0.0033950 secs]
[GC 15719K->14323K(20844K), 0.0033050 secs]
...
...
Time to garbage collect = 1.55 seconds
(* 470 0.0033050 )

The histogram data from the heap analysis tool shows a total size of:
1,516,712 bytes

Improvement
With a small set of changes to this object, we decreased the amount of memory allocated by half. We decreased the number of garbage collections from 470 minor collections to 40. And it took 0.027 seconds to do the garbage collection as opposed to 1.5 seconds.

public class BetterReportBean implements Serializable {
public static final String a = "23423...";
public static final String b = "AB...";
public static final String c = "Very Important";
public static final String d = "1023 Scary Hollow Ln";
public static final String ca = "CA";
public static final BigDecimal bd = new BigDecimal(0);
public static final Date dd = new Date();
private String someNo = a;
private Date sigDate = dd;
private final String userId = b;
...
...
...

Here is the the full source for the typical bean object and a better one with more use of constants.
Simple Report Bean (basic POJO)
Better Report Bean

Additional Tools (jconsole and IBM's Diagnostic Tool for Java GC)

JConsole is provided with the 1.5+ JDK. The screenshot depicts one of the jconsole views. You simply need to type jconsole at the OS command-line prompt and you are given the option to connect to a particular running Java application process id. I can see where this tool is useful for debugging local or dev J2EE servers.



The Diagnostic Tool for Java Garbage Collector is a tool provided by IBM for monitoring the Websphere server. Websphere will output a custom GC statistic output, typically to a file called native_stderr.log . The diagnostic tool can be used to analyze the file and generate the following application. And even though it is amazingly verbose, you might be able to interpret the raw output. Here is example XML output.

<af type="tenured" id="111" timestamp="Dec 11 08:37:03 2008" intervalms="873.459">
<minimum requested_bytes="131088" />
<time exclusiveaccessms="0.237" />
<tenured freebytes="43965512" totalbytes="176996864" percent="24" >
<soa freebytes="43693384" totalbytes="175227392" percent="24" />
<loa freebytes="272128" totalbytes="1769472" percent="15" />
</tenured>
<gc type="global" id="111" totalid="111" intervalms="876.833">
<refs_cleared soft="0" threshold="32" weak="123" phantom="0" />
<finalization objectsqueued="31" />
<timesms mark="166.795" sweep="7.041" compact="0.000" total="174.089" />
<tenured freebytes="101955256" totalbytes="176996864" percent="57" >
<soa freebytes="98864376" totalbytes="173457408" percent="56" />
<loa freebytes="3090880" totalbytes="3539456" percent="87" />
</tenured>
</gc>
<tenured freebytes="101824168" totalbytes="176996864" percent="57" >
<soa freebytes="98733288" totalbytes="173457408" percent="56" />
<loa freebytes="3090880" totalbytes="3539456" percent="87" />
</tenured>
<time totalms="177.606" />
</af>



Resources

http://java.sun.com/docs/hotspot/gc1.4.2/example.html - Diagnosing a Garbage Collection problem
Most Comprehensive Java Perf Guide - http://www.javaperformancetuning.com/
http://oreilly.com/catalog/javapt/chapter/ch04.html - More good performance tips.
https://hat.dev.java.net/ - Heap analysis tool. This tool is included with JDK6.

Java technology, IBM style: Garbage collection policies, Part 1

http://jvmnotebook.googlecode.com/svn/trunk/misc/heap_analysis/ - Browse the SVN repository.

GC Parse Python Script, tool used to parse the GC minor collection output

http://jvmnotebook.googlecode.com/files/heap_analysis.tar.gz - Full source download.

Addendum (gnuplot and regex)

I always feel that charts and graph are great visuals for displaying simple or complex concepts. I only know the surface level details of gnuplot, but here are some notes on creating the images above. I used a simple python script to analyze the GC statements generated by the JVM verbosegc output and then created a tabulated data file that can used with gnuplot.



I normally create a shell script to set all the gnuplot commands and launch the application:

#!/bin/sh
# Also see:
# http://www.duke.edu/~hpgavin/gnuplot.html
#
# Use 'eog' eye of gnome to view the images

OUTPUT_IMG_FILE=image_gc_analysis.png
INPUT_PLOT_FILE=image_gc_analysis.txt

# Output these commands to a temp file and then run gnu plot
echo "set terminal png
set output '${OUTPUT_IMG_FILE}'
set title 'GC results'
set size 1,1
set key left top
set autoscale
set xlabel 'gc_iter'
set ylabel 'gc_main_paus_time'
plot '${INPUT_PLOT_FILE}' using 1:6 title 'GC Results'
"
> gnuplot_tmp_cmd.tmp

# Run the gnu plot command
gnuplot gnuplot_tmp_cmd.tmp > /dev/null

# For future reference, add a command and title to compare results
# E.g.
# '/tmp/data2' using 10 with lines title 'Benchmark 2',


On Regex
In the gc_parse.py python application, I used a couple of lines of regex to find the GC patterns generated by the verbose gc JVM output.

def get_gc_stats(line):
# Example Output java 1.5:
# 20.049: [GC 20.049: [DefNew: 1941K->137K(1576K), 0.0014153 secs] 11194K->11117K(1984K), 0.0014628 secs]
# Example Output java 1.6:
# 20.049: [GC 28.200: [DefNew: 1921K->137K(1984K), 0.0006890 secs] 23030K->21247K(27320K), 0.0007550 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]

# Group1 = First Iter Value
# Group2 = Second 'GC' Iter value
regex_begin_iter = '^(\S*):\s*\[GC\s*(\S*):\s*'
# Group 3 = Young Gen Start Occupancy
# Group 5 = Young Gen End Occupancy
regex_def_new = '\[DefNew:\s*(\S*)(K|M)\-\>(\S*)(K|M)\((.*)$'
regex_full_str = '%s%s' % (regex_begin_iter, regex_def_new)
pattr = re.compile(regex_full_str)
# Group 1 = Main Start Occupancy
# Group 2 = Main End Occupancy
pattr_sub = re.compile('^(.*) (\S*)(K|M)\-\>(\S*)(K|M)\((.*)$')
...
...

------

Thursday, December 11, 2008

Working on a new programming language (octanelang)

I have had it. I am done with the current set of programming languages and writing my own. Maybe it is an effort in futility but I am frustrated enough to give it a go. There is always something missing from the language I am using.

I don't have the full details yet, but here are some must haves:

1. Must be scripting and compilable/bundled language.
2. Must have a separate virtual machine runtime with garbage collector.
3. Must have a command console
4. Must be well documented, have document code generator

Default External libraries and applications:

1. Application Web Server
2. Games
3. (optional) Text Editor, Development Environment
4. Network and web libraries

-----

Thursday, October 16, 2008

JVM Notebook New Release Set 2

New release including updates to projects to the following jvm languages: abcl, antlr, ASM, clojure, groovy, jruby, SISC, jython, scala (jvmnotebook_set2_20081016.zip)

http://code.google.com/p/jvmnotebook/downloads/list - jvmnotebook_set2_20081016.zip

----

Saturday, August 9, 2008

Application server performance testing, includes Django, ErlyWeb, Rails and others

The following represents a set of toy benchmark results for various web application servers including ErlyWeb, Hunchentoot, Django and other application servers. The goal was to test out of the box performance through a simple VIEW page. For example, I was trying to avoid static files but going through the controller and the view. In the Rails example, there is a simple controller that then activates the view template. The same approach is used with the ErlyWeb example. Ideally, we are trying to avoid the server loading the static files but in our case the server may be loading cached content. Overall, Erlang's ErlyWeb/Yaws had great performance and out of the box Django did very well. Tomcat and my various low-level simple JSP/Servlet apps also had stable performance.

Software and Hardware Configuration used in the Benchmark

Two machines were used in the benchmark, a Win32 dual core machine and a single CPU Linux machine.
Win32 Box:

  • Microsoft Windows XP, Vers 2002, Service Pack 2
  • Intel Core 2 CPU
  • 6300 @ 1.86GHz, 3.49 GB of Ram
  • Tomcat: Apache Tomcat Version 5.5.26
  • Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_11-b03)
  • Java HotSpot(TM) Client VM (build 1.5.0_11-b03, mixed mode)
  • Ruby: ruby 1.8.7 (2008-06-20 patchlevel 22) [i386-cygwin]
  • Ruby: Rails 2.1.0
  • JRuby: 1.1.13
  • Python: Python 2.5.1, Cygwin
  • Python: Django 0.96
  • Erlang: Eshell V5.6, R12B-3 Jun 9, 2008
  • Erlang: Yaws/Erlyweb: 1.77 ; erlyweb-0.7.2
  • Lisp: SBCL 1.0.13 Win32 (WITHOUT THREAD SUPPORT)
  • Lisp: Hunchentoot: 0.15.7

Linux Box:

  • Linux houston 2.6.24-19-generic #1 SMP Fri Jul 11 23:41:49 UTC 2008 i686 GNU/Linux
  • Ubuntu Hardy Heron 8.04
  • GNOME 2.22.2
  • 3.0 GB of Ram
  • AMD Athlon 64 Processor 3200+ (running at 2.0GHz)

  • Python: Python 2.5.2
  • Ruby: ruby 1.8.6 (2007-09-24 patchlevel 111) [i486-linux]
  • java version "1.6.0_06"
  • Java(TM) SE Runtime Environment (build 1.6.0_06-b02)
  • Java HotSpot(TM) Client VM (build 10.0-b22, mixed mode, sharing)
  • Apache Tomcat Version 5.5.23
  • Erlang: Eshell V5.6, R12B-3 Jun 9, 2008
  • Lisp: SBCL 1.0.14
  • Lisp: Hunchentoot: 0.15.7

Overview of the tests run

There were 10 tests run on windows and 5 tests run on linux. Most of the source and setups for each test are at the bottom of this entry. The tests are only used to demonstrate the out of the box configurations. There weren't any added optimizations or additional proxy servers used. I literally downloaded the entire compiler, interpreter environments and setup the servers 20 minutes later. (In the event you can't read the bar graphs, use the following order from left to right for those charts).

  1. Erlyweb_erl-1 - Yaws and Erly Web Test, simple controller with Lorem Ipsum data.
  2. Django_py-2 - Simple view Lorem Ipsum, using built-in python Django server
  3. JSP-3 - Simple JSP page on Tomcat (Lorem Ipsum data)
  4. Hunch_Lisp-4 - Simple Hunchentoot template (Lorem Ipsum data). No threading support.
  5. Rails-5 - With WebBrick, simple controller/view with Lorem Ipsum data
  6. JRuby-6 - Simple Botlist JRuby Framework, Simple View.
  7. JRuby_Rails_Tom-7 - JRuby Rails running under Tomcat (similar to Rails-5)
  8. JRuby_Rails_Brick-8 - JRuby Rails running under Web Brick (similar to Rails-5)
  9. JRuby_Rails_Jetty-9 - JRuby Rails running under Jetty (similar to Rails-5)
  10. Simple_HTTP-10 - Custom "Do it yourself" web server in 150 lines of Java IO code. This test should give us the best results because of the low overhead. You wouldn't want to use this as your default application server but could be used for other messaging or middleware purposes.

Linux Tests:

  1. LIN_HTTP-1 - Simple HTTP server, run on linux
  2. LIN_PHP-2 - Apache/PHP page
  3. LIN_LISP-3 - Hunchentoot on Linux with threading support
  4. LIN_JRUBY-4 - Simple Botlist JRuby/JSP page
  5. LIN_BOTLIST-5 - Botlist web-application listing page (demonstrates more complete application). Botlist uses JRuby, Spring, Hibernate with MySQL.

Testing software used

Two pieces of software were used to test these servers, Apache's JMeter and Apache Bench. Apache Bench and JMeter were used on the Linux box.

  • JMeter 2.3.2 - Generates the Summary report out, shown below. Most of these tests were run with 20 concurrent users with a slight delay of 20ms between each request. 1000-2000 requests per user. For slower servers on the Win32 side (like with Hunchentoot) could only handle 5 users and a fewer number of requests.
  • Apache Bench

Table Summary of Results, Win32 and Linux

Label# SamplesAverageMinMaxStd. Dev.Error %ThroughputKB/secAvg. Bytes
Erlyweb_erl-124000004210360.572236.286351
Django_py-230000244112384.596.67E-004152.891121.167508.88
JSP-32400000281.030259.351556.866147
Hunch_Lisp-460007722677234136.62013.02104.398208
Rails-5300001707105956.18085.53616.817385
JRuby-6240001038410.040217.331304.626147
JRuby_Rails_Tom-7450093529509514.56017.3322.571334
JRuby_Rails_Brick-82700045832104866.83030.18268
JRuby_Rails_Jetty-9360017511327257.83060.334.0168
Simple_HTTP-1040000612848.690402.76663.921688










Label# SamplesAverageMinMaxStd. Dev.Error %ThroughputKB/secAvg. Bytes
LIN_HTTP-1300001291101092.63094.3155.441688
LIN_PHP-230000311364.930215.21486.962317
LIN_LISP-330000115521653348.18098.14786.688208
LIN_JRUBY-430000453181174.310152.67916.456147
LIN_BOTLIST-53000019585203156.65071.084080.6958789


  • # Samples - Total number of HTTP requests
  • Average - Average time to complete request in ms
  • Min - Min time to complete request in ms
  • Max - Max time to complete request in ms
  • Throughput - Number of requests per second
  • KB/sec - KB output per second
  • Avg. Bytes - Average bytes per request. There is some variance in the data that is output for each view page.

Charts and Graphs

Note: Click on the chart thumbnail for a the full-size image (thanks for ruining my images blogger.com!)

Chart-1: Throughput on Win32 Box. From left to right: Erlyweb_erl-1 | Django_py-2 | JSP-3 | Hunch_Lisp-4 | Rails-5 | JRuby-6 | JRuby_Rails_Tom-7 | JRuby_Rails_Brick-8 | JRuby_Rails_Jetty-9 | Simple_HTTP-10


Chart-2: KB per second on Win32 Box. From left to right: Erlyweb_erl-1 | Django_py-2 | JSP-3 | Hunch_Lisp-4 | Rails-5 | JRuby-6 | JRuby_Rails_Tom-7 | JRuby_Rails_Brick-8 | JRuby_Rails_Jetty-9 | Simple_HTTP-10



Chart-3: Average time per request on Win32 Box. From left to right: Erlyweb_erl-1 | Django_py-2 | JSP-3 | Hunch_Lisp-4 | Rails-5 | JRuby-6 | JRuby_Rails_Tom-7 | JRuby_Rails_Brick-8 | JRuby_Rails_Jetty-9 | Simple_HTTP-10



Chart-9: Throughput (Req/Sec) on Linux. From left to right: LIN_HTTP-1 | LIN_PHP-2 | LIN_LISP-3 | LIN_JRUBY-4 | LIN_BOTLIST-5



Chart-4: Lisp throughput over time on Linux Box.



Chart-5: Apache/PHP time per request over time on Linux Box.



Chart-6: Lisp/Hunchentoot time per request over time on Linux Box.



Chart-7: Django Time per Requests Summary Win32



Chart-8: Rails/Webbrick Time per Requests Summary Win32

Apache Bench Output (Hunchentoot, PHP, and SimpleHTTP)

Here is console output from apache bench for some of the tests above. I wanted to use three different load testing suites; ApacheBench, JMeter, and Erlang's Tsung to see if they returned the same results. But AB and JMeter were only used.

------------------------------------
Server Software: Hunchentoot
------------------------------------
Server Hostname: localhost
Server Port: 4242
Document Length: 8208 bytes
Concurrency Level: 20
Time taken for tests: 47.605169 seconds
Complete requests: 5000
Total transferred: 41890000 bytes
HTML transferred: 41040000 bytes
Requests per second: 105.03 [#/sec] (mean)
Time per request: 190.421 [ms] (mean)
Time per request: 9.521 [ms] (mean, across all concurrent requests)
Transfer rate: 859.32 [Kbytes/sec] received

--------------------------------------
Server Software: Apache/2.2.8 (PHP)
--------------------------------------
Server Hostname: localhost
Server Port: 80
Document Length: 2317 bytes
Concurrency Level: 20
Time taken for tests: 5.182021 seconds
Complete requests: 5000
Total transferred: 12747822 bytes
HTML transferred: 11601219 bytes
Requests per second: 964.87 [#/sec] (mean)
Time per request: 20.728 [ms] (mean)
Time per request: 1.036 [ms] (mean, across all concurrent requests)
Transfer rate: 2402.34 [Kbytes/sec] received

--------------------------------------
Server Software: Simple-HTTP-Server-In-Java
--------------------------------------
Server Hostname: localhost
Server Port: 9980
Document Length: 1688 bytes
Concurrency Level: 20
Time taken for tests: 19.169574 seconds
Complete requests: 5000
Total transferred: 8860000 bytes
HTML transferred: 8440000 bytes
Requests per second: 260.83 [#/sec] (mean)
Time per request: 76.678 [ms] (mean)
Time per request: 3.834 [ms] (mean, across all concurrent requests)
Transfer rate: 451.34 [Kbytes/sec] received

Environment Setups and Source Code

Setting up the different environments involved out of the box (download and run) configurations.

Yaws and ErlyWeb (Erlyweb_erl-1):

1. Download and install the most recent version of Erlang. For Windows, I went through
the one-click installer. For Ubuntu-Linux, I compiled the most recent source.
2. Download Yaws; I installed Yaws using the steps detailed in the README files:
tar -xzf yaws-1.76.tar.gz
./configure --prefix=/cygdrive/c/erl_stuff/yaws
make
make install
3. Download Erlyweb (as of Aug 10, 0.7.2) and following the install instructions:
http://code.google.com/p/erlyweb/wiki/IntroductionTutorial

(Ease of Setup: 2.5/4.0 - where 4.0 is the easiest to configure) - Erlyweb is easy to setup, but I needed to perform some code changes to get the crypto library to compile. Also, the MySQL driver did not work on windows.

Here is the simple view code:
http://haskellnotebook.googlecode.com/svn/trunk/benchmarks/crud_blog/erlyweb/riki/src/components/

Python and Django (Hunch_Lisp-4):

I assume that you have Python already installed. It is available to most Linux systems. For Win32, there is a one-click installer executable as well as the cygwin version. For this test, I used the cygwin version.

For the Django install, I used the step-by-step guide.
http://www.djangoproject.com/documentation/install/

To run the server, used the built-in server:
python manage.py runserver 9980


The django view code:
http://haskellnotebook.googlecode.com/svn/trunk/benchmarks/crud_blog/django/ghostblog/templates/

Tomcat (JSP-3):

On Win32, I didn't perform a full installation. Download Tomcat and unzip the archive file, enter the bin directory and click on "startup.bat".

All web applications (archived war files) are deployed to the TOMCAT_HOME/webapps/ directory.


Rails Setup for running Webbrick (Rails-5):

1. Install Ruby
2. You can install Rails with Gems as described on the Rails homepage.
http://www.rubyonrails.org/down

3. gem install rails --http-proxy http://proxy.com:9999 (if you are behind a web proxy)
4. Run the rails command to generate the skeleton code:
rails blah

5. I created a generic controller:
ruby script/generate controller home index

6. Run the WebBrick server:
ruby script/server

Additional steps to run the JRuby applications.
To run the JRuby WebBrick application, just use the JRuby commands and executables as opposed to the (C)ruby commands.

To build a JRuby/Rails web-application, use the Warbler plugin:
(http://weblogs.java.net/blog/arungupta/archive/2008/04/rails_and_java_1.html)

./gem.bat install warbler --http-proxy http://localhost:9999
../jruby.bat -S warble

Copy the jblah.war file to the tomcat or jetty webapps directory


Hunchentoot on SBCL (Common Lisp) (Hunch_Lisp-4)

1. Install SBCL
2. Hunchentoot:
I created a setup guide for Hunchentoot, you can use these steps or if you are on Linux you can just use ASDF-INSTALL:
Enterprise Common Lisp Setup With Hunchentoot

3. After Hunchentoot is installed, I use the following commands to compile and load my application:
* (asdf:operate 'asdf:compile-op 'ghost-trinity)
* (asdf:operate 'asdf:load-op 'ghost-trinity)
4. To run the server:

* (hunchentoot:start-server :port 4242)


Simple JRuby Interpreter Application through Botlist (JRuby-6):



Simple HTTP Server in Java (Simple_HTTP-10):

The multi-threaded Java HTTP server is 150 lines of code. It is very basic code for processing a web client and then returning only one response.

http://haskellnotebook.googlecode.com/svn/trunk/benchmarks/crud_blog/simple_server/


BufferedReader bufReader = new BufferedReader(new InputStreamReader(in));
while(running) {
// read in a line from the client
line = bufReader.readLine();
if (line == null)
break;
// and write out the reversed line
//System.out.println("[server/" + line.length() + "]" + line);
if (line.length() == 0)
break;
}
// Write a html response back
StringBuffer buf = new StringBuffer();
buf.append("HTTP/1.1 200 Ok\r\n");
buf.append("Server: Apache-Test\r\n");
buf.append("Connection: close\r\n");
buf.append("Content-Type: text/html\r\n");
...
...
...

Conclusion Bullet Points


  • Erlang, ErlyWeb and Yaws had the best performance. I performed other side tests and went to 100 and 300 concurrent connections and there was no lapse in performance. Others have tested Yaws with 50000, 80000 connections and seen similar results.
  • Django had solid performance with the default server. 152.89 request per second. Average of 24 milliseconds a request.
  • Rails with WebBrick did ok. Normally this configuration is used for development only. 85.5 request per second and the average time to complete a request was 170 ms.
  • Based on "my" tests, the JRuby/Rails setup had the lowest performance, especially with a J2EE server like Tomcat or Jetty. With the JRuby WebBrick server. Each request took 458ms or 0.5 seconds at 30 requests a second.
  • I had trouble testing JRuby Rails with Tomcat and Jetty. Jetty gave the best performance of those two, but it was hard to run more than 5000 requests. I also had to reduce the number of concurrent users to four or five.
    For example, Tomcat would output the following errors after so many requests:

    Aug 9, 2008 11:02:03 AM org.apache.catalina.core ContainerBase$ContainerBackgroundProcessor processChildren
    SEVERE: Exception invoking periodic operation:
    java.lang.OutOfMemoryError: Java heap space

Full Source Download and Data Results

http://haskellnotebook.googlecode.com/files/benchmarks_webapps.tar.gz

Resources

http://yarivsblog.com/articles/2007/12/09/erlyweb-vs-ruby-on-rails-ec2-performance-showdown/

http://docs.codehaus.org/display/GRAILS/Grails+vs+Rails+Benchmark

----