Berlin Brown and Software Development: 2009

Tuesday, December 15, 2009

Most recent software projects and why

Here are my recent open source projects: (Most of these are proof of concepts, source included projects. Use subversion to check out any project).

botnodetoolkit - A complete Clojure/J2EE framework that makes it easy for developers to taking care of browser and server-side components. And other server applications.

haskellnotebook - The Haskell Notebook (another Haskell Cookbook) contains some common and not so common Haskell programming idioms that are contained in a collection of working/compilable examples. There is also a collection of source code from other functional programming languages including Lisp and Erlang. This project is similar to other cookbook projects that you may find on the web for various other programming languages.

iphonenotebook - Octanemech - simple opengl iphone demo. keywords: objectivec, opengl, iphone3, xcode

jvmnotebook - The Java Virtual Machine ( Sun's JVM is called HotSpot ) is a java bytecode interpreter which is fast, portable and secure. Jython, JRuby, Scala, ABCL (Common Lisp) are popular language implementations that run on the JVM that allow for the jvmcookbook sugar of their particular languages.

lightscalaplugin - A Light Eclipse Plugin for the Scala Programming Language

lighttexteditor - Light (also known as LTEC) is a text editor application for editing, viewing and searching through raw text documents.

websecuritynotebook - The web is one of the most pervasive technologies created. The hacker community knows this and will take advantage of web application vulnerabilities. This set of cookbook examples provide tips, tools and utilities to aid against cross-site scripting attacks, injection attacks and other resources.

openbotlist - Botlist is an example application, a proof of concept demo to demonstrate how to use a JVM Language (JRuby) in a J2EE environment. It could possibly work as a complete solution, but it is mostly an example implementation of a web application. The source is fully available and I encourage J2EE web application developers to browse the source.

octanemech - Octane Mech is a 3D, OpenGL based Mech game written in Haskell.

octaneos - The Octane OS Project started with the intent to design a lightweight Operating System. It has a small kernel for an x86 system. The host development environment is designed to work with the Bochs x86 emulator or VMWare. The base shell environment will be with the Factor programming language.

-----

The phantoms, for so they then seemed, were flitting on the other side of the deck, and, with a noiseless celerity, were casting loose the tackles and bands of the boat which swung there. This boat had always been deemed one of the spare boats, though technically called the captain's, on account of its hanging from the starboard quarter. The figure that now stood by its bows was tall and swart, with one white tooth evilly protruding from its steel-like lips. A rumpled Chinese jacket of black cotton funereally invested him, with wide black trowsers of the same dark stuff. But strangely crowning this ebonness was a glistening white plaited turban, the living hair braided and coiled round and round upon his head. Less swart in aspect, the companions of this figure were of that vivid, tiger-yellow complexion peculiar to some of the aboriginal natives of the Manillas;—a race notorious for a certain diabolism of subtilty, and by some honest white mariners supposed to be the paid spies and secret confidential agents on the water of the devil, their lord, whose counting-room they suppose to be elsewhere.

Thursday, November 12, 2009

Google Testability Blog

http://misko.hevery.com/

Great testability blog

Monday, October 5, 2009

Healthcare Reform in America is about public safety not about taking access

What is wrong with the debate?

One issue I have with the entire health care reform debate is not with any particular bill or any particular person that is for or against reform. I take issue with how they frame the debate. They never look at the crisis with our health care system as a safety issue. People will get sick and people will die prematurely if they don't have the proper insurance or access to health care. The alarms should go off at this startlingly revelation. People will die and could potentially spread disease to others if they are hesitant about seeking medical care. According to the Department of Medicine at Cambridge Health Alliance, 46 million people will die because of lack of insurance.

"The Harvard report, finding 45,000 excess deaths linked to uninsurance, made news partly because it was so much larger than past estimates. Why the big difference between the Urban Institute/IOM numbers and the latest report? Dr. Woolhandler explains that the Harvard researchers aimed to replicate what IOM had done back in 2002 but with more recent data" -- factcheck.org

Why should the federal government be concerned with our safety with these indirect dangers but don't jump at the oppurtunity to remedy some of the issues that we know are directly causing the death of many Americans right now. Why spend billions for wars in Iraq in Afghanistan? Terrorists might potentially attack us at home from the foreign land. Why go there when we know tens of thousands will die to heart disease and cancer and other ailments, some prematurely because they don't have adequate health care insurance. Or their health insurer will deny their claims. These problems that exist in the health care industry aren't a fuzzy satellite photo where a potential threat might exist. These are people in the US that have diseases and have to wait out for their cancer or diabetes or other chronic illness because the companies that can protect them are out to make a profit.

FactCheck.org Related Posts:

Dying from Lack of Insurance

"A new study from researchers with the Harvard Medical School found that 46,000 deaths a year can be attributed to the lack of health insurance."

http://www.factcheck.org/2009/09/dying-from-lack-of-insurance/

----

"From 2002 through June 30, 2009, six of the largest insurers operating in California rejected 47.7 million claims for care – 22 percent of all claims.""

http://www.factcheck.org/2009/09/denial-of-claims/

----

"A study determined that 46.2 percent of bankruptcies were attributable to a major medical reason. Debtors cited at least one of the following specific causes: illness or injury (28.3 percent of respondents), uncovered medical bills exceeding $1,000 in the past two years (27 percent)"

http://www.factcheck.org/2008/12/health-care-bill-bankruptcies/

Thursday, September 24, 2009

Modify javap to output to String, view decompiled information at runtime.

I asked the following question on Stackoverflow:

"I am working with an application webserver and have complicated classloading issues. I want to be able to download or print information that would normally get printed by javap (the methods, etc).
I may also need to get the raw binary class data, to perform a binary diff.
How would you do this?"

I was not satisfied with the responses from stack. Sure, I could have used reflection to print information on a particular class from the web, but I was looking for the output from javap.

Javap is essentially a decompiler that comes with Sun's set of Java tools, it is used to print the method names and other information for an associated class.

It is a useful tool, how do you run the tool on code that is deployed on a server, at runtime?

It seemed obvious that javap was written in Java, so I naturally assumed that the source was available with openjdk. Source for javap is available GPL licensed in the mercurial repository. There is no use of the java.lang.Reflection API, all data collected on the class is based on the binary content of the class.

http://hg.openjdk.java.net/jdk7/jaxp/langtools/

I used the source to add small modifications to run javap in a web environment, simply converting the data to a string.

I made two major changes to extract the class information dynamically. One: return the current thread, current classloader and then find the class resource. Two: instantiate a bytearrayoutputstream object and use the outputstream to write the javap information. Then use the bytearrayoutputstream to convert the data to a string object. After performing those steps, use the string data in any environment you need to.

   
    public void displayResults(InputStream classinMem) {        
        final InputStream classin = classinMem;
        try {
            // actual do display
            JavapPrinter printer = new JavapPrinter(classin, out, env);
            printer.print();

        } catch (IllegalArgumentException exc) {
            error(exc.getMessage());
        }     
    }
    public void entry(String[] argv) {        
        final ByteArrayOutputStream baos = new ByteArrayOutputStream();
        this.out = new PrintWriter(baos);
        try {            
            this.perform();
        } finally {
            out.close();
        } // End of the try - Catch Finally //                
    }
    public void perform() {
        final InputStream input = Thread.currentThread().getContextClassLoader().getResourceAsStream(classNameParm);                
        if (input == null) {
            throw new IllegalStateException("Invalid class resource, classname=" + classNameParm);
        }
        displayResults(input);
    }

Shown in the screenshot below, we get similar output as String content.

Download the Full Source
Download the source, tested with Java 1.5 compiler and runtime.

http://jvmnotebook.googlecode.com/svn/trunk/misc/javap/
http://jvmnotebook.googlecode.com/files/javap_to_string.zip

Disclaimer: I didn't have time to re-compile the code after some changes. So, you may have to make obvious edits to get all of the code to work. And the build.xml is not working.

------------------------

Thursday, September 17, 2009

Random Software Developer Thoughts - Entry Number One

I have these thoughts that pop into my head related to software. I always say I need to write these downs. Here is my first entry, writing down these thoughts. They are amazingly random, in a list format. Sometimes I may need to elaborate. I write software for a living, if you don't work in software, maybe this will help you get into the mind of a software developer.
Note: a lot of the comments are geared towards J2EE java development.

Entry Number One

Software development can be very boring sometimes. Think about it, you sit in a cubicle/office eight hours a day, staring at a computer screen. For major portions of your life.
Software development can be very interesting sometimes. You write software that thousands or millions or no one uses regularly. The user normally doesn't have a clue how the software gets written and there are all kinds of fixes, hacks, and easter eggs left in code. You as a developer may have minutes or hours that can effect a major release. A change that can cost millions of dollars. It kind of sucks that developers don't get good coffee.
No one understands the importance of quiet. A noisy office is an unproductive office.
I don't understand how git is so much better than svn. I guess Linus says so, so it must be true. Hasn't svn been used in production environments? Will git make a project better? Shrug
Mac hardware is pretty nice. Mac software isn't so bad either, the software just runs
Microsoft Windows software is functional. There are definitely times when Microsoft Windows XP needs to be slapped around. Why did it take 15 minutes for my machine to boot. It booted in 2 minutes a couple of months ago? Why did my software just lockup?
I haven't ever seen the blue screen of death on a WindowsXP, Win2k machine.
I rarely saw the blue screen of death on a Win98 machine.
Linux is also functional. Distros like Ubuntu Linux could work on polishing some of the user interface, making things easier. Sure Ubuntu Linux is a lot easier than Redhat 5.0 back in the mid 90s. But is usability as easy as on Mac? Software evolves. Ubuntu Linux will get better over time. Right now, it is not amazingly great. Think about it. Is Ubuntu Linux 9.04 better than Redhat5? Yes. Will Ubuntu Linux version 20 work better than Ubuntu 9.04? Yes. Is Apple's Snow Leopard a better user experience than Ubuntu Linux 9.04? Maybe. For me it is.
Eclipse IDE can suck
Eclipse IDE sucks
Why does Eclipse IDE take 1.5 gigs of ram to run, according to the Windows Task Manager? There is only one project.
Java development can suck.
Companies pay money for Java developers to write code
There are a lot of companies looking for Java developers
Can Java developers program?
What about Python? Can they program? Are a lot of companies paying python programmers?
I hate the idea of contracting. Maybe I am too lazy to convince people to personally give me money.
Contractors make a lot of money
Most contractors I have met aren't worth a percent of a percent what they are paid
Especially Java Contractors
Why are programmers so egotistical and angry (this is based on my observations on Internet forums). "How do I read a file in Java?" "You suck, go read the Java API" "Get out of here". Never heard that in ##java.
I wondering if you asked "Where are the political books in a bookstore"? Would the salesperson tell you to RTFM? Would Gandhi?
Lisp
Common Lisp and Scheme are interesting syntactically
Scala, Haskell, Clojure are interesting languages.
ObjectiveC is pretty fast
There are a lot of women programmers. Everywhere I have worked, the percentage has been 40%/60% female/male or 30%/70%.
Yet another myth
I am male, I don't see any differences in the way that womem work? Why would there be?
Developers at startups are normally young.
Developers at startups are young, dress like hippies and hover around major California cities.
Startup developers always seem distracted and unprofessional
Developers are major companies are older
Is web application programming, building web systems really programming?
Is making a simple configuration change, calling a Java API programming?
Why does lunch for software developers always include Chinese food?
Software developers are diverse, where are all the American born programmers?
J2EE Application Servers are Slow.
J2EE Application Servers are slow and eat up a lot of memory. Maybe that is why they are slow? Too many garbage collections. Garbage collects are slow.
Public Internet service sites like google, yahoo are pretty fast. Must be heavy used of Memcache type systems.
Web development can suck
Java Server Pages are horrible templating systems. They allow a developer to break many fundamental programming paradigms. Do you use jsp:include? How do you structure your document? What about all of these taglibs that do the same thing? Scriptlets. We should sue Sun.
There are a lot of Java frameworks out there? But not that many. There are only a dozen or so popular ones. Hibernate, Spring, JSF, Struts.
Game Development with OpenGL is hard
But not that difficult.
I never did learn Perl but I did OK with scripting in Python
Ruby Language
I wonder what the software development world is like in Japan
Are most software developer atheist?
Isn't programming a lot easier now than 40 years ago?
Bill Gates helped work on Microsoft Basic
Am I done yet?

Those are today's thoughts.

Sunday, September 13, 2009

Practical Unix/Linux GNU Find Command Examples

Find is a very popular program for searching for a particular set of files. If you combine the find command with other applications like 'grep' or any other command then you will be even more productive. Here are some practical examples in the listings below. The commands are run Mac OSX, Darwin Kernel Version 9.8.0.

Find Man Page (bsd unix)


AME
     find -- walk a file hierarchy

SYNOPSIS
     find [-H | -L | -P] [-EXdsx] [-f pathname] pathname ... expression
     find [-H | -L | -P] [-EXdsx] -f pathname [pathname ...] expression

DESCRIPTION
     The find utility recursively descends the directory tree for each pathname
     listed, evaluating an expression (composed of the ``primaries'' and
     ``operands'' listed below) in terms of each file in the tree.
...
...

Basic Find

Here the most basic usage for find.

After running the command, a listing of the files with the extension .m (objc code) are displayed.

Basic Find, other extensions

Here is an example of find with .m files and .h files.


227:octanemech berlinbrown$ find . \( -name '*.m' -or -name '*.h' \)
./Classes/EAGLView.h
./Classes/EAGLView.m

Find and executing commands on each file
The find command is not as powerful by itself. You can execute commands on each file that will be returned by the find command. Here is a useful example, find a set of files and perform grep on each file.


27:octanemech berlinbrown$ find . -name '*.m' -exec grep -H Berlin {} \;
./Classes/EAGLView.m://  Created by Berlin Brown on 9/2/09.
./Classes/octanemechAppDelegate.m://  Created by Berlin Brown on 9/2/09.
./main.m://  Created by Berlin Brown on 9/2/09.
227:octanemech berlinbrown$

The {} argument expands to a set of filenames.


Here is another common one-liner, remove all of the subversion .svn system directories.  This command will turn your directory back to the way it was before you used it for your subversion project.

find . -name '.svn' -exec rm -Rvf {} \;

Find, exec and wc (line/character count)
I am software developer and I am need to use the wc command to count the number of lines in a source file. Here is an example:


227:octanemech berlinbrown$ find . -name '*.m' -exec wc -l {} \;
     276 ./Classes/EAGLView.m
     542 ./Classes/octanemechAppDelegate.m
     323 ./Classes/Texture2D.m
      18 ./main.m


227:octanemech berlinbrown$ wc `find . -name '*.m'`
     276     826    8426 ./Classes/EAGLView.m
     542    1830   17171 ./Classes/octanemechAppDelegate.m
     323    1216   10513 ./Classes/Texture2D.m
      18      48     364 ./main.m
    1159    3920   36474 total
227:octanemech berlinbrown$ wc -l `find . -name '*.m'`
     276 ./Classes/EAGLView.m
     542 ./Classes/octanemechAppDelegate.m
     323 ./Classes/Texture2D.m
      18 ./main.m
    1159 total

There is one issue with running find this way, you may not get the totals, I want to use awk in one of the find/wc examples:


227:octanemech berlinbrown$ find . \( -name '*.h' -or -name '*.m' \) -exec wc -l {} \; | awk '{s+=$1} END {print s}'
1385

In natural language, after running the command above, it might read as; Find all of the files with the extension .h or .m, execute the wc command on the files, pipe the output to awk and sum the total.
Adding more filters
Here is the one-liner to find files that or at least 5k.


find . -name '*.m' -size +10k


27:octanemech berlinbrown$ find . -name '*.m' -size +1k
./Classes/EAGLView.m
./Classes/octanemechAppDelegate.m
./Classes/Texture2D.m

To find the files that have been modified within the last day:


find . -mtime -1
or 60 minutes...
find . -mmin -60

More Find Examples


Find then do a search and replace with 'sed' (replace with empty string):

find -name '*.java' -exec sed -i 's/SOME_KEYWORD//g' {} \;

Find files with *.h,*.m extensions and copy to a directory:

find -name '*.h' -or -name '*.m' -exec cp -vf {} ./tmp \;

Count number of total lines for a set of files using awk:

find . -name '*.d' -exec awk '{print NR}' {} +

$ wc -l `find . -name '*.m'` | sort
 ...
   370 ./Inputs/Gesture.m
   378 ./Models/k_wing.m
   422 ./TouchFighterAppDelegate.m
   444 ./GameObjects/Bullets.m
   473 ./Core/ParticleSystemObject.m
   572 ./Support/OpenGL/Texture2D.m
   837 ./GameObjects/SpaceShip.m
  1343 ./TouchFighterView.m
 12745 total

Thursday, September 3, 2009

iPhone Development on Win32 Redux (build error log)

I was trying to build the iPhone toolchain according to "PJ Cabrera's - Write native iPhone applications using Eclipse CDT" article. I made it pretty far with some minor hiccups. I know there is a lot of noise in this blog entry, but I wanted to post the error in case it comes up again. As of yet, I still haven't gotten the win32/iphone/opengl examples to work. Plus, I have the issue of a simulator. The simulator doesn't exist for win32.

Yea, your best bet is to just get a Mac and do iPhone development on it.

You can view the current source here:

http://haskellnotebook.googlecode.com/svn/trunk/iphone/gl/HelloWorldWin32/

Error 2 (OpenGL)

OK, I was able to build an application, I had some issues with OpenGL apps.


/usr/local/bin/arm-apple-darwin9-gcc -c -I/usr/local/lib/gcc/arm-apple-darwin9/4.2.1/include -isysroot /usr/local/iphone-sysroot  HelloWorld.m -o HelloWorld.o
In file included from HelloWorldApp.h:11,
                 from HelloWorld.m:11:
/usr/local/iphone-sysroot/usr/include/OpenGLES/EAGL.h:27: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'EAGLRenderingAPI'
/usr/local/iphone-sysroot/usr/include/OpenGLES/EAGL.h:59: error: expected ')' before 'EAGLRenderingAPI'
/usr/local/iphone-sysroot/usr/include/OpenGLES/EAGL.h:60: error: expected ')' before 'EAGLRenderingAPI'
/usr/local/iphone-sysroot/usr/include/OpenGLES/EAGL.h:65: error: expected specifier-qualifier-list before 'EAGLRenderingAPI'
make: *** [HelloWorld.o] Error 1

Error 1:

OK, to fix the one error there, I need to make sure that the symbolic links for some of the gcc libraries were correct.

Here is the script to correct this problem:

http://haskellnotebook.googlecode.com/svn/trunk/iphone/gl/HelloWorldWin32/ScriptTryFixLib_Links.sh

But, now I am stumped again, here is the trace of the Makefile output: Anyone have any ideas?


/usr/local/bin/arm-apple-darwin9-gcc -framework CoreFoundation -framework Foundation -framework UIKit -lobjc -bind_at_load -isysroot /usr/local/iphone-sysroot -o HelloWorld HelloWorld.o HelloWorldApp.o
/usr/local/bin/arm-apple-darwin9-ld: /usr/local/iphone-sysroot/usr/lib/libstdc++.6.dylib truncated or malformed object (mach header extends past the end of the file)
/usr/local/bin/arm-apple-darwin9-ld: warning multiple definitions of symbol _NSStreamDataWrittenToMemoryStreamKey
make: *** [HelloWorld] Error 1

Tuesday, August 11, 2009

Great site, political science and statistics.

http://www.drewconway.com/zia/

-----

"It must be borne in mind that all this time we have a Sperm Whale's prodigious head hanging to the Pequod's side. But we must let it continue hanging there a while till we can get a chance to attend to it. For the present other matters press, and the best we can do now for the head, is to pray heaven the tackles may hold."

-- Ron Paul 2012

Saturday, August 8, 2009

TeX article/book documents to PDF

Here is a collection of *.tex scripts and makefile for creating article documents. Just an example. Supports output to PDF, ps, dvi, bibliography example, tested with cygwin.

http://jvmnotebook.googlecode.com/svn/trunk/book/articles/example/

Win32 one instance setup for emacs

http://botnode.com/emacs.html - simple setup for running one instance of emacs on windows.

Emacs is a powerful text editor. Some might consider Emacs one of the most powerful editors in computing history. It is highly configurable, ported to many different systems and once you get familiar with how to use it, you can be very productive. Emacs is also useful for working with many different programming languages. Emacs is built on top of a Lisp dialect called Emacs Lisp. Other Lisp dialects benefit from the the built-in Lisp syntax highlighting. That is just one example of a non-mainstream language that has support in Emacs.

Emacs is especially popular in an Unix/Linux environment. Because Emacs is GNU and opensource, a free version is available for the Windows platform as well. One issue that new users to emacs may encounter is getting the editor to work like other Windows editors. This small document is how I configure emacs to only allow one instance to run on Windows. This function is not configured by default. This is not about the best setup for Emacs or a robust emacs configuration file, merely a setup for allowing you to have only one instance of emacs running while you open multiple files.

I downloaded the emacs windows binary, version emacs-23.1 and unzipped the archive file to my local drive. I didn't use an installer. I simply unzipped the file and set the emacs bin directory in the system PATH environment variable.

There are two executables that we are most concerned about , runemacs.exe and emacsclientw.exe. The emacsclient is the socket based client application that checks for a running emacsserver instance. If one is available then open the file in a running instance of emacs.

There are two other files that we are going to add to the bin directory. Add the files vb_emacs_start.vbs and run_emacs.bat to the bin directory. Note: the filename and path of these files really aren't important. I used the emacs bin directory for convience. The vb_emacs_start.vbs is the visual basic script for detecting if an instance of emacs is already running. If one is running than run emacsclientw. If one is not running then run the default runemacs.exe.

Cscript.exe is a tool that allows you to run a script file at the command line. You can run VBScript or JScript. The full script to check for emacs or launch a new instance of emacs is shown in the screenshot above.

For convience, I normally include a wrapper batch script to invoke cscript. Here is the content in run_emacs.bat:

@echo off
cscript "F:\downloads\emacs-23.1-bin-i386\emacs-23.1\bin\vb_emacs_start.vbs" %1 %2 %3 %4
exit

-------------------------

Download and review the scripts

run_emacs.bat.txt - simple emacs wrapper script
vb_emacs_start.vbs.txt - vbscript for checking/launching emacs
emacs_conf.emacs.txt - bare bones, emacs configuration. Notice the emacs server line at the bottom
-------------------------

-- Ron Paul 2012

Friday, August 7, 2009

Polyworld: Using Evolution to Design Artificial Intelligence

http://www.youtube.com/watch?v=_m97_kL4ox0

"Polyworld: Using Evolution to Design Artificial Intelligence"

Wednesday, August 5, 2009

MATCH.COM FRAUD ALERT!!! MATCH.COM FRAUD

So, I signed up for match.com. I knew even before the get go that they were going to be a fraudulent company. I knew it. And I had no intention to use the service at all. Curiosity got the better of me. So, I signed up and paid for the ONE month service. Notice ONE month service. Let me say it again. ONE month service. You would think that a ONE month service would only charge you or keep you active for ONE month. No apparently, I have been charged month after month with the ONE month fee. It was my fault, I used an account that I explicitly use for online transactions like this one and didn't notice the charges. Well, I did happen to notice match.com and there it was.

Ask yourself, why would I get charged several months, for a one month service. There is nothing explicit on the site at all about automatic renewals. It is so unfortunate that a company has to resort to these tactics.

Here are the prices for the service.


$34.99 -- 1 momth
$19.99 -- 3 months
$16.99 -- 6 months
$17.99 -- 6 months

So, if I sign up for the 1 month service, that is $419.88 a year if I get charged for the entire year. $36 for the most expensive 6 month service.

Saturday, August 1, 2009

Vacation, AI, and Haskell

Well, I have time off. Normally what that means is, I spend many late nights studying constantly. Some people go to the beach. I like to research.

This time I am going back to AI and Haskell. Maybe combining the two. I have a couple of themes in my AI research. I have followed the same theme since I was young. Biology, Physics and how it relates to AI. Essentially, you need to design AI from the ground up, create an environment, a virtual environment for this entity to exist and then let the intelligence flourish within that environment. Then make connections between the real environment and our environment.

Interesting AI is difficult. It is too bad that researchers mix the field with informatics and machine learning and all things that seem to belong to the more generic field of Computer Science. Genetic life is the only interesting field that seems like true AI, maybe neural network research.

So, it is my opinion, that to create the AI that we dream of. Not the CS AI you read about in research journals, we need create the AI lifeform and have it exist within a AI virtual environment. Then we interact.

Wednesday, July 29, 2009

JVM Notebook: Basic Clojure, Java and JVM Language performance

"Measure, don't guess. The primary goal of all performance tuning exercises should be maximize the end user experience given the resource constraints." [1]

"It isn't so much a "farewell to the J" as an expansion of the platform opportunities Java provides. Sun's investment to power ongoing development of JRuby and Jython broadens the range and reach of Java, as a whole." -- Rick Ross

Overview and JVM Languages

One of the exciting trends to recently emerge from the Java community is the concept of the JVM language. These technologies are all that you would expect them to be. They are implementations of languages that run on the Java Virtual Machine. Some are newly created and some are based on existing, more mature languages. JRuby, Jython are two JVM languages based on CRuby and CPython. Groovy, Scala, Clojure are three completely new JVM languages that were created to add new language features that weren't supported by the core Java language. Some must or can be compiled. Some run without compilation. You can easily compile Scala code to Java bytecode. Clojure also allows this feature (ahead of time compilation). Clojure and JRuby code can also run without having be explicitly compiled. You can interact with the language. In most cases and with most JVM languages, you have full access to existing libraries that were written in pure Java. And normally you can access existing JVM language code from Java (known as Java interoperability). In most cases, it is easier to access Java calls from the JVM language than it is to call the language code from Java. It really depends on the language. In the snippet below, there is a call to the System static method, 'nanoTime'. Simply invoke the system like you would from pure Java.


require 'java'

def helloWorldBaseline
  # Run the application
  start1 = java.lang.System.nanoTime()
  arr = Array.new
  (1..1000000).each {
    res = 2.2 * 4.1 * 5.4
  }
  end1 = java.lang.System.nanoTime()
  diff = (end1 - start1) * 1e-6  
  puts "Elapsed Time: #{diff} ms"
end

Dynamic Compilation - "Clojure is a compiled language, so one might wonder when you have to run the compiler. You don't. Anything you enter into the REPL or load using load-file is automatically compiled to JVM bytecode on the fly. Compiling ahead-of-time is also possible, but not required" -- Clojure Docs on being Dynamic.

On Performance

"If you want to measure something, then don’t measure other shit." -- Zed Shaw [3]

   For the more popular JVM languages, like JRuby and Jython, there isn't much of a difference between running code in their respective C implementations. JRuby is especially well known for being very portable. With JRuby release 1.3.1, JRuby is compatible with CRuby 1.8.6. Jython 2.5.0 was released last month and brings the level of compatibility to CPython versions 2.5. Django and other popular CPython based frameworks are able to work with Jython. You may be wondering, if the Java Virtual Machine language is compatible with the C language, native implementation, is there a loss in performance when running on the Java Virtual Machine? Is there a major loss in performance? That is this purpose of this document, how much time does it take for a particular piece of code to run in JVM language? How long does it take to run similar routines using pure Java code? I want to make it clear, you will not find a large scientific benchmark run under clean room like conditions. I want to present a simple set of routines and how long it took to run. How long did the Clojure code run? How long did the Scala code run? Basically, I want to present the code and how long each test ran, but I don't want to claim that anyone language or piece of code is faster or slower based on these tests. You could say that most of the pure Java code ran faster. Most of the time, that is what happened after running these tests. But there is too much confounding in my tests. Like Zed Shaw said, "If you want to measure something, then don't measure other shit." [3] There is a lot of stuff in my tests to not make these an official comparison. There is a lot of confounding. But, here is the code, here is how long it took to run? It be relevant in more common tests like a Project Euler problem. Project Euler is a website that contains math problems intended to be solved with computer programs. In Project Euler problem number one, I write a program in Clojure and then in Java. They both run on the JVM and the same value is returned. What was the execution time for each program? Simple tests, simple results.

   When working with JVM languages and possible performance bottlenecks, you want to consider execution time, but you also want to look at the garbage collector and heap memory usage. Garbage collection is an expensive operation. It won't take a minute to run a garbage collect, but it will take cpu cycles away from your application. JVM code runs in a protected environment, the garbage collector provides automatic memory management and normally protects you from improper memory use. And the garbage collector attempts to free up memory that is no longer needed. You can't normally control when the JVM runs garbage collection and certainly don't want force it. But if you monitor your application, you can identify memory leaks or other problems that might cause performance bottlenecks. It will normally be evident where there is a problem. If you see too many garbage collects within a very short period of time and your total available memory is maxed out, you might eventually encounter an out of memory error. In a long running server environment, most of your performance issues might be alleviated if you look at proper heap memory use. Is your code forcing too many garbage collections within a short period of time? Are you creating too many large objects and holding on to them for too long? In performance tuning your application, there are many things to consider. It may not just be improving a particular algorithm. Consider heap memory and object allocation as well. For most of the tests, there are performance stats, memory stats and other garbage collection statistics.

On Clojure

   Clojure is a Lisp dialect created in 2007 by Rich Hickey. It recently reached a 1.0 release and has a large following of Java programmers and Common Lisp developers. Since the Dec 2008 release, the Clojure library has received 18,000 downloads (based on Google Code Stats). Clojure is a dynamically (dynamic/strong) typed language, supports lazy sequences, immutable data structures, macros, and functions as first class objects [4]. Clojure is a functional language just like Common Lisp is a functional language. It is a Lisp dialect so that includes the fully parenthesized syntax. Most syntax will include a function or macro call with arguments or no arguments enclosed by a left and right parenthesis.

Syntax

Most newcomers to a Lisp dialect may get distracted by the parentheses, the symbolic expressions. They can seem daunting if you are more familiar to a language like C++ or Java. But, this actually is one of the major benefits of Lisp. The simple syntax, functional call, args very much resemble how the compiler or parser will interpret the code. Simple is good. Simple is fast, especially to the machine. It can also benefit the developer because you aren't overburdened with a bunch of syntax to memorize.

It also really helps to have a great editor like Emacs. Emacs is built with its own Lisp dialect, Emacs Lisp. So, Clojure syntax is not too foreign to Emacs. You will need to download the Clojure Emacs Mode and you want to add Slime integration.

Here is a snippet of Clojure code. Just focus on the left parenthesis and the token adjacent to the character. The token, function or macro call and the left parenthesis.

Lisp does not normally get more complicated than the parenthesis tokens and defining function bodies. A function/macro call and arguments. The arguments are normally separated by a variable number of spaces and may include calling another routine. That is the essence of functional programming. You have functions that return some value and can call other functions. You don't have to worry about Object creation syntax, for loop syntax, anonymous inner classes or things that you might encounter with Java. Here is some sample Java code. Look at all the tokens that part of the language.

Note: With this document, I tried not to position one language as better or worse than the other. Each technology that I mention has advantages and disadvantages for writing software. Each tool may give the developer productivity gains and some developers may never get used to changing to a new syntax, never truly realizing some of the intended benefits that the language has to offer. You will have to evaluate these languages (or not) on your own and make. I merely try to point out some of the similarities and some of the differences.

Lisp certainly suffers from that old adage, "easy to learn, may take a lifetime to master". It is easy to write Lisp code, it may take time to write readable, solid, idiomatic Lisp code. I am sure I will get many comments on how to write more idiomatic Clojure code even for these rudimentary examples. Luckily, Clojure has many of the functions that you will encounter with most other Lisp dialects. In Java or C++, you may be accustomed to the 'for loop' syntax. Here is the Clojure macro for building a list of elements.


;; Build a collection of random numbers
(for [_ (range 100) ] (.nextInt random))

For Clojure API Documentation: "List comprehension. Takes a vector of one or more binding-form/collection-expr pairs, each followed by zero or more modifiers, and yields a lazy sequence of evaluations of expr. Collections are iterated in a nested fashion, rightmost fastest, and nested coll-exprs can refer to bindings created in prior binding-forms."


;; Macro definition for 'for'
(defmacro for
  [seq-exprs body-expr]
  (assert-args for
     (vector? seq-exprs) "a vector for its binding"
     (even? (count seq-exprs)) "an even number of forms in binding vector")
  (let [to-groups (fn [seq-exprs]
                      (reduce (fn [groups [k v]]
                              (if (keyword? k)
                                (conj (pop groups) (conj (peek groups) [k v]))
                                (conj groups [k v])))
                              [] (partition 2 seq-exprs)))
        err (fn [& msg] (throw (IllegalArgumentException. (apply str msg))))
        emit-bind (fn emit-bind [[[bind expr & mod-pairs]
                                  & [[_ next-expr] :as next-groups]]]
                      (let [giter (gensym "iter__")
                            gxs (gensym "s__")
                            do-mod (fn do-mod [[[k v :as pair] & etc]]
                                       (cond
                                        (= k :let) `(let ~v ~(do-mod etc))
                                        (= k :while) `(when ~v ~(do-mod etc))
                                        (= k :when) `(if ~v
                                                       ~(do-mod etc)
                                                       (recur (rest ~gxs)))
                                        (keyword? k) (err "Invalid 'for' keyword " k)
                                        next-groups
                                        `(let [iterys# ~(emit-bind next-groups)
                                               fs# (seq (iterys# ~next-expr))]
                                           (if fs#
                                             (concat fs# (~giter (rest ~gxs)))
                                             (recur (rest ~gxs))))
                                     :else `(cons ~body-expr
                                                  (~giter (rest ~gxs)))))]
                        `(fn ~giter [~gxs]
                         (lazy-seq
                          (loop [~gxs ~gxs]
                            (when-first [~bind ~gxs]
                                        ~(do-mod mod-pairs)))))))]
    `(let [iter# ~(emit-bind (to-groups seq-exprs))]
       (iter# ~(second seq-exprs)))))
;; END OF MACRO (code from core.clj, clojure source);;

OK, that was a little daunting, but it is good to know the code for all of the Clojure source is freely available. This a yet another added benefit that you may not capitalize on, but all of the Clojure source is highly readable and lightweight. There are only 24k lines of Java code in the clojure.lang package. And only 7k lines of Clojure source in the core libray. I routinely make changes to the Clojure Java code to add trace logic. It took only 10 minutes to figure out where I needed to place my logic and then compile with my own Ant script. Imagine making similar changes to the Java API?

More on the Clojure, Java source for the performance tests

I used a simple format for the performance tests that I will present later in the document. I wrote or used existing code from a third-party and then ran the code. There may be many differences between the style of code. And the Clojure code may not be idiomatic. But it should be close. Here is the source for the Quick Sort, Clojure and Java:


;; Two versions of the Clojure Quick Sort:
(defn qsort 
  "Quick sort from rosetta code:
 http://rosettacode.org/wiki/Quicksort#Clojure"
  [[pvt & rs]]
  (if pvt
    `(~@(qsort (filter #(<  % pvt) rs))
      ~pvt 
      ~@(qsort (filter #(>= % pvt) rs)))))

(defn qsort-2 [[pivot & xs]]
  (when pivot
    (let [smaller #(< % pivot)]
      (lazy-cat (qsort-2 (filter smaller xs))
                [pivot]
                (qsort-2 (remove smaller xs))))))

(dotimes [x 4]
      (println "i: " (int (Math/pow 10.0 x)))
    (time (count (qsort (for [_ (range (int (Math/pow 10.0 x))) ] 
                          (.nextInt random))))))


// JAVA VERSION OF QUICK SORT
    public static final List quicksort(final List arr) {
        if (arr.size() <= 1) {
            return arr;
        }
        Integer pivot = (Integer) arr.get(0); //This pivot can change to get faster results
     
        List less = new LinkedList();
        List pivotList = new LinkedList();
        List more = new LinkedList();
     
        // Partition
        for (Iterator it = arr.iterator(); it.hasNext();) {
            Integer i = (Integer) it.next();
            if (i.compareTo(pivot) < 0) {
                less.add(i);
            } else if (i.compareTo(pivot) > 0) {
                more.add(i);
            } else {
                pivotList.add(i);
            } // End of the if - else //
            
        } // End of the for //
     
        // Recursively sort sublists
        less = quicksort(less);
        more = quicksort(more);
     
        // Concatenate results
        less.addAll(pivotList);
        less.addAll(more);
        return less;
    }
// END OF EXAMPLE

Performance of Test 11, Sorting

How did the various languages perform?


JavaTest   PerformanceTime(ms)
#####      ########### 
java11     1190.1847739999998              
jruby11    5068.168914            
clj11      11852.013362

JRuby and Scala Source, Scala Performance for the Simple Quick Sort

On Scala

Scala was created in 2001 and is currently at version 2.7.5 as of 6/3/2009. Scala is a stark contrast to Clojure. David Pollak had this to say about Scala's type system, "Put another way, Scala is statically typed with optional duck typing... the opposite of Objective-C et. al. with their duck typing with optional static typing." Clojure on the other hand is dynamic in the sense that you don't have to explicitly define a type whenever you need to use a particular object, but you can define the type specification using type hints.


;; Notice, second version uses the String type hint
(defn str-split-refl [s regex]
      (vec (. s split regex)))
(defn str-split-fast [#^String s regex]
      (vec (. s split regex)))

The above example presents two versions of a string split method, in Clojure. The second version use the String type hint to help the compiler determine the type for future calls in that function. On Scala's type system, "Scala is equipped with an expressive type system that enforces statically that abstractions are used in a safe and coherent manner" [6]

When would you use a JVM language? (Scala, Clojure, or just Java)

Why even investigate a JVM language? What is the point? A programming language is like any other tool or library used to create and interact with other software or hardware components? If you work in a web or J2EE environment, you might write SQL code, define CSS scripts, write Javascript code, write HTML/XHTML. It isn't uncommon for web application developers to write Java, sql, css, javascript or HTML code. Sometimes all within the same day, sometimes during different phases of a project. That doesn't take into account the number libraries that you must normally learn, understand and work with. So, NOT learning a new JVM programming language just on the premise that it is something different, isn't a valid reason for not using it. People have asked me, would you use Clojure, Java, or Scala? I have used all three for small GUI projects. I have used Scala for the the backend API where I don't need to make small changes. I used Clojure because of the dynamic nature of the language. I can make many, quick incremental changes without having any major impact on the Scala backend API. Take the Java api for example. Most of the core library is set in stone. The java.lang.String class hasn't changed much in over a decade. I see Scala being used for those type of rigid API requirements. This doesn't mean that Clojure couldn't be used for this purpose, it just means that is how I have used Scala and it just seemed to fit because of how easy it is to call Scala from Java (Java interoperability), also because of the nature of Scala's types. Here is just one example on how I used Clojure. The code snippet below contains valid Clojure code used to develop a small GUI application. If you just look at it without understanding the syntax, the code below almost looks like a general purposed configuration file. Here I can easily modify the layout of my GUI window, buttons without ever really getting any complex language details. I am just looking at the data required to change my layout.

A blogger from creativekarma.com provided a list of some of Scala's features:

    * Conventional class types,
    * Value class types,
    * Nonnullable types,
    * Monad types,
    * Trait types,
    * Singleton object types (procedural modules, utility classes, etc.),
    * Compound types,
    * Functional types,
    * Case classes,
    * Path-dependent types,
    * Anonymous types,
    * Self types,
    * Type aliases,
    * Generic types,
    * Covariant generic types,
    * Contravariant generic types,
    * Bounded generic types,
    * Abstract types,
    * Existential types,
    * Implicit types,
    * Augmented types,
    * View bounded types, and
    * Structural types which allow a form of duck typing when all else fails." [8]

// Here is an Example Scala Quick Sort:
 // http://en.literateprograms.org/Quicksort_(Scala)
 // From scala home
  def sort[A <% Ordered[A]](xs: List[A]):List[A] = {
    if (xs.isEmpty || xs.tail.isEmpty) xs
    else {      
      val pivot = xs( xs.length / 2)      
      // list partition
      // initialize boxes
      var lows: List[A] = Nil
      var mids: List[A] = Nil
      var highs: List[A] = Nil      
      for( val item <- xs) {
        // classify item        
        if ( item == pivot) mids = item :: mids
        else if (item < pivot) lows = item :: lows
        else highs = item :: highs
      }
      // return sorted list appending chunks      
      sort(lows) ::: mids ::: sort(highs) 
    }
  }
  // Running the sort and time
  def run(n:Int) = {
    val start1 = java.lang.System.nanoTime()
    //val l = repeat(n)(random.nextInt)
    val l = initialize(n)
    val l2 = sort(l)
    val d = l2.length
    val z = d + 1
    val end1 = java.lang.System.nanoTime()
    val diff = (end1 - start1) * 1e-6
    println("Elapsed Time: " + diff + " ms " + z)
  }
 // End of scala example

JRuby Quick Sort


# JRuby Sort Example:
class Array
  def quick_sort
    return self if length <= 1
    pivot = self[length / 2]
    return (find_all { |i| i <  pivot }).quick_sort +
           (find_all { |i| i == pivot }) +
           (find_all { |i| i >  pivot }).quick_sort
  end
end
def runTest1a(n)
  # Run the application
  start1 = java.lang.System.nanoTime()
  arr = Array.new
  (1..n).each {
    arr << rand(100000)
  }
  res = arr.quick_sort
  end1 = java.lang.System.nanoTime()
  diff = (end1 - start1) * 1e-6  
  puts "Elapsed Time: #{diff} ms"
end
# End of Example


Test     PerformanceTime(ms)
#####    ########### 
java11   1190.1847739999998              
scala11  1989.283518            
clj11    11852.013362

The line graph is a combination of Java, Scala, and Clojure execution times with an increasing number of random elements. Here are the results for the Clojure quick sort:


iterat   exectime(ms)
#####    #########
81       8.019583
243      16.82068
729      69.767482
2187     121.894408
6561     370.054832
19683    1096.553958
59049    3245.377605
177147   11852.013362

All Tests
I ran 14 different tests, ranging from a simple hello world application to the wider finder test. The figure below depicts some of the results.


CljTest PerformanceTime(ms)
##### ########### 
hello1  145.996411    Simple hello world test, more than 10k iterations               
build2  3474.550155   Build large and small set of random strings
str3 495.028198    Various string operations           
euler4  341.872834    Euler problem number 1
euler5  116.512002    Euler problem number 2
str6 213.379223    Additional string operations
math7 1212.796289   Misc Math operations, including sqrt and other calls            
coll8  16356.185166  Collection operation, simple word frequency test
frac9  354.23239     Mandelbrot fractal
sort11  438.477163    Sorting routines, mostly quick sort
vars12  9542.282043   Large and small object creation
wide13  907.491465    Wide finder project, 10k and smaller number of lines


JavaTest  PerformanceTime(ms)
##### ########### 
hello1  27.110056         Simple hello world test, more than 10k iterations           
build2  1940.5321219      Build large and small set of random strings    
string3 1673.399982       Various string operations          
euler4  23.101229         Euler problem number 1           
euler5 1.5945639999999   Euler problem number 2           
str6 436.737084        Additional string operations           
math7 131.1656449998    Misc Math operations, including sqrt and other calls           
coll8  972.775771        Collection operation, simple word frequency test      
frac9  247.63905499998   Mandelbrot fractal          
sort11  62.24679          Sorting routines, mostly quick sort              
wide13  94.820562         Wide finder project, 10k and smaller number of lines

Source for Results and Code

All of the source is available (common document license/BSD license) through the google-code browsable SVN URL. Download the results, language source code or gnuplot source. Unfortunately, this data is spread out haphazardly throughout the repository.

JVM Notebook - Test Results, Charts and Gnuplot scripts

JVM Notebook - Java, Clojure, JRuby, Scala Source

On Garbage Collection

"If the garbage collector has become a bottleneck, you may wish to customize the generation sizes. Check the verbose garbage collector output, and then explore the sensitivity of your individual performance metric to the garbage collector parameters." -- Tuning Garbage Collection [4]

The default arrangement of generations (for all collectors with the exception of the throughput collector) looks something like this.


Here is some output from the garbage colection report:
...
[GC 1277K->390K(5056K), 0.0006050 secs]
[GC 1286K->399K(5056K), 0.0005540 secs]
[GC 1294K->407K(5056K), 0.0005580 secs]
[GC 1303K->416K(5056K), 0.0009580 secs]
[GC 1311K->424K(5056K), 0.0006540 secs]
[GC 1320K->431K(5056K), 0.0007520 secs]
[GC 1327K->431K(5056K), 0.0012980 secs]
...
...

"With the line below, indicate the combined size of live objects before and after garbage collection, respectively. After minor collections the count includes objects that aren't necessarily alive but can't be reclaimed, either because they are directly alive, or because they are within or referenced from the tenured generation. The number in parenthesis"

(776768K)(in the first line)

"is the total available space, not counting the space in the permanent generation, which is the total heap minus one of the survivor spaces. The minor collection took about a quarter of a second." [5]

The usage of the entire heap was reduced to about 51%

196016K->133633K(261184K)

The output is generated with the 'verbosegc' and other JVM options:
(Note: The duplicated 'verbosegc' options gives more information about the garbage collecting)

java -verbosegc -verbosegc -verbosegc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps CrashJVM -Xms24m -Xmx32m > gc_anal.txt

There are 1400 data points in the output file and in the image for the Clojure sort example. Each GC line in the gc_anal analysis file contains GC information about a minor garbage collection.

[GC [<collector>: <starting occupancy1> -> <ending occupancy1>, <pause time1> secs] 
    <starting occupancy3> -> <ending occupancy3>, <pause time3> secs]

"If the incremental collection of the tenured generation cannot be completed before tenured generation is full, a major collection occurs and is indicated in the verbosegc output by the 'MSC' in the output."


<starting occupancy1> is the occupancy of the young generation 
   before the collection
<ending occupancy1> is the occupancy of the young generation 
   after the collection
<starting occupancy3> is the occupancy of the entire heap 
   before the collection
<ending occupancy3> is the occupancy of the entire heap 
  after the collection

see http://java.sun.com/docs/hotspot/gc1.4.2/example.html

Our GC chart will only show 'used memory after the minor GC' and the total available memory after.

Here is the gnuplot representation of the used memory and total available after a minor garbage collection. There were 1400 data points in the clojure test. This is after running the quick sort code above.


Data Points used in graph:
used after gc | total after gc
...
...
26605 32576
26744 32576
27180 32576
27190 32576
27362 32576
27551 32576
line: 1421 -- 27684 32576
line: 1422 -- 27843 32576

The results for the pure Java version are presented below:

Scala results, 250+ minor garbage collections.

Large Objects

The quick sort example is not a good test to really push the garbage collector. Here is another test, one with Clojure and one with Java. I instantiate a large number of large objects and do the same for small objects. The garbage collector is better at handling smaller objects and consequently not as good at handling large objects (using the default GC rules). So, if you are looking at performance issues, you might look at how often large objects are being created and how long you are holding onto those objects. It is better to create many small objects and retain them for a short time than creating a few number large objects and retaining them for a long time. For example, I guess it is better to write short, static, utility methods and only create objects local to that method. "Large objects might be too big for Eden and will go directly to the old generation area; they will take longer to initialize (when setting fields to their default values such as null and zero); and they might cause fragmentation" [6]. A large object is one that the size of the allocation of the large object within a memory heap exceeds a maximum contiguous free space within the Java heap.

I attempted to create a large object that had many fields with lots of String data. Using my approximate sizeof utility method, it looks like it takes 800kb to create this object.


private Object obj5b = StringUtils.concat(StringUtils.randomString(random, (14 * kb)), 
  StringUtils.randomString(random, (14 * kb)));

With this code, there were 4700 minor garbage collections. It was interesting that total available memory never goes over 14mb. Here is the final console system output after running the program. The last line shows the used memory and total available JVM memory.


53.236: [GC 53.236: [DefNew: 542K->28K(576K), 0.0002510 secs] 2155K->1642K(4448K), 0.0002997 secs]
53.243: [GC 53.244: [DefNew: 540K->25K(576K), 0.0002716 secs] 2154K->1639K(4448K), 0.0003217 secs]
53.252: [GC 53.252: [DefNew: 537K->18K(576K), 0.0002698 secs] 2151K->1631K(4448K), 0.0003206 secs]
"Elapsed time: 9691.390453 msecs"
Done
Performing simple garbage collection cooldown
(used:2M/2M [4M,31M ])
(used:2M/2M [4M,31M ])

Format for the memory usage line (clojure source to create the line):

(defn *memory-usage* []
  (str "(used:" (*used-memory-m*) "M/" (*free-memory-m*) "M [" (*total-memory-m*) "M," (*max-memory-m*) "M ])"))

Shown below is the output for the Java test:


...
87.459: [GC 87.459: [DefNew: 441K->48K(640K), 0.0002922 secs] 5147K->4905K(8968K), 0.0003323 secs]
87.468: [GC 87.468: [DefNew: 624K->64K(640K), 0.0004192 secs] 5481K->5217K(8968K), 0.0004631 secs]
87.470: [GC 87.470: [DefNew: 213K->51K(640K), 0.0002372 secs] 5367K->5269K(8968K), 0.0002763 secs]
87.473: [GC 87.473: [DefNew: 627K->11K(640K), 0.0003707 secs] 5845K->5774K(8968K), 0.0004112 secs]
Elapsed time: 47982.607108 msecs
(used:5M/3M [8M,31M ])
(used:5M/3M [8M,31M ])
(used:5M/3M [8M,31M ])
Done

So far, we have only shown information on minor garbage collections. Here are the results of the major collects. The major collection are delineated by Tenured.


3.657: [GC 3.657: [DefNew: 568K->63K(576K), 0.0003546 secs]3.658: [Tenured: 7464K->506K(7508K), 0.0117550 secs] 7934K->506K(8084K), 0.0122011 secs]

You can see, major garbage collects seem to take more time and there are fewer of them in this result set.


...
1167 4000
1235 4000
1193 4000
line: 156 -- 1260 4000

And the Java results. There were 440+ GC collects.

Additional Tools

There is no shortage of tools for monitoring JVM performance. Mature, open and proprietary monitoring applications are available. Most of them focus garbage collection, heap memory, cpu monitoring, and method trace calls. The netbeans monitor, Sun's jconsole, jrat, and the Eclipse memory analysis tool are a few that I have used recently.

Looking at the top objects with the test examples

The chart report output from Eclipse's Memory Analysis Tool is depicted below. The overview contains the number of objects, number classes and classloader. The MAT tool also looks at potential memory leaks and the objects that are causing the problem, "biggest consumers" and a heap object histogram. To generate the data, we simply generate a heap dump and open the heap dump file.

Even with the 'hello world' test, you can see that many objects were created. And no, the Clojure hello world application that I provided is not the typical hello world app. I actually do some basic multiplication calculations over 1,000,000 iterations. Hello world in this case is a misnomer.

Here is comparable Java output. According to the Eclipse mat tool, almost 9000 objects were created.

Scala and JRuby Class Histogram Results:

Running JRat, Netbeans Profilers

Running JRat or Netbeans profilers is as simple as running with the appropriate JVM args allowing your application to run and then opening the files output after the program has exited.

I downloaded the shiftone-jrat.jar jar file, placed it my current working directory. And then added these args:

The screenshot above contains the jrat output after running the Clojure mandelbrot shootout application. I didn't want to analyze the results but there are a couple of stats worth taking a quick look at. The clojure.lang.Numbers.lt method had 87 million exits and a total method time of 142 milliseconds. The Numbers.ops method was called 367 million times.

The netbeans profiler contains similar profiling statistics as the jrat tool but the Netbeans profiler contains live, realtime results and a host of other metrics.

Additional Tests from Third Party (from AndyF):

Andy, a developer provided great Clojure performance information. I ran some of his
Clojure source. Here are the runtime, garbage collection and memory results.
Source
JVM Notebook SVN source (Andy Benchmark) - Java
JVM
Notebook SVN source (Andy Benchmark) - Clojure

Original Source

Updated Benchmark Results (the rcomp test shows 1:15 performance speed for Clojure)

I ran two types of shootout like tests, the threaded, non-threaded Mandelbrot test in Java and Clojure. Performance for these were similar to the all of the tests that we have run up to this point. The mandelbrot code seemed to perform slower than the 1:10 (java:clojure) speed ratio that we encountered. For example, the Java threaded code ran at '1700.062' ms. The Clojure version ran at 50998.30 ms.


MandelSiz| exectime.java(ms) exectime.clojure(ms)
#####    #########           #####
100      11.515956           90.97791
200      22.154556           263.893038
300      42.036901           610.872097
400      70.4424729          982.452582
500      105.705857          1495.433005
600      151.296857          2091.412938
700      201.6435589         2843.495665
800      260.195961          3650.13744
1000     407.073079          5680.39932
1200     577.353532          8208.967304
1400     785.8052849         11126.994209
2000     1593.239278         22588.712673
2500     2487.326002         35210.526506
3000     3576.071694         50998.301209

The chart above contain the garbage collection results after running the Clojure mandelbrot shootout test.

The following test results are taken verbatim from the clojure-benchmark result set:


Additional Results straight from the shootout source (by AndyF):
See:
http://github.com/jafingerhut/clojure-benchmarks/
------------------------------------------------
Times are real / user / sys on iMac
------------------------------------------------
        |  sbcl  |  perl  |   ghc  |  java  |   clj
------------------------------------------------
           noT      noT            |  noT
long test on iMac:
rcomp   |   8.3  |  11.9  |        |   8.5  | no implementation yet
        |   4.8  |   7.9  |        |   3.6  |
        |   2.1  |   2.3  |        |   1.4  |

long test on iMac:
mand-   | wrong  | out of |  32.7  |  28.6  | 340.4
elbrot  | output | mem    |  59.3  |  54.4  | 350.5
        |        | (?)    |   0.8  |   0.4  |   4.7

           noT      noT            |    T
long test on iMac:
k-nuc-  | 190.9  | 306.0  |  90.5  |  52.4  | 1677.6 (27m 57.6s)
leotide | 187.9  | 302.7  | 130.8  |  89.6  | 2245.1 (37m 25.1s)
        |   2.4  |   1.9  |   4.6  |   1.8  |   24.2 (    24.2s)

Result: Replacing dna-char-to-code-val with a macro did
not speed things up, and may have slowed them down:
                                            | 1800.0 (30m  0.0s)
                                            | 2317.0 (38m 37.0s)
                                            |   30.6 (    30.6s)
------------------------------------------------

k-nucleotide long test on benchmark shootout machine:
        | 164.9  | 249.8  |  52.0  |  20.6  |
        | 164.9  | 246.7  | 112.5  |  58.8  |
        |     ?  |     ?  |     ?  |     ?  |
------------------------------------------------
k-nucleotide medium test on iMac:
        |   8.6  |  12.7  |   3.9  |   3.9  |   64.2 / 69.6 / 69.1 / 67.1
        |   7.9  |  12.5  |   5.4  |   5.7  |   98.4 / 92.9 / 93.1 / 88.1
        |   0.6  |   0.1  |   0.3  |   0.2  |    1.5 /  1.6 /  1.6 /  1.7

k-nucleotide medium test, all clj, modified-pmap with specified number
of parallel threads, on iMac:
        |   1    |   2    |   3    |   4    |   5    |   6    |
        |  74.9  |  70.7  |  77.2  |  76.8  |  82.5  |  77.8  |
        | 125.9  | 122.1  | 134.6  | 134.0  | 143.4  | 134.0  |
        |   2.5  |   2.8  |   2.8  |   2.3  |   3.1  |   2.3  |
------------------------------------------------
Hardware and software used
iMac with Intel Core 2 Duo, 2.16 GHz, 2 GB RAM, Mac OS X 10.5.7
% java -version
java version "1.6.0_13"
Java(TM) SE Runtime Environment (build 1.6.0_13-b03-211)
Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02-83, mixed mode)
% javac -version
javac 1.6.0_13

user> (clojure-version)
"1.1.0-alpha-SNAPSHOT"

End of Andy's Configuration:
------------------------------------------------

The following benchmark information were provided
by Andy from his website.   Visit:

http://github.com/jafingerhut/clojure-benchmarks/

Some additional google clojure threads:
http://groups.google.com/group/clojure/browse_thread/thread/e4299911f4ae8167

Testing Environment

For this document, all of the tests were run with Java 5. (Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_11-b03)). Some tests were run with both the -client and -server options. The majority of tests were run with the -client option. Headius describes the -server option, "The -server option turns on the optimizing JIT along with a few other "server-class" settings.". Updates to the this document will include tests with Java 6.

Operating System and Hardware Environment:

Java: Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_11-b03)
Windows: Windows XP Professional Service Pack 2
Hardware: Intel Core 2 CPU 6300 - 1.86ghZ, 3.49 GB of RAM

Virtual Machine Languages:

The focus of these tests are mostly to look at Clojure and how it compares to pure Java code. Clojure is included in all of the tests. There are only two tests for Scala.

Clojure: 1.0.0/ 2009-05-04
JRuby: 1.3.1 / 2009-06-15; 44 days ago
Scala: 2.7.5/ 2009-06-03; 50 days ago

Resources
[1] - measure quote - http://www.javaperformancetuning.com/news/newtips087.shtml - quote from the java performance tips. I have not found the original source of this quote.
[2] - Rick Ross quote - Farewell to the J in JVM
[3] - Zed Shaw on Statistics - http://zedshaw.com/essays/programmer_stats.html
[4] - Clojure Home - http://clojure.org/
[5] - Tuning Garbage Collection - http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html
[6] - On large vs small - http://chaoticjava.com/posts/gc-tips-and-memory-leaks/
[7] - Scala's, static typing - http://en.wikipedia.org/wiki/Scala_(programming_language)#Static_typing
[8] - http://creativekarma.com/ee.php/weblog/comments/static_typing_and_scala/
--------
Additional Links

http://clj-me.blogspot.com/2008/06/widefinder-2-in-clojure-naive-port-from.html
http://onjava.com/pub/a/onjava/2001/05/30/optimization.html
http://onjava.com/onjava/2001/05/30/examples/Test.java
http://www.javaperformancetuning.com/news/newtips087.shtml
http://blog.headius.com/2009/01/my-favorite-hotspot-jvm-flags.html

http://jrat.sourceforge.net/
http://www.netbeans.org/features/java/profiler.html
--------
Download or Browse All Source:
JVM Notebook - Test Results, Charts and Gnuplot scripts
JVM Notebook - Java, Clojure, JRuby, Scala Source

*Download All Source as Archived File*

Partial PDF version of this document
Tex source for this document
--------
Addendum (gnuplot and regex)

I always feel that charts and graph are great visuals for displaying simple or complex concepts. I only know the surface level details of gnuplot, but here are some notes on creating the images above. I used a simple python script to analyze the GC statements generated by the JVM verbosegc output and then created a tabulated data file that can used with gnuplot.

I typically will create a shell script to set all the gnuplot commands and then launch the application:


#!/bin/sh
# Also see:
# http://www.duke.edu/~hpgavin/gnuplot.html
#
# Use 'eog' eye of gnome to view the images

OUTPUT_IMG_FILE=image_gc_analysis.png
INPUT_PLOT_FILE=image_gc_analysis.txt

# Output these commands to a temp file and then run gnu plot 
echo "set terminal png
set output '${OUTPUT_IMG_FILE}'
set title 'GC results'
set size 1,1
set key left top
set autoscale
set xlabel 'gc_iter'
set ylabel 'gc_main_paus_time'
plot '${INPUT_PLOT_FILE}' using 1:6 title 'GC Results'
" > gnuplot_tmp_cmd.tmp 

# Run the gnu plot command
gnuplot gnuplot_tmp_cmd.tmp > /dev/null

# For future reference, add a command and title to compare results 
# E.g.
# '/tmp/data2' using 10 with lines title 'Benchmark 2',

On Regex
In the gc_parse.py python application, I used a couple of lines of regex to find the GC patterns generated by the verbose gc JVM output.


def get_gc_stats(line):        
        # Example Output java 1.5:
        # 20.049: [GC 20.049: [DefNew: 1941K->137K(1576K), 0.0014153 secs] 11194K->11117K(1984K), 0.0014628 secs]
        # Example Output java 1.6:
        # 20.049: [GC 28.200: [DefNew: 1921K->137K(1984K), 0.0006890 secs] 23030K->21247K(27320K), 0.0007550 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
        
        # Group1 = First Iter Value
        # Group2 = Second 'GC' Iter value
        regex_begin_iter = '^(\S*):\s*\[GC\s*(\S*):\s*'
        # Group 3 = Young Gen Start Occupancy
        # Group 5 = Young Gen End Occupancy
        regex_def_new    = '\[DefNew:\s*(\S*)(K|M)\-\>(\S*)(K|M)\((.*)$'
        regex_full_str   = '%s%s' % (regex_begin_iter, regex_def_new)
        pattr = re.compile(regex_full_str)
        # Group 1 = Main Start Occupancy
        # Group 2 = Main End Occupancy
        pattr_sub = re.compile('^(.*) (\S*)(K|M)\-\>(\S*)(K|M)\((.*)$')
        ...
        ...

------
Header Image from:
http://upload.wikimedia.org/wikipedia/commons/5/53/BabbageDifferenceEngine.jpg
------