Berlin Brown and Software Development: July 2009

Wednesday, July 29, 2009

JVM Notebook: Basic Clojure, Java and JVM Language performance

"Measure, don't guess. The primary goal of all performance tuning exercises should be maximize the end user experience given the resource constraints." [1]

"It isn't so much a "farewell to the J" as an expansion of the platform opportunities Java provides. Sun's investment to power ongoing development of JRuby and Jython broadens the range and reach of Java, as a whole." -- Rick Ross

Overview and JVM Languages

One of the exciting trends to recently emerge from the Java community is the concept of the JVM language. These technologies are all that you would expect them to be. They are implementations of languages that run on the Java Virtual Machine. Some are newly created and some are based on existing, more mature languages. JRuby, Jython are two JVM languages based on CRuby and CPython. Groovy, Scala, Clojure are three completely new JVM languages that were created to add new language features that weren't supported by the core Java language. Some must or can be compiled. Some run without compilation. You can easily compile Scala code to Java bytecode. Clojure also allows this feature (ahead of time compilation). Clojure and JRuby code can also run without having be explicitly compiled. You can interact with the language. In most cases and with most JVM languages, you have full access to existing libraries that were written in pure Java. And normally you can access existing JVM language code from Java (known as Java interoperability). In most cases, it is easier to access Java calls from the JVM language than it is to call the language code from Java. It really depends on the language. In the snippet below, there is a call to the System static method, 'nanoTime'. Simply invoke the system like you would from pure Java.


require 'java'

def helloWorldBaseline
  # Run the application
  start1 = java.lang.System.nanoTime()
  arr = Array.new
  (1..1000000).each {
    res = 2.2 * 4.1 * 5.4
  }
  end1 = java.lang.System.nanoTime()
  diff = (end1 - start1) * 1e-6  
  puts "Elapsed Time: #{diff} ms"
end

Dynamic Compilation - "Clojure is a compiled language, so one might wonder when you have to run the compiler. You don't. Anything you enter into the REPL or load using load-file is automatically compiled to JVM bytecode on the fly. Compiling ahead-of-time is also possible, but not required" -- Clojure Docs on being Dynamic.

On Performance

"If you want to measure something, then don’t measure other shit." -- Zed Shaw [3]

   For the more popular JVM languages, like JRuby and Jython, there isn't much of a difference between running code in their respective C implementations. JRuby is especially well known for being very portable. With JRuby release 1.3.1, JRuby is compatible with CRuby 1.8.6. Jython 2.5.0 was released last month and brings the level of compatibility to CPython versions 2.5. Django and other popular CPython based frameworks are able to work with Jython. You may be wondering, if the Java Virtual Machine language is compatible with the C language, native implementation, is there a loss in performance when running on the Java Virtual Machine? Is there a major loss in performance? That is this purpose of this document, how much time does it take for a particular piece of code to run in JVM language? How long does it take to run similar routines using pure Java code? I want to make it clear, you will not find a large scientific benchmark run under clean room like conditions. I want to present a simple set of routines and how long it took to run. How long did the Clojure code run? How long did the Scala code run? Basically, I want to present the code and how long each test ran, but I don't want to claim that anyone language or piece of code is faster or slower based on these tests. You could say that most of the pure Java code ran faster. Most of the time, that is what happened after running these tests. But there is too much confounding in my tests. Like Zed Shaw said, "If you want to measure something, then don't measure other shit." [3] There is a lot of stuff in my tests to not make these an official comparison. There is a lot of confounding. But, here is the code, here is how long it took to run? It be relevant in more common tests like a Project Euler problem. Project Euler is a website that contains math problems intended to be solved with computer programs. In Project Euler problem number one, I write a program in Clojure and then in Java. They both run on the JVM and the same value is returned. What was the execution time for each program? Simple tests, simple results.

   When working with JVM languages and possible performance bottlenecks, you want to consider execution time, but you also want to look at the garbage collector and heap memory usage. Garbage collection is an expensive operation. It won't take a minute to run a garbage collect, but it will take cpu cycles away from your application. JVM code runs in a protected environment, the garbage collector provides automatic memory management and normally protects you from improper memory use. And the garbage collector attempts to free up memory that is no longer needed. You can't normally control when the JVM runs garbage collection and certainly don't want force it. But if you monitor your application, you can identify memory leaks or other problems that might cause performance bottlenecks. It will normally be evident where there is a problem. If you see too many garbage collects within a very short period of time and your total available memory is maxed out, you might eventually encounter an out of memory error. In a long running server environment, most of your performance issues might be alleviated if you look at proper heap memory use. Is your code forcing too many garbage collections within a short period of time? Are you creating too many large objects and holding on to them for too long? In performance tuning your application, there are many things to consider. It may not just be improving a particular algorithm. Consider heap memory and object allocation as well. For most of the tests, there are performance stats, memory stats and other garbage collection statistics.

On Clojure

   Clojure is a Lisp dialect created in 2007 by Rich Hickey. It recently reached a 1.0 release and has a large following of Java programmers and Common Lisp developers. Since the Dec 2008 release, the Clojure library has received 18,000 downloads (based on Google Code Stats). Clojure is a dynamically (dynamic/strong) typed language, supports lazy sequences, immutable data structures, macros, and functions as first class objects [4]. Clojure is a functional language just like Common Lisp is a functional language. It is a Lisp dialect so that includes the fully parenthesized syntax. Most syntax will include a function or macro call with arguments or no arguments enclosed by a left and right parenthesis.

Syntax

Most newcomers to a Lisp dialect may get distracted by the parentheses, the symbolic expressions. They can seem daunting if you are more familiar to a language like C++ or Java. But, this actually is one of the major benefits of Lisp. The simple syntax, functional call, args very much resemble how the compiler or parser will interpret the code. Simple is good. Simple is fast, especially to the machine. It can also benefit the developer because you aren't overburdened with a bunch of syntax to memorize.

It also really helps to have a great editor like Emacs. Emacs is built with its own Lisp dialect, Emacs Lisp. So, Clojure syntax is not too foreign to Emacs. You will need to download the Clojure Emacs Mode and you want to add Slime integration.

Here is a snippet of Clojure code. Just focus on the left parenthesis and the token adjacent to the character. The token, function or macro call and the left parenthesis.

Lisp does not normally get more complicated than the parenthesis tokens and defining function bodies. A function/macro call and arguments. The arguments are normally separated by a variable number of spaces and may include calling another routine. That is the essence of functional programming. You have functions that return some value and can call other functions. You don't have to worry about Object creation syntax, for loop syntax, anonymous inner classes or things that you might encounter with Java. Here is some sample Java code. Look at all the tokens that part of the language.

Note: With this document, I tried not to position one language as better or worse than the other. Each technology that I mention has advantages and disadvantages for writing software. Each tool may give the developer productivity gains and some developers may never get used to changing to a new syntax, never truly realizing some of the intended benefits that the language has to offer. You will have to evaluate these languages (or not) on your own and make. I merely try to point out some of the similarities and some of the differences.

Lisp certainly suffers from that old adage, "easy to learn, may take a lifetime to master". It is easy to write Lisp code, it may take time to write readable, solid, idiomatic Lisp code. I am sure I will get many comments on how to write more idiomatic Clojure code even for these rudimentary examples. Luckily, Clojure has many of the functions that you will encounter with most other Lisp dialects. In Java or C++, you may be accustomed to the 'for loop' syntax. Here is the Clojure macro for building a list of elements.


;; Build a collection of random numbers
(for [_ (range 100) ] (.nextInt random))

For Clojure API Documentation: "List comprehension. Takes a vector of one or more binding-form/collection-expr pairs, each followed by zero or more modifiers, and yields a lazy sequence of evaluations of expr. Collections are iterated in a nested fashion, rightmost fastest, and nested coll-exprs can refer to bindings created in prior binding-forms."


;; Macro definition for 'for'
(defmacro for
  [seq-exprs body-expr]
  (assert-args for
     (vector? seq-exprs) "a vector for its binding"
     (even? (count seq-exprs)) "an even number of forms in binding vector")
  (let [to-groups (fn [seq-exprs]
                      (reduce (fn [groups [k v]]
                              (if (keyword? k)
                                (conj (pop groups) (conj (peek groups) [k v]))
                                (conj groups [k v])))
                              [] (partition 2 seq-exprs)))
        err (fn [& msg] (throw (IllegalArgumentException. (apply str msg))))
        emit-bind (fn emit-bind [[[bind expr & mod-pairs]
                                  & [[_ next-expr] :as next-groups]]]
                      (let [giter (gensym "iter__")
                            gxs (gensym "s__")
                            do-mod (fn do-mod [[[k v :as pair] & etc]]
                                       (cond
                                        (= k :let) `(let ~v ~(do-mod etc))
                                        (= k :while) `(when ~v ~(do-mod etc))
                                        (= k :when) `(if ~v
                                                       ~(do-mod etc)
                                                       (recur (rest ~gxs)))
                                        (keyword? k) (err "Invalid 'for' keyword " k)
                                        next-groups
                                        `(let [iterys# ~(emit-bind next-groups)
                                               fs# (seq (iterys# ~next-expr))]
                                           (if fs#
                                             (concat fs# (~giter (rest ~gxs)))
                                             (recur (rest ~gxs))))
                                     :else `(cons ~body-expr
                                                  (~giter (rest ~gxs)))))]
                        `(fn ~giter [~gxs]
                         (lazy-seq
                          (loop [~gxs ~gxs]
                            (when-first [~bind ~gxs]
                                        ~(do-mod mod-pairs)))))))]
    `(let [iter# ~(emit-bind (to-groups seq-exprs))]
       (iter# ~(second seq-exprs)))))
;; END OF MACRO (code from core.clj, clojure source);;

OK, that was a little daunting, but it is good to know the code for all of the Clojure source is freely available. This a yet another added benefit that you may not capitalize on, but all of the Clojure source is highly readable and lightweight. There are only 24k lines of Java code in the clojure.lang package. And only 7k lines of Clojure source in the core libray. I routinely make changes to the Clojure Java code to add trace logic. It took only 10 minutes to figure out where I needed to place my logic and then compile with my own Ant script. Imagine making similar changes to the Java API?

More on the Clojure, Java source for the performance tests

I used a simple format for the performance tests that I will present later in the document. I wrote or used existing code from a third-party and then ran the code. There may be many differences between the style of code. And the Clojure code may not be idiomatic. But it should be close. Here is the source for the Quick Sort, Clojure and Java:


;; Two versions of the Clojure Quick Sort:
(defn qsort 
  "Quick sort from rosetta code:
 http://rosettacode.org/wiki/Quicksort#Clojure"
  [[pvt & rs]]
  (if pvt
    `(~@(qsort (filter #(<  % pvt) rs))
      ~pvt 
      ~@(qsort (filter #(>= % pvt) rs)))))

(defn qsort-2 [[pivot & xs]]
  (when pivot
    (let [smaller #(< % pivot)]
      (lazy-cat (qsort-2 (filter smaller xs))
                [pivot]
                (qsort-2 (remove smaller xs))))))

(dotimes [x 4]
      (println "i: " (int (Math/pow 10.0 x)))
    (time (count (qsort (for [_ (range (int (Math/pow 10.0 x))) ] 
                          (.nextInt random))))))


// JAVA VERSION OF QUICK SORT
    public static final List quicksort(final List arr) {
        if (arr.size() <= 1) {
            return arr;
        }
        Integer pivot = (Integer) arr.get(0); //This pivot can change to get faster results
     
        List less = new LinkedList();
        List pivotList = new LinkedList();
        List more = new LinkedList();
     
        // Partition
        for (Iterator it = arr.iterator(); it.hasNext();) {
            Integer i = (Integer) it.next();
            if (i.compareTo(pivot) < 0) {
                less.add(i);
            } else if (i.compareTo(pivot) > 0) {
                more.add(i);
            } else {
                pivotList.add(i);
            } // End of the if - else //
            
        } // End of the for //
     
        // Recursively sort sublists
        less = quicksort(less);
        more = quicksort(more);
     
        // Concatenate results
        less.addAll(pivotList);
        less.addAll(more);
        return less;
    }
// END OF EXAMPLE

Performance of Test 11, Sorting

How did the various languages perform?


JavaTest   PerformanceTime(ms)
#####      ########### 
java11     1190.1847739999998              
jruby11    5068.168914            
clj11      11852.013362

JRuby and Scala Source, Scala Performance for the Simple Quick Sort

On Scala

Scala was created in 2001 and is currently at version 2.7.5 as of 6/3/2009. Scala is a stark contrast to Clojure. David Pollak had this to say about Scala's type system, "Put another way, Scala is statically typed with optional duck typing... the opposite of Objective-C et. al. with their duck typing with optional static typing." Clojure on the other hand is dynamic in the sense that you don't have to explicitly define a type whenever you need to use a particular object, but you can define the type specification using type hints.


;; Notice, second version uses the String type hint
(defn str-split-refl [s regex]
      (vec (. s split regex)))
(defn str-split-fast [#^String s regex]
      (vec (. s split regex)))

The above example presents two versions of a string split method, in Clojure. The second version use the String type hint to help the compiler determine the type for future calls in that function. On Scala's type system, "Scala is equipped with an expressive type system that enforces statically that abstractions are used in a safe and coherent manner" [6]

When would you use a JVM language? (Scala, Clojure, or just Java)

Why even investigate a JVM language? What is the point? A programming language is like any other tool or library used to create and interact with other software or hardware components? If you work in a web or J2EE environment, you might write SQL code, define CSS scripts, write Javascript code, write HTML/XHTML. It isn't uncommon for web application developers to write Java, sql, css, javascript or HTML code. Sometimes all within the same day, sometimes during different phases of a project. That doesn't take into account the number libraries that you must normally learn, understand and work with. So, NOT learning a new JVM programming language just on the premise that it is something different, isn't a valid reason for not using it. People have asked me, would you use Clojure, Java, or Scala? I have used all three for small GUI projects. I have used Scala for the the backend API where I don't need to make small changes. I used Clojure because of the dynamic nature of the language. I can make many, quick incremental changes without having any major impact on the Scala backend API. Take the Java api for example. Most of the core library is set in stone. The java.lang.String class hasn't changed much in over a decade. I see Scala being used for those type of rigid API requirements. This doesn't mean that Clojure couldn't be used for this purpose, it just means that is how I have used Scala and it just seemed to fit because of how easy it is to call Scala from Java (Java interoperability), also because of the nature of Scala's types. Here is just one example on how I used Clojure. The code snippet below contains valid Clojure code used to develop a small GUI application. If you just look at it without understanding the syntax, the code below almost looks like a general purposed configuration file. Here I can easily modify the layout of my GUI window, buttons without ever really getting any complex language details. I am just looking at the data required to change my layout.

A blogger from creativekarma.com provided a list of some of Scala's features:

    * Conventional class types,
    * Value class types,
    * Nonnullable types,
    * Monad types,
    * Trait types,
    * Singleton object types (procedural modules, utility classes, etc.),
    * Compound types,
    * Functional types,
    * Case classes,
    * Path-dependent types,
    * Anonymous types,
    * Self types,
    * Type aliases,
    * Generic types,
    * Covariant generic types,
    * Contravariant generic types,
    * Bounded generic types,
    * Abstract types,
    * Existential types,
    * Implicit types,
    * Augmented types,
    * View bounded types, and
    * Structural types which allow a form of duck typing when all else fails." [8]

// Here is an Example Scala Quick Sort:
 // http://en.literateprograms.org/Quicksort_(Scala)
 // From scala home
  def sort[A <% Ordered[A]](xs: List[A]):List[A] = {
    if (xs.isEmpty || xs.tail.isEmpty) xs
    else {      
      val pivot = xs( xs.length / 2)      
      // list partition
      // initialize boxes
      var lows: List[A] = Nil
      var mids: List[A] = Nil
      var highs: List[A] = Nil      
      for( val item <- xs) {
        // classify item        
        if ( item == pivot) mids = item :: mids
        else if (item < pivot) lows = item :: lows
        else highs = item :: highs
      }
      // return sorted list appending chunks      
      sort(lows) ::: mids ::: sort(highs) 
    }
  }
  // Running the sort and time
  def run(n:Int) = {
    val start1 = java.lang.System.nanoTime()
    //val l = repeat(n)(random.nextInt)
    val l = initialize(n)
    val l2 = sort(l)
    val d = l2.length
    val z = d + 1
    val end1 = java.lang.System.nanoTime()
    val diff = (end1 - start1) * 1e-6
    println("Elapsed Time: " + diff + " ms " + z)
  }
 // End of scala example

JRuby Quick Sort


# JRuby Sort Example:
class Array
  def quick_sort
    return self if length <= 1
    pivot = self[length / 2]
    return (find_all { |i| i <  pivot }).quick_sort +
           (find_all { |i| i == pivot }) +
           (find_all { |i| i >  pivot }).quick_sort
  end
end
def runTest1a(n)
  # Run the application
  start1 = java.lang.System.nanoTime()
  arr = Array.new
  (1..n).each {
    arr << rand(100000)
  }
  res = arr.quick_sort
  end1 = java.lang.System.nanoTime()
  diff = (end1 - start1) * 1e-6  
  puts "Elapsed Time: #{diff} ms"
end
# End of Example


Test     PerformanceTime(ms)
#####    ########### 
java11   1190.1847739999998              
scala11  1989.283518            
clj11    11852.013362

The line graph is a combination of Java, Scala, and Clojure execution times with an increasing number of random elements. Here are the results for the Clojure quick sort:


iterat   exectime(ms)
#####    #########
81       8.019583
243      16.82068
729      69.767482
2187     121.894408
6561     370.054832
19683    1096.553958
59049    3245.377605
177147   11852.013362

All Tests
I ran 14 different tests, ranging from a simple hello world application to the wider finder test. The figure below depicts some of the results.


CljTest PerformanceTime(ms)
##### ########### 
hello1  145.996411    Simple hello world test, more than 10k iterations               
build2  3474.550155   Build large and small set of random strings
str3 495.028198    Various string operations           
euler4  341.872834    Euler problem number 1
euler5  116.512002    Euler problem number 2
str6 213.379223    Additional string operations
math7 1212.796289   Misc Math operations, including sqrt and other calls            
coll8  16356.185166  Collection operation, simple word frequency test
frac9  354.23239     Mandelbrot fractal
sort11  438.477163    Sorting routines, mostly quick sort
vars12  9542.282043   Large and small object creation
wide13  907.491465    Wide finder project, 10k and smaller number of lines


JavaTest  PerformanceTime(ms)
##### ########### 
hello1  27.110056         Simple hello world test, more than 10k iterations           
build2  1940.5321219      Build large and small set of random strings    
string3 1673.399982       Various string operations          
euler4  23.101229         Euler problem number 1           
euler5 1.5945639999999   Euler problem number 2           
str6 436.737084        Additional string operations           
math7 131.1656449998    Misc Math operations, including sqrt and other calls           
coll8  972.775771        Collection operation, simple word frequency test      
frac9  247.63905499998   Mandelbrot fractal          
sort11  62.24679          Sorting routines, mostly quick sort              
wide13  94.820562         Wide finder project, 10k and smaller number of lines

Source for Results and Code

All of the source is available (common document license/BSD license) through the google-code browsable SVN URL. Download the results, language source code or gnuplot source. Unfortunately, this data is spread out haphazardly throughout the repository.

JVM Notebook - Test Results, Charts and Gnuplot scripts

JVM Notebook - Java, Clojure, JRuby, Scala Source

On Garbage Collection

"If the garbage collector has become a bottleneck, you may wish to customize the generation sizes. Check the verbose garbage collector output, and then explore the sensitivity of your individual performance metric to the garbage collector parameters." -- Tuning Garbage Collection [4]

The default arrangement of generations (for all collectors with the exception of the throughput collector) looks something like this.


Here is some output from the garbage colection report:
...
[GC 1277K->390K(5056K), 0.0006050 secs]
[GC 1286K->399K(5056K), 0.0005540 secs]
[GC 1294K->407K(5056K), 0.0005580 secs]
[GC 1303K->416K(5056K), 0.0009580 secs]
[GC 1311K->424K(5056K), 0.0006540 secs]
[GC 1320K->431K(5056K), 0.0007520 secs]
[GC 1327K->431K(5056K), 0.0012980 secs]
...
...

"With the line below, indicate the combined size of live objects before and after garbage collection, respectively. After minor collections the count includes objects that aren't necessarily alive but can't be reclaimed, either because they are directly alive, or because they are within or referenced from the tenured generation. The number in parenthesis"

(776768K)(in the first line)

"is the total available space, not counting the space in the permanent generation, which is the total heap minus one of the survivor spaces. The minor collection took about a quarter of a second." [5]

The usage of the entire heap was reduced to about 51%

196016K->133633K(261184K)

The output is generated with the 'verbosegc' and other JVM options:
(Note: The duplicated 'verbosegc' options gives more information about the garbage collecting)

java -verbosegc -verbosegc -verbosegc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps CrashJVM -Xms24m -Xmx32m > gc_anal.txt

There are 1400 data points in the output file and in the image for the Clojure sort example. Each GC line in the gc_anal analysis file contains GC information about a minor garbage collection.

[GC [<collector>: <starting occupancy1> -> <ending occupancy1>, <pause time1> secs] 
    <starting occupancy3> -> <ending occupancy3>, <pause time3> secs]

"If the incremental collection of the tenured generation cannot be completed before tenured generation is full, a major collection occurs and is indicated in the verbosegc output by the 'MSC' in the output."


<starting occupancy1> is the occupancy of the young generation 
   before the collection
<ending occupancy1> is the occupancy of the young generation 
   after the collection
<starting occupancy3> is the occupancy of the entire heap 
   before the collection
<ending occupancy3> is the occupancy of the entire heap 
  after the collection

see http://java.sun.com/docs/hotspot/gc1.4.2/example.html

Our GC chart will only show 'used memory after the minor GC' and the total available memory after.

Here is the gnuplot representation of the used memory and total available after a minor garbage collection. There were 1400 data points in the clojure test. This is after running the quick sort code above.


Data Points used in graph:
used after gc | total after gc
...
...
26605 32576
26744 32576
27180 32576
27190 32576
27362 32576
27551 32576
line: 1421 -- 27684 32576
line: 1422 -- 27843 32576

The results for the pure Java version are presented below:

Scala results, 250+ minor garbage collections.

Large Objects

The quick sort example is not a good test to really push the garbage collector. Here is another test, one with Clojure and one with Java. I instantiate a large number of large objects and do the same for small objects. The garbage collector is better at handling smaller objects and consequently not as good at handling large objects (using the default GC rules). So, if you are looking at performance issues, you might look at how often large objects are being created and how long you are holding onto those objects. It is better to create many small objects and retain them for a short time than creating a few number large objects and retaining them for a long time. For example, I guess it is better to write short, static, utility methods and only create objects local to that method. "Large objects might be too big for Eden and will go directly to the old generation area; they will take longer to initialize (when setting fields to their default values such as null and zero); and they might cause fragmentation" [6]. A large object is one that the size of the allocation of the large object within a memory heap exceeds a maximum contiguous free space within the Java heap.

I attempted to create a large object that had many fields with lots of String data. Using my approximate sizeof utility method, it looks like it takes 800kb to create this object.


private Object obj5b = StringUtils.concat(StringUtils.randomString(random, (14 * kb)), 
  StringUtils.randomString(random, (14 * kb)));

With this code, there were 4700 minor garbage collections. It was interesting that total available memory never goes over 14mb. Here is the final console system output after running the program. The last line shows the used memory and total available JVM memory.


53.236: [GC 53.236: [DefNew: 542K->28K(576K), 0.0002510 secs] 2155K->1642K(4448K), 0.0002997 secs]
53.243: [GC 53.244: [DefNew: 540K->25K(576K), 0.0002716 secs] 2154K->1639K(4448K), 0.0003217 secs]
53.252: [GC 53.252: [DefNew: 537K->18K(576K), 0.0002698 secs] 2151K->1631K(4448K), 0.0003206 secs]
"Elapsed time: 9691.390453 msecs"
Done
Performing simple garbage collection cooldown
(used:2M/2M [4M,31M ])
(used:2M/2M [4M,31M ])

Format for the memory usage line (clojure source to create the line):

(defn *memory-usage* []
  (str "(used:" (*used-memory-m*) "M/" (*free-memory-m*) "M [" (*total-memory-m*) "M," (*max-memory-m*) "M ])"))

Shown below is the output for the Java test:


...
87.459: [GC 87.459: [DefNew: 441K->48K(640K), 0.0002922 secs] 5147K->4905K(8968K), 0.0003323 secs]
87.468: [GC 87.468: [DefNew: 624K->64K(640K), 0.0004192 secs] 5481K->5217K(8968K), 0.0004631 secs]
87.470: [GC 87.470: [DefNew: 213K->51K(640K), 0.0002372 secs] 5367K->5269K(8968K), 0.0002763 secs]
87.473: [GC 87.473: [DefNew: 627K->11K(640K), 0.0003707 secs] 5845K->5774K(8968K), 0.0004112 secs]
Elapsed time: 47982.607108 msecs
(used:5M/3M [8M,31M ])
(used:5M/3M [8M,31M ])
(used:5M/3M [8M,31M ])
Done

So far, we have only shown information on minor garbage collections. Here are the results of the major collects. The major collection are delineated by Tenured.


3.657: [GC 3.657: [DefNew: 568K->63K(576K), 0.0003546 secs]3.658: [Tenured: 7464K->506K(7508K), 0.0117550 secs] 7934K->506K(8084K), 0.0122011 secs]

You can see, major garbage collects seem to take more time and there are fewer of them in this result set.


...
1167 4000
1235 4000
1193 4000
line: 156 -- 1260 4000

And the Java results. There were 440+ GC collects.

Additional Tools

There is no shortage of tools for monitoring JVM performance. Mature, open and proprietary monitoring applications are available. Most of them focus garbage collection, heap memory, cpu monitoring, and method trace calls. The netbeans monitor, Sun's jconsole, jrat, and the Eclipse memory analysis tool are a few that I have used recently.

Looking at the top objects with the test examples

The chart report output from Eclipse's Memory Analysis Tool is depicted below. The overview contains the number of objects, number classes and classloader. The MAT tool also looks at potential memory leaks and the objects that are causing the problem, "biggest consumers" and a heap object histogram. To generate the data, we simply generate a heap dump and open the heap dump file.

Even with the 'hello world' test, you can see that many objects were created. And no, the Clojure hello world application that I provided is not the typical hello world app. I actually do some basic multiplication calculations over 1,000,000 iterations. Hello world in this case is a misnomer.

Here is comparable Java output. According to the Eclipse mat tool, almost 9000 objects were created.

Scala and JRuby Class Histogram Results:

Running JRat, Netbeans Profilers

Running JRat or Netbeans profilers is as simple as running with the appropriate JVM args allowing your application to run and then opening the files output after the program has exited.

I downloaded the shiftone-jrat.jar jar file, placed it my current working directory. And then added these args:

The screenshot above contains the jrat output after running the Clojure mandelbrot shootout application. I didn't want to analyze the results but there are a couple of stats worth taking a quick look at. The clojure.lang.Numbers.lt method had 87 million exits and a total method time of 142 milliseconds. The Numbers.ops method was called 367 million times.

The netbeans profiler contains similar profiling statistics as the jrat tool but the Netbeans profiler contains live, realtime results and a host of other metrics.

Additional Tests from Third Party (from AndyF):

Andy, a developer provided great Clojure performance information. I ran some of his
Clojure source. Here are the runtime, garbage collection and memory results.
Source
JVM Notebook SVN source (Andy Benchmark) - Java
JVM
Notebook SVN source (Andy Benchmark) - Clojure

Original Source

Updated Benchmark Results (the rcomp test shows 1:15 performance speed for Clojure)

I ran two types of shootout like tests, the threaded, non-threaded Mandelbrot test in Java and Clojure. Performance for these were similar to the all of the tests that we have run up to this point. The mandelbrot code seemed to perform slower than the 1:10 (java:clojure) speed ratio that we encountered. For example, the Java threaded code ran at '1700.062' ms. The Clojure version ran at 50998.30 ms.


MandelSiz| exectime.java(ms) exectime.clojure(ms)
#####    #########           #####
100      11.515956           90.97791
200      22.154556           263.893038
300      42.036901           610.872097
400      70.4424729          982.452582
500      105.705857          1495.433005
600      151.296857          2091.412938
700      201.6435589         2843.495665
800      260.195961          3650.13744
1000     407.073079          5680.39932
1200     577.353532          8208.967304
1400     785.8052849         11126.994209
2000     1593.239278         22588.712673
2500     2487.326002         35210.526506
3000     3576.071694         50998.301209

The chart above contain the garbage collection results after running the Clojure mandelbrot shootout test.

The following test results are taken verbatim from the clojure-benchmark result set:


Additional Results straight from the shootout source (by AndyF):
See:
http://github.com/jafingerhut/clojure-benchmarks/
------------------------------------------------
Times are real / user / sys on iMac
------------------------------------------------
        |  sbcl  |  perl  |   ghc  |  java  |   clj
------------------------------------------------
           noT      noT            |  noT
long test on iMac:
rcomp   |   8.3  |  11.9  |        |   8.5  | no implementation yet
        |   4.8  |   7.9  |        |   3.6  |
        |   2.1  |   2.3  |        |   1.4  |

long test on iMac:
mand-   | wrong  | out of |  32.7  |  28.6  | 340.4
elbrot  | output | mem    |  59.3  |  54.4  | 350.5
        |        | (?)    |   0.8  |   0.4  |   4.7

           noT      noT            |    T
long test on iMac:
k-nuc-  | 190.9  | 306.0  |  90.5  |  52.4  | 1677.6 (27m 57.6s)
leotide | 187.9  | 302.7  | 130.8  |  89.6  | 2245.1 (37m 25.1s)
        |   2.4  |   1.9  |   4.6  |   1.8  |   24.2 (    24.2s)

Result: Replacing dna-char-to-code-val with a macro did
not speed things up, and may have slowed them down:
                                            | 1800.0 (30m  0.0s)
                                            | 2317.0 (38m 37.0s)
                                            |   30.6 (    30.6s)
------------------------------------------------

k-nucleotide long test on benchmark shootout machine:
        | 164.9  | 249.8  |  52.0  |  20.6  |
        | 164.9  | 246.7  | 112.5  |  58.8  |
        |     ?  |     ?  |     ?  |     ?  |
------------------------------------------------
k-nucleotide medium test on iMac:
        |   8.6  |  12.7  |   3.9  |   3.9  |   64.2 / 69.6 / 69.1 / 67.1
        |   7.9  |  12.5  |   5.4  |   5.7  |   98.4 / 92.9 / 93.1 / 88.1
        |   0.6  |   0.1  |   0.3  |   0.2  |    1.5 /  1.6 /  1.6 /  1.7

k-nucleotide medium test, all clj, modified-pmap with specified number
of parallel threads, on iMac:
        |   1    |   2    |   3    |   4    |   5    |   6    |
        |  74.9  |  70.7  |  77.2  |  76.8  |  82.5  |  77.8  |
        | 125.9  | 122.1  | 134.6  | 134.0  | 143.4  | 134.0  |
        |   2.5  |   2.8  |   2.8  |   2.3  |   3.1  |   2.3  |
------------------------------------------------
Hardware and software used
iMac with Intel Core 2 Duo, 2.16 GHz, 2 GB RAM, Mac OS X 10.5.7
% java -version
java version "1.6.0_13"
Java(TM) SE Runtime Environment (build 1.6.0_13-b03-211)
Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02-83, mixed mode)
% javac -version
javac 1.6.0_13

user> (clojure-version)
"1.1.0-alpha-SNAPSHOT"

End of Andy's Configuration:
------------------------------------------------

The following benchmark information were provided
by Andy from his website.   Visit:

http://github.com/jafingerhut/clojure-benchmarks/

Some additional google clojure threads:
http://groups.google.com/group/clojure/browse_thread/thread/e4299911f4ae8167

Testing Environment

For this document, all of the tests were run with Java 5. (Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_11-b03)). Some tests were run with both the -client and -server options. The majority of tests were run with the -client option. Headius describes the -server option, "The -server option turns on the optimizing JIT along with a few other "server-class" settings.". Updates to the this document will include tests with Java 6.

Operating System and Hardware Environment:

Java: Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_11-b03)
Windows: Windows XP Professional Service Pack 2
Hardware: Intel Core 2 CPU 6300 - 1.86ghZ, 3.49 GB of RAM

Virtual Machine Languages:

The focus of these tests are mostly to look at Clojure and how it compares to pure Java code. Clojure is included in all of the tests. There are only two tests for Scala.

Clojure: 1.0.0/ 2009-05-04
JRuby: 1.3.1 / 2009-06-15; 44 days ago
Scala: 2.7.5/ 2009-06-03; 50 days ago

Resources
[1] - measure quote - http://www.javaperformancetuning.com/news/newtips087.shtml - quote from the java performance tips. I have not found the original source of this quote.
[2] - Rick Ross quote - Farewell to the J in JVM
[3] - Zed Shaw on Statistics - http://zedshaw.com/essays/programmer_stats.html
[4] - Clojure Home - http://clojure.org/
[5] - Tuning Garbage Collection - http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html
[6] - On large vs small - http://chaoticjava.com/posts/gc-tips-and-memory-leaks/
[7] - Scala's, static typing - http://en.wikipedia.org/wiki/Scala_(programming_language)#Static_typing
[8] - http://creativekarma.com/ee.php/weblog/comments/static_typing_and_scala/
--------
Additional Links

http://clj-me.blogspot.com/2008/06/widefinder-2-in-clojure-naive-port-from.html
http://onjava.com/pub/a/onjava/2001/05/30/optimization.html
http://onjava.com/onjava/2001/05/30/examples/Test.java
http://www.javaperformancetuning.com/news/newtips087.shtml
http://blog.headius.com/2009/01/my-favorite-hotspot-jvm-flags.html

http://jrat.sourceforge.net/
http://www.netbeans.org/features/java/profiler.html
--------
Download or Browse All Source:
JVM Notebook - Test Results, Charts and Gnuplot scripts
JVM Notebook - Java, Clojure, JRuby, Scala Source

*Download All Source as Archived File*

Partial PDF version of this document
Tex source for this document
--------
Addendum (gnuplot and regex)

I always feel that charts and graph are great visuals for displaying simple or complex concepts. I only know the surface level details of gnuplot, but here are some notes on creating the images above. I used a simple python script to analyze the GC statements generated by the JVM verbosegc output and then created a tabulated data file that can used with gnuplot.

I typically will create a shell script to set all the gnuplot commands and then launch the application:


#!/bin/sh
# Also see:
# http://www.duke.edu/~hpgavin/gnuplot.html
#
# Use 'eog' eye of gnome to view the images

OUTPUT_IMG_FILE=image_gc_analysis.png
INPUT_PLOT_FILE=image_gc_analysis.txt

# Output these commands to a temp file and then run gnu plot 
echo "set terminal png
set output '${OUTPUT_IMG_FILE}'
set title 'GC results'
set size 1,1
set key left top
set autoscale
set xlabel 'gc_iter'
set ylabel 'gc_main_paus_time'
plot '${INPUT_PLOT_FILE}' using 1:6 title 'GC Results'
" > gnuplot_tmp_cmd.tmp 

# Run the gnu plot command
gnuplot gnuplot_tmp_cmd.tmp > /dev/null

# For future reference, add a command and title to compare results 
# E.g.
# '/tmp/data2' using 10 with lines title 'Benchmark 2',

On Regex
In the gc_parse.py python application, I used a couple of lines of regex to find the GC patterns generated by the verbose gc JVM output.


def get_gc_stats(line):        
        # Example Output java 1.5:
        # 20.049: [GC 20.049: [DefNew: 1941K->137K(1576K), 0.0014153 secs] 11194K->11117K(1984K), 0.0014628 secs]
        # Example Output java 1.6:
        # 20.049: [GC 28.200: [DefNew: 1921K->137K(1984K), 0.0006890 secs] 23030K->21247K(27320K), 0.0007550 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
        
        # Group1 = First Iter Value
        # Group2 = Second 'GC' Iter value
        regex_begin_iter = '^(\S*):\s*\[GC\s*(\S*):\s*'
        # Group 3 = Young Gen Start Occupancy
        # Group 5 = Young Gen End Occupancy
        regex_def_new    = '\[DefNew:\s*(\S*)(K|M)\-\>(\S*)(K|M)\((.*)$'
        regex_full_str   = '%s%s' % (regex_begin_iter, regex_def_new)
        pattr = re.compile(regex_full_str)
        # Group 1 = Main Start Occupancy
        # Group 2 = Main End Occupancy
        pattr_sub = re.compile('^(.*) (\S*)(K|M)\-\>(\S*)(K|M)\((.*)$')
        ...
        ...

------
Header Image from:
http://upload.wikimedia.org/wikipedia/commons/5/53/BabbageDifferenceEngine.jpg
------

Tuesday, July 28, 2009

Clojure Performance and Discussion

Here is a post verbatim from AndyF on the Clojure discussion forum.

http://groups.google.com/group/clojure/browse_thread/thread/289904c1a5deb8d8

BEGIN-QUOTE:

Ive (Andy) got a start at programs for solving two of the problems solved in
many other languages on "The Computer Language Benchmarks Game" web
site:

http://shootout.alioth.debian.org

In particular, the k-nucleotide problem has as a significant part of
its computation time a similar task to what you have done -- tallying
the number of times each unique item appears in a collection, using a
map (which most programs on the web site implement with a mutable hash
table, not a persistent map like in Clojure).

I also have adapted some solutions for calculating complex numbers in
the Mandelbrot set from a discussion in this group from several months
back, and addded a bit of code so the input/output match what is
expected for the Mandelbrot benchmark. I don't have Clojure
implementations for the other problems on the web site, yet (I'd be
happy to add them to my collection if you'd like to send them to me).
Here are some comparisons of run times on my iMac with 2.2 GHz Intel
Core 2 Duo, 2 GB RAM. I make no claims that these are the best
Clojure implementations that could be made, but I have done several
versions that were slower and/or didn't finish due to running out of
memory, before getting to the Clojure versions I have now. This is
intended to be viewed in a fixed width font. You can also look at the
RESULTS file in the zip/tar.gz files linked below, where you can get
the full programs and test inputs I used.

Times are real / user / sys on my iMac. real means elapsed time from
start to finish, which can be less than (user+sys) in the cases where
the program explicitly uses both processor cores.

| sbcl | perl | ghc | java | clj
-----------------------------------------------------
mand- | wrong | out of | 32.7 | 28.6 | 340.4
elbrot | output | mem | 59.3 | 54.4 | 350.5
| | (?) | 0.8 | 0.4 | 4.7

k-nuc- | 190.9 | 306.0 | 90.5 | 52.4 | 1677.6 (27m 57.6s)
leotide | 187.9 | 302.7 | 130.8 | 89.6 | 2245.1 (37m 25.1s)
| 2.4 | 1.9 | 4.6 | 1.8 | 24.2 ( 24.2s)

For these two programs at least, the Clojure implementations have a
total CPU time (the sum of the 2nd and 3rd numbers reported above) of
6.5 times more for Clojure vs. Java on the mandelbrot program, and 25
times more for Clojure vs. Java on the k-nucleotide program.

Here are links to zip and .tar.gz files containing the programs and
input files used to produce these results. They've been tested in Mac
OS X, but I suspect they ought to work on Linux with little or no
modification. Windows users would need to do a little more to run the
programs, but it shouldn't be difficult. Sorry, they are each about
75 Mbytes, primarily because a few of the sample input and expected
output files are quite large.

http://homepage.mac.com/jafingerhut/files/language-shootout.tar.gz
http://homepage.mac.com/jafingerhut/files/language-shootout.zip

For the k-nucleotide program, I think this may be a fundamental issue
with persistent implementations of maps / hash tables, but I'd be
happy to find a better implementation of them, if that would help
similar Clojure programs to speed up. I haven't read Chris Okasaki's
book on functional data structures yet, but I suspect a good
understanding of its contents might lead to some improvements. I've
read that Okasaki's book doesn't spell out an implementation for hash
tables, but leaves it as an exercise for the reader, so don't rush off
and buy the book expecting to have a quick job of translating some
pseudo-ML into Java.

I thought it was interesting that even the Haskell entry to the k-
nucleotide benchmark uses a *mutable* hash table (at least, I think
they are from the discussion on the Wiki page linked below -- my
Haskell knowledge isn't extensive enough to understand all of their
code). I don't think that is idiomatic Haskell, but the people
writing the Haskell entry are apparently willing to forego pure
functional programming when they can get significantly better
performance from a mutable data structure.

http://www.haskell.org/haskellwiki/Shootout/Knucleotide

In terms of what instructions the processor is running every time a
mutable hash table is updated, it is roughly "calculate the hash
function, look up the corresponding entry in the hash table, compare
the keys for exactness and perhaps traverse more entries looking for
the exact match, then add 1 to the count when you find the matching
entry".

Clojure's persistent hash map is calculating the hash function, then I
believe the basic idea is looking that hash value up in a tree of
nodes, each of which is a Java array of 32 elements. When it finds
the matching entry in that tree, doing the "assoc" to "modify" the
count of an entry involves copying that array of 32 elements, with one
element in the copy different than the original, and then going back
up the tree to the root, creating new copies of arrays of 32 elements,
each with one element different than the original, pointing at the new
copied child. I'm sure there are special cases for small maps, but I
think that is the behavior for sufficiently large hash maps. If so,
then it is pretty easy to see that the processor is doing
significantly more work for each map update, i.e. assoc call, than any
implementation that uses mutable hash tables.

Of course, Clojure makes it easy to call Java code implementing
mutable hash tables, if performance is all-important for such a task,
and you realize the issues with having a mutable data structure vs.
Clojure's persistent data structures.

END-QUOTE -- From Andy F

Update-1:

The source for these examples are on github:
git://github.com/jafingerhut/clojure-benchmarks.git

-- Ron Paul 2012

Monday, July 27, 2009

A whole food diet (post Food Inc)

I haven't watched Food Inc yet, but I get more from all the people just talking about it. So far, it seems to be a major work of art. With heart disease and cancer being a major killer in America, it is time for all of us to start thinking about what goes into our bodies.

Also as part of my research, I am concerned and wanting to change my diet. Most of my research has come from the Whole Foods websites and blog. Check it out as I don't have much else to say on the subject.

http://blog.wholefoodsmarket.com/

Saturday, July 18, 2009

Light Log File Viewer - Alpha Release

Light Logs is a sub project of the light text editor. The light log viewer is a simple desktop application for viewing log files that are generated by a web server. But it may also be useful for viewing other text files. This application is especially useful for viewing logs generated by a J2EE Servlet container like the Apache Tomcat or a full J2EE Application Server like Weblogic.

Latest version for the Light Log Viewer is 0.8.20090715

http://code.google.com/p/lighttexteditor/

http://code.google.com/p/lighttexteditor/wiki/LightLogViewer

Monday, July 13, 2009

Nightly Meal Deals in Atlanta (mostly midtown area)

Google Calendar Listing of Meal Deals in the Atlanta area. I haven't tried most of these, so you may want to call before you go.

http://www.google.com/calendar/embed?src=of063g67ea7kkhbjds18rjec14@group.calendar.google.com

Atlanta Nightly Deals

Sunday, July 12, 2009

Great example on using git for small projects, simple merge

I asked the following question on stackoverflow:

"I have a source directory that includes a mix of different languages and files. Probably 100+ total files. I want to 'fork' that code and create a completely different version of the source. At this point, I will have version-1 of the code and version-2 of the code. I will not make that much development on version-1. I want to change the documentation headers and some other small changes. I will do a lot of development on version-2 and then come back a month or two later and merge version-2 with version-1 (set the latest version-2 to version-1).

But, I don't want to lose any changes that I made to version-1 (say after months of small fixes)."

Here is the response.

--- Begin Quote ---

git only creates A SINGLE .git directory. svn is the one that scatters .svn directories in EVERY directory.

Your time would be far better spent learning useful git commands (such as git merge --squash). Use git, it will do exactly what you want with no trouble.

Edit:

With git, you can keep it all in one place, switch back and forth at a whim, and make as many commits as you want.

In order to reduce conflicts, you may want to keep Phase1 merged into Phase2 (as you make infrequent changes in Phase1). But that is totally up to you.

Here is how I would do it with git:

cd /path/to/project
git init
git add .
git commit -m "Initial Commit"

At this point, you are on the master branch with a single commit called "Initial Commit". Now, let's create branches.

git branch Phase1
git branch Phase2

Now, to work on Phase2:

git checkout Phase2
... work ...
git add ...
git commit

Switch to Phase 1 and do some work.

git checkout Phase1
... work ...
git add ...
git commit

Switch to Phase 2 and do some more work.

git checkout Phase2
... work ...
git add ...
git commit

At the appropriate time, pull changes from Phase1 into Phase2 (or the other way around):

git checkout Phase2
git merge Phase1
... resolve any conflicts and commit if needed ...

Repeat... You'll be able to make as many commits, merges, and branches as you need to stay on top of the project.

Also, use git tag ... to create tags that point to a given commit. That way you can always go back without having to sort through the history.

Edit:

We run git from the command line, so I am not very familiar with what GUI tools exist. It should not be hard to find.

When you run into a conflict, git marks the files as conflicted (git status) and also merges them as best possible. Where the merge cannot complete, it leaves very clear markers in the files:

<<<<<<< yours:sample.txt
Conflict resolution is hard;
We went shopping yesterday.
=======
We go shopping today.
>>>>>>> theirs:sample.txt

So you simply delete one of the two, and edit the remainder to suit.

It rarely happens and is very easy to clean up.

--- End Quote ---

Quote is from user, 'gahooa'. Thanks for the response.

Note: this material is attributed to stackoverflow,

what-are-some-simple-good-ways-to-eventually-merge-two-directories-of-source/1112232#1112232