Saturday, December 30, 2023

Monday, December 25, 2023

Opening up this blog again

 See my posts here and twitter.

Here is a poem for you.

The chaos lives among us

For the chaos is real

The math is real

twitter/x:

https://twitter.com/BerlinBrownMech

Friday, April 14, 2017

On Unit Testing, Java TDD for developers to write

It has been a while, let's kick off 2017 with a blog entry.

I have read and I am reading about four or five posts a day about unit testing. It really has been a long time obsession for me. I have moved past the technical and practical considerations on unit testing frameworks and done with the debates with "should you use Junit or Mockito or Karma?" I am more interested in the psychology of unit testing, who does it, likes it, hates it? It really is one of those easy to learn, hard to master concepts. For example, many many may play chess when they are young and can end up being horribly chess players most of their life, I am part of that majority. Unfortunately, I have never played chess and sat down for hours and tried to master it. I never see the common patterns or have a developed end game. I mostly just play with a knowledge of the basic rules. Following good unit testing practices within your software development shop is a lot like playing chess. It is easy to learn and difficult master. Actually, there are a lot of big differences, chess is a game, chess is not coding, and people take their software development very seriously. So if you don't master unit testing, but are able to complete your job tasks, some might argue that is an acceptable risk in the world of software development. And why master chess or master unit testing? If developers are fine without unit testing, then why even suggest it. Some developers just don't want to invest the energy to master the practice. And in some development shops, there is no hard requirement to do so.

I am not going to convince you to write unit tests with this one post, I will leave that up to software guru Martin Fowler and the people at ThoughtWorks who have written large tomes on the subject. But I will present my thoughts on why some developers won't write unit tests but why they should. Those developers and architects that do advocate unit testing generally fall into that category where they have written just enough unit tests to find it useful and they generally love the practice, they also encourage others to follow along. I am sort of in that camp, I have almost become religious about it. I can't imagine my real code without unit tests and I just feel guilty by only testing through manual functional testing.

Jeff Atwood of Coding Horror wrote a short blog post on the topic, "I Pity the Fool Who Doesn't Write Unit Tests". Here is the one blurb that stuck out for me, "Even if you only agree with a quarter of the items on that list-- and I'd say at least half of them are true in my experience-- that is a huge step forward for software developers". And this one, "It's more fun to code with them than without". That is the essence of this unit testing religion, we can't force it on developers and we can't force developers to write unit tests only a certain way. I and many others don't believe in the practice of 100% coverage. You will rarely get there anyway, depending on the project or company. Some will argue that you shouldn't break the rule on non-determinism and this is a big one. Basically, the unit test should return the same output every time you run the test. You should avoid breaking this rule for unit tests but you can still write and add automated integration tests to your suite and not waste time, combine a collection of unit tests and integration tests. A simple integration test might test connecting to your REST microservice and validating the HTTP status code. At that point, your test moves into the integration testing category. If you connect to the database, run a particular SQL statement and validate data model returned from the SQL invocation, then your test is basically integration. Both scenarios are not units are non-deterministic but I would still consider them to be useful. Also, as a start for new developers getting familiar with unit testing, writing integration tests may be more familiar to them than decomposing or refactoring their code for a real unit test. There is a benefit in database or HTTP integration tests, you can add them to a test suite and run them in a automated form after a code change and after a build. Even bad tests can be useful.

Misko Hevery is creator of one of the most popular JavaScript frameworks to emerge in the last couple of years. It is a Google project that he started working as an Agile Coach. As he puts it, he wants to maintain the high level of automated testing culture at Google. Most of his published articles are not about AngularJs but on the benefits of automated testing. I can only imagine that he developed the MVC JavaScript framework because the old crop of frameworks were a pain to work with for developers. They were not testable.
I have given my advocacy speech on unit testing, but how do I use it, what practices do I follow?
  • For every piece of new code, I formulate a unit test case. New code could include my model structure or interface into my Java services. This is critical, unit testing encourages you to write testable early code. Meaning, I try to use interfaces and abstract classes which allow me to inject mock objects early in the development process.
  • For local development, I can build, write code, write and update my unit tests and then run the automated suite of tests. The key part is re-running the test suite. Normally I want my unit tests to pass, if they don't pass then I can look at my code and refactor. Also, the code I write today, I can run a year from now, I should expect the same result.
  • As you are writing your unit tests. Have fun, this is not production code, the unit tests don't run in production, you can test input as little or as much as possible.
  • I try to avoid unit tests around code that doesn't do anything. Write unit tests around your modules that have some kind of behavior. We shouldn't write model POJO code with setters and getters, but there is no reason to test a setter method. It is more fun to code around the real functionality.
  • Writing unit tests also encourages the developer to write testable code
  • Write Java code that doesn't use static methods or variables. Imagine that, try writing code that doesn't make use of the static keyword. Why would you do this? Static, class level routines are procedural and inherently hard to test. You can override their functionality, they are completely class level.
  • Writing unit tests encourages refactoring. Some refactoring may include the use of OOP techniques. Use interfaces and abstract classes.
  • Use a DI/Dependency Injection framework like AngularJS (yea I called AngularJS DI), Spring or Guice. DI frameworks encourages the container to create new objects for you. Managing objects on your own and using the 'new' operator encourages untestable code.
In Summary, see what Jeff Atwood, Martin Fowler and Misko Hevery have said about Unit Testing. And we pity the fool that don't do it.

Thursday, May 8, 2014

This is my XSS hack servlet

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
public class SimpleServletXSS extends HttpServlet {
    
    private static final long serialVersionUID = 1L;

    @Override
    public void doGet(final HttpServletRequest request, final HttpServletResponse response) throws ServletException, IOException {
        doPost(request, response);
    }
    @Override
    public void doPost(final HttpServletRequest request, final HttpServletResponse response) throws ServletException, IOException {
        final StringBuffer buf = new StringBuffer();
        final String method = request.getMethod();  
        buf.append("<html><head>\n<!-- HEAD -->\n</head>\n <body>  <br />  <form method='post' action='SimpleServletXSS'>\n<textarea cols='40' rows='5' name='hack1'> &lt;script&gt; alert(\"XSS\");&lt;/script&gt; </textarea>  <br /> <textarea cols='40' rows='5' name='hack2'>&lt;script&gt; alert(\"XSS\"); &lt;/script&gt; </textarea> \n<!-- DATA -->\n");        
        buf.append("<br /><input type='submit' value='SUBMIT'>");
        buf.append("</form></body></html>");
        final String html = buf.toString();        
        final PrintWriter out = response.getWriter();
        if ("GET".equalsIgnoreCase(method)) {
            out.println(html);
        } else if ("POST".equalsIgnoreCase(method)) {
            final String head = request.getParameter("hack1");
            final String data = request.getParameter("hack2");
            System.out.println(head);
            System.out.println(data);
            out.println(html.replaceAll("<!-- HEAD -->", head).replaceAll("<!-- DATA -->", data));            
        } else {
            throw new ServletException("Error");
        }
    }
    
}
 
}

Thursday, February 7, 2013

Wolfram's Cellular Automata, A New Kind of Science and Example Squaring Rule (2)


Overview - Playing the Game Of Life

When most computer users upload a profile image from their desktop to Facebook's website they don't stop to think about the simple binary math rules that are fundamental to most digital devices. We realize that 4 gigabytes of RAM is more memory than 512 megabytes but we don't visualize the logic chips that are involved in an xor $0x100, eax operation for a 32-bit CISC processor. Software developers have to consider memory management or how a computer's operating system loads their programs into memory. They don't normally consider VHDL logic circuit designs, the data paths, arithmetic logic units or the millions of transistors that make up a modern CPU. Those low-level details have been intentionally hidden from the user application developer. The modern CPU may have changed dramatically over the last decade but at the heart of early digital computing were simple Boolean operations. These simple rules were combined together and logic replicated to load programs into memory and then execute. The rules that control most digital devices are based on elementary Boolean rules. Cellular automata has a similar bottom-up approach, rules consist of simple programs (as Stephen Wolfram calls them) that apply to a set of cells on a grid.


Conway's Game of Life cellular automaton is one of the most prominent examples of cellular automata theory. The one dimensional program consists of a cell grid typically with several dozen or more rows and similar number of columns. Each cell on the grid has an on or off Boolean state. Every cell on the grid survives or dies to the next generation depending on the game of life rules. If there are too many neighbors surrounding a cell then the cell dies due to overcrowding. If there is only one neighbor cell, the base cell dies due to under-population. Activity on a particular cell is not interesting but when you run the entire system for many generations, a group of patterns begin to form.

More on Conway's Game of Life




Figure: Game of Life Output

You may notice some common patterns in the figure. After so many iterations through the game of life rules, only a few cells tend to stay alive. We started with a large random number of alive cells and over time those cells died off. In a controlled environment you may begin with carefully placed live cells and monitor the patterns that emerge to model some other natural phenomena.



Figure: Common Game of Life Surviving and Oscillating Patterns


A New Kind of Science

The name Stephan Wolfram has been mentioned several times in this post. He is the founder of Wolfram|Research, his company is known for the popular Mathematica software suite and Wolfram|Alpha knowledge engine. He did not initially discover cellular automata but recently he has been a prominent figure in its advocacy. He spent 10 years working on his book, A New Kind of Science. In the 1300 page tome, he discusses how cellular automata can be applied to every field of science from biology to physics. NKA is a detailed study of cellular automata programs.

Basic Cellular Automata




Figure: Wolfram's Elementary CA Rule 30. Look at 3 bit input and 1 bit output.


The diagram above depicts the rule 30 program (or rule 30 elementary cellular automaton). There are 8 input states (2 ^ 3) and an output state of one or zero. If you look at the diagram from left to right. The first sequence of blocks on the left depict an input state of { 1 1 1 } with an output of 0. Given input of cells { 1 1 1}, the output will be set to 0. Subsequently, the next set of blocks consist of an input state of { 1 1 0 } with an output of 0.

Here is python pseudo code for processing rule30 input:
def rule30(inputCell_0, inputCell_1, inputCell_1): {
  if inputCell_0 == 1 and inputCell_1 == 1 and inputCell_2 == 1
    return 0:
  else if inputCell_0 == 1 and inputCell_1 == 1 and inputCell_2 == 0:
    return 0:
  ...
}
grid = new Grid(100, 100)
grid[row0, col50] = 1 # Enable first cell on row zero
for j until 100:
  for i until 100:
    valsForNextRow[i] = rule30(inputLastRow[i - 1], inputLastRow[i], inputLastRow[i + 1])


Example of first three cases using a boolean notation:
{ 1 1 1 } -> 0
{ 1 1 0 } -> 0
{ 1 0 1 } -> 0
...
Example of first few cases with Scala programming language:

class CellularAutomataRule extends Rule {
 def rule(inputState:(Int, Int, Int)) : Rules.Output = 
   inputState match {
  case (1, 1, 1) => 0
  case (1, 1, 0) => 0
                case (1, 0, 1) => 0
                case (1, 0, 0) => 1
                ...                
 }
} // End of Rule



Figure: Scala Example with pattern matching





Figure: Elementary Automata Grid after several iterations, look at image from top to bottom


Cellular Automata and Squaring Application

How do you square two numbers?

With most popular programming languages you could use infix notation providing an input parameter on the left and an input parameter on the right side of some arithmetic function. With Java, you might write the following code:
int x = 4 * 4;
Output : 16

The above snippet is valid code used to multiply four times four with a result of sixteen but it does not say much about the native implementation of the multiplication operator. There are many layers involved with that particular function but they aren't visible to the developer. Is the function implemented and optimized by the compiler or implemented by the runtime environment? It is possible that the operating system may cache the result or build an implementation for the arithmetic operation. Ultimately for most basic integer multiplication or addition, those operations are performed at the hardware level. So how then does the hardware do it?

In the figure depicted below is an AND gate and truth table, the gate takes two Boolean input values and returns the output AND operation. If one is entered in input A and zero is entered into input B, then the output C returned by the AND gate is one. An arithmetic logic unit may perform basic Boolean operations or possibly some form of basic arithmetic. An ALU may consist of AND, XOR and other similar simple gates combined to ultimately perform basic arithmetic, increment, decrement or jump operations. (Most of my comments focus on older generation basic circuits, modern circuit design may not use such techniques or basic components)




Figure: Boolean AND Gate, InputA, B and Output


If you start from that basic piece of Java code 4 * 4, there are many levels of software and hardware layers that are involved to implement that operation and then return a result.

I wanted to present basic Boolean arithmetic so that you can see how basic rules can lead to more complex patterns and behavior. One two input AND gate will generate a Boolean result. Several million logic circuits may be used to build a complete CPU. You may already be familiar with the Conway's game of life, an initial grid is created with a random number of initial live cells. We can use a simple cellular automata program to square two integers use the rules described in Wolfram's A New Kind of Science. After so many iterations, a common pattern will emerge and that pattern holds the result of N * N. In our squaring example we started with the input number of enabled cells (N = 4) and after so many iterations a pattern emerged that contained the squaring of the input. In many of Wolfram's Elementary rules, a binary sequence is used for input and output. With the general CA squaring rule, an input and output number ranging from 0 to 7 are defined for each cell.

Squaring Rule



Figure: Applet Visual Output Grid for Squaring Cellular Automata

CellularAutomaton[{ 
 { 0, Blank[], 3} -> 0, 
 { Blank[], 2, 3} -> 3, 
 { 1, 1, 3 }   -> 4, 
 { Blank[], 1, 4} -> 4, 
 { Alternatives[1, 2], 3, Blank[]} -> 5, 
 { Pattern[$`p, Alternatives[0, 1]], 4, Blank[]} -> 7 - $`p,
 { 7, 2, 6} -> 3, 
 { 7, Blank[], Blank[]} -> 7,  
 { Blank[], 7, Pattern[$`p, Alternatives[1, 2]]} -> $`p,
 { Blank[], Pattern[$`p, Alternatives[5, 6]], Blank[]} -> 7 - $`p,  
 { Alternatives[5, 6], Pattern[$`p, Alternatives[1, 2]], Blank[]} -> 7 - $`p,
 { Alternatives[5, 6], 0, 0} -> 1, 
 { Blank[], Pattern[$`p, Alternatives[1, 2]], Blank[]} -> $`p,
 { Blank[],  Blank[], Blank[]} -> 0}, {
 ...
 Append[Table[1, {$CellContext`n$$}], 3], 0}, 
 Table -> Expression to N
 Append -> Table to 3



Figure: Notebook Source File For Mathematica, General CA Rule for Squaring Automaton


The general rules for the squaring automaton are similar to the rules that were mentioned for the elementary rule30 program. Integer values (range 0 - 7) are used instead of binary inputs and outputs. The initial row and initial number of cells are represented by the input parameter (N = 4 in our example).
Example Row: 0 0 0 0 0 3 3 3 3 1 0 0 0 0 

Besides the first row, the initial grid contains all zeros. On the next sequence, the CA rule for squaring is run against each cell on the second row. On the sequence after that, the CA rule is run against the third row and so on until the last row in the grid has been reached. With a 100 x 100 grid, the output pattern will emerge before row 100 is reached.
class SquaringRule extends Rules.GeneralRule {
 def ruleId() = 132
 def rule(inputState:Rules.RuleInput) : Rules.Output = 
   inputState match {
  case (0, _, 3) => 0
  case (_, 2, 3) => 3
  case (1, 1, 3) => 4
  case (_, 1, 4) => 4
  case (1 | 2, 3, _) => 5
  case (0 | 1, 4, _) => 7 - inputState._1
  case (7, 2, 6) => 3
  case (7, _, _) => 7           
  case (_, 7, 1 | 2) => inputState._3
  case (_, 5 | 6, _) => 7 - inputState._2
  case (5 | 6, 1 | 2, _) => 7 - inputState._2
  case (5 | 6, 0, 0) => 1
  case (_, 1 | 2, _) => inputState._2
  case _   => 0            
 }
} // End of Rule



Figure: Scala Source for Squaring Rule uses Pattern Matching


Applied Cellular Automata

Cellular automata is often used with data compression, cryptography, artificial intelligence, urban planning, financial market modeling, music generation, and 3D terrain generation. If you are a software engineer, you may have to step back and consider how cellular automata patterns emerge and understand the nature of the dynamic system before looking for a typical software library. CA is not normally seen in everyday applications. Consider this when you look at some random pattern, don't think of the phenomenon as a random sequence of events that cannot be replicated, think of the event in terms of a cellular automaton. Try to imagine the rules that could model that natural behavior. Modeling seemingly random patterns is an area where cellular automata is being widely used. Urban planning departments are integrating geographic information systems (GIS) with cellular automata in an attempt to predict growth in an area of a city.

Summary

The simple squaring example mentioned in this post merely gives you an overview of a basic cellular automata system. Scientists, biologists, computer scientists and software engineers want to find better ways to observe relationships and patterns that occur in our world. Review Stephen Wolfram's A New Kind of Science to give you an idea for what is possible with seemingly simple rules.

Source Code and Applet

1. doingitwrongnotebook/wiki/CelluarAutomataSquaringApplet
2. SVN source repository directory
3. CelluarAutomataApplet - Test of Elementary Rules
4. Game of Life Applet
5. Full Download Applet Examples (keywords: Scala, Rule30, Rule190, GameOfLife, Wolfram Squaring Rule)



Figure: Squaring Cellular Automaton Output, Input = 4 (top of grid), Output = 16 (pattern towards the bottom)

Resources
1. http://www.wolframscience.com/
2. http://www.scala-lang.org/ - Scala Programming Language

--- Berlin Brown (2012)

Tuesday, January 29, 2013

Internals of the OpenJDK - HashMap

Here is the implementation of HashMap from OpenJDK 6.  It is interesting how simple it truly is.  Essentially HashMap consists of an array called 'table'.  On the 'put' call, we use the hashcode of the key and then call another hash function, then convert that into an index into the array.  Place the 'value' object at the index array position.

OpenJDK HashMap Implementation

Friday, December 21, 2012

Basic word frequency analysis

Here are some interesting terms in the Democratic presidential debate from 2008:

I believe we're at a defining moment in our history. Our nation is at war; our planet is in peril....


-------------------------------

Total Count of most terms : 9125
Interesting Word Freq Count: 1952
-------------------------------
id=1 ct=112(39.16%) term=think
id=2 ct=101(35.31%) term=applause
id=3 ct=97(33.92%) term=clinton
id=4 ct=97(33.92%) term=people
id=5 ct=85(29.72%) term=senator
id=6 ct=66(23.08%) term=health
id=7 ct=62(21.68%) term=obama
id=8 ct=56(19.58%) term=care
id=9 ct=56(19.58%) term=blitzer
id=10 ct=47(16.43%) term=right
id=11 ct=44(15.38%) term=president
id=12 ct=40(13.99%) term=country
id=13 ct=35(12.24%) term=make
id=14 ct=34(11.89%) term=plan
id=15 ct=32(11.19%) term=question
id=16 ct=30(10.49%) term=believe
id=17 ct=30(10.49%) term=important
id=18 ct=28(9.79%) term=issue
id=19 ct=28(9.79%) term=take
id=20 ct=27(9.44%) term=time
id=21 ct=26(9.09%) term=years
id=22 ct=26(9.09%) term=american
id=23 ct=25(8.74%) term=first
id=24 ct=24(8.39%) term=insurance
id=25 ct=23(8.04%) term=bush
id=26 ct=23(8.04%) term=part
id=27 ct=21(7.34%) term=iraq
id=28 ct=20(6.99%) term=year
id=29 ct=20(6.99%) term=million
id=30 ct=19(6.64%) term=need
id=31 ct=19(6.64%) term=united
id=32 ct=19(6.64%) term=states
id=33 ct=18(6.29%) term=over
id=34 ct=18(6.29%) term=able
id=35 ct=17(5.94%) term=change
id=36 ct=17(5.94%) term=immigration
id=37 ct=17(5.94%) term=trying
id=38 ct=17(5.94%) term=work
id=39 ct=17(5.94%) term=clear
id=40 ct=17(5.94%) term=loo

Contrast this word frequency data with an Obama and Romney debate in 2012:

-------------------------------
Total Count of most terms : 10361
Interesting Word Freq Count: 1853
-------------------------------
id=1 ct=148(40.33%) term=romney
id=2 ct=109(29.70%) term=people
id=3 ct=106(28.88%) term=governor
id=4 ct=102(27.79%) term=president
id=5 ct=101(27.52%) term=make
id=6 ct=89(24.25%) term=obama
id=7 ct=86(23.43%) term=crowley
id=8 ct=72(19.62%) term=jobs
id=9 ct=71(19.35%) term=question
id=10 ct=66(17.98%) term=years
id=11 ct=44(11.99%) term=four
id=12 ct=43(11.72%) term=think
id=13 ct=41(11.17%) term=percent
id=14 ct=40(10.90%) term=country
id=15 ct=40(10.90%) term=energy
id=16 ct=40(10.90%) term=last
id=17 ct=35(9.54%) term=economy
id=18 ct=34(9.26%) term=down
id=19 ct=31(8.45%) term=right
id=20 ct=31(8.45%) term=america
id=21 ct=30(8.17%) term=back
id=22 ct=28(7.63%) term=women
id=23 ct=27(7.36%) term=time
id=24 ct=26(7.08%) term=need
id=25 ct=26(7.08%) term=believe
id=26 ct=26(7.08%) term=able
id=27 ct=26(7.08%) term=good
id=28 ct=26(7.08%) term=million
id=29 ct=25(6.81%) term=folks
id=30 ct=25(6.81%) term=plan
id=31 ct=24(6.54%) term=year
id=32 ct=24(6.54%) term=number
id=33 ct=24(6.54%) term=work
id=34 ct=23(6.27%) term=cant
id=35 ct=23(6.27%) term=american
id=36 ct=23(6.27%) term=done
id=37 ct=23(6.27%) term=small
id=38 ct=23(6.27%) term=place
id=39 ct=23(6.27%) term=part
id=40 ct=22(5.99%) term=over

And here is the GOP debate:



-------------------------------
Total Count of most terms : 12322
Interesting Word Freq Count: 2436
-------------------------------
id=1 ct=156(35.78%) term=king
id=2 ct=103(23.62%) term=people
id=3 ct=97(22.25%) term=right
id=4 ct=93(21.33%) term=president
id=5 ct=80(18.35%) term=question
id=6 ct=74(16.97%) term=states
id=7 ct=73(16.74%) term=government
id=8 ct=57(13.07%) term=think
id=9 ct=52(11.93%) term=need
id=10 ct=51(11.70%) term=john
id=11 ct=51(11.70%) term=governor
id=12 ct=46(10.55%) term=country
id=13 ct=46(10.55%) term=back
id=14 ct=44(10.09%) term=united
id=15 ct=43(9.86%) term=cain
id=16 ct=43(9.86%) term=take
id=17 ct=43(9.86%) term=romney
id=18 ct=42(9.63%) term=hampshire
id=19 ct=42(9.63%) term=paul
id=20 ct=42(9.63%) term=first
id=21 ct=41(9.40%) term=candidates
id=22 ct=40(9.17%) term=jobs
id=23 ct=39(8.94%) term=state
id=24 ct=39(8.94%) term=time
id=25 ct=39(8.94%) term=federal
id=26 ct=38(8.72%) term=pawlenty
id=27 ct=36(8.26%) term=down
id=28 ct=35(8.03%) term=american
id=29 ct=34(7.80%) term=believe
id=30 ct=34(7.80%) term=america
id=31 ct=34(7.80%) term=economy
id=32 ct=33(7.57%) term=years
id=33 ct=32(7.34%) term=obama
id=34 ct=32(7.34%) term=bachmann
id=35 ct=32(7.34%) term=applause
id=36 ct=32(7.34%) term=money
id=37 ct=31(7.11%) term=issue
id=38 ct=31(7.11%) term=thank
id=39 ct=30(6.88%) term=over
id=40 ct=30(6.88%) term=santorum
id=41 ct=30(6.88%) term=look
id=42 ct=29(6.65%) term=program
id=43 ct=28(6.42%) term=work
id=44 ct=26(5.96%) term=things
id=45 ct=26(5.96%) term=care
id=46 ct=25(5.73%) term=make
id=47 ct=25(5.73%) term=percent
id=48 ct=25(5.73%) term=doing
id=49 ct=24(5.50%) term=obamacare
id=50 ct=24(5.50%) term=where
id=51 ct=24(5.50%) term=administration
id=52 ct=24(5.50%) term=national
id=53 ct=24(5.50%) term=private
id=54 ct=24(5.50%) term=other
id=55 ct=23(5.28%) term=republican
id=56 ct=23(5.28%) term=break
id=57 ct=23(5.28%) term=congressman
id=58 ct=23(5.28%) term=tonight
id=59 ct=23(5.28%) term=senator
id=60 ct=23(5.28%) term=questions
id=61 ct=22(5.05%) term=gingrich
id=62 ct=22(5.05%) term=issues
id=63 ct=21(4.82%) term=medicare
id=64 ct=20(4.59%) term=problem
id=65 ct=20(4.59%) term=life
id=66 ct=20(4.59%) term=cant
id=67 ct=20(4.59%) term=wrong
id=68 ct=20(4.59%) term=continue
id=69 ct=20(4.59%) term=party
id=70 ct=20(4.59%) term=tell
id=71 ct=20(4.59%) term=done
id=72 ct=20(4.59%) term=give
id=73 ct=19(4.36%) term=answer
id=74 ct=19(4.36%) term=start
id=75 ct=19(4.36%) term=policy
id=76 ct=19(4.36%) term=congress
id=77 ct=19(4.36%) term=last
id=78 ct=19(4.36%) term=speaker
id=79 ct=18(4.13%) term=thing
id=80 ct=18(4.13%) term=plan
id=81 ct=18(4.13%) term=debate
id=82 ct=18(4.13%) term=point
id=83 ct=17(3.90%) term=shouldnt
id=84 ct=17(3.90%) term=world
id=85 ct=17(3.90%) term=could
id=86 ct=17(3.90%) term=bill
id=87 ct=17(3.90%) term=home
id=88 ct=17(3.90%) term=little
id=89 ct=16(3.67%) term=conversation
id=90 ct=16(3.67%) term=support
id=91 ct=16(3.67%) term=republicans
id=92 ct=16(3.67%) term=didnt
id=93 ct=16(3.67%) term=better
id=94 ct=16(3.67%) term=maybe
id=95 ct=16(3.67%) term=keep
id=96 ct=15(3.44%) term=made
id=97 ct=15(3.44%) term=year
id=98 ct=15(3.44%) term=again

Here are several job resumes:



-------------------------------
Total Count of most terms : 1967
Interesting Word Freq Count: 974
-------------------------------
id=1 ct=38(50.67%) term=software
id=2 ct=21(28.00%) term=linux
id=3 ct=20(26.67%) term=developed
id=4 ct=20(26.67%) term=using
id=5 ct=19(25.33%) term=data
id=6 ct=16(21.33%) term=code
id=7 ct=14(18.67%) term=experience
id=8 ct=13(17.33%) term=engineer
id=9 ct=12(16.00%) term=image
id=10 ct=12(16.00%) term=computer
id=11 ct=11(14.67%) term=java
id=12 ct=10(13.33%) term=programming
id=13 ct=10(13.33%) term=design
id=14 ct=10(13.33%) term=windows
id=15 ct=10(13.33%) term=metrics
id=16 ct=10(13.33%) term=graphics
id=17 ct=9(12.00%) term=languages
id=18 ct=9(12.00%) term=realtime
id=19 ct=9(12.00%) term=over
id=20 ct=9(12.00%) term=maintained
id=21 ct=9(12.00%) term=development
id=22 ct=8(10.67%) term=developer
id=23 ct=8(10.67%) term=used
id=24 ct=8(10.67%) term=algorithms
id=25 ct=8(10.67%) term=machine
id=26 ct=7(9.33%) term=processing
id=27 ct=7(9.33%) term=python
id=28 ct=7(9.33%) term=team
id=29 ct=7(9.33%) term=worked
id=30 ct=7(9.33%) term=helped
id=31 ct=7(9.33%) term=years
id=32 ct=7(9.33%) term=university
id=33 ct=7(9.33%) term=game
id=34 ct=7(9.33%) term=perl
id=35 ct=7(9.33%) term=google
id=36 ct=6(8.00%) term=video
id=37 ct=6(8.00%) term=project
id=38 ct=6(8.00%) term=rendering
id=39 ct=6(8.00%) term=monica
id=40 ct=6(8.00%) term=learning
id=41 ct=6(8.00%) term=senior
id=42 ct=6(8.00%) term=product
id=43 ct=6(8.00%) term=technology
id=44 ct=6(8.00%) term=santa
id=45 ct=6(8.00%) term=application
id=46 ct=6(8.00%) term=engineering
id=47 ct=6(8.00%) term=server
id=48 ct=6(8.00%) term=skills
id=49 ct=6(8.00%) term=shiraz
id=50 ct=6(8.00%) term=research
id=51 ct=5(6.67%) term=advanced
id=52 ct=5(6.67%) term=animation
id=53 ct=5(6.67%) term=applications
id=54 ct=5(6.67%) term=designed
id=55 ct=5(6.67%) term=pipeline
id=56 ct=5(6.67%) term=towards
id=57 ct=5(6.67%) term=port
id=58 ct=5(6.67%) term=optimized
id=59 ct=5(6.67%) term=networking
id=60 ct=5(6.67%) term=audacity
id=61 ct=5(6.67%) term=microsoft
id=62 ct=5(6.67%) term=parallel
id=63 ct=5(6.67%) term=audio
id=64 ct=5(6.67%) term=network
id=65 ct=5(6.67%) term=javascript
id=66 ct=5(6.67%) term=aphrodite
id=67 ct=5(6.67%) term=wrote
id=68 ct=5(6.67%) term=implemented
id=69 ct=5(6.67%) term=technical
id=70 ct=5(6.67%) term=responsible
id=71 ct=5(6.67%) term=custom
id=72 ct=5(6.67%) term=systems
id=73 ct=5(6.67%) term=other
id=74 ct=5(6.67%) term=researched

Here is some data on job descriptions:



-------------------------------
Total Count of most terms : 918
Interesting Word Freq Count: 479
-------------------------------
id=1 ct=23(92.00%) term=experience
id=2 ct=13(52.00%) term=development
id=3 ct=12(48.00%) term=software
id=4 ct=12(48.00%) term=systems
id=5 ct=10(40.00%) term=design
id=6 ct=9(36.00%) term=security
id=7 ct=8(32.00%) term=java
id=8 ct=8(32.00%) term=skills
id=9 ct=8(32.00%) term=plus
id=10 ct=7(28.00%) term=required
id=11 ct=7(28.00%) term=must
id=12 ct=6(24.00%) term=projects
id=13 ct=6(24.00%) term=computer
id=14 ct=6(24.00%) term=strong
id=15 ct=6(24.00%) term=network
id=16 ct=6(24.00%) term=work
id=17 ct=5(20.00%) term=netwitness
id=18 ct=5(20.00%) term=applications
id=19 ct=5(20.00%) term=team
id=20 ct=5(20.00%) term=requirements
id=21 ct=5(20.00%) term=spring
id=22 ct=5(20.00%) term=science
id=23 ct=5(20.00%) term=information
id=24 ct=5(20.00%) term=solutions

Tuesday, December 18, 2012