Posts

Showing posts from February, 2008

More test coverage and (rspec, scalacheck, and hunit)

I added more test coverage to botlist; basically adding tests by language. I don't have much to add to this entry except to say, here are some examples on how I used these test frameworks in botlist. Links to the actual libraries are at the bottom of the post. RSpec http://openbotlist.googlecode.com/svn/trunk/openbotlist/tests/integration/ruby/rspec # create mock tests include_class 'org.spirit.bean.impl.BotListCoreUsers' unless defined? BotListCoreUsers include_class 'org.spirit.util.BotListUniqueId' unless defined? BotListUniqueId include_class 'org.acegisecurity.providers.encoding.Md5PasswordEncoder' unless defined? Md5PasswordEncoder include_class 'org.spirit.bean.impl.BotListProfileSettings' unless defined? BotListProfileSettings include_class 'org.spirit.contract.BotListContractManager' unless defined? BotListContractManager include_class 'java.text.SimpleDateFormat' unless defined? SimpleDateFormat include_c

Adding rspecs with jruby and for the spring framework

I will have to discuss this in future blog entries. But this is a rspec helper script for setting up rspec and spring (with jruby): ### ### Author: Berlin Brown ### spec_helper.rb ### Date: 2/22/2008 ### Description: RSpec, JRuby helper for setting up ### spring rspec tests for botlist ### lib_path = File .expand_path( " #{File.dirname(__FILE__)} /../../lib" ) $LOAD_PATH .unshift lib_path unless $LOAD_PATH .include?(lib_path) require 'spec' require 'java' include_class( 'java.lang.String' ) { 'JString' } include_class( 'java.lang.System' ) { 'JSystem' } # Have to manually find the spring config files spring_config = File .expand_path( " #{File.dirname(__FILE__)} /../../../../WEB-INF/botlistings-servlet.xml" ) spring_util_config = File .expand_path( " #{File.dirname(__FILE__)} /../../../../WEB-INF/spring-botlist-util.xml" ) puts spring_config # The logs will get output here JSystem .getProperties().put

Botlist Hackathon - Adding test/build server and Lisp web frontend

It is going to be a fun weekend. I am building a simple build server/jobs that will create daily builds. Also, I plan on creating more test coverage for the botlist system. Fun, fun, fun. Botlist is built with many languages so it will be interesting to build a complete test suite. In case you are interested, languages actively being used. In order of use. (leaving out html, xml, bash and other misc items). 1. Ruby/JRuby - Main web front-end (see botspiritcompany.com). 2. Python - Misc scripting tasks, web scraping, etc. 2b. Python/Django - New web front-end. 3. Java/J2EE/SpringMVC - Part of the main web front-end. 4. Haskell - Text processing back-end. 5. Erlang - Web scraping, IRC bot. 6. Lisp - New web front-end. Other notables. A. Perl - for misc scripting tasks B. Factor - used for web test framework, I started it, but didn't get to work on it further. Still a powerful language for such tasks, probably better than some above. Updated: Actually, if you really want to

TDD in one sentence, "Only ever write code to fix a failing test"

"Only ever write code to fix a failing test" EOM

Great application

Great application for persisting key/value data. Seems to just have been released. (Memcachedb) http://code.google.com/p/memcachedb/wiki/Performance

More fun this week, Analysis for wikipedia data

Analysis for WEX: http://blog.freebase.com/?p=108 "Growing at approximately 1,700 articles a day, Wikipedia is a significant repository of human knowledge. With its focus and depth, Wikipedia has emerged as a public good of information, fueling a small industry of computer science research. And though Wikipedia contains a wealth of collective knowledge, due to is idiosyncratic markup and semi-structured design, developers wishing to utilize this resource each incur significant start-up costs simply handling, parsing and decoding the raw corpus."

Semantic web indexing

Here is a good article on semanatic web indexing. "The most surprising figure here is probably the abundance of FOAF namespaced arcs, this appears to be largely due to the FOAF data automatically generated by services such as Live Journal (http://livejournal.com/) which, of the documents indexed so far account for 89% of the documents using the FOAF namespace." FOAF Semantic indexing from w3

Makings of a simple web scraper in Erlang

This code parse a web page and tokenizes the content. The code uses Joe's www_tools library and I was trying to get the rfc4627 code to parse unicode documents. That particular code is a work in progress. Ultimately, I would like to be able to use this code to crawl FOAF documents. Simple Driver Code (uses url.erl and disk_cache). %% %% Simple Statistic Analysis of social networking sites %% Author: Berlin Brown %% Date: 2/12/2008 %% -module (socialstats). -export ([start_social/0]). -import (url, [test/0, raw_get_url/2, start_cache/1, stop_cache/0]). -import (rfc4627, [unicode_decode/1]). -import (html_analyze, [disk_cache_analyze/1]). -define ( SocialURL , "http://botnode.com/" ). start_social () -> io:format( "*** Running social statistics~n" ), %% First, setup the URL disk cache url:start_cache( "db_cache/socialstats.dc" ), case url:raw_get_url(? SocialURL , 60000) of {ok, Data } -> io:format( "Dat

Botnode wiki re-rereleased (language wiki site)

I have been wanting to create a multi-language programming wiki for a while now. This is it. Basically the botnode wiki will contain a collection of code snippets in various languages (haskell, erlang, scala, etc). http://www.botnode.com/botwiki/index.php?title=Main_Page

Apparently the junglerl www_tools has issues

I am guessing that the www_tools erlang library doesn't support a valid HTTP request. Because I can't even get a valid response from a simple lighttpd based page. Sigh, I guess I have to fix it. In any case, here is the code I am testing. -module(socialstats). -export([start_social/0]). -import(url, [test/0, raw_get_url/2]). start_social() -> io:format("*** Running social statistics~n"), case url:raw_get_url("http://botnode.com", 80) of {ok, Data} -> io:format("Data: ~p ~n", [Data]), {ok, Data}; {error, What} -> io:format("ERR:~p ~n", [What]), {error, What} end, io:format("*** Done [!]~n"). %% End of File

Wikipedia Definition: Erlang programming language

http://en.wikipedia.org/wiki/Erlang_(programming_language) "Erlang is a general-purpose concurrent programming language and runtime system. The sequential subset of Erlang is a functional language, with strict evaluation, single assignment, and dynamic typing. For concurrency it follows the Actor model. It was designed by Ericsson to support distributed, fault-tolerant, soft-real-time, non-stop applications. It supports hot swapping so code can be changed without stopping a system. [1] Erlang was originally a proprietary language within Ericsson, but was released as open source in 1998. The Ericsson implementation primarily runs interpreted virtual machine code, but it also includes a native code compiler (not supported on all platforms), developed by the High-Performance Erlang Project (HiPE) at Uppsala University. It also now supports interpretation via escript as of r11b-4."

Paul Graham and Design

"Here it is: I like to find (a) simple solutions (b) to overlooked problems (c) that actually need to be solved, and (d) deliver them as informally as possible, (e) starting with a very crude version 1, then (f) iterating rapidly." So true, so true.

Blogspam - SQLite Article on Atomic transactions

http://www.sqlite.org/atomiccommit.html "An important feature of transactional databases like SQLite is "atomic commit". Atomic commit means that either all database changes within a single transaction occur or none of them occur. With atomic commit, it is as if many different writes to different sections of the database file occur instantaneously and simultaneously. Real hardware serializes writes to mass storage, and writing a single sector takes a finite amount of time. So it is impossible to truly write many different sectors of a database file simultaneously and/or instantaneously. But the atomic commit logic within SQLite makes it appear as if the changes for a transaction are all written instantaneously and simultaneously."

Find the truth

Because there is no other path.

Broken Saints Series Review

I posted this to Amazon; a review of the Broken Saints series. I don't even know what to write. I just finished watching the entire thing and am going; awesome, awesome, awesome, awesome. Amazing. If you can dream up the perfect story that combines young, old, technology, religion, good, bad and put it together; Broken Saints will be 1000 times better than anything you could come up with. It is part Cyberpunk, part religious tale, part storytelling. Truly, truly amazed. In terms of Anime or other things that are considered different or strange; Broken Saints is better than Akira, probably better than some of the Ghost in the Shell series. It doesn't really compare to any hollywood stories, but it beats the story of Lord of the Rings. Good job. I was lucky that was able to experience this. Anybody who gives this a bad review. They probably didn't watch most it, have really low IQ or flat out crazy. You can easily ignore the bad ratings, I almost listened to them and missed o

Joy compared with other functional programming languages

http://www.latrobe.edu.au/philosophy/phimvt/joy/j08cnt.html "Joy is a functional programming language which is not based on the application of functions to arguments but on the composition of functions. This paper compares and contrasts Joy with the theoretical basis of other functional formalisms and the programming languages based on them. One group comprises the lambda calculus and the programming languages Lisp, ML and Miranda. Another comprises combinatory logic and the language FP by Backus. A third comprises Cartesian closed categories. The paper concludes that Joy is significantly different from any of these formalisms and programming languages."

ANN: Major Release: Botlist 0.5 Valentine Release, would you like some cake?

This is a big release; it won't be visible on the web frontend, but botlist is morphing into the creation that envisioned. Here is where we are and where we are going: (1) Find information from RSS feeds (almost complete, but functional) (2) Find interesting articles from raw online content (getting there, part of Valentine release) (3) Extract content from the web and convert the raw information into machine readable format, semantic web (future of botlist) And don't forget to visit botlist to see the new updates. The spirits of the bots are alive. http://www.botspiritcompany.com

Look out for: Reuters, semantic web and Calais

http://opencalais.com/ "What is Calais? We want to make all the world's content more accessible, interoperable and valuable. Some call it Web 2.0, Web 3.0, the semantic web or the Giant Global Graph - we call our piece of it Calais. The core of Calais is our web service. We're working to make this service more accessible by developing sample applications, supporting developers and offering bounties for specific capabilities. For more information - please visit our FAQ." I just heard about this link through reddit; this is the kind of system that botlist could be.

Python script; check running process

If you launch a long running processing, sometimes you don't want to relaunch the script while the other process is still running. There are bash oriented ways of checking for this, but I wanted to make these complicated and use a more robust language. Here is a script to check a PID file, check and grep the 'ps aux' for a particular name and return 0 exit code if the process is not running. """ Berlin Brown Date: 2/2/2008 Copyright: Public Domain Utility for checking if process is running. Versions: Should work with python 2.4+ Use case includes: * If PID file found, read the contents * If PID file found or not found, also check the 'ps aux' status of the script to make sure that the script is not running. Additional FAQ: * What if the PID file gets created but does not get removed? + In this scenario, we need to issue a 'force' command. But also, check the running process with the 'ps aux' command. Script/

My FOAF profile at livejournal

Here is my FOAF/RDF profile from livejournal. I would also like to scan foaf repositories with botlist. That is a future enhancement. http://berlinbrown.livejournal.com/data/foaf

Why I only post code snippets without much explanation?

First reason is that I eat, drink and breathe code. Code of many different paradigms, idioms? When I try to explain a topic, I just can't help but throwing the code out there. You may not even know it, but I a selective about the code that I throw at you. For those expecting a detailed analysis of the examples, you won't find that here. I hope that code snippets are useful; I post them because I can't find similar examples out on the web. Most of them are practical, procedural examples as opposed to looking at the aspects of the language. For example, I posted an entry on XML processing in Scala. It introduced a couple of concepts that you may not have seen elsewhere; working with existing java code, simple code from a model class to XML, simple liftweb responses. Enjoy.

Scala and Lift snippet: taste of XML with Scala and Lift for simple XML over HTTP RPC protocol

The botlist application is a distributed system. Bots/Agents run in the background on some remote machine and send payloads to a web front end server. In this case, a J2EE server (botlist). Here is some of the code that makes that happen. On the receiving end; a liftweb based application running on Tomcat: The method remote_agent is associated with the remote_agent URI. It returns a XML response when a GET request is encountered. The remote_agent_send function is used to process POST requests from the stand-alone client. import java.util.Random import org.springframework.context.{ApplicationContext => AC} import org.spirit.dao.impl.{BotListUserVisitLogDAOImpl => LogDAO} import org.spirit.dao.impl.{BotListSessionRequestLogDAOImpl => SessDAO} import org.spirit.bean.impl.{BotListUserVisitLog => Log} import org.spirit.bean.impl.{BotListSessionRequestLog => Sess} import net.liftweb.http._ import net.liftweb.http.S._ import net.liftweb.http.S import scala.xml.{NodeSeq, Te

Web page DNA

I am developing a type of web page DNA. If you look at any webpage. What are some of its characteristics? Things that might stick out. A human can easily tell if a page is interesting a not. But how would a bot do it? 1. For example, botlist may extract the following information from a page: linktype: () views: 23 links: 4 images: 6 para: 7 chars: 8 proctime: 10 objid:123sdfsdf 2. Some other interesting things might include last-modified date or host name for example. 3. Keywords and description are always important.

Botlist the only medium sized web technology where one programming language was not enough

Here are the following programming language technologies that are used with botlist. If you are interested in the source. It is all freely available. http://code.google.com/p/openbotlist/ http://www.botspiritcompany.com/botlist/ Web Front End: Java - bean classes/some view logic (pojos used with hibernate) JRuby - business logic, database connectivity Spring Framework - J2EE framework Hibernate - ORM framework Scala/Lift - business logic, XML-HTTP api (Future additions): Python Django - additional web front end Lisp web server - additional web front end Back End: Python - web crawling Haskell - Text mining analysis Scala - Remote APIs

Said what I have been thinking; session state is evil

I am surprised I missed this on the blogosphere, David tells the truth. Session state is evil. If you have worked with low-level HTTP applications, really getting at HTTP then you know this is true. If you write basic ASP pages and save variables, you might not have to deal with this issue. http://davidvancouvering.blogspot.com/2007/09/session-state-is-evil.html The basics of saving variables on the server side is an easy thing to do. But once you start to get millions of users and then relying on the application server to maintain state for each user, it gets tricky. And trusting the application server may not be the best idea. I will let you read the article and let you decide.

Accessing the spring framework from LiftWeb

One of the benefits (if you can figure it out) of working with the JVM languages is the ability to integrate technologies. One of the problems is how to do so. The botlist web application is built on JRuby and Spring. I am now building future functionality with Scala/Lift and Spring. If you have worked with spring; the spring ApplicationContext contains a link between the servlet world to the spring world. In this lift example, I extract the application context through the http servlet request and the session instance. def getAC(request: HttpServletRequest) = { val sess = request.getSession val sc = sess.getServletContext // Cast to the application context val acobj = sc.getAttribute("org.springframework.web.servlet.FrameworkServlet.CONTEXT.botlistings") acobj.asInstanceOf[AC] } After getting the application context, it is faily straight-forward to access the spring bean objects. // Cast to the user visit log bean (defined in the spring configuration) val log_obj

Person month calculations on an opensource project

I was browsing the web and came upon the botlist project on koders.com. Koders.com archives the source of various projects and adds the source to their search engine. It also collects interesting project statistics on a particular project. Here are the botlist numbers: Development Cost $135,685 Lines of code: 27,137 Person months (PM): 27.14 Labor Cost/Month: $5000 Here is a larger project (jboss) Development Cost LOC: 1,695,805 $8,479,025 Assumptions Lines of code: 1,695,805 Person months (PM): 1695.81 Labor Cost/Month: $5000 Here is a question; what does it take to develop a useful opensource (or possibly commercial) project. I am going to use some arbitrary numbers for the sake of argument. And yes, number of lines of code is a bad metric to use, but there is a big difference between 10 lines of code, 100,000 lines of code and a million lines of code. I want to create a project which will end up with 200,000 lines of code. It is a generic widget server. Developed in java,

ANN: Setup guide for botlist web application front-end

http://code.google.com/p/openbotlist/wiki/QuickStart The botlist J2EE web frontend might be considered a medium sized web application. Make sure that you have a J2EE servlet container. Tomcat 5.5+ is recommended but not required. MySQL database server is required (expect a Postgres configuration in the future). The java build tool Ant is also required for building the project. Test Environment and Recommended Configuration * Mysql Ver 14.12 Distrib 5.0.51a, for Win32 (ia32) (db server) * Ant 1.7.0 (java build tool) * Tomcat 5.5.26 (application server) * Java SDK java version "1.5.0_11" (java compiler), 1.6 recommended * Operating systems: WinXP and Ubuntu Linux 7.10 * All other libraries are provided in the subversion source or download Check out source from subversion As of 2/2/2008 Checking out the botlist source is the recommended way to get build and run the application. In the future, regular releases and snapshots will be available, for now you s

Ping the semantic web, datasets

Interesting, look at all of the RDF datasets that are out there: http://pingthesemanticweb.com/stats/namespaces.php http://xmlns.com/foaf/0.1/ 900, 799 http://blogs.yandex.ru/schema/foaf/ 581, 133 http://www.w3.org/2003/01/geo/wgs84_pos# 145, 758 http://rdfs.org/sioc/ns# 80, 097 http://rdfs.org/sioc/types#

Business Model: The web is junk, why you could do hosted semantic web services

Web is junk One thing that is always amazed me, how does Google make all of that money through web advertising? Let me ask that question another way. Why do people pay Google such much for web advertising? Recently, I heard a Google talk from a head Google advertising consultant on the billions of dollars in advertising revenue. It was a good presentation on the relationships between search terms and text based advertising links. But there was one question that I didn't find an answer to. Does web advertising work? Do clicks turn into a real return on investment? I have been on the web since 1997, I haven't ever intentionally clicked on an advertising link. Maybe once, twice; but after spending hours a day for 10 something years, I can honestly say don't have any interest in clicking on Google's or any other search engine's advertising. From Google's perspective, it really doesn't matter. If they receive $100 a month from a customer, Google has al

Haskell-snippet: Split with regex

The first listing shows perl code for splitting a string with a delimiter "::|". Listing 2.3.2008.1: #!/usr/bin/perl # Simple example, show regex split usage print "Running\n"; $string = "file:///home/baby ::| test1::| test2"; $string2 = "file:///home/baby , test1, test2"; my @data = split /\s*::\|\s*/, $string; print "----\n "; print join(" ", @data); print " \n----\n"; print "Done\n"; The second listing below shows a haskell regex approach for performing the same operation: Listing 2.3.2008.2: import Text.Regex (splitRegex, mkRegex) csv string = "abc ::| 123 ::|" let csv_lst = splitRegex (mkRegex "\\s*(::\\|)+\\s*") csv linkUrlField = (csv_lst !! 0) ... End of code snippet.

Haskell snippet; CRUD operations with haskell hsql and hsql-sqlite3

The source listing below is not complicated, showing a basic create, read unit test (minus the update/delete) against a simple sqlite3 database. You may have some trouble setting up hsql, especially because it seems that the module is not being maintained. The code is still useful and viable, but for now, you are going to have issues building the module. The build failure will probably be resolved pretty soon as I see updates to that particular code. Ensure that you are running the latest ghc. Tested with ghc 6.8.2 Download hsql-1.7 (or greater) http://hackage.haskell.org/packages/archive/hsql/1.7/hsql-1.7.tar.gz Change the hsql.cabal to what is shown in the listing; Rank2Types, DeriveDataTypeable extensions were added. This will not work with previous versions of ghc (at least I had to build it on 6.8.2). name: hsql version: 1.7 license: BSD3 author: Krasimir Angelov category: Database description: Simple library for database access from Haskell. exposed-modules: Database.HSQL

Haskell HSQL/SQLite with ghc 6.8 setup is ...a little...messed up?

I am doing my research on the intertubes and it looks like database access with ghc/haskell is not that high on the list of priorities. I am sure, somewhere in the that haskell source is working code. It is just a matter of getting it to work with the most recent stuff; like Cabal 1.2+, GHC6.8+, and gasp Sqlite. I didn't even think about messing with postgres/sqlite. This the difference between opensource and commercial software. Not that there are sometimes issues with the code. It is an issue of doing the right thing and/or making people happy. Most opensource projects do it the right way? Huh? Ideally, you don't want to mix the base haskell system with the database drivers. GHC has done just that. E.g. Java may include a whole mess of garbage that you don't normally need. In theory, it doesn't make sense. But, if you are a lazy developer, sometimes being able to just run your database code is a lot easier even though, theoretically, the base compiler s