Web page DNA

February 08, 2008

I am developing a type of web page DNA. If you look at any webpage. What are some of its characteristics? Things that might stick out. A human can easily tell if a page is interesting a not. But how would a bot do it?

1. For example, botlist may extract the following information from a page:

linktype: () views: 23 links: 4 images: 6 para: 7 chars: 8 proctime: 10 objid:123sdfsdf

2. Some other interesting things might include last-modified date or host name for example.

3. Keywords and description are always important.

Search This Blog

Berlin Brown and Software Development

Web page DNA

Comments

Popular posts from this blog

JVM Notebook: Basic Clojure, Java and JVM Language performance

On Unit Testing, Java TDD for developers to write

Application server performance testing, includes Django, ErlyWeb, Rails and others