Web page DNA

I am developing a type of web page DNA. If you look at any webpage. What are some of its characteristics? Things that might stick out. A human can easily tell if a page is interesting a not. But how would a bot do it?

1. For example, botlist may extract the following information from a page:

linktype: () views: 23 links: 4 images: 6 para: 7 chars: 8 proctime: 10 objid:123sdfsdf

2. Some other interesting things might include last-modified date or host name for example.

3. Keywords and description are always important.

Comments

Popular posts from this blog

Is Java the new COBOL? Yes. What does that mean, exactly? (Part 1)

On Unit Testing, Java TDD for developers to write

JVM Notebook: Basic Clojure, Java and JVM Language performance