Friday, February 8, 2008

Web page DNA

I am developing a type of web page DNA. If you look at any webpage. What are some of its characteristics? Things that might stick out. A human can easily tell if a page is interesting a not. But how would a bot do it?

1. For example, botlist may extract the following information from a page:

linktype: () views: 23 links: 4 images: 6 para: 7 chars: 8 proctime: 10 objid:123sdfsdf

2. Some other interesting things might include last-modified date or host name for example.

3. Keywords and description are always important.

