I am developing a type of web page DNA. If you look at any webpage. What are some of its characteristics? Things that might stick out. A human can easily tell if a page is interesting a not. But how would a bot do it?
1. For example, botlist may extract the following information from a page:
linktype: () views: 23 links: 4 images: 6 para: 7 chars: 8 proctime: 10 objid:123sdfsdf
2. Some other interesting things might include last-modified date or host name for example.
3. Keywords and description are always important.