Business Model: The web is junk, why you could do hosted semantic web services
Web is junk
One thing that is always amazed me, how does Google make all of that money through web advertising? Let me ask that question another way. Why do people pay Google such much for web advertising? Recently, I heard a Google talk from a head Google advertising consultant on the billions of dollars in advertising revenue. It was a good presentation on the relationships between search terms and text based advertising links. But there was one question that I didn't find an answer to. Does web advertising work? Do clicks turn into a real return on investment? I have been on the web since 1997, I haven't ever intentionally clicked on an advertising link. Maybe once, twice; but after spending hours a day for 10 something years, I can honestly say don't have any interest in clicking on Google's or any other search engine's advertising. From Google's perspective, it really doesn't matter. If they receive $100 a month from a customer, Google has already complete their transaction. Google may place that particular ad at the right place, at the right time. Who knows? In print media, advertising fits. There is a picture caption of a product, possibly with price and contact information. Related products are positioned next to each other. Works for print, not so much for web media. In any case, Google is one of the hottest technology companies in the history of the world. One of their revenue sources is through web advertising. They make a lot of money, I don't.
The web is junk/noise. By and large, their isn't a whole lot of relevant, organized information on the web. Wikipedia, Reddit, Digg are useful sites with relevant, dense amount of information and there are a dozen, hundred, hundreds of sites with relevant bits of information. The other millions and millions of sites are mostly filled with junk. The text mining phrase is noise and even beyond noise is spam. There is a lot of noise out there. Take wikipedia, which is a valuable source of information. Wikipedia is great, but they could have gone further and used RDF metadata to organize the information. Some research projects are manually and using automated approaches to convert Wikipedia data into RDF dumps and OWL Ontologies (see semantic web). It is unfortunate that the the major players haven't pushed for these WWW extensions. Imagine that you are interested in parsing a web document to extract valuable information. It is doable but not straight forward. How would you extract the creation date? The key proper nouns? Topic information? HTML, TABLE, SPAN, DIV HTML tags provide the layout structure for the browser to render but doesn't describe what the document is about or if relevant information is available. If you are familiar with RSS, early version RDF Site Summary provided a format for describing the when a page is added to a blog post with title, date, and description information:
Host the data
Imagine setting up a data hosting service. Host various data formats like RDF. Give users at least 10gigs. Charge a light fee such as $10-20 a month. This is where it gets a little complicated. As opposed to throwing HTML at the users. You could host their RDF and then output visual data tools. Graphs charts, simple web interfaces. Kind of like, geocities used to be a free, shared web hosting service. Now, you are doing RDF hosting service.
One thing that is always amazed me, how does Google make all of that money through web advertising? Let me ask that question another way. Why do people pay Google such much for web advertising? Recently, I heard a Google talk from a head Google advertising consultant on the billions of dollars in advertising revenue. It was a good presentation on the relationships between search terms and text based advertising links. But there was one question that I didn't find an answer to. Does web advertising work? Do clicks turn into a real return on investment? I have been on the web since 1997, I haven't ever intentionally clicked on an advertising link. Maybe once, twice; but after spending hours a day for 10 something years, I can honestly say don't have any interest in clicking on Google's or any other search engine's advertising. From Google's perspective, it really doesn't matter. If they receive $100 a month from a customer, Google has already complete their transaction. Google may place that particular ad at the right place, at the right time. Who knows? In print media, advertising fits. There is a picture caption of a product, possibly with price and contact information. Related products are positioned next to each other. Works for print, not so much for web media. In any case, Google is one of the hottest technology companies in the history of the world. One of their revenue sources is through web advertising. They make a lot of money, I don't.
The web is junk/noise. By and large, their isn't a whole lot of relevant, organized information on the web. Wikipedia, Reddit, Digg are useful sites with relevant, dense amount of information and there are a dozen, hundred, hundreds of sites with relevant bits of information. The other millions and millions of sites are mostly filled with junk. The text mining phrase is noise and even beyond noise is spam. There is a lot of noise out there. Take wikipedia, which is a valuable source of information. Wikipedia is great, but they could have gone further and used RDF metadata to organize the information. Some research projects are manually and using automated approaches to convert Wikipedia data into RDF dumps and OWL Ontologies (see semantic web). It is unfortunate that the the major players haven't pushed for these WWW extensions. Imagine that you are interested in parsing a web document to extract valuable information. It is doable but not straight forward. How would you extract the creation date? The key proper nouns? Topic information? HTML, TABLE, SPAN, DIV HTML tags provide the layout structure for the browser to render but doesn't describe what the document is about or if relevant information is available. If you are familiar with RSS, early version RDF Site Summary provided a format for describing the when a page is added to a blog post with title, date, and description information:
<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://purl.org/rss/1.0/">
<channel rdf:about="http://www.xml.com/xml/news.rss">
<description>
XML.com features a rich mix of information and services
for the XML community.
</description>
<image rdf:resource="http://xml.com/universal/images/xml_tiny.gif" />
<items>
<rdf:Seq>
<rdf:li rdf:resource="http://xml.com/pub/2000/08/09/xslt/xslt.html" />
<rdf:li rdf:resource="http://xml.com/pub/2000/08/09/rdfdb/index.html" />
</rdf:Seq>
</items>
<textinput rdf:resource="http://search.xml.com" />
</channel>
<image rdf:about="http://xml.com/universal/images/xml_tiny.gif">
</image>
<item rdf:about="http://xml.com/pub/2000/08/09/xslt/xslt.html">
<description>
Processing document inclusions with general XML tools can be
problematic. This article proposes a way of preserving inclusion
information through SAX-based processing.
</description>
</item>
<item rdf:about="http://xml.com/pub/2000/08/09/rdfdb/index.html">
<description>
Tool and API support for the Resource Description Framework
is slowly coming of age. Edd Dumbill takes a look at RDFDB,
one of the most exciting new RDF toolkits.
</description>
</item>
<textinput rdf:about="http://search.xml.com">
<description>Search XML.com's XML collection</description>
<name>s</name>
</textinput>
</rdf:RDF>
Host the data
Imagine setting up a data hosting service. Host various data formats like RDF. Give users at least 10gigs. Charge a light fee such as $10-20 a month. This is where it gets a little complicated. As opposed to throwing HTML at the users. You could host their RDF and then output visual data tools. Graphs charts, simple web interfaces. Kind of like, geocities used to be a free, shared web hosting service. Now, you are doing RDF hosting service.
Comments