Saturday, February 28, 2009

Wikipedia corpus

While there have been a lot of speculation around Web 2.0 and Semantic Web manifestations of wikipedia like nature, Wikipedia itself is a great corpus to do information retrieval and run semantic experiments on (just saying the obvious).

Never the less this is well understood by such startups as

http://www.powerset.com/ a tool with fancy UI design to do NL "search"

and

http://www.freebase.com/ one of the promising social networks for "I like" information. Not really another facebook or digg.

Both companies do a lot of retrieving for further compilations just like DJs like to play with Beethoven's pieces.

Saturday, February 21, 2009

Real entities

Today I have wasted several hours for the concept outline of my new open source startup. The idea behind is so simple, based on all well known components. But as a join vision I think that is a new simple way of designing quick light-weight database applications. Now the most challenging task is, of course, not the idea of atomic hashes itself, which is trivial, but the implementation of the query language for the project. Once you have a version control like svn content management for data maintain, realentities library is a solution worth to consider.