Saturday, November 26, 2005

"Semantic Web on its way into practice"

Germany’s most visited IT-site has an article about the Semantic Web on its way into practice. The article itself isn’t that interesting for people working in this area, but there is one interesting comment: In it a reader complains that the Semantic Web people are always claiming the Gene Ontology as an application of Semantic Web technology, even though it’s not even using OWL light. The writer seems to be quite knowledgeable about the Gene Ontology and hence this makes for an interesting read. His argument about the Gene Ontology having nothing to do with Semantic Technologies, however, has little merit: the Gene Ontology is an explicit, computer understandable conceptualization of a domain – that’s what we’re talking about all the time. Doesn’t matter if its in OWL or something simpler.

If your interested in the application of Semantic Web to life sciences – take a look at the current HCLSIG homepage – it really improved since the last time I wrote about it and contains useful links & resources.

P.S.: I'm very busy until Thursday (with both a software delivery deadline and a conference I'll be attending). So don't expect to many posts until then ... Unless of course, the conference is totally boring, the software has less bugs than expected and the weather in Vienna is too bad for sightseeing ... (two of these conditions actually have a chance of happening ..).

Tags: , ,

Wednesday, November 23, 2005

Some AI Related Links ..

An article about "Robotic Lawers" (alright, expert systems for law).

An humor ontology? (thanks Mark)

And Max Voelkel from AIFB now has his own Blog. His work is mostly in the area of Semantic Desktop and Semantic Wiki.

Tuesday, November 22, 2005

Semantic (Web) Base

Assuming that Google gets around to offer decent API's to build applications on top of Google Base, the "Google Base Web" looks like this:

On the buttom are sites that add data to Google Base (using bulk upload - not displayed are people that are entering data by hand). On the top are applications that access all this data through the Google Base service - much like an internet scale data warehouse. The content model of this system is a simple entity-attribute model.

It should be obvious by now that you could build a similar designed system that uses a different content model - like relations or RDF - indeed, people that believe in the superiority of RDF should be starting to build such a system right about now.

But make no mistake about it - building such a "RDF base" / "Semantic Base" / "Semantic Web Base" is no simple matter, especially if it should work for internet scale data sizes.
And the difficulties in building such a system aren't only of technical nature - some conceptual questions need to be answered first: Do I really want to treat all the data from all sources as one giant graph / knowledge base? (this may work for RDF, but I don't think you would want to do this for RDF(S)) Do I want to treat each source as a seperate graph? How do I discover the correct graphs? ...

Links & Tags: Google Base Announcement, , ,

Monday, November 21, 2005

More Pictures ...

More Google Stuff ...

Cringely's Google Mart about the future of google is a must read.

And Google now has a free web statistics tool. I subscribed on the first day, but they have been swamped with registrations and have been unabled to keep up with demand - so I'm still waiting for my first report. I liked the advanced features of the Analytics tool (like the possibility to exclude vistors from my office's domain) but I'm not yet sure if I'll leave sitemeter for good. The fact that visitors to this site show up my sitemeter stats immediately is one feature that I like and that seems to be missing in Google's service. I also got the impression that Google Analytics can only detect vistors that have JavaScript enabled (unlike sitemeter: they have more information for people that have JS enabled, but at least include the others).

More Great Tools ..

If you happen to work with eclipse, you too will love these tools:

QuickREx Never before has such a small tool, that needed so little time to learn saved me so much time ;-). But serious - a little tool that really speeds up my work with regular expressions in Java.

XMLBuddy a free XMLPlugin for Eclipse that works with Eclipse 3.0.

Friday, November 18, 2005

The World in 2006

At a time of the year when most newspapers are busy writing their retrospective accounts of the past year, the Economist publishes an outlook of what it expects to happen in the next year - always fun to read (available online but you have to pay for most articles).
From the editorial:

In 2006 records will be broken and landmarks reached in diverse domains. For the first time, Homo Sapiens will be more urban than rural. For the first time, too, the Royal Shakespeare Company will perform the Bard's complete works in a single season. Singapore Airlines will fly the new A380 superjumbo, the world's biggest passanger jet; in Japan, the world's biggest bank will be born; the highest railway will open in China. The largest-ever global television audience will watch football's World Cup final in Berlin [...]

They have a few technology related articles; on the business side of things, guest author Nikklas Zenström (Skype) expects "connectivity" to be the trend of 2006. Another article talks about computer gaming and predicts a greater diversity of controller, games tailored to more casual gamers, a lot of games based on film franchises and more marketing to lure more people to start gaming. There's also an article on Blogs that predicts the rise of brands in the blogospher - a small number of blogs with a large impact. These brands will give at least a resembling of order and accountability to the blogosphere. The author expects traditional media to be challenged by blogs and sees the need for them to find "creative ways of drawing on the proliferating sources of content and channels of distribution". On the science pages they predict a rise of interest in "Mirror Neurons" and "Spintronics" (the usage of the electrons spin in addition to the its charge in computing). There is also an article on Negropontes "One Laptop per Child" association.

Overall the moot of the predictions is less positive than in the past years - the economist sees quite a few challenges for the next year.

Thursday, November 17, 2005

What Google Base lacks (and what you can do about it)

So Google Base offers a way to upload structured data (even in large numbers as bulk upload) and a way to search it from your browser. Whats really missing is a way for other people (i.e. everyone but google) to use the structured content. As I have argued before I believe that offering such a possibility is not in the interest of Google and hence I don't think they'll offer one - or that they will restrict it (for example by limiting the number of elements that can be accessed per day).

So, what can we do about it? Luckily the solution is very simple: post the bulk files that you upload to Google Base on you website and let others know about it; and while we're at it: do the same for the files that you upload to Google's sitemap (alright, they are already on your website, but only Google knows where) and Froogle.
How can other people know where to find these files? That's easy: if you use RSS and ATOM there are already established methods: the nice buttons and more importantly the "alternate" link in the header:

<link rel="alternate" type="application/rss+xml" 
title="RDF-File" href="http://website/rss.rdf">
For all the other files we need an agreement which types to write there - but from a technical point of view thats trivial.

What do we gain by doing this? We level the playing field, we make it easier for people to use the structured data, to build applications on top of it and to challenge Google. Not that I don't like Google, but should Google ever get so much proprietory data that it would get difficult to challenge them, even they will rest and slow down the pace of innovation. I fear the proprietory data could be the next Windows.

Update: Danny Ayers has the same idea, although he puts it less negatively.

Tags: ,

Wednesday, November 16, 2005

Rich Web Clients ..

Since AJAX came along there has been renewed interested in building rich clients for web applications. Even the W3C recently launched the Rich Web Client Activity.
From a technical point of view I could never quite share the prevalent enthusiasm about AJAX (come one, we can do all this with Java Applets/WebStart for a decade now) but recently I came accross a couple of great tools/developments for Rich Web Clients (no, not AJAX):

One is Macromedia's Flex(Wikipedia), a platform and application server that abstracts away a lot of the details of building rich web clients (and the "back" button works :-) ). There's a free trial and a free developer version but it is expensive to deploy. A similar approach is taken by the open source platform Laszlo. Microsoft is appearently also working on a similar system called Sparkle. Here again the focus is to make it easier for designers and application developers to work together.

Tags: ,

Google Base Coverage ..

Monday, November 14, 2005

Two Semantic Search Engines (well ... maybe)

Introduced with the great claim to be "as easy as Google but much more efficient" Con Weaver is a Semantic Search Engine by Frauenhofer. But well, at this link we have a website that doesn't even work with Opera, lots of the usual great talk and no demo to be seen.

MKSearch is

a research project to develop a metadata search engine. The system is composed of two linked systems; an indexing Web crawler and a public query interface. The indexing component extracts Dublin Core metadata from Web documents and stores them in RDF format. The query interface matches documents in the index using an RDF query language and can return the results in a variety of formats including standard HTML and as a standing RSS feed.
Its (very) open source and you can download and install it on your system (Tomcat); I for now lacked the time to do so. Sadly they do not offer an installed version that you could try before spending the time to install and configure the system yourself.

Tags: ,

Sunday, November 13, 2005

World Brain by H.G. Wells

HG Wells wrote in his 1937 (!) piece World Brain: The Idea of a Permanent World Encyclopaedia[complete text is at the link]:

There is no practical obstacle whatever now to the creation of an efficient index to all human knowledge, ideas and achievements, to the creation, that is, of a complete planetary memory for all mankind.
The whole human memory can be, and probably in a short time will be, made accessible to every individual.
It need not be concentrated in any one single place. It need not be vulnerable as a human head or a human heart is vulnerable. It can be reproduced exactly and fully, in Peru, China, Iceland, Central Africa, or wherever else seems to afford an insurance against danger and interruption.
Unbelievable how prescient some people are ... But still, he forgot about the "copyright holders" :-)...

Tags: ,

Semantic Web and AI Information Sources

Today I took some time to clean my bloglines account and add a few new sources - these may be of interest to some readers, so here they are:

Nova Spivacks: Minding the Planet. Nova Spivack works on a very secret Semantic Web startup that gets funded by Vulcan (yes, that counts for something) and writes interesting things about AI, Semantic web and the like. Update: Just saw that I should link to this article to let him know that I'm reading his blog.

Mark Watson "opinions on Java, Ruby, AI, semantic web, and politics". I'm not a great fan of people who think that, if I'm interested in their oppinions on CS topics I should be interested in their political take on things - but some of his links (especially the Java related stuff) look interesting.

BuzzHit Blog, from the makers of healthline.com. Since I liked the system I thought it may be interesting to keep track what they are doing next. Very low intensity blog.

Semantic Bits a blog by Ina O'Murchu, a Semantic Web researcher at DERI Gallway. She writes about Semantic Web research and, well, everything - especially IPods.

Jonathan Schwartz's Blog. He's chief operating officier at sun - and since Sun has been and still is one of the most innovative cs companies this is a must read; although it seems you have to take his statements with a grain of salt - especially those about IBM - he's a bit too positive about Sun (I guess you have to be as COO writing public articles).

Kurzwail AI: Accelerating Intelligent news .. Do I need to say more?

ISTResults: I couldn't possibly say it half as nice as they do:

The IST Results service gives you online news and analysis on the emerging results from Information Society Technologies research. The service reports on prototype products and services ready for commercialisation as well as work in progress and interim results with significant potential for exploitation.

Alright, thats enough for now, need to do some other things as well. More some other day.

Friday, November 11, 2005

Great Tools and an ISWC Paper

Sun is giving away the Java Studio Creator 2004Q2 and the Sun Java Studio Enterprise 8 for free! (seems to be a limited time offer)

There is a new version of Evernote, a nice note taking tool (with a free version). I have been using it for a few month and can only recommend it.

I few days ago I found XStream, a tool that serializes Java to XML and recreates object from xml. Really a great tool, very easy to integrate; an extremly fast way to create programs that read/write xml configuration files. The XML that gets created is very readable. It comes with a BSD style license and hence can be used almost everywhere. I was surprised to see that some of its concepts are very similar to the serialization framework that I wrote for F-logic. Obviously its a lot more mature, but also doesn't face the same challenges.

And we here are all very proud that our colleague Boris Motik won one of two best paper awards at this years ISWC. Sorry, can't find the paper online yet.

Innovation @ Microsoft

Recently I came across some links about really cool things going on at Microsoft. One is about Singularity, an innovative new operating system currently developed at Microsoft reasearch (that has nothing in common with Windows!) Btw: The project descriptions sounds great, but a still think that the name is a bit of a stretch.

Even more interesting is this Wired article that is mostly about model checking verification of third party drivers performed by Microsoft - very cool stuff (with a very nerdish interpretation of "cool").

IMHO by far not as cool, but a lot more publisized is the "Microsoft Live" initiative, described in this German article and commented by Cringely here.

Tags: ,

Thursday, November 10, 2005

Technology trends ...

Readers of this blog will already know that I like speculations about directions the development of technology may take - well, here a three more links with interesting speculations.

The German "Feldafinger Kreis" published a new report[german] about what it sees as important technology trends. This years study talks about self-managed systems, software agents, web services, networked smart labels, grid computing and peer2peer. Well yes, the Semantic Web is absent - but at least it was in the last report (published in 2002).

Juergen Luebeck has a scan of Gartners current "IT hype cycle" on his blog. It shows "Corporate Semantic Web" still in the initial ascend - so in Gartners oppinion the Semantic Web will get hyped a lot more before the (inevitable) desilusionment and then the realistic assesment of the technologies potential.

And not related to IT, but nevertheless interesting: The New York Times writes about "Beam Powered" space travel [free registration required].

Picture of the Week: A Pelican

Just a pelican preparing to fly ...

Actually the title of this post is misleading - I do not plan on posting a picture a week. I will try, however, to always have a picture on the front page to stop the blog from looking too boring.

Wednesday, November 09, 2005

More about Google Base ...

Today I stumpled about two particulary interesting new articles about Google Base: The first article lends further credibility to the idea that Google Base is part of a "Google plot" to enter the market of classified advertising. The authors of this article also share my view that this is part of a strategy to create proprietory content. The second article sounds a more optimistic note by claiming that Google Base is nothing but one node in a Semantic Web like internet, sharing its data in simple XML formats. Don't share that optimism.

My earlier entry about the relation between Google Base and the Semantic Web can be found here.

Tags: ,

Tuesday, November 08, 2005

Semantic Web Reference Card

Via Semantic Web Mailing List:

To commemorate the Fourth International Semantic Web Conference, we've issued a special ISWC edition (v2.0) of the UMBC Semantic Web Reference Card. [...] Intended as a handy "cheat sheet" for researchers and developers [...]
It includes the RDF/RDFS/OWL vocabulary, SPARQL language reference and more. It can be downloaded here.
Very nice, I already printed mine :-)

Some Links about Semantic Web and Computer Science

The ISWC Portal even for those that can't go there.
The W3C Launches the Rule Interchange Format Working Group.

An interesting Wired article about the Worst Software Bugs ever. And another one with speculations about technical inventions that are often floated but (hopefully) will not become reality because they would be too annoying.

Knowledge Zone - "One stop shop for Ontologies"

Stanford Medical Informatics created a public ontology registry that allows to upload, search and review ontologies.

Knowledge Zone is a web-based portal that allows users to submit their ontologies, to search for existing ontologies, to find out their rankings based on user reviews, to post their own reviews, and to rate reviews. We would like to invite you to submit your Ontology in the Knowledge Zone application (http://smi-protege.stanford.edu:8080/KnowledgeZone/). Knowledge Zone is one of the efforts to provide metadata related to ontologies to help users find, access, and assess ontologies for their applications as well as promote dissemination of ontologies and their re-use. If you are familiar with or using any of the ontologies that are already in the repository, we invite you to provide reviews and ratings for them. If you know anyone who uses your ontology or would be a good person to provide a review for it, we invite you to send them the link to the KnowledgeZone and encourage them to review and rate your ontology.
There are quite a few things that aren't that great about this site (for example its slow and you can't search by formalism), but still: better such a repository than none.

Tags:

Saturday, November 05, 2005

Books & Internet (A compilation of recent news)

There has been a flurry of news related to initiatives that try to make books accessible via the internet - time for a compilation.

For some time now there had been initiatives to digitize books and to make them accessible on the internet; initiatives like the Internet Archive, the The Internet Library at the University of California, San Francisco or the digitalisation initiative of the Library of Congress. For some reason these projects never really became very popular and propably also lack the resources to scan a significant percentage of books.
Then during the dot-com boom a lot companies believed that traditional books will disappear soon and will get replaced by e-books - something that still hasn't happened. There is still some market for e-books, but e-books are nowhere close to replacing the traditional book. More succesfull where services like the Oreilly's Safari that was introduced in 2001 and that offers online access to a large number of technical books. The next big step was Amazons "Search inside the book" functionality that was finally released in October 2003.

The recent flurry of activity started when Google first announced its initiative to offer search inside books (August 2004) and then to scan books from a couple of large university libraries without consent from the copyright holders (December 2004). This initiative ran into a lot of opposition, mostly from publishers / authors but also from some european (mostly french) politicians that feared that Google Print will accelarate the americanisation of the global culture. At one time Google stopped scanning books to give publishers some time to opt-out of the scanning process - an offer that wasn't used by many publishers because they hold the oppinion that Google must not scan any books unless they have explicit consent of the publishers to do so (opt-in). Many people, however, say that opt-in cannot be the solution[very interesting article] because a large number of books (75%) is neither in the public domain nor commercially exploited and for a large number of these the publishers propable don't even have the rights for digital distribution. It is unlikely that the publishers would be willing to spend money to get these rights and so the majority of books would never opt-in to get scanned - not because someone does not want them to be scanned, only because its not worth to spend the time and money needed in order to opt in.
Since Google did not back down the american Authors Guild sued google over alleged copyright infringement on a massive scale. Google maintains that it respects copyright and that allowing people to search books without giving them access to more than a few sentences is "fair use".

Not to be outdone by Google Microsoft announced that it will offer MSN Book Search in cooperation with the Open Content Alliance. The Open Content Alliance had been founded just weeks earlier and strives to make content freely accessible. Besides Microsoft the OCA includes Yahoo!, Oreilly and others. Unlike Google Print the OCA takes a more careful approach when it comes to copyright - for now only scanning books that where not under copyright anyway.
In another development a group of German publishers announced that they will build their own system to make book content accessible - this system working on a peer2peer basis that allows the publishers to keep full control of their books.
Meanwhile Google said that they are expanding Google Print to some european countries.
Finally Amazon, propably shocked by all the big companies moving into its terrain, develops a feature to allow consumers to buy online access to (parts) of books. A feature that is also under development at Google.

Tags: , , ,

Friday, November 04, 2005

Amazons Artificial Artificial Intelligence

Via semanticwave: This sounds like a joke at first, but apparently isn't: Amazon has created an API to query humans!

People can register with amazon to perform so called HITs (human intelligence tasks) for a small fee and programmers can use the API supplied by amazon to query this group of people - for example to ask if there is a tiger in a picture.

This is really a cool and innovative idea, although I'm afraid it will end up causing even more SPAM ... A while ago I did some consulting for a firm building anti-spam software and one of the topics where "turing tests" (CAPTCHA) that for example stop spammers from creating free email accounts (you all know these pictures showing distorted numbers that you need to write into a form in order to prove that you are a human). Back then I learned about some porn / spammer joint ventures that where defeating these tests by enlisting the help of people looking for free porn: "write down the distorted number shown in the picture to get another page of free porn pictures", the pictures where taken just in time from the site of a free email service and the answers would be used to create a new account. I'm afraid amazons "mechanical turk" makes it easier for spammers to defeat this kind of test.

Tags: , ,

Thursday, November 03, 2005

KAON 2

Reader of this Blog propable already read this somewhere else, but it is a really great piece of software and deserves all the publicity it can get:

The AIFB and the FZI Institute at the University of Karlsruhe, in cooperation with ontoprise GmbH, are pleased to announce KAON2 -- a new tool for management of OWL ontologies and reasoning.
You can read more at the KAON2 page. There you can also download a free version for non-commercial use (Free as in you supermarket, not the Richard Stallman free)

KAON 2 is mostly the work of Boris Motik, (among other things) an outstanding software engineer and Java programmer. You can meet him (and KAON 2) at the ISWC.

Tags:

Wednesday, November 02, 2005

Can the HCLSIG treat the no-computer virus?

Earlier I wrote about the Semantic Web for Health Care and Life Sciences Interest Group (HCLSIG) and an article in the economist that identified "interoperability" between the IT systems of health care providers as the main cure for the "no computer virus" that plagues the health care sector, costing us billions of dollars and tens of thousands of lives each year.

We all know that the Semantic Web is about interoperability - so is the HCLSIG the cure for the no-computer virus?
I don't think so. It really isn't technology that is the problem, the technology to make the IT systems of health care providers interoperable has been around for a while. Running pilots of such systems (such as the city wide network in Santa Barbara) are testiment to that. The challenges are more in the area of economy (who pays for these solution?), privacy and coordination (in order to be interoperable everybody has to agree to some standards, or somebody has to impose them - no wunder the centralised British NHS takes a lead role in building such a network).

This should't mean that I don't like the idea of applying Semantic Web ideas to the health care sector! I do! There is a tremendous amount of medical literature and Semantic Web ideas could help to use them better. Drugs, the human anatomy or diseases are prime candidates of well structured domains that are just waiting to be formalized in ontologies, greatly increasing retrieval of research literature (especially multi-language) and lowering the cost of building (world wide interoperable) medical software. Semantic Web technologies may even one day aid the effort of building diagnostic expert systems. Expert systems that would need to be build by thousands of different peoples (distributed, not complete centralised control) and that would reflect that ever changing and sometimes conflicting state of medical research.

Tags: , ,

Tuesday, November 01, 2005

Semantic Wikipedia

The (partly) Karlsruher project "Semantic Wikipedia" deserves a lot more publicity then it's getting. It is a well thought out plan to extend wikipedia in the direction of the Semantic Web.

From Dennys Blog:

[...] we propose the introduction of typed links as an extremely simple and unintrusive way for rendering large parts of Wikipedia machine readable. We provide a detailed plan on how to achieve this goal in a way that hardly impacts usability and performance, propose an implementation plan, and discuss possible difficulties on Wikipedia's way to the semantic future of the World Wide Web. The possible gains of this endeavor are huge; we sketch them by considering some immediate applications that semantic technologies can provide to enhance browsing, searching, and editing Wikipedia.

The idea has been presented at the Wikimania 2005, the paper is online. You can read more about the state of the development of a Semantic MediaWiki here or in Dennys Blog.

Tags:

More Color!

Berliner Dom

This blog looks not just a little bland, so I'll include one of my pictures. It shows a part of the "Berliner Dom" (Berlin Cathedral) that you can find - you guessed it - in Berlin. You can read about the Berlin Dom here [german] or here [english].

The picture is a side effect of bicycling through Berlin at night. Which is fun, because the city looks great at night, but also risky because Berlin really lacks a decent infrastructure of bicycle lanes.