Theseus – Technologies to Navigate the Maze

Recognising the innovative force and economic power of information technologies, the Federal Ministry of Economics and Technology is supporting a consortium of 21 companies and research institutions that have signed up to the Theseus project. With total funding of 180 million euros, the project aims to develop new semantic Web technologies. Anyone who has ever searched the Web is familiar with the problem: you input your search term and within milliseconds, search engines like Google, Lycos or Yahoo provide you with six-digit hit lists. The user then has to undertake the laborious and time-consuming task of navigating the maze: picking out the few pages which appear to meet his or her needs. That's when the real search for useful data starts. And by the time they've clicked on 20 sites, even Web-hardened surfers are likely to give up. If what they need is on site no. 21, that's just too bad.
For years, developers have dreamed of creating semantic search engines which allow the mass of data held on the Internet to be searched by content. This leap forward would usher in a new era – the age of Web 3.0, the semantic web. But the community still has a long way to go before that happens. At present, the Internet – at least for the PCs and programmes which search it and make it visible to users – is simply a collection of symbols, letters and pixels. As a result, search engines merely sift through the mass of data using formal criteria – with predictable results: endless hit lists, most of which are irrelevant. If you're a sports fan and input the word "golf", for example, the search engine will also list websites which refer to the "Golf" car, and if you're searching in German, it will give you plenty of information about the Gulf as well.
Semantic search technologies, on the other hand – according to the vision – understand what they can and should find, and like a human being, can make an instant judgment as to which data actually fit the user's needs. "We are making the shift from search engines to response engines", says Wolfgang Wahlster, head of the German Research Centre for Artificial Intelligence, which is one of the Theseus project partners. These are visionary words. The fact is that before a response engine can respond at all, the data which it is evaluating must be readable and comprehensible. As a result, the lion's share of the research is aimed at developing databases which can be evaluated semantically. And it doesn’t take a search engine expert to see that this type of project would be almost impossible to realise for the existing World Wide Web, at least in the foreseeable future.
Semantic technologies for specialist areas
So the Theseus project is not intended to develop a new Internet search engine to compete with Google and which could be used in every situation that crops up in the digital data world. The data contained on the Internet are simply far too heterogeneous and chaotic for that. What's more, at the end of the project, there won't even be an Internet platform – probably not even a physical product, says Thomas Huber, press spokesperson for the Theseus project. Instead, Theseus aims to create standards for semantic searches within specific areas. With partners from the business community, notably Siemens and SAP, and research associations such as the Fraunhofer Gesellschaft and various universities, Theseus therefore consists of subprojects which focus on specific application scenarios. As a spokeswoman from the Federal Ministry of Economics and Technology explains, these have been selected in advance by companies and the Ministry itself on the grounds that they appear to be particularly promising.
The "Medico" project, for example, focusses on developing semantic search technology for medical images, enabling the computer to evaluate X-ray images or CAT scans. The software, according to the vision, will identify medical anomalies, catalogue the data, provide images for comparison, and access treatment reports from all over the world. It will then supply the doctor with relevant data and treatment proposals on this basis.
Besides "Medico", there are other scenarios for which semantic technologies are being researched. "Contentus" aims, among other things, to create a system for the processing of audio files so that broadcasters, museums and other arts organisations can make their stocks accessible to the public via the Internet. The "Ordo" project will allow companies to systematise their data and access them swiftly and precisely. "Alexandria", on the other hand, develops Web 3.0-compatible tools for the ordinary Internet user.
Web 3.0: an innovation and economic factor
Each of these "Theseus" subprojects involves close cooperation between the business and the research communities, although the lead organisations in each case are the partner companies who bear 50 percent of the total costs. It is hoped that Theseus will provide the impetus for market-oriented product development on the basis of these promising technologies. There is still profound frustration that in the past, German inventions in the field of information technology were brought to market by foreign companies. The fact that key elements of MP3 technology – notably the compression algorithm which is now used to upload almost every audio file to the Internet – were originally developed in Germany still awakens painful memories. At the time, no one was willing to invest in this technology and turn it into a marketable product – so MP3 went abroad, where other companies are now making a killing. This mustn't happen again. The fact is that Germany has massive scientific potential in the field of information technologies, says Thomas Huber. It is simply a matter of bringing these people together to form a critical mass which will then generate innovation. Theseus is intended to provide the impetus for this process. And indeed, all the project partners are networked so that one research group can benefit from the others' findings. A consortium agreement regulates the rights issues, assuming that at the end of the project, which has a five-year life span, it has produced viable outcomes. It's no secret that innovation cannot happen by decree. And time is of the essence: for researchers in other countries are also pressing ahead with the development of technologies for Web 3.0.
V8 Verlag GmbH, Cologne
Translation: Hillary Crowe
Copyright: Goethe Institute, Online Editorial Team
Any questions about this article? Please write to us!
online-redaktion@goethe.de
November 2007









