Libraries in Germany – Model Projects

When Computers Learn to Understand: Semantic Search Engines

Logo von „Contentus“; © DNBLogo of “Contentus”; © DNBThe need for search engines that actually help in finding what is sought has grown with the rapid development of the Internet. Semantic search engines are meant to do precisely that.

A question, a keyword, a bewildering number of links: that, as a rule, is how the result looks when you use a classical search engine to find an answer to a concrete question on the Internet. Take, for example, the case of Gutenberg. If you want to know where Johannes von Gutenberg was born, go on the Net and google the keyword “Gutenberg”, you will be referred to over 10.7 million links. Supplement the keyword with “birthplace” and you will still find a proud 14,000 links. Behind them are texts in which you must yourself glean the answer – assuming you find it at all.

One question, one answer (here “Mainz”): this is the dream of every Internet user. Semantic search engines could make this dream come true.

A term with many meanings

Logo of evri.com; © evri.comThe term “semantic search engine” is understood in different senses. “Some search engines are already being called ‘semantic’ when they provide a matching term or thematic clusters”, says Andreas Heß of the German National Library. For a computer scientist, however, these are not “real” semantic search engines. In the Contentus Project, Heß is working on the next generation of the digital library. “Search engines that deserve this name don’t deliver results in the form of links to other texts, but as already processed information.”

Heß mentions evri.com as an example. “There you enter the name ‘Gutenberg’ and the search engine displays a network that presents persons and places connected with Gutenberg. It displays birth and death dates and also pictures. Such search engines integrate information from multiple sources – ranging from the home page of a museum to that of a blog.”

Promoting reading skills for computers

Screen shot of “Contentus”; © DNBYet no computer can do what is a matter of course for every experienced reader. In contrast to human beings, computers can’t understand texts. Semantic search engines therefore find information only through what are called “formal knowledge representations”. There information is presented in such a way that a computer can also understand it.

For this, definite facts must be explicitly stored and broken down into very simple relations such as subject – predicate – object. In addition, the computer needs information on how to interpret predicates.

An important goal in the development of semantic search engines is to enable them to draw inferences: for example, to join together the information “Gutenberg born in Mainz” and “Mainz is in Germany” into the inference that Gutenberg was born in Germany. “To make this possible, however, I have to have explicitly told the computer before what relation exists between Mainz and Germany”, explains Heß.

Long way to the World Wide Web

Students at computers; © ColourboxSemantic search engines are currently being developed in very different areas. “It’s difficult to get an overview of the scene”, emphasizes Heß. “There’s a constant stream of new pages from universities, but also from small start-ups and individuals.”

But since these search engines require a particularly formal representation of contents, they have up to now been feasible only for closed databases – and not for the entire Internet. “That will probably still take quite a long time”, says Heß. “In the end, in order to apply semantic search engines to the entire Net, webmasters would have to add to their pages a semantic processing.” Currently, projects are developing the simplest and most automated possible processes for this.

Developing foundations in the library sector

Theseus Project logo; © BMWiIn future semantic search engines will play an ever greater role especially in libraries, archives and museums because they clearly add significant value to the user offering. In the Contentus Project, a part of the extensive Theseus Research Program of the Ministry for Economy and Technology, the German National Library is currently working on the foundations for semantic search.

“The standard database existing in the German National Library furnishes a very good starting point”, reports Heß. “The person and keyword vocabulary contains precisely the sort of information that computers can understand”. The Library, he says, is transferring all its existing database into a so-called “ontology”, a kind of extended thesaurus. “While a thesaurus says, for example, only which generic and specific terms or synonyms there are for the word ‘book’, an ontology can more precisely model and describe that a book has an author, a date of publication and so on.”

But the project is still in its infancy. The way to a computer that can understand texts is still a long one.

 

Semantic search engines:

Hakia: www.hakia.com
WeFind: www.wefind.com
Evri: www.evri.com
Semager: www.semager.de
Dandelon: www.dandelon.com

Dagmar Giersberg
is a freelance journalist living in Bonn.

Translation: Jonathan Uhlaner.
Copyright: Goethe-Institut Online-Redaktion
February 2010

Any questions about this article? Please write to us!
online-redaktion@goethe.de

Related links