Semantic Catalogue Search
“We want to beat Google”
The Saxon Regional Library-State and University Library in Dresden (SLUB) has developed a novel and convenient catalogue search using semantic technology. Dr. Achim Bonte, Deputy Director General of the library, explains the advantages of SLUBsemantics.
Dr Achim Bonte | © SLUB
Mr Bonte, what is meant by a “semantic web”?
All our EDP rests on the binary system. Thus we can translate the character string G, o, e, t, h, e into a series of zeros and ones. The semantic web has now, so to say, taught EDP to talk. This means that data are no longer only processed mechanically, but also linked to content. We produce a semantic web for a certain term and order further terms to it. All this is done in a specific description language, the Resource Description Framework (RDF). This in turn leads to the computer recognizing what meaning we connect with the character string “Goethe” – an eighteenth-century writer who, among other works, wrote Faust.
Multilingual and disambiguating searchTogether with the young company Avantgarde Labs, your library has developed a semantic catalogue search. How does SLUBsemantics differ from a conventional catalogue?
Our search is multilingual. If, for example, you enter “automatic transmission”, you get hits in English and Polish. In addition, SLUBsemantics translates everyday language into technical language and vice versa. You search for “rotten meat” and get everything on the subject of food safety. Or you enter “adoposis” and get hits for “overweight” and “obesity”. This doesn’t work in the one-dimensional string searches of normal catalogues.
And SLUBsemantics eliminates the problem of ambiguity. If you enter “Pyrthon”, you might mean the snake, the programming language or the group of comedians “Monty Python”. Normal catalogues deliver the hits all jumbled; SLUBsemantics sorts them – for instance, into the areas of biology, computer science and film.
SLUBsemantics builds on existing concepts …
Yes, we make use of already existing experience. Wikipedia, for example, enables the translation services. We’ve developed a method that matches our searches with lexical entries from Wikipedia. Wikipedia describes the meanings and we’re the beneficiaries of the multilingualism. And the whole thing is then matched with our holdings.
The ideal lexical entryWhat further developments are you planning?
So far we’ve relied on only Wikipedia for a data reservoir. Although this is gigantic, it has its limits. In a current project, therefore, we’re trying to integrate data sources of various provenances. We harvest information, combine it and get in the end an integrated data set stripped of redundancy – the ideal lexical entry. This would be the next step: to open up data even more deeply, without being overwhelmed by redundant information.
So you want to be better than Google …
Yes, even if that may sound at first somewhat exorbitant. Our advantage is that we’re developing not the breadth but the depth. Google has to find solutions that fit everybody and can be used everywhere. Our services are intended in the first place for the local clientele. Our primary user group consists not of a billion but of about 80,000 people. And we know the make-up of our clientele. We can therefore work with finer tools than can Google, can focus on specific kinds of data and specific technical languages. But yes, we want to serve our users better than Google does.
Supply that is in demandHave other libraries already expressed interests in your developments?
Yes, the British Library is now testing a prototype based on our technology. And we’re in talks with other major libraries in Germany and other parts of Europe. Among others, we’re in conversation with the German National Library in Leipzig and Frankfurt am Main, which is designing the technical development for the German Digital Library – and from there it’s not very far to Europeana.
Where are the semantic web and libraries headed in the future?
I think libraries will have to go further along this path. We should take Google, Flickr and other products of the Internet industry as a benchmark and look carefully at what we can do better. We’re not in economic competition with these companies, but at the same time are moving in a supply and demand market. Every library must have an answer to the question why its there – now, and in ten or twenty years. And these answers will be different of course for a small town library than for big a tanker like SLUB.