Archiving Internet Content
For a new logic of collecting

German National Library
German National Library | Photo (detail): © Deutsche Nationalbibliothek, Stephan Jockel

Any item published in Germany must be sent to the German National Library. Since 2006 this has also applied to publications on the internet. Yet this is still problematic, says jurist and librarian Eric Steinhauer.


Prof. Steinhauer, anyone who publishes a book in Germany is legally obligated to send two copies free of charge, so-called “deposit copies”, to the German National Library (Deutsche Nationalbibliothek / DNB). Since 2006, the National Library has also been given the task of archiving in this way media published on the internet. Why is this still problematic?

In the 2006 debate in the Bundestag, when the National Library Act was passed, the controversy was mainly about the name. But the really revolutionary thing about this bill, that the National Library should also be responsible for web publications, wasn’t a matter of dispute. Yet web publications, from a legal point of view, are to be seen very differently from printed material, records or CDs. With these media there is always a concrete object that is “delivered” to the library. This makes the DNB the owner of the copy and the right of distribution, in the sense of the copyright, is exhausted. That is to say, the author can no longer determine what happens with this “delivered” medium. With web publications, on the other hand, there is no physical data carrier, but only a data stream. When I send the DNB a file, I generate a copy of this work on the system of the DNB. But this doesn’t imply any rights of use. And that’s the problem for the DNB – especially when the library is supposed to store large parts of the internet.

What is the DNB’s mission and how can it be fulfilled?

According to the law, in addition to deposit copies, the National Library should archive the German language internet at periodic intervals. But the National Library can’t do this, because it has no powers of copyright to make the copies that thereby must necessarily be generated. In strictly legal terms, the library would first have to ask permission from all website operators, which of course is impossible. Moreover, it’s often impossible to distinguish on the web which publications are relevant for archiving. For those interested in contemporary history, it’s perhaps important to know what’s being discussed on certain topics on mass media such as Facebook or Twitter. Theoretically, all this would also have to be archived.

Online gaps

The National Library is obliged to archive the entire German-language internet? How can the DNB even recognize what belongs to the German-language web?

Printed material always has a clear site of publication. On the interest it’s different. Even if you took all web sites ending with “de” as the criterion, this wouldn’t work because domains are freely selectable. What if a German website operator chooses a “com” address? Then he no longer falls under the category. Finding out what counts as belonging to the German-language internet would involve an administrative effort that is hardly feasible. The logic of collecting web publications doesn’t seem to me to have been quite thought through yet.

What do you propose?

I could imagine a layered approach. For example, web publications that correspond functionally to conventional printed material could be collected as deposit copies. This is already happening and can be done simply because most publishers are behind it. But there are also purely online publications – blogs, for instance. Anyone can start a blog from one day to the next. Should every blog be saved? I think the National Library or regional libraries, which are also supposed to store relevant content, could perhaps archive a few hundred representative blogs. Their operators would have to be informed and deliver them. But then there will be gaps in the library collections. And a website may no longer be accessible after two or three years, which means it will be lost forever to the collective memory if no one else has stored it. For this reason, it would make sense to introduce a third level of collecting, which would at first save everything without looking at the content. Drawing on this fund, memory institutions such as libraries and archives could integrate publications they lack into their collections.

Reform of the copyright law

In February 2017, the Ministry of Justice introduced a draft law for copyright reform. Can this improve the situation for the German National Library?

A change in the bill concerns a supplement to the National Library Act. The National Library would now at last have the necessary copyright powers to archive web publications systematically and by automatized means. After more than ten years, the Ministry of Justice has seen the need to act here and has proposed a corresponding change. The draft law essentially solves the current problems. There are a few details that still need to be discussed – for example, the question of how to make stored web pages accessible. Following the logic of the law, this would be possible only for non-commercial purposes. I think this is awkward. But for me it’s important above all that we can begin collecting. We need to be permitted to archive the publications; the question of usage is secondary. The material would then be secured at all events.

Prof. Dr. Eric Steinhauer is Department Head of Media Processing at the University of Hagen and honorary professor at the Institute of Library and Information Science at the Humboldt University in Berlin. He is a champion of Open Access and the reform of German library legislation.