Corpora Teaching and learning with text corpora in GFL

Corpora are digitised collections of texts
Corpora are digitised collections of texts | © connel_design/

The potential uses of corpora in GFL lessons are diverse. But teachers often view corpora with scepticism. Why it’s worth introducing them and how teachers can integrate corpora into language lessons easily.

Great potential is attributed to the use of corpora in GFL. However teachers view them with scepticism to some extent, because it is uncommon for the linguistics of corpora to be included in the training for GFL teachers. And many do not have the time to acquire this knowledge independently. To offer GFL teachers an introduction to working with corpora, the following article explains the basic concepts and highlights a few possible ways in which they can be used.

What are corpora?

Corpora are extensive digitised collections of written texts or transcribed oral material. This allows large quantities of authentic language data to be systematically analysed. A variety of access routes for describing language, researching the acquisition of foreign and second languages, developing reference works, teaching and lesson-planning material, as well as for practical use during lessons, are opened up as a result.
A distinction is made between different types of corpus. There are text corpora consisting solely of written data, and so-called multi-modal corpora that also contain audio recordings and sometimes video as well (cf. Ahrenholz /Wallner 2013). Usually additional data is available to complement the language data. This includes information about the author, place of publication or text type. It is possible to limit the corpus search, for instance to texts of a certain type or from a specified timeframe. The language data can be enriched with supplementary linguistic information (annotations). This includes information about the part of speech (POS tagging), about non-inflected basic forms (lemmatisation) and the syntactical function of the individual words (parsing). This means it is possible to search a tagged corpus for the adjective “arm” (Engl.=poor) without finding instances of the noun “Arm” (engl.=arm) at the same time. In DWDS the correct search query for this would be “arm with $p=ADJD”.

What corpora are available for German?

There is a large number of corpora and corpus collections for the German language. The largest of the written-language corpora accessible to the public include the Deutsches Referenzkorpus (German Reference Corpus; DeReKo), the Digitales Wörterbuch der deutschen Sprache (Digital Dictionary of the German Language; DWDS), the Projekt deutscher Wortschatz (German Vocabulary Project) as well as the Korpus Südtirol (Corpus of South Tyrol). Corpora covering spoken language are provided by the Datenbank gesprochenes Deutsch (Database of Spoken German; DGD2) and by GeWiss, a comparative corpus of spoken academic language.
The corpora that are currently accessible to the public are not primarily set up for educational use. They do not provide information designed for foreign-language teachers and students, such as suggestions for educational enhancement, multilingual user tutorials or virtual workstations. The individual portals have different user interfaces and search functions, which means that individual training is necessary.

When is working with corpora useful?

Teachers can use corpora as a source of authentic speech samples within the scope of lesson preparation. The appropriate corpora can be cited as a reliable reference source, particularly for vocational or subject-specific language training. In the post-processing phase, corpora are ideal as a complementary correction tool for checking whether certain constructions and structures are acceptable (cf. Lüdeling/Walter 2010, 6; Ahrenholz/Wallner 2013, 263 ff).
Students can also carry out independent research using corpora. An introduction to working with corpora should cover the required analysis steps, but also evaluation techniques to facilitate interpretation of the research findings.
Corpora contain authentic language data. The texts concerned are from the press, literature, specialist literature and everyday conversations, and are primarily aimed at an adult audience of native speakers. For this reason it is usually recommended that corpora are used in lessons with advanced adult students. But individual opportunities for use do not necessarily require an in-depth understanding of the text, so that even students who are at a less advanced level can carry out productive research.

How can you work with corpora in lessons?

The potential uses of corpora in GFL lessons are diverse, and these are illustrated using the example of DWDS. The DWDS features a user interface with a highly intuitive design, which means that it is possible to get up and running quickly.
Work with concordances is extremely suitable for language tuition. It means showing a search term in its immediate language context. It is based on the didactical principle of data-driven learning. Students work out their own usage rules based on an intensive analysis of linguistic material. Concordances are ideal for illustration and development of different language structures and phenomena, such as accusative/dative prepositions (cf. Wallner 2013), word formation patterns (cf. ebd. and Fig. 1) or verb to end in a weil construction. As well as this, they can be used as tools to assist with the production of text and to expand vocabulary. Furthermore concordances can be used to illustrate multiple meanings of words (cf. Fig. 2). Fig. 1: Concordance list for search query “*sprechen” (=speak) Fig. 1: Concordance list for search query “*sprechen” (=speak) | © DWDS Fig. 2: Concordance list for search query “Schein” (=appearance, ticket…) Fig. 2: Concordance list for search query “Schein” (=appearance, ticket…) | © DWDS The full text can be shown for the concordances in each case. This is recommended if the aim is to develop the meaning and usage of a phenomenon on the basis of its contextual environment, for which a certain understanding of the text is necessary.
By contrast, working with the DWDS analysis tool Wortprofil 3.0. requires less understanding of the text. This offers the option of researching a search term by matching combination partners. For instance in Profile Deutsch the word “Durst” (=thirst) is shown both productively and receptively as an A1 term. Admittedly no further combination partners are given. These in turn can be looked up with the help of the DWDS word profile. For instance students can find out here how Durst can be, and also what you can do with it or to counter it (cf. Fig. 3).
Fig. 3: DWDS Wortprofil 3.0 for search query “Durst” Fig. 3: DWDS Wortprofil 3.0 for search query “Durst” (=thirst) | © DWDS The DWDS Wortprofil function is also suitable for working with words that have an almost identical meaning. Partially synonymous adjectives such as “niedlich” and “süß” (=sweet) can be differentiated from one another more precisely by referring to their combination partners (cf. Fig. 4).
Fig. 4: DWDS Wortprofil 3.0 for search query “niedlich” with “süß” as a comparative term Fig. 4: DWDS Wortprofil 3.0 for search query “niedlich” with “süß” as a comparative term | © DWDS The illustrated activities are very well-suited for independent research carried out by students. Here, students are called upon to adopt a research-oriented perspective on language and to question the information in reference works critically. At the same time they are trained to carry out independent analysis and expand their vocabulary according to their requirements.


Ahrenholz, Bernt; Wallner, Franziska: Digitale Korpora und Deutsch als Fremdsprache. In: Bernt Ahrenholz, Ingelore Oomen-Welke (Hrsg.): Deutsch als Fremdsprache (Deutschunterricht in Theorie und Praxis, Bd. 10), S. 261-272, Schneider Verlag Hohengehren, 2013.

Barkowski, Hans; Grommes, Patrick; Lex, Beate; Vicente, Sara; Wallner, Franziska; Winzer-Kiontke, Britta: Deutsch als fremde Sprache. Deutsch Lehren Lernen 3. Langenscheidt, 2014.

Glaboniat, Manuela: Profile deutsch: Lernzielbestimmungen, Kannbeschreibungen und kommunikative Mittel für die Niveaustufen A1, A2, B1, B2, C1 und C2 des Gemeinsamen europäischen Referenzrahmens für Sprachen, Buch mit CD-ROM, Langenscheidt, 2005.

Lüdeling, Anke; Walter, Maik: Korpuslinguistik. In: Hans-Jürgen Krumm et al. (Hrsg.): Handbuch Deutsch als Fremd- und Zweitsprache, HSK 35, S. 315-322, De Gruyter, 2010.

Wallner, Franziska: Korpora im DaF-Unterricht – Potentiale und Perspektiven am Beispiel des DWDS. Revista Nebrija de Lingüística Aplicada 13, Nr. número especial – Actas de Congreso (2013).