Quick access:

Go directly to content (Alt 1) Go directly to first-level navigation (Alt 2)

Artificial intelligence in foreign language learning
AI Can’t Cut It: Correcting Language Learners’ Writing Still Has to Be Done by Hand

Hand with a red pencil correcting a text
Meaningful correction requires expertise and tact | © Getty Images

Many providers of AI writing assistants promise “high-quality writing at the push of a button”. Proofreading and editing software is now capable of instantly correcting reams of text and providing suggested improvements. But how good are these AI editors, and do they help foreign language learners learn to write well themselves?

By Dr. Moritz Dittmeyer

What’s the point of learning a foreign language, one might ask, given the myriad of new language technologies and online support tools. Besides well-known applications that translate texts into many different languages in a split second, smart apps can now help us write flawless and stylistically solid texts, whether it’s the autocorrect function in Microsoft Office or in the browser of our choice or more versatile tools like Duden Mentor, Grammarly or the LanguageTool.

Many providers of so-called “writing assistants” promise to improve texts of all kinds, for any purpose, and make us more successful as a result. Their functional repertoire includes checking our spelling, grammar, and punctuation as well as optimizing style, and even rephrasing whole sentences.

So doesn’t it make perfect sense to use these new tools in language learning as well? And do we still need trained teachers to correct their pupils’ texts and suggest improvements if AIs can do the job much faster, more accurately and more objectively, as is widely claimed?

The LanguageTool put to the test

But it’s not that simple, as a Goethe-Institut study has shown with the aid of researchers at Humboldt University in Berlin. Current-day proofing tools may well help native speakers and advanced language learners produce better texts, but they’re not useful in learning to write well in a foreign language, especially for beginners. Proofing tools need to be extensively overhauled before they can be recommended to serve didactic purposes or improve text quality.
The LanguageTool
The subject of the study was the LanguageTool, an open-source app for checking grammar, style and spelling in several different languages. As it is free and open-source, it forms the basis of many online correction apps. Purportedly used by millions of people around the world, the LanguageTool has evolved into a fully-fledged writing assistant, though it only delivers its full potential in a paid version. The app is continuously improved by an active community. The current version for the German language applies over 4,500 rules of grammar, punctuation, spelling, typography, idioms, gender-inclusive language etc.

The Experiment

When it comes to proofreading, the LanguageTool turns out to be less accurate than Goethe-Institut teachers. In the study we compared corrections of the tool to texts written by German-language learners in online Goethe-Institut A2-level courses with corresponding edits from teachers. The LanguageTool flagged significantly more mistakes than the teachers did. But quite a lot of those edits were wrong or irrelevant, e.g. with regard to sentence fragments or spelling/capitalization of proper nouns that were actually correct, or suggested improvements that were beyond the learners’ current level of proficiency. And the opposite phenomenon was observed when it came to more complex grammatical mistakes: syntax errors in particular were hardly noticed. All of which made for a very poor markup.

The LanguageTool’s inaccurate and inadequate corrections are due to its technological approach to identifying grammar mistakes and its failure to consider semantic content in rules-based markups. A solid proofreading app would also take semantic or contextual information into account, e.g. recognizing proper nouns as such instead of trying to correct the spelling.

Beispiel eines vom LangugeTool korrigierten Texts einer Sprachlernenden. Beispiel eines vom LangugeTool korrigierten Texts einer Sprachlernenden. | © Goethe-Institut

Corrective feedback in language learning

The task turns out to be even more complex. Since the whole point of doing written assignments in foreign language courses is, of course, to helplearn the language, there has to be a didactic component to the corrections. Corrective feedback should ideally be calibrated to each learner’s specific level and learning style. It can be focused and implicit or comprehensive and explicit. For example, by focusing on certain types of errors when correcting, a teacher can draw a pupil’s attention to a particular problem. Implicit corrections can have the effect of motivating learners to review the underlying rules on their own. The teachers we asked in the study pointed out these didactic shortcomings in corrections and improvements suggested by the LanguageTool in particular. They also stressed that erroneously flagged mistakes and faulty edits are particularly misleading for language learners since they are unable to recognize incorrect corrections and “disimprovements”. The psychological effects thereof should not be discounted either: excessive and especially erroneous edits can demoralize learners and undermine their motivation to learn the language.

Conclusion & outlook

Without extensive and specific adjustments for use in foreign language learning, the technology behind the LanguageTool is hardly suitable for proofreading German learners’ writings.
In addition to rules-based approaches, for some years now so-called “language models” have been used to correct and improve texts. “Language model” here refers to complex pre-trained neural network architectures capable of modeling a language in its entirety. What’s special about language models is that they’re attentive. They achieve a certain level of comprehension by assigning meaning and context to processed words and word fragments.

Recently developed language models have been breaking one record after another, and these network architectures are indeed astoundingly good at “understanding”, manipulating, and imitating natural language. One of the most famous language models is Generative Pre-trained Transformer 3 (GPT-3) created by the US research lab OpenAI.

However, they’d need to be adapted extensivelyfor use in correcting and improving texts written by language learners. Usable comprehensive language models already exist primarily for English. So one common practice for correcting German texts is to translate them into English first, improve them in English and then translate them back into German. The problem is that the text is sometimes noticeably distorted after going through the various steps involved, and often contains words, expressions, and grammatical structures that are not yet familiar to language learners.

What’s more, language models are trained to produce ideal or standard language and are unable to allow for different levels of language proficiency for didactic purposes. As a result, they propose corrections and improvements that often make no sense for lower-level learners, for whom other factors besides ideal usage need to be taken into account.

A German learner’s text marked up by a language model (GPT-NeoX 20B via https://nlpcloud.com) A German learner’s text marked up by a language model (GPT-NeoX 20B via https://nlpcloud.com) | © Goethe-Institut Another problem with language models is that learners might not understand the reason for a particular correction or improvement. Unlike rule-based approaches, language models don’t have any specifiable rules based on which the reasoning behind certain proposed edits can be deduced. The suggested corrections are simply a result – to put it simply – of what is statistically most probable in the given context. So a punctuation mark or word marked incorrect is replaced by the punctuation or word that occurs most frequently in the implicit context of the sentence in question and the text as a whole.
Concerning sustainability issues, by the way, it should also be borne in mind that the use of language models is highly resource-intensive. The development, training and operation of big pre-trained language models like GPT-3 consumes thousands of megawatts of electricity, costs millions and requires billions of data points.
So at least for the time being, correcting language learners’ texts in a suitable, sensible and didactically effective way remains a job for trained teachers.

Any doubts?

Well then, try it yourself. Here are two anonymized texts written by language learners that you can have proofread by the language assistant of your choice:

Hallo Kadira. Mein Traum haus hat eine großküche, ein groß und hell bad mit ein groß Spiegel. Mein Slafszimmer hat einen groß bett, viel Schranks, Meinen Arbeitzimmer hat einen Bibliothek, einen Tisch für den Computer un ein bequemer Stuhl. Sie hat auch ein großer und farbenfroher Garten. Mit Freundlichen Grüßen, Tho

Liebe Taisha. Das ist meine Tag. Jeden Morgen ich aufstehe um 5:30 Uhr und ich zaubere den Frühstück für mich und meine Tochter Khaleesi. Dann ich arbeite bis 16 Uhr. ; Von 16:15 bis 18:45 spiele ich mit Khaleesi, um 19 Uhr haben wir das Abendessen und ich vorbereite Khaleesi für das Bett um 8 Uhr. Sie habe ein Bad und sie schlaft von 21 Uhr. Dann ich arbeite, zaubere die Essen für den Morgen oder ich lerne Deutsch. Es kann helfen um eine Kultur besser zu verstanden, und auch um mehrere Leute zu können lernen. Viele Grüße, Sichin


Sylvio Rüdian, Moritz Dittmeyer, and Niels Pinkwart (2022): Challenges of using auto-correction tools for language learning. In LAK22: 12thInternational Learning Analytics and Knowledge Conference (LAK22). New York: Association for Computing Machinery. 426–431. https://doi.org/10.1145/3506860.3506867

Nassaji, H., & Kartchava, E. (eds.) (2021): The Cambridge Handbook of Corrective Feedback in Second Language Learning and Teaching (Cambridge Handbooks in Language and Linguistics). Cambridge: Cambridge University Press. https://doi.org/10.1017/9781108589789