All kinds of different exams and tests – short tests to verify student progress, standard assessment testing, vocabulary tests and final exams – are an important part of everyday life for teachers. On the very first day of a course, students ask about the exam relevance of the teaching and learning material, while language students require a specific certificate that will allow them to study or work. Exams and tests are therefore relevant to the way teachers organize their daily work. But what form should testing best take? And what do teachers need to take into account?

Not all tests are the same, and several distinctions can be made between them. The first of these concerns the function of a test. Low stakes tests are particularly important in language lessons, for example. These are tests which are of interest and relevance to teachers and learners but have no social, school or professional consequences. They include vocabulary tests and short tests designed by teachers themselves to check that specific learning targets have been reached.

By contrast, high stakes tests are those whose results have some impact on those who take them: these include the driving test (mobility), the Abitur (Germany’s higher education entrance qualification) and the Goethe-Zertifikat A1: Start Deutsch 1 (that allows spouses to be reunified with their partners in Germany). In other words, these are tests which learners need to pass and are designed by an external body.

Learning progress, learning targets, learning success

Apart from the function of a test, the time at which testing takes place is important (cf. Common European Framework of Reference, Chapter 9): should learning progress be tested while a course is ongoing or should the success of learning be checked at the end?

In the former case we talk of formative assessment. This allows teachers to obtain feedback about the success of their own teaching processes and adjust future lessons accordingly, repeat certain content or ensure future progress is made. This allows progress to be measured without using standardized targets. One example are short tests to verify student progress.

Summative assessment plays a much more common role in everyday teaching. This takes the form of a final test that aims to measure the learning progress actually achieved against a defined or imaginary benchmark, and is therefore course-related. These can include vocabulary tests and final exams in which the target to be achieved is specified by the textbook or curriculum.

Key questions for teachers

Teachers should therefore take the following questions into account when preparing tests:
  • What is the purpose of the test? Is it designed for example to measure learning progress with respect to the most recently taught lessons?
  • Is the test intended to prepare the students for something? And if so, what? For office communication, for instance, or for dealing with simple/complex everyday situations?
  • How useful or realistic is the test in terms of such preparation?

Ensuring the quality of tests

But what makes a final school exam or language course test different from an internationally recognized language test? One factor is the collection and analysis of data, which is one of 17 minimum standards set by the Association of Language Testers in Europe (ALTE). Items to be tested are first verified multiple times internally and then reviewed by external experts. Next, a group of at least 200 participants, whose composition roughly corresponds to the future exam candidates, trials all items. The results of these trials are statistically analysed to identify any inadequacies, inaccuracies or indeed errors. Following this analysis, the individual items are revised once more so that ultimately a fair, error-free and precisely measuring test is constructed.

Level and marking

In the same way that the Common European Framework of Reference (CEFR) classifies foreign language proficiency according to different levels, the tasks set in tests and exams must likewise match the desired level. If a task is set that exceeds the language level of the learners, they may be unable to answer certain questions despite actually knowing the answer. If a task is set that corresponds to the language level but is too easy or too difficult, the test will no longer fulfil its original intention and will no longer be valid and reliable.

As with designing items, it is common practice to mark exams on a level-related basis: errors typical of the level in question should be ignored as the focus is on overall proficiency.
This is illustrated by an example:

Example of a task set at A2 level, from: Materials for examiners Example of a task set at A2 level, from: Materials for examiners | © Goethe-Institut Learners are familiar with all the level A2 words and grammar used in the task, and the reasons for writing – to congratulate someone, to respond to an invitation and to ask questions – are realistic.

This is an example of a student’s answer:

Example of a student’s answer at A2 level, from: Materials for examiners Example of a student’s answer at A2 level, from: Materials for examiners | © Goethe-Institut The student has fully completed the sections Glückwunsch and Gäste and has therefore received full marks. At this level, students have not yet learnt that the verb gratulieren takes the preposition zu. The fact that the preposition has been left out can therefore be ignored, especially since this does not impair understanding. The sentence about the section Auto does contain errors that would impair understanding, however, which should result in marks being deducted.
A mark should be deducted because the salutations have been left out. The kind of text is recognizable, however, and the communicative nature of the text is clear. The length of the text is appropriate.
Assuming that three marks are awarded per section, plus one mark for the communicative design of the text as a whole, marks would be awarded in this case as follows: 3 (Glückwunsch) + 3 (Gäste) + 1.5 (Auto) + 0.5 (communicative design) = 8 out of 10.

Small-scale quality control

How can the quality of tests be ensured, however, if the time and financial or staff resources are not available to conduct multiple internal revisions, engage external experts or collect and analyse data? After designing a test, teachers can work through the set tasks themselves, focusing in detail on the items in question and then comparing their answers against the standard answers. This allows them to determine whether the items test the learners on the content that is supposed to be tested or accidentally test them instead on general knowledge, logic or their ability to concentrate.

It is also helpful to have a colleague work through the test and give their feedback: in which cases were items phrased unclearly? Was it possible to work through the items in the time given? Are the individual items distinguished clearly enough from one another, or do they overlap? Does one incorrect answer generate additional incorrect answers? And most importantly, does the test correspond to the level of the target group? The objective is to mark tests on a level-appropriate basis, focusing on what the exam candidates know rather than on any gaps in their knowledge. The CEFR can-do descriptors should always be taken into account.

This allows teachers to conduct small-scale quality control themselves and ensure that the test tests what it is supposed to test.

ALTE and Q-Mark

The Association of Language Testers in Europe (ALTE) is an organization of foreign language testers. In addition to the Goethe-Institut, it currently has 33 other full members who agree on certain standards and review one another for compliance with these standards. Only those exams which meet the ALTE’s 17 minimum standards are entitled to the internationally recognized Q-Mark. This quality indicator is awarded specifically to each exam rather than generally to an institution, which makes the Goethe-Institut the only language tester in the German-speaking world to offer Q-Mark-approved exams at all six levels.



