Volume 16, No. 1 
January 2012

  Ebrahim Golavar


Front Page

  Translation Journal

Translator Education

Translators' Performance on Translation Production Tests & Translation Multiple-Choice Tests

by Ebrahim Golavar

Lecturer in Translation Studies
Islamic Azad University, Anzali Branch, Young Researchers' Club
Guilan, Iran


Translation testing is a controversial issue in translation studies. Translation testing, especially in its multiple-choice test format, seems to lack a comprehensive theory. This article is an attempt to investigate the performances of translation trainees on two kinds of translation tests, namely production and multiple-choice tests. The researcher hypothesizes that translators' performance on production tests is different from that on other tests. To this end, 45 (both men and women) senior students of the translation training program of the Islamic Azad University of Rasht were selected and participated in the experiment. The subjects were tested on the mentioned forms of translation tests (with the same content). The data analysis of the study showed that the subjects’ performances on the mentioned kinds of tests were different; therefore, it can be said that the hypothesis of the study was supported. The results and implications of this study, as well as suggestions for future research are discussed.

Key Words: Translation, Evaluation, Translation Testing, Production test, Multiple-choice test

1. Introduction

n translation studies, various training and evaluation methods are currently being used and developed. According to Hubscher-Davidson (2007) the aim of those methods is to improve translation pedagogy by focusing on the students' needs. However one of the main issues in translation studies is translation testing, which is related to translation evaluation and assessment. Although issues related to translation evaluation have receivedconsiderable attention over the years, it seems that translation testing has not received much attention. Mousavi (1999:394) defines a test as any procedure used to measure a factor or assess some ability." He (ibid) also defines testing as, “the use of tests or the study of the theory and practice of their use, development, evaluation, etc.”

One of the main issues in translation studies is translation testing.
Translation evaluation can be defined as the act of making judgments on a translation. Such judgments might be affected by some subjective factors; therefore, it seems that evaluation in translation is a problematic area. It is also possible that such evaluation is difficult even for experienced translation instructors. Regarding to this idea, Goff-Kfouri (2004) emphasizes that translation instructors need to become competent in test writing but they must know that there is no perfect test and no foolproof grading or marking system. Rahimi (2005) calls translation and testing as two “controversial” and “challenging” fields in language studies

Surprisingly enough, despite the importance of evaluation in translation, few studies have been carried out on translation testing and there seems to be a need to do much more research in this field. Rahimhi (2005: 62) claims that “testing is underlined because there can be no science without measurement.” Maybe it can be said that importance of testing, particularly language testing, is even more significant since it is rooted in many complicated scientific disciplines such as linguistics, psychology and sociology. Ghonsooly (1993) mentions that translation testing methodology has been criticized for its subjective character so he did a study on the objectivity and scorability in translation testing methodology. Rahimi (2005) investigates the relationship between test form and trainees’ translation performance and he concludes that the difference in translation test forms doesn’t affect the subjects’ performance. On the other hand, Schmidt and McCutcheon (1994:118 as cited in Goff-Kfouri, 2004) state that, “it seems that the instructor’s testing methods do have a lasting effect on the learning experience, the students’ attitude, as well as the teacher’s enthusiasm.”

The present study is an interdisciplinary research between translation teaching and translation testing. The results of the study have pedagogical implications for teaching and testing translation. This study aims at discussing the possible relationship between two kinds of translation test forms and the performance of translation trainees on them. First, some general concepts related to translation, translation evaluation and testing will be mentioned. Then, the experiment of the study will be described.

2. What is Translation?

As Munger (1999:5) holds the word “translate” comes from the Latin word “transferre” which means “to bear,/carry/bring across; to transfer.” The term translation can have several meanings: it can refer to the subject field, the product, and the process. Newmark (1988:5) defines translation as, “[…] rendering the meaning of a text into another language in the way that the author intended the text.” Nida (1975:95) claims that translation is, “reproducing in the receptor language (target language) the closest natural equivalent of the message of the source language; first in terms of meaning and second in terms of style.” According to Guthknecht (2003), translation is a communicative device that is a necessity on economic and on general human grounds. Bassnet (1991:12) says that, “what is generally understood as translation involves the rendering of a source language (SL) text into the target language.” Similarly, Catford (1965: 20) states that translation is “a process of substituting a text in one language for a text in another.”

3. Types of Translation

According to Beekman and Callow (1989), there are four main types of translation: 1) highly literal, 2) modified literal, 3) idiomatic, 4) unduly free. Beekman and Callow (1989:21): “ if its form [linguistic form of translation] corresponds more to the form of the original text, it is classified as literal; if its form corresponds more to the form of the receptor language (RL), then it is classified as idiomatic.” They (ibid) call literal and idiomatic translations as “two basic approaches to translation.” Manafi Anari (2004:85) quotes Beekman and Callow (1989), “the highly literal translation is that in which the obligatory grammatical rules of the RL are set aside and translation follows the order of the original word for word and with high consistency.” He continues (ibid), “the modified literal translation occurs when the translator makes some lexical or grammatical adjustment to correct the errors arising from literalism, and produce something which is equivalent to the original.” Larson (1984: 10) also defines as, “[…] one which has the same meaning as the source language but is expressed in the natural form of the receptor language.” Beekman and Callow (1989:23) point out, in the unduly free translation “the purpose is to make the message as relevant and clear as possible.” Newmark (1981) classifies translation to: communicative and semantic. He (ibid: 22) states that, “in communicative translation, the translator takes into consideration all parameters of two languages involved in translation process. In this kind of translation, readers get the same impression from the translated text as the readers of the author’s work experience.” According to him (ibid: 12), semantic translation is, “[…] the precise contextual meaning of the author.”

Larson (1984) also divides translation into two categories: meaning-based translation and form-based translation. According to him (ibid: 114), a meaning- based translation focuses on the communication of the meaning and contextual elements of the source texts and a form-based translation, on the other hand, focuses on the lexical and grammatical factors of translation.

4. Translation Evaluation

According to Carry and Jumpelt (1963) defining the quality of translation was first discussed in the third conference of the International Federation of Translators on Quality in 1959. So far, within the field of translation studies, translation evaluation has received much attention and there have always been some efforts to investigate the issue both in theory and practice.

As House (2001: 255) states

Translation quality is a problematic concept if it is taken to involve individual and externally motivated value judgment alone. Obviously, passing any “final judgment” on the quality of a translation that fulfils the demands of scientific objectivity is very difficult indeed.

Lauscher (2000) mentions that “translation scholars have tried to improve practical translation quality assessment by developing models which allow for reproducible, intersubjective judgment (e.g. Reiss 1972: 12-13; Wilss 1977: 251; Ammann 1993: 433-34; Gerzymisch-Abrogast 1977)”. Lauscher (2000, ibid) claims that “they [the translation scholars] hoped to achieve this goal [improving a practical translation quality assessment] by building their models on scientific theories of translation, which can provide a yardstick, and by introducing a systematic procedure for evaluation.” Besides this, House (2001) presents a similar viewpoint where she claims that translation quality assessment requires a theory of translation. She (ibid) claims that different views of translation lead to different concepts of translation quality, and different ways of assessing it.

Similarly, in the context of translation teaching some scholars have also introduced some proposals for translation evaluation (e.g. Delisle 1993; Hurtado 1995; Nord 1988 and 1996; Kussmaul 1995; Pym 1996; Gouadec 1981 and 1989; Presas 1996).

5. Translation Testing

5.1. Translation production test

In this form of translation tests, translation trainees are given a number of separate sentences or a full paragraph, and asked to translate them into the target language. Essay-type test is a kind of production test. Scoring process in this form of test is generally subjective since there is no objective and generally- agreed- on answer key, and there is little or no standard production test of translation available. A teacher-made translation test is this kind of test. (Rahimy, 2005:67)

5.2. Multiple- choice translation test

These are tests that consist of a sentence from the source language as the stem and two, three or four translation versions into target language as alternative choices. One of the choices is the correct answer. Here, the testees do not produce the translation, but recognize the answer. In this case the responses are scored objectively because there is a fixed answer key, thus, any scorer can score such tests objectively. (ibid)

Farahzad (1992) argues that these kinds of translation tests limit the examinees’ performance creativity and it is not useful for the students to conclude that none of these options is adequate. Then, she suggests two kinds of translation test: limited-response item and controlled free-response item test.

Limited-response item test is an integrated test which examines several components such as comprehension of the source text, accuracy in terms of content, appropriateness of grammatical forms, and choice of words, etc., of translation at a time. In this type of translation test, translators are free to select certain equivalents from among a series of synonyms, to adopt certain grammatical arrangements, to ignore lexical or grammatical adjustments in order to secure the fidelity of the source text, etc. She believes that the limited-response item tests prevent translation innovation in translators. She (ibid:274) also states, “that there is doubt about appropriateness of selected items in multiple-choice translation test, but the exact problem with the translation knowledge of examinees can be determined through this test at once.”

Based on the free-response item test, her emphasis is on the appropriateness of selected texts. She believes that in this kind of translation test, the selected text must be authentic, self- contained and at the level of testees’ competence. Also, she states (ibid, 272) that, “the examiner should give some general information about the text, the author and the name of the book, and the text selected, to the testees in this case.”

6. Four models for Translation Assessment

Waddington (2001: 311-325) describes four models for translation as follows:

6. 1. Method a

This method is taken from Hurtado (1995); it is based on error analysis and possible mistakes are grouped under following headings:

(1) Inappropriate renderings which affect the understating of the source text; these are divided into eight categories: countersense, faux sense, nonsense, addition, omission, unresolved extralinguistic references, loss of meaning, and inappropriate linguistic variation (register, style, dialect, etc.).

(2) Inappropriate renderings which affect expression in the target language; these are divided into five categories: spelling, grammar, lexical items, text, and style.

(3) Inadequate renderings which affect the transmission of either the main function or secondary functions of the source text.

In each of these categories a distinction is made between serious errors (-2 points) and minor errors (-1 point). There is a fourth category which describes the plus points to be awarded for good (+1 point) or exceptionally good solutions (+2 points) to translation problems. In the case of the translation exam where this method was used, the sum of the negative points was subtracted from a total of 110 and then divided by 11 to reach a mark from 0 to 10. For example, if a student gets a total of –66 points, his result would be calculated as follows: 110-66=44/11=4 (which fails to pass; the lowest pass mark is 5).

6.2. Method B

Method B is also based on error analysis and was designed to take into account the negative effect of errors on the overall quality of the translations (Cf. Kussmaul1995:129, and Waddington 1999: chapter 7). The corrector first has to determine whether each mistake is a translation mistake or just a language mistake; this is done by deciding whether or not the mistake affects the transfer of meaning from the source to the target text: if it does not, it is a language error (and is penalised with –1 point); if it does, it is a translation error (and is penalized with –2 points). However, in the case of translation errors, the corrector has to judge the importance of the negative effect that each one of these errors has on the translation, taking into consideration the objective and the target reader specified in the instructions to the candidates in the exam paper. In order to judge this importance, the corrector is given the following table:

Table 1. Typology of errors in method b.

Negative effect on words in ST:

Penalty for negative effect

On: 1-5 words


6-20 words


21-40 words


41-60 words


61-80 words


81-100 words


100+ words


The whole text


The final mark for each translation is calculated in the same way as for Method A: that is to say, the examiner fixes a total number of positive points (in the case of method B, this was 85), then subtracts the total number of negative points from this figure, and finally divides the result by 8.5. For example, if a student is given 30 minus points, his total mark would be 6.5 (pass): 85-30 = 55/8.5 = 6.5.

6.3. Method C

Method C is a holistic method of assessment. The scale is unitary and treats the translation competence as a whole, but requires the corrector to consider three different aspects of the student’s performance, as shown in the table below. For each of the five levels there are two possible marks, so as to comply with the Spanish marking system of 0 – 10; this allows the corrector freedom to award the higher mark to the candidate who fully meets the requirements of a particular level and the lower mark to the candidate who falls between two levels but is closer to the upper one.

Table 2. Scale for holistic method c


Accuracy of transfer of ST content

Quality of expression in TL

Degree of task completion


Level 5

Complete transfer of ST information; only minor revision needed to reach professional standard.

Almost all the translation reads like a piece originally written in target Language. There may be minor lexical, grammatical or spelling errors.


9, 10

Level 4

Almost complete transfer; there may be one or two insignificant inaccuracies; requires certain amount of revision to reach professional standard.

Large sections read like a piece originally written in target language. There are a number of lexical, grammatical or spelling errors.

Almost completely successful

7, 8

Level 3

Transfer of the general idea (s) but with a number of lapses in accuracy; needs considerable revision to reach professional standard.

Certain parts read like a piece originally written in target language, but others read like a translation. There are a considerable number of lexical, grammatical or spelling errors.


5, 6

Level 2

Transfer undermined by serious inaccuracies; thorough revision required to reach professional standard.

Almost the entire text reads like a translation; there are continual lexical, grammatical or spelling errors.


3, 4

Level 1

Totally inadequate transfer of ST content; the translation is not worth revising.

The candidate reveals a total lack of ability to express himself adequately in English.

Totally inadequate

1, 2

6.4. Method D

Method D consists of combining error analysis Method B and holistic Method C in a proportion of 70/30; that is to say, Method B accounts for 70% of the total result and Method C for the remaining 30%.

7. The study

7.1. Research question

The purpose of this study is to find out the answer to the following question:

Does the translation performance of translation trainees differ on translation production and multiple-choice tests?

7.2. Research hypothesis

In order to investigate the above mentioned question, the following null hypothesis was developed:

There is no difference between the performances of translation trainees on translation production and multiple-choice tests.

7.3. Subjects

The subjects of the study were 45 senior students of translation training program studying at the Islamic Azad University. They were randomly selected from among 100 students who participated in an Oxford Placement Test. The purpose of this test was to assure the homogeneity of the subjects’ general proficiency. They were also tested on principles of translation for relative homogeneity of their translation competence.

7.4. Instruments

On the whole, three measures were used in this study. The first measurement was an Oxford Placement Test to determine the general proficiency of the subjects. All the subjects were asked to perform the test in a limited time. The reliability of this TOEFL test was calculated by estimating Chronbach’s alpha (internal consistency) and turned out to be .90, which is a highly satisfactory correlation coefficient.

The second measurement technique was a translation production test which had three paragraphs and 186 words. In this test, the students had to create the target language.

The third measurement technique was a multiple-choice translation test which was exactly based on the sentences of the production test but only its form was different. This test consisted of 20 isolated sentences each with four equivalents in the target language in which only one answer was correct (sentences were contextually rich enough).

7.5. Procedure

This study was carried out in four experiments. In the first experiment, 120 number of translation students, both male and female, who had passed "translation principles and methods", were randomly selected to take part in an Oxford Placement Test. In the second experiment, based on the results of the Placement Test 90 numbers of them were identified to be tested on two kinds of translation test forms. They were assigned to two equal groups. Two other experiments were carried out too. In the third experiment the first group was given a production test and the second group was given a multiple-choice test. In the forth experiment each group was given a test that they had not been given in the third experiment. For the evaluation of the papers, two translation scholars who had MA in translation studies were asked for help. They both had teaching experience in translation and were asked to evaluate the production test papers based on the Method D explained in chapter. As soon as all the tests were done and all the data were gathered, SPSS Software (Version 16) was used for analyzing the data.

7.6. Data Analysis

The outcome of statistical analysis of this study will be represented in table 3.

Table 3. Group statistics for comparison of the performances of on production and multiple-choice tests




Std. Deviation

Std. Error Mean


Production Test





Multiple- choice test





Table 4: independent samples test for comparison of the performances on two production and multiple-choice tests

Independent Samples Test


Equal variances assumed

Equal variances not assumed

Levene's Test for Equality of Variances





t-test for Equality of Means







Sig. (2-tailed)



Mean Difference



Std. Error Difference



95% Confidence Interval of the Difference







Table 4 indicates that the significance of the Levene's test is .033, and is lower than the significance level of 0.05 and therefore, our H o (equality of variances) is rejected. So we consider the first row for concluding about the means.

Significance of the test of equality of means supposing the inequality of variances is lower than 0.05, therefore, we reject the null hypothesis, and the claim of inequality of means of the two groups is accepted.

As it can be seen in table 3, the means in the production and multiple-choice tests are 15.1667 and 12.9778 respectively. It is clear that there is a significantly meaningful difference between them.

8. Conclusion, Pedagogical Implications

This study focused on the performances of translation trainees on two kinds of translation tests. The researchers aimed to investigate whether or not different forms of translation tests can affect translators' performances. The two kinds of translation tests were with the same content. The results of the study showed that there was a significant difference between the mean scores of the two groups. This provides a justified evidence to reject the null hypothesis of the study. Therefore, it can be said that translation trainees’ performances differ on translation production test and multiple-choice tests which means that there seems to be a relationship between translation test forms and the translation trainees' performances on them.

This study can have pedagogical implications for translation teachers, students, evaluators, and test makers. Translation teachers will be able to design suitable kind(s) of translation tests for the students. The organizations which are responsible for designing translation exams or interviews will be able to choose suitable kind(s) of translation tests.

On the other hand translation students themselves will be able to understand their ability in doing different kinds of translation tests and increase such an ability which consequently can improve the quality of their translation.

9. Suggestions for further research

The present study focused only on two kinds of translation tests, production and multiple-choice tests. Therefore, further research can be done on other kinds of translation tests or even similar research can be done on different kinds of interpretation tests.


Ammann, M. (1993), Kriterien für eine allgemeine Kritik der Praxis des translatorischen Handelns. In J. Holz- Mänttäri & C. Nord (Eds.), Traducere Navem. Festschrift für Katharina Reiß zum 70 Geburtstag (PP.433-466). Tampere: University of Tampere.

Carry, E. and R. W. Jumpelt (eds) (1963) Quality in Translation, Proceedings of the 3rd Congress of the International Federation of Translators (Bad Godesberg, 1959), New York: Macmillan/Pergamon Press.

Delisle, J. (1993). La traduction raisonnée: Manuel d’initiation a la traduction professionnelle de l’anglais vers le français (Collection Pédagogie de la traduction). Ottawa: Presses de l’Université d‘Ottawa.

Farahzad, F. (1992). Testing achievement in translation classes. In C. Dollerupt and A. Loddergard (eds.), Teaching translation and interpreting training, talent and experience (pp.271- 278). Amesterdam:John Benjamin

Farahzad, Farzaneh (2004) ‘Meaning in Translation’. Translation Studies Quarterly 2 (7 & 8): 81.

Gerzymisch-Abrogast, H. (1997). Wissenschaftliche Grundlagen für die Evaluierung von Übersetzungsleistungen. In E. Fleischmann (Ed.), Translationsdidaktik: Grundfragen der Übersetzungswissenschaft (pp. 573—579). Tübingen: Narr.

Ghonsooly, B. (1993), "Development and Validation of a Translation Test." Edinburgh Working Papers in Applied Linguistics, v 4, p 54-62.

Goff-Kfouri, C.A. (2004). 'Language Learning in Translation Classrooms' [online] Translation Journal. Volume 9, No.2. available from: http://accurapid.com/journal/32ed u1.htm.

Golavar, E. (2009). The Role of Perceptual Learning Styles of Translation Students in Their Performance on two kinds of Translation Tests (Unpublished Master's thesis). University of Chabahar, Iran.

Gouadec, D. (1981). Paramètres de l’évaluation des traductions. Meta, 26(2), 99—116.

Gouadec, D. (1989). Comprendre, évaluer, prévenir. Special issue: L’erreur en traduction. TTR, 2(2), 35—54.

House, Juliane (2001) ‘Translation Quality Assessment: Linguistic Description versus Social Evaluation’. Meta, 46(2): 243-257.

Hubscher-Davidson, S. (2007). "Meeting Students' Expectations in Undergraduate Translation Programs. "[Online] Translation Journal, Volume (11), No 1. Available from: http://acurapid.com/journal/39edu.h tm

Hurtado, A.A. (1995). La didáctica de la traducción. Evolución y estado actual. In P. Hernandez & J.M. Bravo (Eds.), Perspectivas de la traducción (pp. 49—74). Universidad de Valladolid.

Kussmaul, P. (1995). Training the translator. Amsterdam and Philadelphia: John Benjamins.

Mousavi, S.A. (1999). A Dictionary of Language Testing. (2nd Ed.). Tehran: Rahnama Publiction. 394-404.

Newmark, P.(1988). A Textbook of Translation. Prentice Hall.

Nida, E. (1975). Language Structure and Translation: essays by Nida. Stansford University Press.

Nord, C. (1991): Text Analysis in Translation. Amsterdam: Rodopi.

Presas, M. (1996). Problems de traducció i competéncia traductora. Master’s thesis, Universitat Auto`noma de Barcelona.

Rahimi, R. (2005). Test Forms and Trainees’ Translation Performance. Translation Studies, 9(3), 61-74.

Reiss, K. (2000). Translation criticism — the potentials and limitations: Categories and criteria for translation quality assessment. (E.F. Rhodes, Trans). Manchester: St. Jerome. (Original work published 1971).

Tajvidi, G.R. (2003). Fields of research in translation studies. In F. Farahzad (Ed.), (2004). Proceedings of Translation Studies Conferences (pp. 101—121). Tehran: Setarhe Sabz.

Tajvidi, Gh. R., (2006). Translating Texts in Politics. (4th Ed.). Tehran: Payame Noor University Press.

Waddington, Christopher (2001), “Different methods of Evaluating Student Translations: The Question of Validity.” Meta, XLVI, PP. 311-325.

Wilss, W. (1977). Übersetzungswissenchaft. Probleme and Methoden (p. 251). Stuttgart: Ernst Klett. (Trans. 1982 as The science of translation: problems and methods. Tübingen: Gunter Narr).