File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/h01-1042_metho.xml
Size: 6,188 bytes
Last Modified: 2025-10-06 14:07:35
<?xml version="1.0" standalone="yes"?> <Paper uid="H01-1042"> <Title>Is That Your Final Answer?</Title> <Section position="4" start_page="0" end_page="2" type="metho"> <SectionTitle> 2. SIMPLE LANGUAGE LEARNING EXPERIMENT </SectionTitle> <Paragraph position="0"> The basic part of scoring learner language (particularly second language acquisition and English as a second language) consists of identifying likely errors and understanding the cause of them.</Paragraph> <Paragraph position="1"> From these, diagnostic models of language learning can be built and used to effectively remediate learner errors, [3] provide an excellent example of this. Furthermore, language learner testing seeks to measure the student's ability to produce language which is fluent (intelligible) and correct (adequate or informative).</Paragraph> <Paragraph position="2"> These are the same criteria typically used to measure MT system capability In looking at different second language acquisition (SLA) testing paradigms, one experiment stands out as a useful starting point for our purposes. One experiment in particular serves as the model for this investigation. In their test of language teachers, Meara and Babi [3] looked at assessors making a native speaker (L1) / language learner (L2) distinction in written essays They showed the assessors essays one word at a time and counted the number of words it took to make the distinction.</Paragraph> <Paragraph position="3"> They found that assessors could accurately attribute L1 texts 83.9% of the time and L2 texts 87.2% of the time for 180 texts and 18 assessors. Additionally, they found that assessors could make the L1/L2 distinction in less than 100 words. They also learned that it took longer to confirm that an essay was a native speaker's than a language learner's. It took, on average, 53.9 words to recognize an L1 text and only 36.7 words to accurately distinguish an L2 text. While their purpose was to rate the language assessment process, the results are intriguing from an MT perspective.</Paragraph> <Paragraph position="4"> They attribute the fact that L2 took less words to identify to the fact that L1 writing &quot;can only be identified negatively by the absence of errors, or the absence of awkward writing.&quot; While they could not readily select features, lexical or syntactic, on which evaluators consistently made their evaluation, they hypothesize that there is a &quot;tolerance threshold&quot; for low quality writing. In essence, once the pain threshold had been reached through errors, missteps or inconsistencies, then the assessor could confidently make the assessment. It is this finding that we use to disagree with Jones and Rusk [2] basic premise. Instead of looking for what the MT system got right, it is more fruitful to analyze what the MT system failed to capture, from an intelligibility standpoint. This kind of diagnostic is more difficult, as we will discuss later.</Paragraph> <Paragraph position="5"> We take this as the starting point for looking at assessing the intelligibility of MT output. The question to be answered is does this apply to distinguishing between expert translation and MT output? This paper reports on an experiment to answer this question. We believe that human assessors key off of specific error types and that an analysis of the results of the experiment will enable us to do a program which automatically gets these.</Paragraph> <Paragraph position="6"> The discussion of whether or not MT output should be compared to human translation output is grist for other papers and other forums.</Paragraph> <Paragraph position="7"> In their experiment, they were examining students learning Spanish as a second language.</Paragraph> </Section> <Section position="5" start_page="2" end_page="3" type="metho"> <SectionTitle> 3. SHORT READING TEST </SectionTitle> <Paragraph position="0"> We started with publicly available data which was developed during the 1994 DARPA Machine Translation Evaluations [8], focusing on the Spanish language evaluation first. They may be obtained at: http://ursula.georgetown.edu.</Paragraph> <Paragraph position="1"> We selected the first 50 translations from each system and from the reference translation. We extracted the first portion of each translation (from 98 to 140 words as determined by sentence boundaries). In addition, we removed headlines, as we felt these served as distracters. Participants were recruited through the author's workplace, through the author's neighborhood and a nearby daycare center. Most were computer professionals and some were familiar with MT development or use. Each subject was given a set of six extracts - a mix of different machine and human translations. The participants were told to read line by line until they were able to make a distinction between the possible authors of the text - a human translator or a machine translator. The first twenty-five test subjects were given no information about the expertise of the human translator. The second twenty-five test subjects were told that the human translator was an expert. They were given up to three minutes per text, although they frequently required much less time. Finally, they were asked to circle the word at which they made their distinction. Figure 1 shows a sample text.</Paragraph> <Paragraph position="2"> 3001GP The general secretary of the UN, Butros Butros-Ghali, was pronounced on Wednesday in favor of a solution &quot;more properly Haitian&quot; resulting of a &quot;commitment&quot; negotiated between the parts, if the international sanctions against Haiti continue being ineffectual to restore the democracy in that country.</Paragraph> <Paragraph position="3"> While United States multiplied the last days the threats of an intervention to fight to compel to the golpistas to abandon the power, Butros Ghali estimated in a directed report on Wednesday to the general Assembly of the UN that a solution of the Haitian crisis only it will be able be obtained &quot;with a commitment, based on constructive and consented grants&quot; by the parts.</Paragraph> </Section> class="xml-element"></Paper>