File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/01/h01-1042_evalu.xml
Size: 2,444 bytes
Last Modified: 2025-10-06 13:58:41
<?xml version="1.0" standalone="yes"?> <Paper uid="H01-1042"> <Title>Is That Your Final Answer?</Title> <Section position="6" start_page="3" end_page="4" type="evalu"> <SectionTitle> 4. RESULTS </SectionTitle> <Paragraph position="0"> Our first question is does this kind of test apply to distinguishing between expert translation and MT output? The answer is yes.</Paragraph> <Paragraph position="1"> Subjects were able to distinguish MT output from human translations 88.4% of the time, overall. This determination is Data has since been moved to a new location.</Paragraph> <Paragraph position="2"> more straightforward for readers than the native/non-native speaker distinction. There was a degree of variation on a persystem basis, as captured in Table 1. Additionally, as presented in Table 2, the number of words to determine that a text was human was nearly twice the closest system.</Paragraph> <Paragraph position="3"> The second question is does this ability correlate with the intelligibility scores applied by human raters? One way to look at the answer to this is to view the fact that the more intelligible a system output, the harder it is to distinguish from human output. So, systems which have lower scores for human judgment should have higher intelligibility scores. Table 3 presents the scores with the fluency scores as judged by human assessors.</Paragraph> <Paragraph position="4"> Indeed, the systems with the lowest fluency scores were most easily attributed. The system with the best fluency score was also the one most confused. Individual articles in the test sample will need to be evaluated statistically before a definite correlation can be determined, but the results are encouraging.</Paragraph> <Paragraph position="5"> For those texts where the participants failed to mark a specific spot, the length of the text was included in the average. The final question is are there characteristics of the MT output which enable the decision to be made quickly? The initial results lead us to believe that it is so. Not translated words (non proper nouns) were generally immediate clues as to the fact that a system produced the results. Other factors included: incorrect pronoun translation; incorrect preposition translation; incorrect punctuation. A more detailed breakdown of the selection criteria and the errors occurring before the selected word is currently in process.</Paragraph> </Section> class="xml-element"></Paper>