File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/02/c02-1057_abstr.xml
Size: 4,738 bytes
Last Modified: 2025-10-06 13:42:17
<?xml version="1.0" standalone="yes"?> <Paper uid="C02-1057"> <Title>Tiejun Zhao +</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> To help developing a localization oriented EBMT system, an automatic machine translation evaluation method is implemented which adopts edit distance, cosine correlation and Dice coefficient as criteria. Experiment shows that the evaluation method distinguishes well between &quot;good&quot; translations and &quot;bad&quot; ones. To prove that the method is consistent with human evaluation, 6 MT systems are scored and compared. Theoretical analysis is made to validate the experimental results.</Paragraph> <Paragraph position="1"> Correlation coefficient and significance tests at 0.01 level are made to ensure the reliability of the results. Linear regression equations are calculated to map the automatic scoring results to human scorings.</Paragraph> <Paragraph position="2"> Introduction Machine translation evaluation has always been a key and open problem. Various evaluation methods exist to answer either of the two questions (Bohan 2000): (1) How can you tell if a machine translation system is &quot;good&quot;? And (2) How can you tell which of two machine translation systems is &quot;better&quot;? Since manual evaluation is time consuming and inconsistent, automatic methods are broadly studied and implemented using different heuristics. Jones (2000) utilises linguistic information such as balance of parse trees, N-grams, semantic co-occurrence and so on as indicators of translation quality. Brew C (1994) compares human rankings and automatic measures to decide the translation quality, whose criteria involve word frequency, POS tagging distribution and other text features. Another type of evaluation method involves comparison of the translation result with human translations.</Paragraph> <Paragraph position="3"> Yokoyama (2001) proposed a two-way MT based evaluation method, which compares output Japanese sentences with the original Japanese sentence for the word identification, the correctness of the modification, the syntactic dependency and the parataxis. Yasuda (2001) evaluates the translation output by measuring the similarity between the translation output and translation answer candidates from a parallel corpus. Akiba (2001) uses multiple edit distances to automatically rank machine translation output by translation examples.</Paragraph> <Paragraph position="4"> Another path of machine translation evaluation is based on test suites. Yu (1993) designs a test suite consisting of sentences with various test points. Guessoum (2001) proposes a semi-automatic evaluation method of the grammatical coverage machine translation systems via a database of unfolded grammatical structures. Koh (2001) describes their test suite constructed on the basis of fine-grained classification of linguistic phenomena.</Paragraph> <Paragraph position="5"> There are many other valuable reports on automatic evaluation. All the evaluation methods show the wisdom of authors in their utilisation of available tools and resources for automatic evaluation tasks. For our localization-oriented lexicalised EBMT system an automatic evaluation module is implemented.</Paragraph> <Paragraph position="6"> Some string similarity criteria are taken as heuristics. Experimental results show that this method is useful in quality feedback in development of the EBMT system. Six machine translation systems are utilised to test the consistency between the automatic method and human evaluation. To avoid stochastic errors, significance test and linear correlation are calculated. Compared with previous works, ours is special in the following ways: 1) It is developed for localisation-oriented EBMT, which demands higher translation quality. 2) Statistical measures are introduced to verify the significance of the experiments. Linear regression provides a bridge over human and automatic scoring for systems.</Paragraph> <Paragraph position="7"> The paper is organised as follows: First the localization-oriented lexicalised EBMT system is introduced as the background of evaluation task. Second the automatic evaluation method is further described. Both theoretical and implementation of the evaluation method are fully discussed. Then six systems are evaluated both manually and with our automatic method.</Paragraph> <Paragraph position="8"> Consistency between the two methods is analysed. At last before the conclusion, linear correlation and significance test validate the result and exclude the possibility of random consistency.</Paragraph> </Section> class="xml-element"></Paper>