File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/01/h01-1041_evalu.xml
Size: 4,144 bytes
Last Modified: 2025-10-06 13:58:40
<?xml version="1.0" standalone="yes"?> <Paper uid="H01-1041"> <Title>OTHER LANGUAGES SEMANTIC FRAMES (COMMON COALITION LANGUAGE) SEMANTIC FRAMES (COMMON COALITION LANGUAGE) UNDERSTANDING UNDERSTANDING UNDERSTANDING UNDERSTANDING GENERATION GENERATION GENERATION GENERATION</Title> <Section position="5" start_page="6" end_page="7" type="evalu"> <SectionTitle> 3. EVALUATION AND RESEARCH ISSUES </SectionTitle> <Paragraph position="0"> We have trained the system with about 1,600 Korean newspaper articles on &quot;missiles&quot; and &quot;chemical biological warfare&quot;, as in For quality evaluation, we have adopted a 5-point scale evaluation score, defined as follows. Score 4: Translation is both accurate and natural. Score 3: Translation is accurate with minor grammatical errors which do not affect the intended meaning of the input, e.g. morphological errors such as &quot;swam vs. swimmed.&quot; Score 2: Translation is partially accurate, and sufficient for content understanding. Most errors are due to inaccurate word choice, inaccurate word order, and partial translation. Score 1: Translation is word-for-word, and partial content understanding is possible. Score 0: There is no translation output, or no content understanding is possible.</Paragraph> <Paragraph position="1"> We have performed the quality evaluation on 410 clauses from the training data, and 80 clauses from the test data. We have conducted the evaluation in 3 phases. Eval 1: Baseline evaluation after grammar and lexicon acquisition. Eval 2: Evaluation after augmenting word sense disambiguation rules. Eval 3: Evaluation after augmenting word sense disambiguation rules and accurate word order generation rules. The purpose of the 3-phase evaluation was to examine the contribution of parsing, word sense disambiguation and accurate word order generation to the overall translation quality. Once the score had been assigned to each clause, the translation score was obtained by the formula: (Sum of the scores for each clause * 25) / Number of clauses evaluated.</Paragraph> <Paragraph position="2"> Evaluation results are shown in Table 2 and Table 3 in terms of parsing coverage (P) and the translation score (T).</Paragraph> <Paragraph position="3"> For both training and test data, the baseline translation quality score is over 50, sufficient for content understanding of the documents. Word sense disambiguation (Eval 1 vs. Eval 2) increases the translation score by about 10%, indicating that effective word sense disambiguation has a great potential for improving the translation quality.</Paragraph> <Paragraph position="4"> We would like to point out that the evaluations reported in this paper are performed on clauses rather than sentences (which often consist of more than one clause). In a very recent evaluation, we have found out that evaluations on sentences decrease the overall translation score about by 15. Nevertheless, the translation quality is still good enough for content understanding with some effort. The primary cause for the lower translation scores when the evaluation unit is a sentence as opposed to a clause is due to either an incorrect clause boundary identification, or some information (e.g. missing arguments in embedded clauses) which cannot be easily recovered after a sentence is fragmented into clauses. This has led to the ability to handle complex sentences as We would like to note that the evaluation reported here was a self-evaluation of the system by a system developer, primarily to identify the key research issues in system development. We will report evaluation results by non system developers who have no knowledge of Korean in the future. A system evaluation by a non-bilingual speaker will avoid the issue of implicitly utilizing the knowledge the evaluator has about the source language in the evaluation process.</Paragraph> <Paragraph position="5"> the primary research issue, and we are working out the solution of utilizing syntactically annotated corpus for both grammar and probability acquisition, as discussed in Section 2.3.</Paragraph> </Section> class="xml-element"></Paper>