File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/01/h01-1002_evalu.xml

Size: 2,782 bytes

Last Modified: 2025-10-06 13:58:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="H01-1002">
  <Title>Translating Hong Kong News Training News News News Legal LangModel Legal News Prior Legal</Title>
  <Section position="5" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
4. RESULTS
</SectionTitle>
    <Paragraph position="0"> We discovered that there is a certain amount of synergy between some of the improvements, particularly the term finder and statistical dictionary extraction. Applying the term finder modifies the parallel corpus in such a way that it becomes more difficult for the EBMT engine to find matches which it can align, while adding dictionary entries derived from the modified corpus eliminates that effect. As a result, we will not present the performance results for Test Condition 3 (improved segmenter plus term finder); further, the data for Test Conditions 2 (improved segmenter only) and 4 (improved segmenter plus statistical dictionary) may not accurately reflect the contribution of those two components to the full system  for which the EBMT engine was able to produce a translation, while Figure 3 shows the average number of source-languagewords per translated fragment. These curves do not increase monotonically because, for performance reasons, the EBMT engine does not attempt to align every occurrence of a phrase, only theN (currently 12) most-recently added ones; as a result, adding more text to the corpus can cause EBMT to ignore matches that successfully align in favor of newer occurrences which it is unable to align.</Paragraph>
    <Paragraph position="1"> Examining Figure 3, it is clear that the fifth slice (from 40 to 50%) is much more like the test data than other slices, resulting in longer matches. In general, the closer training and test text are to each other, the longer the phrases they have in common.</Paragraph>
    <Paragraph position="2"> Figure 1 summarizes the results of human quality assessments.</Paragraph>
    <Paragraph position="3"> The &amp;quot;Good&amp;quot; and &amp;quot;OK&amp;quot; judgements were combined into &amp;quot;Acceptable&amp;quot; and the the percentage of &amp;quot;Acceptable&amp;quot; judgements was averaged across sentences and graders. As hoped and expected, the improvements do in fact result not only in better coverage by EBMT, but also in better quality assessments by the human graders. Further, the results on Hong Kong news text show that the choice of language model does have a definite effect on quality. These results also confirm the adage that there is no such thing as too much training text for language modeling, since the model generated from the EBMT corpus was unable to match the performance of the pre-existing model generated from two orders of magnitude more text.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML