File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/01/h01-1002_concl.xml

Size: 2,300 bytes

Last Modified: 2025-10-06 13:53:01

<?xml version="1.0" standalone="yes"?>
<Paper uid="H01-1002">
  <Title>Translating Hong Kong News Training News News News Legal LangModel Legal News Prior Legal</Title>
  <Section position="6" start_page="0" end_page="0" type="concl">
    <SectionTitle>
5. CONCLUSIONS AND FUTURE WORK
</SectionTitle>
    <Paragraph position="0"> As seen in Figure 2, the enhancements described here cumulatively provide a 12% absolute improvement in coverage for EBMT translations without requiring any additional knowledge resources.</Paragraph>
    <Paragraph position="1"> Further, the enhanced coverage does, in fact, result in improved translations, as verified by human judgements. We can also conclude that when we combine words into larger chunks on both sides of the corpus, the possibility of finding larger matches between the source language and the target language increases, which leads to the improvement of the translation quality for EBMT.</Paragraph>
    <Paragraph position="2"> We will do further research on the interaction between the improved segmenter, term finder and statistical dictionary builder, utilizing the information provided by the statistical dictionary as feed-back for the segmenter and term finder to modify their results. We are also investigating the effects of splitting the EBMT training into multiple sets of topic-specific sentences, automatically separated using clustering techniques.</Paragraph>
    <Paragraph position="3"> The relatively low slope of the coverage curve also indicates that the training corpus is sufficiently large. Our prior experience with Spanish (using the UN Multilingual Corpus [5]) and French (using  the Hansard corpus [7]) was that the curve flattens out at between two and three million words of training text, which appears also to be the case for Chinese (each training slice contains approximately one million words of total text).</Paragraph>
    <Paragraph position="4"> We have not yet taken full advantage of the features of the EBMT software. In particular, it supports equivalence classes that permit generalization of the training text into templates for improved coverage. We intend to test automatic creation of equivalence classes from the training corpus [4] in conjunction with the other improvements reported herein.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML