XML Viewer - w02-1607

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/02/w02-1607_concl.xml

Size: 1,173 bytes

Last Modified: 2025-10-06 13:53:31

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1607">
  <Title>Building a training corpus for word sense disambiguation in English-to-Vietnamese Machine Translation</Title>
  <Section position="5" start_page="40" end_page="40" type="concl">
    <SectionTitle>
5 Conclusion
</SectionTitle>
    <Paragraph position="0"> In this paper , we have presented the building of semantically annotated bilingual corpus (based on semantic classes of LLOCE). So far, we have built an English-Vietnamese bilingual corpus with 5,000,000 words from selected sources (in science-techniques and conventional fields). We have also taken advantage of corresponding features of bilingual corpus to semantically annotate for English (and Vietnamese) words via class-based word alignment. This class-based approach has been experimented in our English-Vietnamese bilingual corpus and given encouraging results (nearly 70% of ambiguous words are assigned to correct semantic labels).</Paragraph>
    <Paragraph position="1"> In the next stages, we will use this annotated corpus as training corpus for WSD in our EVT with the machine learning method of Eric Brill (TBL).</Paragraph>
  </Section>
class="xml-element"></Paper>

Download Original XML