File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-1607_intro.xml

Size: 2,105 bytes

Last Modified: 2025-10-06 14:01:37

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1607">
  <Title>Building a training corpus for word sense disambiguation in English-to-Vietnamese Machine Translation</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Nowadays more and more people are interested in word sense disambiguation (WSD). Bilingual corpora have been exploited in order to train such WSD system, finding out the rules that can be applied in Machine Translation (Zinovjeva, 2000). The statistical method based on bilingual corpus is used to find and link words in bitexts for English-French, English-Chinese, English-Japanese, etc. (Isahara, Melamed, 2000).</Paragraph>
    <Paragraph position="1"> Regarding the English-Vietnamese bilingual corpus, however, so far, we haven't seen any works yet. In this paper, we present building an English-Vietnamese bilingual corpus with semantic tags. This semantically-annotated coprus will be used to train the WSD module for our EVT in the future. In this paper, we don't concentrate on word alignment or WSD, but we concentrate on assigning semantic tags to English and Vietnamese words via their class-based word-alignments (Dien Dinh, 2002).</Paragraph>
    <Paragraph position="2"> Thanks to aligned word-pairs along with their corresponding semantic classes in LLOCE, we can find the correct sense of a word and assign it to an appropriate semantic tag. That is, we take advantage of manually correct translation of English and Vietnamese words to disambiguate word senses in semantic tagging. The rest of this paper consists of 4 following sections:  - Section 2: Collecting English-Vietnamese bilingual texts.</Paragraph>
    <Paragraph position="3"> - Section 3: Normalizing English-Vietnamese bilingual corpus.</Paragraph>
    <Paragraph position="4"> - Section 4: Annotating bilingual corpus: assigning semantic tags to word-pairs in corpus and applying this semantically-annotated corpus to train the WSD module. - Section 5: Conclusion and future improvements.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML