File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/02/w02-1607_concl.xml
Size: 1,173 bytes
Last Modified: 2025-10-06 13:53:31
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-1607"> <Title>Building a training corpus for word sense disambiguation in English-to-Vietnamese Machine Translation</Title> <Section position="5" start_page="40" end_page="40" type="concl"> <SectionTitle> 5 Conclusion </SectionTitle> <Paragraph position="0"> In this paper , we have presented the building of semantically annotated bilingual corpus (based on semantic classes of LLOCE). So far, we have built an English-Vietnamese bilingual corpus with 5,000,000 words from selected sources (in science-techniques and conventional fields). We have also taken advantage of corresponding features of bilingual corpus to semantically annotate for English (and Vietnamese) words via class-based word alignment. This class-based approach has been experimented in our English-Vietnamese bilingual corpus and given encouraging results (nearly 70% of ambiguous words are assigned to correct semantic labels).</Paragraph> <Paragraph position="1"> In the next stages, we will use this annotated corpus as training corpus for WSD in our EVT with the machine learning method of Eric Brill (TBL).</Paragraph> </Section> class="xml-element"></Paper>