File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/06/p06-1058_relat.xml
Size: 3,217 bytes
Last Modified: 2025-10-06 14:15:51
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-1058"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics An Equivalent Pseudoword Solution to Chinese Word Sense Disambiguation</Title> <Section position="4" start_page="0" end_page="457" type="relat"> <SectionTitle> 2 Related Work </SectionTitle> <Paragraph position="0"> For supervised WSD methods, a knowledge acquisition bottleneck is to prepare the manually tagged corpus. Unsupervised method is an alternative, which often involves automatic generation of tagged corpus, bilingual corpus alignment, etc. The value of unsupervised methods lies in the knowledge acquisition solutions they adopt.</Paragraph> <Section position="1" start_page="457" end_page="457" type="sub_section"> <SectionTitle> 2.1 Automatic Generation of Training Corpus </SectionTitle> <Paragraph position="0"> Automatic corpus tagging is a solution to WSD, which generates large-scale corpus from a small seed corpus. This is a weakly supervised learning or semi-supervised learning method. This reinforcement algorithm dates back to Gale et al.</Paragraph> <Paragraph position="1"> (1992a). Their investigation was based on a 6word test set with 2 senses for each word.</Paragraph> <Paragraph position="2"> Yarowsky (1994 and 1995), Mihalcea and Moldovan (2000), and Mihalcea (2002) have made further research to obtain large corpus of higher quality from an initial seed corpus. A semi-supervised method proposed by Niu et al.</Paragraph> <Paragraph position="3"> (2005) clustered untagged instances with tagged ones starting from a small seed corpus, which assumes that similar instances should have similar tags. Clustering was used instead of bootstrapping and was proved more efficient.</Paragraph> </Section> <Section position="2" start_page="457" end_page="457" type="sub_section"> <SectionTitle> 2.2 Method Based on Parallel Corpus </SectionTitle> <Paragraph position="0"> Parallel corpus is a solution to the bottleneck of knowledge acquisition. Ide et al. (2001 and 2002), Ng et al. (2003), and Diab (2003, 2004a, and 2004b) made research on the use of alignment for WSD.</Paragraph> <Paragraph position="1"> Diab and Resnik (2002) investigated the feasibility of automatically annotating large amounts of data in parallel corpora using an unsupervised algorithm, making use of two languages simultaneously, only one of which has an available sense inventory. The results showed that word-level translation correspondences are a valuable source of information for sense disambiguation.</Paragraph> <Paragraph position="2"> The method by Li and Li (2002) does not require parallel corpus. It avoids the alignment work and takes advantage of bilingual corpus.</Paragraph> <Paragraph position="3"> In short, technology of automatic corpus tagging is based on the manually labeled corpus.</Paragraph> <Paragraph position="4"> That is to say, it still need human intervention and is not a completely unsupervised method.</Paragraph> <Paragraph position="5"> Large-scale parallel corpus; especially word-aligned corpus is highly unobtainable, which has limited the WSD methods based on parallel corpus. null</Paragraph> </Section> </Section> class="xml-element"></Paper>