File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-0101_intro.xml
Size: 1,959 bytes
Last Modified: 2025-10-06 14:03:49
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-0101"> <Title>Improving Context Vector Models by Feature Clustering for Automatic Thesaurus Construction</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Thesaurus is one of the most useful linguistic resources. It provides information more than just synonyms. For example, in WordNet (Fellbaum, 1998), it also builds up relations between synonym sets, such as hyponym, hypernym. There are two Chinese thesauruses Cilin(1983) and Hownet1. Cilin provides synonym sets with simple hierarchical structure. Hownet uses some primitive senses to describe word meanings. The common primitive senses provide additional relations between words implicitly. However, many words occurred in contemporary news corpora are not covered by Chinese thesauruses.</Paragraph> <Paragraph position="2"> Therefore, we intend to create a thesaurus based on contemporary news corpora. The common steps to automatically construct a thesaurus include a) contextual information extraction, b) finding synonym words and c) organizing synonym words into a thesaurus. The approach is based upon the fact that word meaning lays on its contextual behavior. If words act similarly in context, they may share the same meaning.</Paragraph> <Paragraph position="3"> However, the method can only handle frequent words rather than infrequent ones. In fact most of vocabularies occur infrequently, one has to discover extend information to overcome the data sparseness problem. We will introduce the conventional approaches for automatic thesaurus construction in section 2. Follow a discussion about the problems and solutions of context vector models in section 3. In section 4, we use two performance evaluation metrics, i.e. discrimination and nonlinear interpolated precision, to evaluate our proposed method.</Paragraph> </Section> class="xml-element"></Paper>