File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-1004_intro.xml

Size: 6,788 bytes

Last Modified: 2025-10-06 14:06:26

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-1004">
  <Title>Learning New Compositions from Given Ones</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Word compositions have long been a concern in lexicography(Benson et al. 1986; Miller et al. 1995), and now as a specific kind of lexical knowledge, it has been shown that they have an important role in many areas in natural language processing, e.g., parsing, generation, lexicon building, word sense disambiguation, and information retrieving, etc.(e.g., Abney 1989, 1990; Benson et al. 1986; Yarowsky 1995; Church and Hanks 1989; Church, Gale, Hans, and Hindle 1989). But due to the huge number of words, it is impossible to list all compositions between words by hand in dictionaries. So an urgent problem occurs: how to automatically acquire word compositions? In general, word compositions fall into two categories: free compositions and bound compositions, i.e., collocations. Free compositions refer to those in which words can be replaced by other similar ones, while in bound compositions, words cannot be replaced freely(Benson 1990). Free compositions are predictable, i.e., their reasonableness can be determined according to the syntactic and semantic properties of the words in them. While bound compositions are not predictable, i.e., their reasonableness cannot be derived from the syntactic and semantic properties of the words in them(Smadja 1993). Now with the availability of large-scale corpus, automatic acquisition of word compositions, especially word collocations from them have been extensively studied(e.g., Choueka et al. 1988; Church and Hanks 1989; Smadja 1993). The key of their methods is to make use of some statistical means, e.g., frequencies or mutual information, to quantify the compositional strength between words. These methods are more appropriate for retrieving bound compositions, while less appropriate for retrieving free ones. This is because in free compositions, words are related with each other in a more loose way, which may result in the invalidity of mutual information and other statistical means in distinguishing reasonable compositions from unreasonable ones. In this paper, we start from a different point to explore the problem of automatic acquisition of free compositions. Although we cannot list all free compositions, we can select some typical ones as those specified in some dictionaries(e.g., Benson 1986; Zhang et al.</Paragraph>
    <Paragraph position="1"> 1994). According to the properties held by free compositions, we can reasonably suppose that selected compositions can provide strong clues for others.</Paragraph>
    <Paragraph position="2"> Furthermore we suppose that words can be classified into clusters, with the members in each cluster similar in their compositional ability, which can be characterized as the set of the words able to combined with them to form meaningful phrases. Thus any given composition, although specifying the relation between two words literally, suggests the relation between two clusters. So for each word(or clus-Ji, He and Huang 25 Learning New Compositions Ji Donghong, He Jun and Huang Changning (1997) Learning New Compositions from Given Ones. In T.M. Ellison (ed.) CoNLL97: Computational Natural Language Learning, ACL pp 25-32. (~) 1997 Association for Computational Linguistics ter), there exist some word clusters, the word (or the words in the cluster) can and only can combine with the words in the clusters to form meaningful phrases. We call the set of these clusters compositional frame of the word (or the cluster). A seemingly plausible method to determine compositional frames is to make use of pre-defined semantic classes in some thesauri(e.g., Miller et al. 1993; Mei et al. 1996). The rationale behind the method is to take such an assumption that if one word can be combined with another one to form a meaningful phrase, the words similar to them in meaning can also be combined with each other. But it has been shown that the similarity between words in meaning doesn't correspond to the similarity in compositional ability(Zhu 1982). So adopting semantic classes to construct compositional frames will result in considerable redundancy. An alternative to semantic class is word cluster based on distributional environment (Brown et al., 1992), which in general refers to the surrounding words distributed around certain word (e.g., Hatzivassiloglou et al., 1993; Pereira et al., 1993), or the classes of them(Bensch et al., 1995), or more complex statistical means (Dagan et al., 1993).</Paragraph>
    <Paragraph position="3"> According to the properties of the clusters in compositional frames, the clusters should be based on the environment, which, however, is narrowed in the given compositions. Because the given compositions are listed by hand, it is impossible to make use of statistical means to form the environment, the remaining choices are surrounding words or classes of them.</Paragraph>
    <Paragraph position="4"> Pereira et a1.(1993) put forward a method to cluster nouns in V-N compositions, taking the verbs which can combine with a noun as its environment.</Paragraph>
    <Paragraph position="5"> Although its goal is to deal with the problem of data sparseness, it suffers from the problem itself.</Paragraph>
    <Paragraph position="6"> A strategy to alleviate the effects of the problem is to cluster nouns and verbs simultaneously. But as a result, the problem of word clustering becomes a bootstrapping one, or a non-linear one: the environment is also to be determined. Bensch et al. (1995) proposed a definite method to deal with the generalized version of the non-linear problem, but it suffers from the problem of local optimization.</Paragraph>
    <Paragraph position="7"> In this paper, we focus on A-N compositions in Chinese, and explore the problem of learning new compositions from given ones. In order to copy with the problem of sparseness, we take adjective clusters as nouns' environment, and take noun clusters as adjectives' environment. In order to avoid local optimal solutions, we propose a cooperative evolutionary strategy. The method uses no specific knowledge of A-N structure, and can be applied to other structures. null The remainder of the paper is organized as follows: in section 2, we give a formal description of the problem. In section 3, we discuss a kind of cooperative evolution strategy to deal with the problem. In section 4, we explore the problem of parameter estimation. In section 5, we present our experiments and the results as well as their evaluation. In section 6, we give some conclusions and discuss future work.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML