File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/w02-0904_metho.xml

Size: 15,792 bytes

Last Modified: 2025-10-06 14:08:03

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-0904">
  <Title>Building a hyponymy lexicon with hierarchical structure</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Previous work
</SectionTitle>
    <Paragraph position="0"> One of the first studies on acquisition of hyponymy relations was made by Hearst (1992). She found that certain lexico-syntactic constructions can be used as indicators of the hyponymy relation between words in text. Example 1 shows a relation of this kind and an example. The noun phrase 'a0a2a1a4a3 ' is a hypernym and 'a5a6a5 a0a7a1a9a8a11a10a13a12a4a14a15a0a2a1a9a16a18a17a20a19a22a21a24a23a25a27a26a20a12a28a0a7a1a30a29 ' is one or more (conjoined) noun phrases that are the hyponyms:</Paragraph>
    <Paragraph position="2"> 'such cars as Volvo, Seat and Ford' Hearst proposed furthermore, that new syntactic patterns can be found in the following way: July 2002, pp. 26-33. Association for Computational Linguistics.  Caraballo (1999) uses a hierarchical clustering technique to build a hyponymy hierarchy of nouns. The internal nodes are labeled by the syntactic constructions from Hearst (1992). Each internal node in the hierarchy can be represented by up to three nouns.</Paragraph>
    <Paragraph position="3"> Work by Riloff &amp; Shepherd (1997) and Charniak &amp; Roark (1998) aims to build semantic lexicons where the words included in each category or entry are related to, or are a member of the category.</Paragraph>
    <Paragraph position="4"> Sanderson &amp; Croft (1999) build hierarchical structures of concepts on the basis of generality and specificity. They use material divided by different text categories and base the decision of subsumption on term co-occurrence in the different categories.</Paragraph>
    <Paragraph position="5"> A term x is said to subsume y if the documents in which y occurs are a subset of the documents in which x occurs. The relations between concepts in their subsumption hierarchy are of different kinds (among other the hyponymy relation), and are unlabeled. null The work most similar to ours is that of Morin &amp; Jacquemin (1999). They produce partial hyponymy hierarchies guided by transitivity in the relation. But while they work on a domain-specific corpus, we will acquire hyponymy data from a corpus which is not restricted to one domain.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Principles for building a hierarchical
</SectionTitle>
    <Paragraph position="0"> lexicon This section will describe the principles behind our method for building the hierarchical structures in a lexicon.</Paragraph>
    <Paragraph position="1"> As the objective is to build a nominal hyponymy lexicon with partial hierarchical structures, there are conditions that the hierarchical structures should meet. The structures can each be seen as separate hyponymy hierarchies, and for each hierarchy the following criteria should be fulfilled:  1. A hierarchy has to be strict, so that every child node in it can have one parent node only.</Paragraph>
    <Paragraph position="2"> 2. The words or phrases forming the nodes in a hierarchy should be disambiguated.</Paragraph>
    <Paragraph position="3"> 3. The organization in a hierarchy should be such that every child node is a hyponym (i.e. a type/kind) of its parent.</Paragraph>
    <Paragraph position="4">  Generally, principle 1-2 above are meant to prevent the hierarchies from containing ambiguity. The built-in ambiguity in the hyponymy hierarchy presented in (Caraballo, 1999) is primarily an effect of the fact that all information is composed into one tree. Part of the ambiguity could have been solved if the requirement of building one tree had been relaxed. null Principle 2, regarding keeping the hierarchy ambiguity-free, is especially important, as we are working with acquisition from a corpus that is not domain restricted. We will have to constrain the way in which the hierarchy is growing in order to keep it unambiguous. Had we worked with domain-specific data (see e.g. Morin and Jaquemin (1999)), it would have been possible to assume only one sense per word or phrase.</Paragraph>
    <Paragraph position="5"> The problem of building a hyponymy lexicon can be seen as a type of classification problem. In this specific classification task, the hypernym is the class, the hyponyms are the class-members, and classifying a word means connecting it to its correct hypernym. The algorithm for classification and for building hierarchies will be further described in section 6.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Corpus and relevant terms
</SectionTitle>
    <Paragraph position="0"> This work has been implemented for Swedish, a Germanic language. Swedish has frequent and productive compounding, and morphology is richer compared to, for example, English. Compounding affects the building of any lexical resource in that the number of different word types in the language is larger, and thus, the problems of data sparseness become more noticeable. In order to, at least partly, overcome the data sparseness problem, lemmatization has been performed. However, no attempt has been made to make a deeper analysis of compounds.</Paragraph>
    <Paragraph position="1"> The corpus used for this research consists of 293,692 articles from the Swedish daily news paper 'Dagens Nyheter'. The corpus was tokenized, tagged and lemmatized. The tagger we used, implemented by Megyesi (2001) for Swedish, is the TnT-tagger (Brants, 2000), trained on the SUC Corpus (Ejerhed et al., 1992). After preprocessing, the corpus was labeled for base noun phrases (baseNP).</Paragraph>
    <Paragraph position="2"> A baseNP includes optional determiners and/or premodifiers, followed by nominal heads.</Paragraph>
    <Paragraph position="3"> Naturally, conceptually relevant terms, rather than noun phrases, should be placed in the lexicon and the hierarchies. For reasons of simplification, though, the choice was made as to treat nominal heads with premodifying nouns in genitive (within the limits of the baseNP described above) as the relevant terms to include in the hierarchies. However, premodifiers describing amounts, such as 'kilo', are never included in the relevant terms.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Lexico-syntactic constructions
</SectionTitle>
    <Paragraph position="0"> Lexico-syntactic constructions are extracted from the corpus, in the fashion suggested by Hearst (1992). Five different Swedish constructions has been chosen - constructions 2-6 below - as a basis for building the lexicon (an example with the English translation is given below for each  'exotiska frukter som papaya, pepino och mango' /lit. exotic fruits such as papaya, pepino and mango/</Paragraph>
    <Paragraph position="2"> 'trafikinformation och annan information' /lit. information on traffic and other information/</Paragraph>
    <Paragraph position="4"> (num.expr.) greater than one.</Paragraph>
    <Paragraph position="5"> 'riksdagen, stadsfullm&amp;quot;aktige och liknande f&amp;quot;orsamlingar' /lit. the Swedish Parliament, the town councilor and similar assemblies/</Paragraph>
    <Paragraph position="7"> torv&amp;quot;agsprojekt' /lit. the East way and the West way, the two highway projects/ The basic assumption is that these constructions (henceforth called hh-constructions), yield pairs of terms between which the hyponymy relation holds. After a manual inspection of 20% of the total number of hh-constructions, it was estimated that 92% of the hh-constructions give us correct hyponymy relations. Erroneous hh-constructions are mainly due to problems with, for example, incorrect tagging, but also change in meaning due to PP-attachment.</Paragraph>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
6 Building the hierarchical lexicon
</SectionTitle>
    <Paragraph position="0"> To give an accurate description of the algorithm for building the lexicon, the description here is divided into several parts. The first part describes how hypernyms/hyponyms are grouped into classes, building an unambiguous lexicon base. The second part describes how arrangement into hierarchical structures is performed from this unambiguous data. Last, we will describe how the lexicon is extended. null</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
6.1 Classification
</SectionTitle>
      <Paragraph position="0"> There are two straightforward methods that can be used to classify the data from the hh-constructions.</Paragraph>
      <Paragraph position="1"> The first would be to group all hypernyms of the same lemma into one class. The second would be to let each hypernym token (independently of their lemma) initially build their own class, and then try to group tokens according to their sense. The first method is suitable for building classes from the hh-constructions for a domain-specific corpus. However, when working with a news paper corpus, as in our case, this method would lead to possible ambiguity in the classes, as hypernyms of the same lemma can have more than one sense.</Paragraph>
      <Paragraph position="2"> Thus, we choose to take the second, more cumbersome approach in order to avoid all possible ambigu'...fever, pain, and other symptoms...' Hypernym: symptom; Hyponyms: fever, pain  from a simple hh-construction.</Paragraph>
      <Paragraph position="3"> ity in the lexicon. Avoiding ambiguity is important as the result of classification will be used as a base for building a lexicon with hierarchical structures. Initially, the hypernym and hyponyms of the hh-constructions from the text are used to build a base for a class system. An example of how an initial class is created from a simple hh-construction is given in Table 1. Each class X N has a class feature X which is the hypernym's lemma, where N is a unique number designating the unique class and where the class members are the hyponym lemmas.</Paragraph>
      <Paragraph position="4"> After this initial step, the unique classes are grouped into larger classes. Constraints are put on the grouping process in order to keep the classes unambiguous. 2 Two classes A and B can only be collapsed if they fulfill the following two prerequisites:  1. The class features of the classes have to be the same.</Paragraph>
      <Paragraph position="5"> 2. There has to be a non-empty intersection in  class members between the classes.</Paragraph>
      <Paragraph position="6"> An example of a collapsing operation of this kind is given in Table 2. As can be seen in the table, the method captures correct sense distinctions as well as incorrect ones (i.e. two classes are created when there should be only one). The effect of this will be further discussed in section 8. Note however, that some incorrect sense distinctions introduced here are corrected through the introduction of hierarchical structure, which will be discussed in the next section. null</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
6.2 Building hierarchical structure
</SectionTitle>
      <Paragraph position="0"> Hierarchical structure is introduced in the lexicon through a number of rules, which are directed by 2Also, system internally, all words, hypernyms and hyponyms have unique index number attached to them.</Paragraph>
      <Paragraph position="1">  classes. After the collapse, correct sense distinction is kept between the class denoted 1 and the classes 2 and 4. Incorrect sense distinction is created between the classes denoted 2 and 4.</Paragraph>
      <Paragraph position="2"> the over all principle of transitivity in the hyponymy relation. That is, if X is a kind of Y, and Y is a kind of Z, then the two classes containing these pairs can only be composed if the hyponymy relation also holds between X and Z. In practice, the three hypernym-hyponym pairs X-Y, Y-Z and X-Z all have to be found in our corpus.3 Next, we will turn to the outline of the implementation for building the hierarchical structures from the classes created through the method described in the previous section:  1. For each class k among all classes: a. find all sets of classes that can be used in building hierarchies with class k.</Paragraph>
      <Paragraph position="3"> b. choose one set of classes that should be used. 2. Compose all chosen sets of classes.</Paragraph>
      <Paragraph position="4"> 3. Build trees that reflect all the implemented  compositions.</Paragraph>
      <Paragraph position="5"> A typical hierarchical structure of the kind that is built here can be seen in Figure 1. The algorithm for building this hierarchical structure will now be described in more detail: Searching for a set of classes for composition is performed according to the transitivity principle described above. For each hypernym-hyponym pair a102 a29a7a83a104a103a22a29 (see example below), search for</Paragraph>
      <Paragraph position="7"> For each class, all sets of classes that can be used for compositions into hierarchies ('classes 2 + 3' constitutes one set for class 1) are stored. From these sets of classes, one is randomly chosen for implementa- null It is also possible to compose sets of classes where class 1 = class 2, but in that case, step 3. is left out. The result is, in any case, two modified classes, where the classes are linked through the term a103a13a8 . In cases where class 1 a117a118 class 2, class 2 is erased and its class members are placed as class members in</Paragraph>
      <Paragraph position="9"> In the final trees, all different compositions are reflected. In this way, several compositions might cowork to build trees, and the more compositions that are used, the deeper the tree will be.</Paragraph>
      <Paragraph position="10"> It is worth noting that, when any tree is built, end nodes (i.e. non-internal hyponyms) with the same lemma as other end nodes in the collapsed tree are moved downwards in the tree. The goal is to keep only one instance, i.e. the one that is placed lowest in the tree. Also, measures are taken all along in the building process in order to keep the tree acyclic.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
6.3 Extending the lexicon
</SectionTitle>
      <Paragraph position="0"> Obviously, apart from the hypernym-hyponym data that we get from the hh-constructions listed in section 5, more data can be found in text. In order to capture some of the data, we propose a similar but simpler algorithm to that of Hearst (1992). The algorithm is simpler in that it does not search for new syntactic environments that reveals the hypernym-hyponym relation. Instead, it relies on the general syntactic pattern in step 2 (below) for finding new lexical hypernym-hyponym data:  1. Look through the previously extracted hh-constructions and extract the pairs of hypernyms-hyponyms where the frequency of the pair is higher than 2.</Paragraph>
      <Paragraph position="1"> 2. Search in new data for patterns of the following kind:</Paragraph>
      <Paragraph position="3"> where a0a7a1a4a3 is a baseNP, where (funcword)+ is one or more function words 5 and where the sequence '(a0a2a1a9a16 , (anda23or))+ a0a7a1a30a29 ' is a conjoined noun phrase.</Paragraph>
      <Paragraph position="4"> 5A function word is here negatively defined as anything but verbs (including auxiliary verbs), adjectives, nouns or full stops. 3. Extract related hypernyms and hyponyms where: a. hypernym from step 1. is head in a0a7a1a9a3 b. hyponym from step 1. is head in one of the noun phrases in the conjoined noun phrase.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML