File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/p99-1016_metho.xml

Size: 17,004 bytes

Last Modified: 2025-10-06 14:15:27

<?xml version="1.0" standalone="yes"?>
<Paper uid="P99-1016">
  <Title>Automatic construction of a hypernym-labeled noun hierarchy from text</Title>
  <Section position="4" start_page="120" end_page="121" type="metho">
    <SectionTitle>
3 Assigning hypernyms
</SectionTitle>
    <Paragraph position="0"> Following WordNet, a word A is said to be a hyperuym of a word B if native speakers of English accept the sentence &amp;quot;B is a (kind of) A.,, To determine possible hypernyms for a particular noun, we use the same parsed text described in the previous section. As suggested in Hearst (1992), we can find some hypernym data in the text by looking for conjunctions involving the word &amp;quot;other&amp;quot;, as in &amp;quot;X, Y, and other Zs&amp;quot; (patterns 3 and 4 in Hearst). From this phrase we can extract that Z is likely a hypernym for both X and Y.</Paragraph>
    <Paragraph position="1"> This data is extracted from the parsed text, and for each noun we construct a vector of hypernyms, with a value of i if a word has been seen as a hypernym for this noun and 0 otherwise. These vectors are associated with the leaves of the binary tree constructed in the previous section.</Paragraph>
    <Paragraph position="2"> For each internal node of the tree, we construct a vector of hypernyms by adding together the vectors of its children. We then assign a hypernym to this node by simply choosing the hypernym with the largest value in this vector; that is, the hypernym which appeared with the largest number of the node's descendant nouns. (In case of ties, the hypernyms are ordered arbitrarily.) We also list the second- and third-best hypernyms, to account for cases where a sin- null Hypernyms # nouns gle word does not describe the cluster adequately, or cases where there are a few good hypernyms which tend to alternate, such as &amp;quot;country&amp;quot; and &amp;quot;nation&amp;quot;. (There may or may not be any kind of semantic relationship among the hypernyms listed. Because of the method of selecting hypernyms, the hypernyms may be synonyms of each other, have hypernym-hyponym relationships of their own, or be completely unrelated.) If a hypernym has occurred with only one of the descendant nouns, it is not listed as one of the best hypernyms, since we have insufficient evidence that the word could describe this class of nouns. Not every node has sufficient data to be assigned a hypernym.</Paragraph>
  </Section>
  <Section position="5" start_page="121" end_page="121" type="metho">
    <SectionTitle>
4 Compressing the tree
</SectionTitle>
    <Paragraph position="0"> The labeled tree constructed in the previous section tends to be extremely redundant.</Paragraph>
    <Paragraph position="1"> Recall that the tree is binary. In many cases, a group of nouns really do not have an inherent tree structure, for example, a cluster of countries. Although it is possible that a reasonable tree structure could be created with subtrees of, say, European countries, Asian countries, etc., recall that we are using single-word hypernyms. A large binary tree of countries would ideally have &amp;quot;country&amp;quot; (or &amp;quot;nation&amp;quot;) as the best hypernym at every level. We would like to combine these subtrees into a single parent labeled &amp;quot;country&amp;quot; or &amp;quot;nation&amp;quot;, with each country appearing as a leaf directly beneath this parent.</Paragraph>
    <Paragraph position="2"> (Obviously, the tree will no longer be binary). null Another type of redundancy can occur when an internal node is unlabeled, meaning a hypernym could not be found to describe * its descendant nouns. Since the tree's root is labeled, somewhere above this node there is necessarily a node labeled with a hypernym which applies to its descendant nouns, including those which are a descendant of this node. We want to move this node's children directly under the nearest labeled ancestor.</Paragraph>
    <Paragraph position="3"> We compress the tree using the following very simple algorithm: in depth-first order,  examine the children of each internal node.</Paragraph>
    <Paragraph position="4"> If the child is itself an internal node, and it either has no best hypernym or the same three best hypernyms as its parent, delete this child and make its children into children of the parent instead.</Paragraph>
  </Section>
  <Section position="6" start_page="121" end_page="122" type="metho">
    <SectionTitle>
5 Results and evaluation
</SectionTitle>
    <Paragraph position="0"> There are 20,014 leaves (nouns) and 654 internal nodes in the final tree (reduced from 20,013 internal nodes in the uncompressed tree). The top-level node in our learned tree is labeled &amp;quot;product/analyst/official&amp;quot;. (Recall from the previous discussion that we do not assume any kind of semantic relationship among the hypernyms listed for a particular cluster.) Since these hypernyms are learned from the Wall Street Journal, they are domain-specific labels rather than the more general &amp;quot;thing/person&amp;quot;. However, if the hierarchy were to be used for text from the financial domain, these labels may be preferred.</Paragraph>
    <Paragraph position="1"> The next level of the hierarchy, the children of the root, is as shown in Table 1.</Paragraph>
    <Paragraph position="2"> (&amp;quot;Conductor&amp;quot; seems out-of-place on this list; see the next section for discussion.) These  numbers do not add up to 20,014 because 1,288 nouns are attached directly to the root, meaning that they couldn't be clustered to any greater level of detail. These tend to be nouns for which little data was available, generally proper nouns (e.g., Reindel, Yaghoubi, Igoe).</Paragraph>
    <Paragraph position="3"> To evaluate the hierarchy, 10 internal nodes dominating at least 20 nouns were selected at random. For each of these nodes, we randomly selected 20 of the nouns from the cluster under that node. Three human judges were asked to evaluate for each noun and each of the (up to) three hypernyms listed as &amp;quot;best&amp;quot; for that cluster, whether they were actually in a hyponym-hypernym relation. The judges were students working in natural language processing or computational linguistics at our institution who were not directly involved in the research for this project. 5 &amp;quot;noise&amp;quot; nouns randomly selected from elsewhere in the tree were also added to each cluster without the judges' knowledge to verify that the judges were not overly generous.</Paragraph>
    <Paragraph position="4"> Some nouns, especially proper nouns, were not recognized by the judges. For any noun that was not evaluated by at least two judges, we evaluated the noun/hypernym pair by examining the appearances of that noun in the source text and verifying that the hypernym was correct for the predominant sense of the noun.</Paragraph>
    <Paragraph position="5"> Table 2 presents the results of this evaluation. The table lists only results for the actual candidate hyponym nouns, not the noise words. The &amp;quot;Hypernym 1&amp;quot; column indicates whether the &amp;quot;best&amp;quot; hypernym was considered correct, while the &amp;quot;Any hypernym&amp;quot; column indicates whether any of the listed hypernyms were accepted. Within * those columns, &amp;quot;majority&amp;quot; lists the opinion of the majority of judges, and &amp;quot;any&amp;quot; indicates the hypernyms that were accepted by even one of the judges.</Paragraph>
    <Paragraph position="6"> The &amp;quot;Hypernym 1/any&amp;quot; column can be used to compare results to Riloff and Shepherd (1997). For five hand-selected categories, each with a single hypernym, and the 20 nouns their algorithm scored as the best members of each category, at least one judge marked on average about 31% of the nouns as correct. Using randomly-selected categories and randomly-selected category members we achieved 39%.</Paragraph>
    <Paragraph position="7"> By the strictest criteria, our algorithm produces correct hyponyms for a randomly-selected hypernym 33% of the time. Roark and Charniak (1998) report that for a hand-selected category, their algorithm generally produces 20% to 40% correct entries.</Paragraph>
    <Paragraph position="8"> Furthermore, if we loosen our criteria to consider also the second- and third-best hypernyms, 60% of the nouns evaluated were assigned to at least one correct hypernym according to at least one judge.</Paragraph>
    <Paragraph position="9"> The &amp;quot;bank/firm/station&amp;quot; cluster consists largely of investment firms, which were marked as incorrect for &amp;quot;bank&amp;quot;, resulting in the poor performance on the Hypernym 1 measures for this cluster. The last cluster in the list, labeled &amp;quot;company&amp;quot;, is actually a very good cluster of cities that because of sparse data was assigned a poor hypernym.</Paragraph>
    <Paragraph position="10"> Some of the suggestions in the .following section might correct this problem.</Paragraph>
    <Paragraph position="11"> Of the 50 noise words, a few of them were actually rated as correct as well, as shown in  This is largely because the noise words were selected truly at random, so that a noise word for the &amp;quot;company&amp;quot; cluster may not have been in that particular cluster but may still have appeared under a &amp;quot;company&amp;quot; hypernym elsewhere in the hierarchy.</Paragraph>
  </Section>
  <Section position="7" start_page="122" end_page="124" type="metho">
    <SectionTitle>
6 Discussion and future
</SectionTitle>
    <Paragraph position="0"> directions Future work should benefit greatly by using data on the hypernyms of hypernyms. In our current tree, the best hypernym for the entire tree is &amp;quot;product&amp;quot;; however, many times nodes deeper in the tree are given this label also. For example, we have a cluster including many forms of currency, but because there is little data for these particular words, the only hypernym found was &amp;quot;product&amp;quot;. However, the parent of this node has the best hypernym of &amp;quot;currency&amp;quot;. If  we knew that &amp;quot;product&amp;quot; was a hypernym of &amp;quot;currency&amp;quot;, we could detect that the parent node's label is more specific and simply absorb the child node into the parent. Furthermore, we may be able to use data on the hypernyms of hypernyms to give better labels to some nodes that are currently labeled simply with the best hypernyms of their subtrees, such as a node labeled &amp;quot;product/analyst&amp;quot; which has two subtrees, one labeled &amp;quot;product&amp;quot; and containing words for things, the other labeled &amp;quot;analyst&amp;quot; and containing names of people. We would like to instead label this node something like &amp;quot;entity&amp;quot;. It is not yet clear whether corpus data will provide sufficient data for hypernyms at such a high level of the tree, but depending on the intended application for the hierarchy, this level of generality might not be required.</Paragraph>
    <Paragraph position="1"> As noted in the previous section, one major spurious result is a cluster of 51 nouns, mainly people, which is given the hypernym &amp;quot;conductor&amp;quot;. The reason for this is that few of the nouns appear with hypernyms, and two of them (Giulini and Ozawa) appear in the same phrase listing conductors, thus giving &amp;quot;conductor&amp;quot; a count of two, sufficient to be listed as the only hypernym for the cluster. It might be useful to have some stricter criterion for hypernyms, say, that they occur with a certain percentage of the nouns below them in the tree. Additional hypernym data would also be helpful in this case, and should be easily obtainable by looking for other patterns in the text as suggested by Hearst (1992).</Paragraph>
    <Paragraph position="2"> Because the tree is built in a binary fashion, when, e.g., three clusters should all be distinct children of a common parent, two of them must merge first, giving an artificial intermediate level in the tree.</Paragraph>
    <Paragraph position="3"> For example, in the current tree a cluster with best hypernym &amp;quot;agency&amp;quot; and one with best hypernym &amp;quot;exchange&amp;quot; (as in &amp;quot;stock exchange&amp;quot;) have a parent with two best hypernyms &amp;quot;agency/exchange&amp;quot;, rather than both of these nodes simply being attached to the next level up with best hypernym &amp;quot;group&amp;quot;. It might be possible to correct for this situation by comparing the hypernyms for the two clusters and if there is little overlap, deleting their parent node and attaching them to their grandparent instead.</Paragraph>
    <Paragraph position="4"> It would be useful to try to identify terms made up of multiple words, rather than just using the head nouns of the noun phrases.</Paragraph>
    <Paragraph position="5">  Not only would this provide a more &amp;quot;useful hierarchy, or at least perhaps one that is more useful for certain applications, but it would also help to prevent some errors. Hearst (1992) gives an example of a potential hyponym-hypernym pair &amp;quot;broken bone/injury&amp;quot;. Using our algorithm, we would learn that &amp;quot;injury&amp;quot; is a hypernym of &amp;quot;bone&amp;quot;. Ideally, this would not appear in our hierarchy since a more common hypernym would be chosen instead, but it is possible that in some cases a bad hypernym would be found based on multiple word phrases. A discussion of the difficulties in deciding how much of a noun phrase to use can be found in Hearst.</Paragraph>
    <Paragraph position="6"> Ideally, a useful hierarchy should allow for multiple senses of a word, and this is an area which can be explored in future work. However, domain-specific text tends to greatly constrain which senses of a word will appear, and if the learned hierarchy is intended for use with the same type of text from which it was learned, it is possible that'this would be of limited benefit.</Paragraph>
    <Paragraph position="7"> We used parsed text for these experiments because we believed we would get better results and the parsed data was readily available. However, it would be interesting to see if parsing is necessary or if we can get equivalent or nearly-equivalent results doing some simpler text processing, as suggested in Ahlswede and Evens (1988). Both Hearst (1992) and Riloff and Shepherd (1997) use unparsed text.</Paragraph>
  </Section>
  <Section position="8" start_page="124" end_page="124" type="metho">
    <SectionTitle>
7 Related work
</SectionTitle>
    <Paragraph position="0"> Pereira et al. (1993) used clustering to build an unlabeled hierarchy of nouns. Their hierarchy is constructed top-down, rather than bottom-up, with nouns being allowed membership in multiple clusters. Their clustering is based on verb-object relations rather than on the noun-noun relations that we use.</Paragraph>
    <Paragraph position="1"> Future work on our project will include an attempt to incorporate verb-object data as well in the clustering process. The tree they construct is also binary with some internal nodes which seem to be &amp;quot;artificial&amp;quot;, but for evaluation purposes they disregard the tree structure and consider only the leaf nodes.</Paragraph>
    <Paragraph position="2"> Unfortunately it is difficult to compare their results to ours since their evaluation is based on the verb-object relations.</Paragraph>
    <Paragraph position="3"> Riloff and Shepherd (1997) suggested using conjunction and appositive data to cluster nouns; however, they approximated this data by just looking at the nearest NP on each side of a particular NP. Roark and Charniak (1998) built on that work by actually using conjunction and appositive data for noun clustering, as we do here. (They also use noun compound data, but in a separate stage of processing.) Both of these projects have the goal of building a single cluster of, e.g., vehicles, and both use seed words to initialize a cluster with nouns belonging to it.</Paragraph>
    <Paragraph position="4"> Hearst (1992) introduced the idea of learning hypernym-hyponym relationships from text and gives several examples of patterns that can be used to detect these relationships including those used here, along with an algorithm for identifying new patterns.</Paragraph>
    <Paragraph position="5"> This work shares with ours the feature that it does not need large amounts of data to learn a hypernym; unlike in much statistical work, a single occurrence is sufficient.</Paragraph>
    <Paragraph position="6"> The hyponym-hypernym pairs found by Hearst's algorithm include some that Hearst describes as &amp;quot;context and point-of-view dependent,&amp;quot; such as &amp;quot;Washington/nationalist&amp;quot; and &amp;quot;aircraft/target&amp;quot;. Our work is somewhat less sensitive to this kind of problem since only the most common hypernym of an entire cluster of nouns is reported, so much of the noise is filtered.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML