XML Viewer - w01-0511

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/w01-0511_metho.xml
Size: 7,596 bytes
Last Modified: 2025-10-06 14:07:41
<?xml version="1.0" standalone="yes"?>
<Paper uid="W01-0511">
  <Title>ClassifyingtheSemanticRelationsinNounCompoundsviaa Domain-SpecificLexicalHierarchy</Title>
  <Section position="4" start_page="1" end_page="2" type="metho">
    <SectionTitle>
3 NounCompoundRelations
</SectionTitle>
    <Paragraph position="0"> In this work we aim for a representation that is intermediate in generality between standard case roles (suchasAgent, Patient, Topic, Instrument), andthe specificity required for information extraction. We have created a set of relations that are sufficiently general to cover a significant number of noun compounds, but that can be domain specific enough to be useful in analysis. We want to support relationships between entities that are shown to be important in cognitive linguistics, in particular we intend to support the kinds of inferences that arise from Talmy's force dynamics (Talmy, 1985). It has been shown that relations of this kind can be combined in order to determine the &amp;quot;directionality&amp;quot; of a sentence (e.g., whether or not a politicianis infavor of, or opposed to, a proposal) (Hearst, 1990). In the medical domain this translates to, for example, mapping a sentence into a representation showing that a chemical removes an entity that is blocking the passage of a fluid through a channel.</Paragraph>
    <Paragraph position="1"> The problem remains of determining what the appropriate kinds of relations are. In theoretical linguistics, there are contradictory views regarding the semantic propertiesofnouncompounds(NCs). Levi (1978) argues that there exists a small set of semantic relationships that NCs may imply. Downing (1977) argues that the semantics of NCs cannot be exhausted by any finite listing of relationships. Between these two extremes lies Warren's (1978) taxonomyofsixmajorsemanticrelationsorganizedinto null a hierarchical structure.</Paragraph>
    <Paragraph position="2"> We have identified the 38 relations shown in Table 1. We tried to produce relations that correspond to the linguistic theories such as those of Levi and Warren, but in many cases these are inappropriate.</Paragraph>
    <Paragraph position="3"> Levi's classes are too general for our purposes; for example, she collapses the &amp;quot;location&amp;quot; and &amp;quot;time&amp;quot; relationships into one single class &amp;quot;In&amp;quot; and therefore field mouse and autumnal rain belong to the same class. Warren's classification schema is much more detailed, and there is some overlap between the top levels of Warren's hierarchy and our set of relations. For example, our &amp;quot;Cause (2-1)&amp;quot; for flu virus corresponds to her &amp;quot;Causer-Result&amp;quot; of hay fever, and our &amp;quot;Person Afflicted&amp;quot; (migraine patient) can be thought as Warren's &amp;quot;Belonging-Possessor&amp;quot; of gunman. Warren differentiates some classes also on the basis of the semantics of the constituents, so that, for example, the &amp;quot;Time&amp;quot; relationship is dividedupinto&amp;quot;Time-AnimateEntity&amp;quot;ofweekend null guests and &amp;quot;Time-Inanimate Entity&amp;quot; of Sunday paper. Our classification is based on the kind of relationships that hold between the constituent nouns rather than on the semantics of the head nouns.</Paragraph>
    <Paragraph position="4"> Fortheautomaticclassificationtask, weusedonly the 18 relations (indicated in bold in Table 1) for which an adequate number of examples were found in the current collection. Many NCs were ambiguous, in that they could be described by more than one semantic relationship. In these cases, we simply multi-labeled them: for example, cell growth is both &amp;quot;Activity&amp;quot; and &amp;quot;Change&amp;quot;, tumor regression is &amp;quot;Ending/reduction&amp;quot;and&amp;quot;Change&amp;quot;and bladder dysfunction is &amp;quot;Location&amp;quot; and &amp;quot;Defect&amp;quot;. Our approach handles this kind of multi-labeled classification.</Paragraph>
    <Paragraph position="5"> Two relation types are especially problematic.</Paragraph>
    <Paragraph position="6"> Some compounds are non-compositional or lexicalized, such as vitamin k and e2 protein; others defy classification because the nouns are subtypes of one another. This group includes migraine headache, guinea pig,andhbv carrier. WeplacedalltheseNCs in a catch-all category. We also included a &amp;quot;wrong&amp;quot; category containing word pairs that were incorrectly labeled as NCs.</Paragraph>
    <Paragraph position="7">  The relations were found by iterative refinement based on looking at 2245 extracted compounds (described in the next section) and finding commonalities among them. Labeling was done by the authors of this paper and a biology student; the NCs were classified out of context. We expect to continue development and refinement of these relationship types, based on what ends up clearly being use- null The percentage of the word pairs extracted that were not true NCs was about 6%;some examples are: treat migraine, ten patient, headache more. We do not know, however, how many NCs we missed. The errors occurred when the wrong label was assigned by the tagger (see Section 4).</Paragraph>
    <Paragraph position="8"> ful &amp;quot;downstream&amp;quot; in the analysis.</Paragraph>
    <Paragraph position="9"> The end goal is to combine these relationships in NCs with more that two constituent nouns, like in the example intranasal migraine treatment of Section 1.</Paragraph>
  </Section>
  <Section position="5" start_page="2" end_page="3" type="metho">
    <SectionTitle>
4 CollectionandLexicalResources
</SectionTitle>
    <Paragraph position="0"> To create a collection of noun compounds, we performed searches from MedLine, which contains referencesandabstractsfrom4300biomedicaljournals. null Weusedseveralqueryterms,intendedtospanacross different subfields. We retained only the titles and the abstracts of the retrieved documents. On these titles and abstracts we ran a part-of-speech tagger (Cutting et al., 1991) and a program that extracts only sequences of units tagged as nouns. We extracted NCs with up to 6 constituents, but for this paper we consider only NCs with 2 constituents.</Paragraph>
    <Paragraph position="1"> The Unified Medical Language System (UMLS) is a biomedical lexical resource produced and maintained by the National Library of Medicine (Humphreys et al., 1998). We use the MetaThesaurus component to map lexical items into unique concept IDs (CUIs).</Paragraph>
    <Paragraph position="2">  The UMLS also has a mapping from these CUIs into the MeSH lexical hierarchy (Lowe and Barnett, 1994); we mapped the CUIs into MeSH terms. There are about 19,000 unique main terms in MeSH, as well as additional modifiers. There are 15 main subhierarchies (trees) in MeSH, each corresponding to a major branch of medical ontology. For example, tree A corresponds to Anatomy, tree B to Organisms, and so on. The longer the name of the MeSH term, the longer the path from the root and the more precise the description. For example migraine is C10.228.140.546.800.525, that is, C (a disease), C10 (Nervous System Diseases), C10.228 (Central Nervous System Diseases) and so on.</Paragraph>
    <Paragraph position="3"> We use the MeSH hierarchy for generalization across classes of nouns; we use itinstead of theother resources in the UMLS primarily because of MeSH's hierarchical structure. For these experiments, we considered only those noun compounds for which both nouns can be mapped into MeSH terms, resulting in a total of 2245 NCs.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML