XML Viewer - j92-1001

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/j92-1001_metho.xml
Size: 49,484 bytes
Last Modified: 2025-10-06 14:13:11
<?xml version="1.0" standalone="yes"?>
<Paper uid="J92-1001">
  <Title>Using Multiple Knowledge Sources for Word Sense Discrimination</Title>
  <Section position="3" start_page="4" end_page="18" type="metho">
    <SectionTitle>
3. Sources of Knowledge
</SectionTitle>
    <Paragraph position="0"> To identify preference cues such as morphology, word frequency, collocations, semantic contexts, syntactic expectations, and conceptual relations in unrestricted texts, a system needs a large amount of knowledge in each category. In most cases, this just means that the understander's lexicon and conceptual hierarchy must include preference information, although processing concerns suggest moving some information out of these structures and into data modules specific to a particular process, such as identifying collocations. TRUMP obtains the necessary knowledge from a moderately sized lexicon (8,775 unique roots), specifically designed for use in language understanding, and a hierarchy of nearly 1,000 higher-level concepts, overlaid with approximately 40 concept-cluster definitions. It also uses a library of over 1,400 collocational patterns.</Paragraph>
    <Paragraph position="1"> We will consider each in turn.</Paragraph>
    <Section position="1" start_page="4" end_page="7" type="sub_section">
      <SectionTitle>
3.1 The Lexicon
</SectionTitle>
      <Paragraph position="0"> Development of TRUMP's current lexicon followed an experiment with a moderatelysized, commercially available lexicon (10,000 unique roots), which demonstrated many substantive problems in applying lexical resources to text processing. Although the lexicon had good morphological and grammatical coverage, as well as a thesaurus-based semantic representation of word meanings, it lacked reasonable information for discriminating senses. The current lexicon, although roughly the same size as the earlier one, has been designed to better meet the needs of producing semantic representations of text. The lexicon features a hierarchy of 1,000 parent concepts for encoding semantic preferences and restrictions, sense-based morphology and subcategorization, a distinction between primary and secondary senses and senses that require particular &amp;quot;triggers&amp;quot; or appear only in specific contexts, and a broad range of collocational information. (An alternative would have been to give up discriminating senses that the lexicon does not distinguish; cf. Janssen \[1990\].) At this time, the lexicon contains about 13,000 senses and 10,000 explicit derivations.</Paragraph>
      <Paragraph position="1"> Each lexical entry provides information about the morphological preferences, sense preferences, and syntactic cues associated with a root, its senses, and their possible derivations. An entry also links words to the conceptual hierarchy by naming the conceptual parent of each sense. If necessary, an entry can also specify the composition of common phrases, such as collocations, that have the root as their head.</Paragraph>
      <Paragraph position="2"> TRUMP's lexicon combines a core lexicon with dynamic lexicons linked to specialized conceptual domains, collocations, and concretions. The core lexicon contains the generic, or context-independent, senses of each word. The system considers these senses whenever a word appears in the input. The dynamic lexicons contain word senses that normally appear only within a particular context; these senses are considered only when that context is active. This distinction is a product of experience; it is conceivable that a formerly dynamic sense may become static, as when military terms creep into everyday language. The partitioning of the lexicon into static and  Susan W. McRoy Using Multiple Knowledge Sources dynamic components reduces the number of senses the system must consider in situations where the context does not trigger some dynamic sense. Although the idea of using dynamic lexicons is not new (see Schank and Abelson \[1977\], for example), our approach is much more flexible than previous ones because TRUMP's lexicon does not link all senses to a domain. As a result, the lexical retrieval mechanism never forces the system to use a sense just because the domain has preselected it.</Paragraph>
      <Paragraph position="3">  between word senses. This means that, for a task such as generating databases from text, task-specific processing or inference must augment the core lexical knowledge, but problems of considering many nuances of meaning or low-frequency senses are avoided. For example, the financial sense of issue (e.g., a new security) falls under the same core sense as the latest issue of a magazine. The 'progeny' and 'exit' senses of issue are omitted from the lexicon. The idea is to preserve in the core lexicon only the common, coarse distinctions among senses (cf. Frazier and Rayner 1990).</Paragraph>
      <Paragraph position="4"> Figure 1 shows the lexical entries for the word issue. Each entry has a part of speech, :POS, and a set of core senses, :SENSES. Each sense has a :TYPE field that indicates *primary* for a preferred (primary) sense and *secondary* for a deprecated (secondary) sense. The general rule for determining the :TYPE of a sense is that secondary senses are those that the semantic interpreter should not select without specific contextual information, such as the failure of some selectional restriction pertaining to the primary sense. For example, the word yard can mean an enclosed area, a workplace, or a unit of measure, but in the empty context, the enclosed-area sense is assumed. This classification makes clear the relative frequency of the senses. This is in contrast to just listing them in historical order, the approach of many lexicons (such as the Longman Dictionary of Contemporary English \[Procter 1978\]) that have been used in computational applications.</Paragraph>
      <Paragraph position="5"> The :PaR field links each word sense to its immediate parent in the semantic hierarchy. (See Section 3.2.) The parents and siblings of the two noun senses of issue, which are listed in Figure 2, give an idea of the coverage of the lexicon. In the figure, word senses are given as a root followed by a sense number; conceptual categories are designated by atoms beginning with c-. Explicit derivations, such as &amp;quot;period-ic-al-x,&amp;quot; are indicated by roots followed by endings and additional type specifiers. These derivative lexical entries do &amp;quot;double duty&amp;quot; in the lexicon: an application program can use the derivation as well as the semantics of the derivative form.</Paragraph>
      <Paragraph position="6"> The :ASS0C field, not currently used in processing, includes the lexicographer's choice of synonym or closely related words for each sense.</Paragraph>
      <Paragraph position="7"> The :SYNTAX field encodes syntactic constraints and subcategorizations for each sense. When senses share constraints (not the case in this example), they can be encoded at the level of the word entry. When the syntactic constraints (such as io-rec, one-obj, and no-oh j) influence semantic preferences, they are attached to the sense entry. For example, in this case, issue used as an intransitive verb (no-oh j) would favor 'passive moving' even though it is a secondary sense. The io-rec subcategorization in the first two senses means indirect object as recipient: the ditransitive form will fill the RECIPIENT role. The grammatical knowledge base of the system relates these subcategories to semantic roles.</Paragraph>
      <Paragraph position="8"> The :G-DERIV and :S-DERIV fields mark morphological derivations. The former, which is NIL in the case of issue to indicate no derivations, encodes the derivations at the word root level, while the latter encodes them at the sense preference level.</Paragraph>
      <Paragraph position="9"> For example, the :S-DERIV constraint allows issuance to derive from either of the first two senses of the verb, with issuer and issuable deriving only from the 'giving' sense.</Paragraph>
      <Paragraph position="10">  The lexical entries for issue.</Paragraph>
      <Paragraph position="11"> The derivation triples encode the form of each affix, the resulting syntactic category (usually redundant), and the &amp;quot;semantic transformation&amp;quot; that applies between the core sense and the resulting sense. For example, the triple (-er noun tr_actor) in the entry for issue says that an issuer plays the ACTOR role of the first sense of the verb issue. Because derivations often apply to multiple senses and often result in different semantic transformations (for example, the ending -ion can indicate the act of perform- null situations, the dynamic lexicons contain senses that are active only in a particular context. Although these senses require triggers, a sense and its trigger may occur just as frequently as a core sense. Thus, the dynamic-static distinction is orthogonal to the distinction between primary and secondary senses made in the core lexicon.</Paragraph>
      <Paragraph position="12"> Currently, TRUMP has lexicons linked to domains, collocations, and concretions.</Paragraph>
      <Paragraph position="13"> For example, TRUMP's military lexicon contains a sense of engage that means 'attack.' However, the system does not consider this sense unless the military domain is active.</Paragraph>
      <Paragraph position="14"> Similarly, the collocational lexicon contains senses triggered by well-known patterns of words; for example, the sequence take effect activates a sense of take meaning 'transpire.' (Section 3.3 discusses collocations and their representation in more detail.) Concretions activate specializations of the abstract sense of a word when it occurs with an object of a specific type. For example, in the core lexicon, the verb project has the abstract sense 'transfer'; however, if its object is a sound, the system activates a sense corresponding Computational Linguistics Volume 18, Number 1 to a 'communication event,' as in She projected her voice. Encoding these specializations in the core lexicon would be problematic, because then a system would be forced to resolve such nuances of meaning even when there was not enough information to do so. Dynamic lexicons can provide much finer distinctions among senses than the core lexicon, because they do not increase the amount of ambiguity when their triggering context is inactive.</Paragraph>
      <Paragraph position="15"> Together, the core and dynamic lexicons provide the information necessary to recognize morphological preferences, sense preferences, and syntactic cues. They also provide some of the information required to verify and interpret collocations. Sections 3.2, 3.3, and 3.4, below, describe sources of information that enable a system to recognize role-based preferences, collocations, and the semantic context.</Paragraph>
    </Section>
    <Section position="2" start_page="7" end_page="7" type="sub_section">
      <SectionTitle>
3.2 The Concept Hierarchy
</SectionTitle>
      <Paragraph position="0"> The concept hierarchy serves several purposes. First, it associates word senses that are siblings or otherwise closely related in the hierarchy, thus providing a thesaurus for information retrieval and other tasks (cf. Fox et al. 1988). In a sense tagging system, these associations can help determine the semantic context. Second, it supplies the basic ontology to which domain knowledge can be associated, so that each new domain requires only incremental knowledge engineering. Third, it allows role-based preferences, wherever possible, to apply to groups of word senses rather than just individual lexical entries.</Paragraph>
      <Paragraph position="1"> To see howthe hierarchy's concept definitions establish the basic ontology, consider Figure 3, the definition of the concept c-recording, c-recording is the parent concept for activities involving the storage of information, namely, the following verb senses: book2 cataloguel clock1 compile1 date3 documentl enter3 indexl inputl keyl logl recordl In a concept definition, the : PAR fields link the concept to its immediate parents in the hierarch~ The :ASSOC field links the derived instances of the given concept to their places in the hierarchy. For example, according to Figure 3, the object form derived</Paragraph>
      <Paragraph position="3"> The conceptual definition of c-made-of-rel.</Paragraph>
      <Paragraph position="4"> from enter3 (i.e., entry) has the parent c-information. The :ROLE-PLAY fields mark specializations of a parent's roles (or introduce new roles). Each :ROLE-PLAY indicates the parent's name for a role along with the concept's specialization of it. For example, c-recording specializes its inherited OBJECT role as PATIENT.</Paragraph>
      <Paragraph position="5"> The : REL8 and : PREF fields identify which combinations of concept, role, and filler an understander should expect (and hence prefer). For example, the definition in Figure 4 expresses that fabric materials are common modifiers of clothing (e.g., wool suit) and fill the clothing's MADE-OF role. TRUMP's hierarchy also allows the specification of such preferences from the perspective of the filler, where they can be made more general. For example, although colors are also common modifiers of clothing (e.g., blue suit), it is better to associate this preference with the filler (c-color-qual) because colors prefer to fill the COLOR role of any physical object. (Figure 5 shows an encoding of this preference.) The hierarchy also permits the specification of such preferences from the perspective of the relation underlying a role. For example, the relation c-made-of in Figure 6 indicates (in its :RELS) that physical objects normally have a MADE-OF role and (in its : PREF) that the role is normally filled by some physical object. Figure 7 gives a complete account of the use of the :RELS and :PREF fields and how they permit the expression of role-related preferences from any perspective.</Paragraph>
    </Section>
    <Section position="3" start_page="7" end_page="12" type="sub_section">
      <SectionTitle>
3.3 Collocational Patterns
</SectionTitle>
      <Paragraph position="0"> Collocation is the relationship among any group of words that tend to co-occur in a predictable configuration. Although collocations seem to have a semantic basis, many collocations are best recognized by their syntactic form. Thus, for current purposes, we limit the use of the term &amp;quot;collocation&amp;quot; to sense preferences that result from these well-defined syntactic constructions} For example, the particle combination pick up 1 Traditionally many of these expressions have been categorized as idioms (see Cowie and Mackin 1975; Cowie, Mackin, and McCraig 1983), but as most are at least partly compositional and can be processed by normal parsing methods, we prefer to use the more general term &amp;quot;collocation.&amp;quot; This categorization thus happily encompasses both the obvious idioms and the compositional expressions whose status as idioms is highly debatable. Our use of the term is thus similar to that of Smadja and McKeown, who partition collocations into open compounds, predicative relations, and idiomatic expressions (Smadja and McKeown 1990).</Paragraph>
      <Paragraph position="1">  The use of :PREF and :RELS.</Paragraph>
      <Paragraph position="2">  1. 249 profit take 2. 205 take place 3. 157 take act 4. 113 say take 5. 113 act take 6. 99 take advantage 7. 94 take effect 8. 88 take profit 9. 77 take step 10. 76 take account  The top ten co-occurences with take.</Paragraph>
      <Paragraph position="3"> and the verb-complement combination make the team are both collocation-inducing expressions. Excluded from this classification are unstructured associations among senses that establish the general semantic context, for example, courtroom~defendant. (We will discuss this type of association in the next section.) Collocations often introduce dynamic word senses, i.e., ones that behave compositionally, but occur only in the context of the expression, making it inappropriate for the system to consider them outside that context. For example, the collocation hang from triggers a sense of from that marks an INSTRUMENT. In other cases, a collocation simply creates preferences for selected core senses, as in the pairing of the 'opportunity' sense of break with the 'cause-to-have' sense of give in give her a break. There is also a class of collocations that introduce a noncompositional sense for the entire expression, for example, the collocation take place invokes a sense 'transpire.' To recognize collocations during preprocessing, TRUMP uses a set of patterns, each of which lists the root words or syntactic categories that make up the collocation. For example, the pattern (TAKE (A) (ADd) BATH) matches the clauses take a hot bath and takes hot baths. In a pattern, parentheses indicate optionality; the system encodes the repeatability of a category, such as adjectives, procedurally. Currently, there are patterns for verb-particle, verb-preposition, and verb-object collocations, as well as compound nouns.</Paragraph>
      <Paragraph position="4"> Initially, we acquired patterns for verb-object collocations by analyzing lists of root word pairs that were weighted for relative co-occurrence in a corpus of articles  Susan W. McRoy Using Multiple Knowledge Sources from the Dow Jones News Service (cf. Church and Hanks 1990; Smadja and McKeown 1990). As an example of the kind of data that we derived, Figure 8 shows the ten most frequent co-occurrences involving the root &amp;quot;take.&amp;quot; Note that the collocation &amp;quot;take action&amp;quot; appears both in its active form (third in the list), as well as its passive, actions were taken (fifth in the list).</Paragraph>
      <Paragraph position="5"> From an examination of these lists and the contexts in which the pairs appeared in the corpus, we constructed the patterns used by TRUMP to identify collocations. Then, using the patterns as a guide, we added lexical entries for each collocation. (Figure 9 lists some of the entries for the compositional collocations associated with the verb take; the entries pair a dynamic sense of take with a sense occurring as its complement.) These entries link the collocations to the semantic hierarchy, and, where appropriate, provide syntactic constraints that the parser can use to verify the presence of a collocation. For example, Figure 10 shows the entry for the noncompositional collocation take place, which requires that the object (t-*tail*) be singular and determinerless. These entries differ from similar representations of collocations or idioms in Smadja and McKeown (1990) and Stock (1989), in that they are sense-based rather than wordbased. That is, instead of expressing collocations as word-templates, the lexicon groups together collocations that combine the same sense of the head verb with particular senses or higher-level concepts (cf. Dyer and Zernik 1986). This approach better addresses the fact that collocations do have a semantic basis, capturing general forms such as give him or her &lt;some temporal object&gt;, which underlies the collocations give month, give minute, and give time. Currently, the system has entries for over 1700 such collocations.</Paragraph>
    </Section>
    <Section position="4" start_page="12" end_page="15" type="sub_section">
      <SectionTitle>
3.4 Cluster Definitions
</SectionTitle>
      <Paragraph position="0"> The last source of sense preferences we need to consider is the semantic context.</Paragraph>
      <Paragraph position="1"> Work on lexical cohesion suggests that people use words that repeat a conceptual category or that have a semantic association to each other to create unity in text (Morris 1988; Morris and Hirst 1991; Halliday and Hasan 1976). These associations can be thought of as a class of collocations that lack the predictable syntactic structure of, say, collocations arising from verb-particle or compound noun constructions. Since language producers select senses that group together semantically, a language analyzer should prefer senses that share a semantic association. However, it is unclear whether the benefit of knowing the exact nature of an association would justify the cost of determining it. Thus, our system provides a cluster mechanism for representing and identifying groups of senses that are associated in some unspecified way.</Paragraph>
      <Paragraph position="2"> A cluster is a set of the senses associated with some central concept. The definition of a cluster includes a name suggesting the central concept and a list of the cluster's members, as in Figure 11. A cluster may contain concepts or other clusters.</Paragraph>
      <Paragraph position="3"> TRUMP's knowledge base contains three types of clusters: categorial, functional, and situational. The simplest type of cluster is the categorial cluster. These clusters consist of the sets of all senses sharing a particular conceptual parent. Since the conceptual hierarchy already encodes these clusters implicitly, we need not write formal cluster definitions for them. Obviously, a sense will belong to a number of categorial clusters, one for each element of its parent chain.</Paragraph>
      <Paragraph position="4"> The second type of cluster is the functional cluster. These consist of the sets of all senses sharing a specified functional relationship. For example, our system has a small number of part-whole clusters that list the parts associated with the object named by the cluster. Figure 12 shows the part-whole cluster cl-egg for parts of an egg.</Paragraph>
      <Paragraph position="5"> The third type of cluster, the situational cluster, encodes general relationships among senses on the basis of their being associated with a common setting, event,  The definition of the cluster cl-courtroom.</Paragraph>
      <Paragraph position="6"> or purpose. Since a cluster's usefulness is inversely proportional to its size, these clusters normally include only senses that do not occur outside the clustered context or that strongly suggest the clustered context when they occur with some other member of the cluster. Thus, situational clusters are centered upon fairly specific ideas and may correspondingly be very specific with respect to their elements. It is not unusual for a word to be contained in a cluster while its synonyms are not. For example, the cluster cl-courtroom shown in Figure 13 contains sense verb_testifyl, but not verb_assertl. Situational clusters capture the associations found in generic descriptions (cf. Dahlgren, McDowell, and Stabler 1989) or dictionary examples (cf. Janssen 1990), but are more compact because clusters may include whole categories of objects (such as c-law-action) as members and need not specify relationships between the members. (As mentioned above, the conceptual hierarchy is the best place for encoding known role-related expectations.) The use of clusters for sense discrimination is also comparable to approaches that favor senses linked by marked paths in a semantic network (Hirst 1987). In fact, clusters capture most of the useful associations found in scripts or semantic networks, but lack many of the disadvantages of using networks. For example, because clusters do not specify what the exact nature of any association is, learning new clusters from previously processed sentences would be fairly straightforward, in contrast to learning new fragments of network. Using clusters also avoids the major problem associated with marker-passing approaches, namely how to prevent the production of stupid paths (or remove them from consideration after they have been produced) (Charniak  Computational Linguistics Volume 18, Number 1 1983). The relevant difference is that a cluster is cautious because it must explicitly specify all its elements. A marker passer takes the opposite stance, however, considering all paths up, down, and across the network unless it is explicitly constrained. Thus a marker passer might find the following dubious path from the 'written object' sense of book to the 'part-of-a-plant' sense of leaf: \[book made-of paper\] \[paper made-from wood\] \[tree made-of wood\] \[tre~e has-part leaf\] whereas no cluster would link these entities, unless there had been some prior evidence of a connection. (The recommended solution to the production of such paths by a marker passer is to prevent the passing of marks through certain kinds of nodes \[Hirst 1987; Hendler 1987\].) From the lexical entries, the underlying concept hierarchy, and the specialized entries for collocation and clusters just described, a language analyzer can extract the information that establishes preferences among senses. In the next section, we will describe how a semantic interpreter can apply knowledge from such a wide variety of sources.</Paragraph>
      <Paragraph position="7"> 4. Using Knowledge to Identify Sense Preferences There is a wide variety of information about which sense is the correct one, and the challenge is to decide when and how to use this information. The danger of a combinatorial explosion of possibilities makes it advantageous to try to resolve ambiguities as early as possible. Indeed, efficient preprocessing of texts can elicit a number of cues for word senses, set up preferences, and help control the parse. Then, the parse and semantic interpretation of the text will provide the cues necessary to complete the task of resolution.</Paragraph>
      <Paragraph position="8"> Without actually parsing a text, a preprocessor can identify for each word its morphology, 2 its syntactic tag or tags, 3 and whether it is part of a collocation; for each sense, it can identify whether the sense is preferred or deprecated and whether it is supported by a cluster. These properties are all either retrievable directly from a knowledge base or computable from short sequences of words. To identify whether the input satisfies the expectations created by syntactic cues or whether it satisfies role-related expectations, the system must first perform some syntactic analysis of the input. Identifying these properties must come after parsing, because recognizing them requires both the structural cues provided by parsing and a semantic analysis of the text.</Paragraph>
      <Paragraph position="9"> In our system, processing occurs in three phases: morphology, preprocessing, and parsing and semantic interpretation. (See Figure 14.) Analysis of a text begins with the identification of the morphological features of each word and the retrieval of the (core) senses of each word. Then, the input passes through a special preprocessor that identifies parse-independent semantic preferences (i.e., syntactic tags, collocations, and clusters) and makes a preliminary selection of word senses. This selection process eliminates those core senses that are obviously inappropriate and triggers certain 2 This is at least true for English, although whether it is possible for morphologically complex or agglutinative languages such as Finnish remains to be seen.</Paragraph>
      <Paragraph position="10">  The system architecture.</Paragraph>
      <Paragraph position="11"> specialized senses. In the third phase, TRUMP attempts to parse the input and at the same time produce a &amp;quot;preferred&amp;quot; semantic interpretation for it. Since the preferred interpretation also fixes the preferred sense of each word, it is at this point that the text can be given semantic tags, thus allowing sense-based information retrieval.</Paragraph>
      <Paragraph position="12"> In the next few subsections we will describe in greater detail the processes that enable the system to identify semantic preferences: morphological analysis, tagging, collocation identification, cluster matching, and semantic interpretation. Afterward we will discuss how the system combines the preferences it identifies.</Paragraph>
    </Section>
    <Section position="5" start_page="15" end_page="17" type="sub_section">
      <SectionTitle>
4.1 Morphological Analysis and Lexical Retrieval
</SectionTitle>
      <Paragraph position="0"> The first step in processing an input text is to determine the root, syntactic features, and affixes of each word. This information is necessary both for retrieving the word's lexical entries and for the syntactic tagging of the text during preprocessing. Morphological analysis not only reduces the number of words and senses that must be in the lexicon, but it also enables a system to make reasonable guesses about the syntactic and semantic identity of unknown words so that they do not prevent parsing (see Rau, Jacobs, and Zernik 1989). Once morphological analysis of a word is complete, the system retrieves (or derives) the corresponding senses and establishes initial semantic preferences for the primary senses. For example, by default, the sense of agree meaning 'to concur' (agreed is preferred over its other senses. The lexical entry for agree marks this preference by giving it :TYPE *primary* (see Figure 15). The entry also says that derivations (listed in the :S-DERIV field) agreel+ment and agree2+able are preferred, derivations agreel+able and agree3+ment are deprecated, and all other sense-affix combinations (excepting inflections) have been disallowed.</Paragraph>
      <Paragraph position="1"> During morphological analysis, the system retrieves only the most general senses.</Paragraph>
      <Paragraph position="2"> It waits until the preprocessor or the parser identifies supporting evidence before it retrieves word senses specific to a context, such as a domain, a situation, or a collocation. In most cases this approach helps reduce the amount of ambiguity. The approach is compatible with evidence discussed by Simpson and Burgess (1988) that  The lexical entry for the verb agree.</Paragraph>
      <Paragraph position="3"> &amp;quot;multiple meanings are activated in frequency-coded order&amp;quot; and that low-frequency senses are handled by a second retrieval process that accumulates evidence for those senses and activates them as necessary.</Paragraph>
    </Section>
    <Section position="6" start_page="17" end_page="18" type="sub_section">
      <SectionTitle>
4.2 Tagging
</SectionTitle>
      <Paragraph position="0"> Once the system determines the morphological analysis of each word, the next step in preprocessing is to try to determine the correct part of speech for the word. Our system uses a tagging program, written by Uri Zernik (1990), that takes information about the root, affix, and possible syntactic category for each word and applies stochastic techniques to select a syntactic tag for each word. Stochastic taggers look at small groups of words and pick the most likely assignment of tags, determined by the frequency of alternative syntactic patterns in similar texts. Although it may not be possible to completely disambiguate all words prior to parsing, approaches based on  Susan W. McRoy Using Multiple Knowledge Sources stochastic information have been quite successful (Church 1988; Garside, Leech, and Sampson 1987; de Marcken 1990). 4 To allow for the fact that the tagger may err, as part of the tagging process the system makes a second pass through the text to remove some systematic errors that result from biases common to statistical approaches. For example, they tend to prefer modifiers over nouns and nouns over verbs; for instance, in Example 5, the tagger erroneously marks the word need as a noun.</Paragraph>
      <Paragraph position="1"> Example 5 You really need the Campbell Soups of the world to be interested in your magazine. In this second pass, the system applies a few rules derived from our grammar and resets the tags where necessary. For example, to correct for the noun versus verb overgeneralization, whenever a word that can be either a noun or a verb gets tagged as just a noun, the corrector lets it remain ambiguous unless it is immediately preceded by a determiner (a good clue for nouns), or it is immediately preceded by a plural noun or a preposition, or is immediately followed by a determiner (three clues that suggest a word may be a verb). The system is able to correct for all the systematic errors we have identified thus far using just nine rules of this sort.</Paragraph>
      <Paragraph position="2"> After tagging, the preprocessor eliminates all senses corresponding to unselected parts of speech.</Paragraph>
    </Section>
    <Section position="7" start_page="18" end_page="18" type="sub_section">
      <SectionTitle>
4.3 Identification of Collocations
</SectionTitle>
      <Paragraph position="0"> Following the syntactic filtering of senses, TRUMP's preprocessor identifies collocations and establishes semantic preferences for the senses associated with them. In this stage of preprocessing, the system recognizes the following types of collocations:  * verb+particle pairs such as take on; * verb+preposition pairs such as invest in; * verb+particle+preposition combinations such as break in on; * verb+complement clauses such as take a bath, their passives, as in actions  were taken, and hyphenated nominals, such as profit-taking; * compound noun phrases such as investment bank.</Paragraph>
      <Paragraph position="1"> To recognize a collocation, the preprocessor relies on a set of simple patterns, which match the general syntactic context in which the collocation occurs. For example, the system recognizes the collocation &amp;quot;take profit&amp;quot; found in Example 6 with the pattern</Paragraph>
      <Paragraph position="3"> A number of stocks that have spearheaded the market's recent rally bore the brunt of isolated profit-taking Tuesday.</Paragraph>
      <Paragraph position="4"> The preprocessor's strategy for locating a collocation is to first scan the text for trigger words, and if it finds the necessary triggers, then to try to match the complete pattern. (Triggers typically correspond to the phrasal head of a collocation, but for</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="18" end_page="20" type="metho">
    <SectionTitle>
4 Magerman and Marcus (1990) do complete stochastic N-gram parsing.
</SectionTitle>
    <Paragraph position="0"> Computational Linguistics Volume 18, Number 1 more complex patterns, such as verb-complement clauses, both parts of the collocation must be present.) The system's matching procedures allow for punctuation and verb-complement inversion.</Paragraph>
    <Paragraph position="1"> If the triggers are found and the match is successful, the preprocessor has a choice of subsequent actions, depending on how cautious it is supposed to be. In its aggressive mode, it updates the representations of the matched words, adding any triggered senses and preferences for the collocated senses. It also deletes any unsupported, deprecated senses. In its cautious mode, it just adds the word senses associated with the pattern to a dynamic store. Once stored, these senses are then available for the parser to use after it verifies the syntactic constraints of the collocation; if it is successful, it will add preferences for the appropriate senses. Early identification of triggered senses enables the system to use them for cluster matching in the next stage.</Paragraph>
    <Section position="1" start_page="19" end_page="19" type="sub_section">
      <SectionTitle>
4.4 Identification of Clusters
</SectionTitle>
      <Paragraph position="0"> After the syntactic filtering of senses and the activation of senses triggered by collocations, the next step of preprocessing identifies preferences for senses that invoke currently active clusters (see Section 3.4). A cluster is active if it contains any of the senses under consideration for other words in the current paragraph. The system may also activate certain clusters to represent the general topic of the text.</Paragraph>
      <Paragraph position="1"> The preprocessor's strategy for assessing cluster-based preferences is to take the set of cluster names invoked by each sense of each content word in the sentence and locate all intersections between it and the names of other active clusters. (For purposes of cluster matching, the sense list for each word will include all the special and noncompositional senses activated during the previous stage of preprocessing, as well as any domain-specific senses that are not yet active.) For each intersection the preprocessor finds, it adds preferences for the senses that are supported by the cluster match. Then, the preprocessor activates any previously inactive senses it found to be supported by a cluster match. This triggering of senses on the basis of conceptual context forms the final step of the preprocessing phase.</Paragraph>
    </Section>
    <Section position="2" start_page="19" end_page="20" type="sub_section">
      <SectionTitle>
4.5 Semantic Interpretation
</SectionTitle>
      <Paragraph position="0"> Once preprocessing is complete, the parsing phase begins. In this phase, TRUMP attempts to build syntactic structures, while calling on the semantic interpreter to build and rate alternative interpretations for each structure proposed. These semantic evaluations then guide the parser's evaluation of syntactic structures. They may also influence the actual progression of the parse. For example, if a structure is found to have incoherent semantics, the parser immediately eliminates it (and all structures that might contain it) from further consideration. Also, whenever the semantics of a parse becomes sufficiently better than that of its competitors, the system prunes the semantically inferior parses, reducing the number of ambiguities even further, s As suggested above, the system builds semantic interpretations incrementally. For each proposed combination of syntactic structures, there is a corresponding combination of semantic structures. It is the job of the semantic interpreter to identify the possible relations that link the structures being combined, identify the preferences associated with each possible combination of head, role (relation), and filler (the argument or modifier), and then rank competing semantic interpretations.</Paragraph>
      <Paragraph position="1"> 5 A similar approach has been taken by Gibson (1990) and is supported by the psychological experiments of Kurtzman (1984).  Susan W. McRoy Using Multiple Knowledge Sources For each proposed combination, knowledge sources may contribute the following preferences: * preferences directly associated with the head or the filler, determined recursively from their components, beginning with preferences identified during preprocessing.</Paragraph>
      <Paragraph position="2"> * preferences associated with syntactic cues, such as the satisfaction of restrictions listed in the lexicon. For example, a word may allow only modifiers of a particular syntactic form, or a modifier may modify only a certain syntactic form. (For example, the sense meaning 'to care for,' in She tends plants or She tends to plants occurs with an NP or PP object, whereas the sense of tend meaning 'to have a tendency' as in She tends to lose things requires a clausal object.) * preferences associated with the semantic &amp;quot;fit&amp;quot; between any two of the head, the role, and the filler, for example: filler and role e.g., foods make good fillers for the PATIENT role of eating activities; filler and head e.g., colors make good modifiers of physical objects; head and role e.g., monetary objects expect to be qualified by some QUANTITY.</Paragraph>
      <Paragraph position="3"> The conceptual hierarchy and the lexicon contain the information that encodes these preferences.</Paragraph>
      <Paragraph position="4"> * preferences triggered by reference resolution. (Currently, our system does not make use of these preferences, but see Crain and Steedman \[1985\]; Altmann and Steedman \[1988\]; Hirst \[1987\].) How the semantic interpreter combines these preferences is the subject of the next section.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="20" end_page="41" type="metho">
    <SectionTitle>
5. Combining Preferences to Select Senses
</SectionTitle>
    <Paragraph position="0"> Given the number of preference cues available for discriminating word senses, an understander must face the question of what to do if they conflict. For example, in the sentence Mary took a picture to Bob, the fact that photography does not normally have a destination (negative role-related information) should override the support for the 'photograph' interpretation of took a picture given by collocation analysis. A particular source of information may also support more than one possible interpretation, but to different degrees. For example, cigarette filter may correspond either to something that filters out cigarettes or to something that is part of a cigarette, but the latter relation is more likely. Our strategy for combining the preferences described in the preceding sections is to rate most highly the sense with the strongest combination of supporting cues. The system assigns each preference cue a strength, an integer value between +10 and -10, and then sums these strengths to find the sense with the highest rating.</Paragraph>
    <Paragraph position="1"> The strength of a particular cue depends on its type and on the degree to which the expectations underlying it are satisfied. For cues that are polar -- for example, a sense is either low or high frequency -- a value must be chosen experimentally, depending on the strength of the cue compared with others. For example, the system assigns frequency information (the primary-secondary distinction) a score close to  Computational Linguistics Volume 18, Number 1 zero because this information tends to be significant only when other preferences are inconclusive. For cues that have an inherent extent -- for example, the conceptual category specified by a role preference subsumes a set of elements that can be counted -- the cue strength is a function of the magnitude of the extent, that is, its specificity. TRUMP's specificity function maps the number of elements subsumed by the concept onto the range 0 to +10. The function assigns concepts with few members a high value and concepts with many members a low w~lue. For example, the concept c-object, which subsumes roughly half the knowledge base, has a low specificity value (1). In contrast, the concept noun&amp;after1, which subsumes only a single entity, has a high specificity value (10). Concept strength is inversely proportional to concept size because a preference for a very general (large) concept often indicates that either there is no strong expectation at all or there is a gap in the system's knowledge. In either case, a concept that subsumes only a few senses is stronger information than a concept that subsumes more. The preference score for a complex concept, formed by combining simpler concepts with the connectives AND, OR, and NOT, is a function of the number of senses subsumed by both, either, or neither concept, respectively. Similarly, the score for a cluster is the specificity of that cluster (as defined in Section 3.4). (If a sense belongs to more than one active cluster, then only the most specific one is considered.) The exact details of the function (i.e., the range of magnitudes corresponding to each specificity class) necessarily depend on the size and organization of one's concept hierarchy. For example, one would assign specificity value 1 to any concept with more members than any immediate specialization of the most abstract concept.</Paragraph>
    <Paragraph position="2"> When a preference cue matches the input, the cue strength is its specificity value; when a concept fails to match the input, the strength is a negative value whose magnitude is usually the specificity of the concept, but it is not always this straightforward. Rating the evidence associated with a preference failure is a subtle problem, because there are different types of preference failure to take into account. Failure to meet a general preference is always significant, whereas failure to meet a very specific preference is only strong information when a slight relaxation of the preference does not eliminate the failure. This presents a bit of a paradox: the greater the specificity of a concept, the more information there is about it, but the less information there may be about a corresponding preference. The paradox arises because the failure of a very specific preference introduces significant uncertainty as to why the preference failed.</Paragraph>
    <Paragraph position="3"> Failing to meet a very general preference is always strong information because, in practice, the purpose of such preferences is to eliminate the grossly inappropriate -such as trying to use a relation with a physical object when it should only be applied to events. The specificity function in this case returns a value whose magnitude is the same as the specificity of the complement of the concept (i.e., the positive specificity less the maximum specificity, 10.) The result is a negative number whose absolute value is greater than it would be by default. For example, if a preference is for the concept c-object, which has a positive specificity of 1, and this concept fails to match the input, then the preference value for the cue will be -9.</Paragraph>
    <Paragraph position="4"> On the other hand, a very specific preference usually pinpoints the expected entity, i.e., the dead giveaway pairings of role and filler. Thus, it is quite common for these preferences to overspecify the underlying constraint; for example, cut may expect a tool as an INSTRUMENT, but almost any physical object will suffice. When a slight relaxation of the preference is satisfiable, a system should take the cautious route, and assume it has a case of overspecification and is at worst a weak failure. Again, the specificity function returns a negative value with magnitude equivalent to the specificity of the complement of the concept, but this time the result will be a negative number whose  Susan W. McRoy Using Multiple Knowledge Sources absolute value is less than it would be by defaulL When this approach fails, a system can safely assume that the entity under consideration is &amp;quot;obviously inappropriate&amp;quot; for a relatively strong expectation, and return the default value. The default value for a concept that is neither especially general nor specific and that fails to match the input is just -1 times the positive specificity of the concept.</Paragraph>
    <Paragraph position="5"> The strategy of favoring the most specific information has several advantages.</Paragraph>
    <Paragraph position="6"> This approach best addresses the concerns of an expanding knowledge base where one must be concerned not only with competition between preferences but also with the inevitable gaps in knowledge. Generally, the more specific information there is, the more complete, and hence more trustworthy, the information is. Thus, when there is a clear semantic distinction between the senses and the system has the information necessary to identify it, a clear distinction usually emerges in the ratings. When there is no strong semantic distinction, or there is very little information, preference scores are usually very close, so that the parser must fall back on syntactic preferences, such as Right Association. This result provides a simple, sensible means of balancing syntactic and semantic preferences.</Paragraph>
    <Paragraph position="7"> To see how the cue strengths of frequency information, morphological preferences, collocations, clusters, syntactic preferences, and role-related preferences interact with one another to produce the final ranking of senses, consider the problem of deciding the correct sense of reached in Example 1 (repeated below): Example 1 The agreement reached by the state and the EPA provides for the safe storage of the waste.</Paragraph>
    <Paragraph position="8"> According to the system's lexicon, reached has four possible verb senses: * reach1, as in reach a destination, which has conceptual parents c-dest-occur (&amp;quot;destination occurrence&amp;quot;) and c-arriving; * reach2, as in reach for a cookie, which has conceptual parent c-bodypart-act ion; * reach3, as in reach her by telephone, which has conceptual parent c-comm-event (&amp;quot;communication event&amp;quot;); and * reach4, as in reach a conclusion, which has conceptual parent c-cause-to-event-change.</Paragraph>
    <Paragraph position="9"> Figure 16 shows a tabulation of cue strengths for each of these interpretations of reach in Example 1, when just information in the VP reached by the state and the EPA is considered. The sense reach3 has the highest total score. From the table, we see that, at this point in the parse, the only strong source of preferences is the role information (line 6 of Figure 16). The derivation of these numbers is shown in Figures 17, 18, and 19, which list the role preferences associated with the possible interpretations of the preposition by for reach3, and its two nearest competitors, reach1 and reach4.</Paragraph>
    <Paragraph position="10"> Together, the data in the tables reveal the following sources of preference strength: The 'arrival' sense (reachl) gains support from the fact that there is a sense of by meaning AGENT, which is a role that arrivals expect (line 3 of column 3 of Figure 17), and the state and the EPA make reasonably good agents (line 5 of column 3 of Figure 17).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML