File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1133_metho.xml

Size: 17,670 bytes

Last Modified: 2025-10-06 14:08:47

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1133">
  <Title>Automated Induction of Sense in Context</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 CPA Methodology
</SectionTitle>
    <Paragraph position="0"> The Corpus Pattern Analysis (CPA) technique uses a semi-automatic bootstrapping process to produce a dictionary of selection contexts for predicates in a language. Word senses for verbs are distinguished through corpus-derived syntagmatic patterns mapped to Generative Lexicon Theory (Pustejovsky (1995)) as a linguistic model of interpretation, which guides and constrains the induction of senses from word distributional information. Each pattern is speci ed in terms of lexical sets for each argument, shallow semantic typing of these sets, and other syntagmatically relevant criteria (e.g., adverbials of manner, phrasal particles, genitives, negatives). null The procedure consists of three subtasks: (1) the manual discovery of selection context patterns for speci c verbs; (2) the automatic recognition of instances of the identi ed patterns; and (3) automatic acquisition of patterns for unanalyzed cases. Initially, a number of patterns are manually formulated by a lexicographer through corpus pattern analysis of about 500 occurrences of each verb lemma. Next, for higher frequency verbs, the remaining corpus occurrences are scrutinized to see if any low-frequency patterns have been missed. The patterns are then translated into a feature matrix used for identifying the sense of unseen instances for a particular verb.</Paragraph>
    <Paragraph position="1"> In the remainder of this section, we describe these subtasks in more detail. The following sections explain the current status of the implementation of these tasks.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Lexical Discovery
</SectionTitle>
      <Paragraph position="0"> Norms of usage are captured in what we call selection context patterns. For each lemma, contexts of usage are sorted into groups, and a stereotypical CPA pattern that captures the relevant semantic and syntactic features of the group is recorded.</Paragraph>
      <Paragraph position="1"> Many patterns have alternations, recorded in satellite CPA patterns. Alternations are linked to the main CPA pattern through the same sensemodifying mechanisms as those that allow for exploitations (coercions) of the norms of usage to be understood. For example, here is the set of patterns for the verb treat. Note that these patterns do not capture all possible uses, and other patterns may be added, e.g. if additional evidence is found in domain-speci c corpora.</Paragraph>
      <Paragraph position="2">  (1) CPA Pattern set for treat: I. [[Person 1]] treat [[Person 2]] ({at  |in} [[Location]]) (for [[Event = Injury  |Ailment]]); NO [Adv[Manner]] II. [[Person 1]] treat [[Person 2]] [Adv[Manner]] IIIa. [[Person]] treat [[TopType 1]] {{as  |like} [[TopType 2]]} IIIb. [[Person]] treat [[TopType]] {{as if  |as though  |like} [CLAUSE]} IV. [[Person 1]] treat [[Person 2]] {to [[Event]]} V. [[Person]] treat [[PhysObj  |Stuff 1]] (with [[Stuff 2]]) There may be several patterns realizing a single sense of a verb, as in (IIIa/IIIb) above. Also, there may be several equivalent alternations or there may be a stereotype. Note that alternations are di erent realizations of the same norm, not exploitations (i.e., not coercions).</Paragraph>
      <Paragraph position="3"> (2) Alternations for treat Pattern 1 :  A CPA pattern extends the traditional notion of selectional context to include a number of other contextual features, such as minor category parsing and subphrasal cues. Accurate identi cation of the semantically relevant aspects of a pattern is not an obvious and straightforward procedure, as has sometimes been assumed in the literature. For example, the presence or absence of an adverbial of manner in the third valency slot around a verb can dramatically alter the verb's meaning. Simple syntactic encoding of argument structure, for instance, is insu cient to discriminate between the two major senses of the verb treat, as illustrated below.</Paragraph>
      <Paragraph position="4">  (3) a. They say their bosses treat them with respect. b. Such patients are treated with antibiotics.</Paragraph>
      <Paragraph position="5">  The ability to recognize the shallow semantic type of a phrase in the context of a predicate is of course crucial |for example, in (3a) recognizing the PP as (a) an adverbial, and (b) an adverbial of manner, rather than an instrumental co-agent (as in (3b)), is crucial for assigning the correct sense to the verb treat above.</Paragraph>
      <Paragraph position="6"> In the CPA model, automatic identi cation of selection contexts not only captures the argument structure of a predicate, but also more delicate features, which may have a profound e ect on the semantic interpretation of a predicate in context. There are four constraint sets that contribute to the patterns for encoding selection contexts. These are:  (4) a. Shallow Syntactic Parsing: Phrase-level recognition of major categories.</Paragraph>
      <Paragraph position="7"> b. Shallow Semantic Typing: 50-100 primitive shallow types, such as Person, Institution, Event, Abstract, Artifact, Location, and so forth. These are the top types selected from the Brandeis Shallow Ontology (BSO), and are similar to entities (and some relations) employed in Named Entity Recognition tasks, such as TREC and ACE.</Paragraph>
      <Paragraph position="8"> c. Minor Syntactic Category Parsing: e.g., locatives, purpose clauses, rationale clauses, temporal adjuncts. d. Subphrasal Syntactic Cue Recognition: e.g.,  genitives, partitives, bare plural/determiner distinctions, in nitivals, negatives.</Paragraph>
      <Paragraph position="9"> The notion of a selection context pattern, as produced by a human annotator, is expressed as a BNF speci cation in Table 1.1 This speci cation relies on word order to specify argument position, and is easily translated to a template with slots allocated for each argument. Within this grammar, a semantic roles can be speci ed for each argument, but this information currently is not used in the automated processing.</Paragraph>
      <Paragraph position="10"> English contains only about 8,000 verbs, of which we estimate that about 30% have only one basic pattern. The rest are evenly split between verbs hav- null more patterns. About 20 light verbs have between 100 and 200 patterns each. This is less alarming than it sounds, because the majority of light verb patterns involve selection of just one speci c nominal head, e.g., take account, take plunge, take photograph, with few if any alternations. The pattern sets for verbs of di erent frequency groups di er in terms of the number and type of features each pattern requires, the number of patterns in a set for a given verbs, the number of alternations for each pattern, and the type of selectional preferences a ecting the verb's arguments.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Brandeis Shallow Ontology
</SectionTitle>
      <Paragraph position="0"> The Brandeis Shallow Ontology (BSO) is a shallow hierarchy of types selected for their prevalence in manually identi ed selection context patterns. At the time of writing, there are just 65 types, in terms of which patterns for the rst one hundred verbs have been analyzed. New types are added occasionally, but only when all possibilities of using existing types prove inadequate. Once the set of manually extracted patterns is su cient, the type system will be re-populated and become pattern-driven.</Paragraph>
      <Paragraph position="1"> The BSO type system allows multiple inheritance (e.g. Document v PhysObj and Document v Information. The types currently comprising the ontology are listed above. The BSO contains type assignments for 20,000 noun entries and 10,000 nominal collocation entries.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Corpus-driven Type System
</SectionTitle>
      <Paragraph position="0"> The acquisition strategy for selectional preferences for predicates proceeds as follows: (5) a. Partition the corpus occurrences of a predicate according to the selection contexts pattern grammar, distinguished by the four levels of constraints mentioned in (4). These are uninterpreted patterns for the predicate.</Paragraph>
      <Paragraph position="1"> b. Within a given pattern, promote the statistically signi cant literal types from the corpus for each argument to the predicate. This induces an interpretation of the pattern, treating the promoted literal type as the speci c binding of a shallow type from step (a) above.</Paragraph>
      <Paragraph position="2"> c. Within a given pattern, coerce all lexical heads in the same shallow type for an argument, into the promoted literal type, assigned in (b) above. This is a coercion of a lexical head to the interpretation of the promoted literal type induced from step (b) above.</Paragraph>
      <Paragraph position="3"> In a sense, (5a) can be seen as a broad multi-level partitioning of the selectional behavior for a predicate according to a richer set of syntactic and semantic discriminants. Step (5b) can be seen as capturing the norms of usage in the corpus, while step (5c) is a way of modeling the exploitation of these norms in the language (through coercion, metonymy, and other generative operations). To illustrate the way in which CPA discriminates uninterpreted patterns from the corpus, we return to the verb treat as it is used in the BNC. Although there are three basic senses for this verb, the two major senses, as illustrated in (1) above, emerge as correlated with two distinct context patterns, using the discriminant constraints mentioned in (4) above. For the full speci cation for this verb, see www.cs.brandeis.edu/~arum/cpa/treat.html.</Paragraph>
      <Paragraph position="4"> (6) a. [[Person 1]] treat [[Person 2]]; NO [Adv[Manner]] b. [[Person 1]] treat [[Person 2]] [Adv[Manner]] Given a distinct (contextual) basis on which to analyze the actual statistical distribution of the words in each argument position, we can promote statistically relevant and signi cant literal types for these positions. For example, for pattern (a) above, this induces Doctor as Person 1, and Patient as bound to Person 2. This produces the interpreted context pattern for this sense as shown below.</Paragraph>
      <Paragraph position="5"> (7) [[doctor]] treat [[patient]] Promoted literal types are corpus-derived and predicate-dependent, and are syntactic heads of phrases that occur with the greatest frequency in argument positions for a given sense pattern; they are subsequently assumed to be subtypes of the particular shallow type in the pattern. Step (5c) above then enables us to bind the other lexical heads in these positions as coerced forms of the promoted literal type. This can be seen below in the concordance sample, where therapies is interpreted as Doctor, and people and girl are interpreted as Patient.</Paragraph>
      <Paragraph position="6"> (8) a. returned with a doctor who treated the girl till an ambulance arrived.</Paragraph>
      <Paragraph position="7"> b. more than 90,000 people have been treated for cholera since the epidemic began c. nonsurgical therapies to treat the breast cancer, which may involve Model Bias The assumption within GL is that semantic types in the grammar map systematically to default syntactic templates (cf. Pustejovsky (1995)). These are termed canonical syntactic forms (CSFs). For example, the CSF for the type proposition is a tensed S. There are, however, many possible realizations (such as in nitival S and NP) for this type due to the di erent possibilities available from generative devices in a grammar, such as coercion and co-composition. The resulting set of syntactic forms associated with a particular semantic type is called a phrasal paradigm for that type. The model bias provided by GL acts to guide the interpretation of purely statistically based measures (cf. Pustejovsky (2000)).</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Automatic Recognition of Pattern Use
</SectionTitle>
      <Paragraph position="0"> Essentially, this subtask is similar to the traditional supervised WSD problem. Its purpose is (1) to test the discriminatory power of CPA-derived featureset, (2) to extend and re ne the inventory of features captured by the CPA patterns, and (3) to allow for predicate-based argument groupings by classifying unseen instances. Extension and re nement of the inventory of features should involve feature induction, but at the moment this part has not been implemented. During the lexical discovery stage, lexical sets that ll some of the argument slots in the patterns are instantiated from the training examples. As more predicate-based lexical sets within shallow types are explored, the data will permit identi cation of the types of features that unite elements in lexical sets.</Paragraph>
    </Section>
    <Section position="5" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 Automatic Pattern Acquisition
</SectionTitle>
      <Paragraph position="0"> The algorithm for automatic pattern acquisition involves the following steps:  (9) a. Collect all constituents in a particular argument position; b. Identify syntactic alternations; c. Perform clustering on all nouns that occur in  a particular argument position of a given predicate; null d. For each cluster, measure its relatedness to the known lexical sets, obtained previously during the lexical discovery stage and extended through WSD of unseen instances. If none of the existing lexical sets pass the distance threshold, establish the cluster as a new lexical set, to be used in future pattern speci cation.</Paragraph>
      <Paragraph position="1"> Step (9d) must include extensive ltering procedures to check for shared semantic features, looking for commonality between the members. That is, there must be some threshold overlap between subgroups of the candidate lexical set and and the existing semantic classes. For instance, checking if, for a certain percentage of pairs in the candidate set, there already exists a set of which both elements are members.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Current Implementation
</SectionTitle>
    <Paragraph position="0"> The CPA patterns are developed using the British National Corpus (BNC). The sorted instances are used as a training set for the supervised disambiguation. For the disambiguation task, each pattern is translated into into a set of preprocessing-speci c features.</Paragraph>
    <Paragraph position="1"> The BNC is preprocessed using the Robust Accurate Statistical Parsing system (RASP) and semantically tagged with BSO types. The RASP system (Briscoe and Carroll (2002)) tokenizes, POS-tags, and lemmatizes text, generating a forest of full parse trees for each sentence and associating a probability with each parse. For each parse, RASP produces a set of grammatical relations, specifying the relation type, the headword, and the dependent element. All our computations are performed over the single top-ranked tree for the sentences where a full parse was successfully obtained. Some of the grammatical relations identi ed by RASP are shown in (10).</Paragraph>
    <Paragraph position="2"> (10) subjects: ncsubj, clausal (csubj, xsubj) objects: dobj, iobj, clausal complement modi ers: adverbs, modi ers of event nominals We use endocentric semantic typing, i.e., the head-word of each constituent is used to establish its semantic type. The semantic tagging strategy is similar to the one described in Pustejovsky et al. (2002). Currently, a subset of 24 BSO types is used for semantic tagging.</Paragraph>
    <Paragraph position="3"> A CPA pattern is translated into a feature set, which in the current implementation uses binary features. It is further complemented with other discriminant context features which, rather than distinguishing a particular pattern, are merely likely to occur with a given subset of patterns; that is, the features that only partially determine or co-determine a sense. In the future, these should be learned from the training set through feature induction from the training sample, but at the moment, they are added manually. The resulting feature matrix for each pattern contains features such as those in (11) below.</Paragraph>
    <Paragraph position="4"> Each pattern is translated into a template of 15-25 features.</Paragraph>
    <Paragraph position="5"> (11) Selected context features: a. obj institution: object belongs to the BSO type 'Institution' null b. subj human group: subject belongs to the BSO type 'HumanGroup' null c. mod adv ly: target verb has an adverbial modi er, with a -ly adverb d. clausal like: target verb has a clausal argument introduced by 'like' e. iobj with: target verb has an indirect object by 'with' f. obj PRP: direct object is a personal pronoun g. stem VVG: the target verb stem is an -ing form Each feature may be realized by a number of RASP relations. For instance, a feature dealing with objects would take into account RASP relations 'dobj', 'obj2', and 'ncsubj' (for passives). The features such as (11a)-(11e) are typically taken directly from the pattern speci cation, while features such as in (11f) and (11g) would typically be added as co-determining the pattern.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML