XML Viewer - w04-1908

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-1908_metho.xml
Size: 15,800 bytes
Last Modified: 2025-10-06 14:09:18
<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-1908">
  <Title>Automated Induction of Sense in Context</Title>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 CPA Methodology
</SectionTitle>
    <Paragraph position="0"> The Corpus Pattern Analysis (CPA) technique uses a semi-automatic bootstrapping process to produce a dictionary of selection contexts for predicates in a language. Word senses for verbs are distinguished through corpus-derived syntagmatic patterns mapped to Generative Lexicon Theory (Pustejovsky (1995)) as a linguistic model of interpretation, which guides and constrains the induction of senses from word distributional information. Each pattern is speci ed in terms of lexical sets for each argument, shallow semantic typing of these sets, and other syntagmatically relevant criteria (e.g., adverbials of manner, phrasal particles, genitives, negatives). null The procedure consists of three subtasks: (1) the manual discovery of selection context patterns for speci c verbs; (2) the automatic recognition of instances of the identi ed patterns; and (3) automatic acquisition of patterns for unanalyzed cases. Initially, a number of patterns are manually formulated by a lexicographer through corpus pattern analysis of about 500 occurrences of each verb lemma. Next, for higher frequency verbs, the remaining corpus occurrences are scrutinized to see if any low-frequency patterns have been missed. The patterns are then translated into a feature matrix used for identifying the sense of unseen instances for a particular verb.</Paragraph>
    <Paragraph position="1"> In the remainder of this section, we describe these subtasks in more detail. The following sections explain the current status of the implementation of these tasks.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Lexical Discovery
</SectionTitle>
      <Paragraph position="0"> Norms of usage are captured in what we call selection context patterns. For each lemma, contexts of usage are sorted into groups, and a stereotypical CPA pattern that captures the relevant semantic and syntactic features of the group is recorded. For example, here is the set of common patterns for the verb treat.</Paragraph>
      <Paragraph position="1"> (1) CPA pattern set for treat: I. [[Person 1]] treat [[Person 2]] ({at  |in} [[Location]]) (for [[Event = Injury  |Ailment]]); NO [Adv[Manner]] II. [[Person 1]] treat [[Person 2]] [Adv[Manner]] IIIa. [[Person]] treat [[TopType 1]] {{as  |like} [[TopType 2]]} IIIb. [[Person]] treat [[TopType]] {{as if  |as though  |like} [CLAUSE]} IV. [[Person 1]] treat [[Person 2]] {to [[Event]]} V. [[Person]] treat [[PhysObj  |Stuff 1]] (with [[Stuff 2]]) There may be several patterns realizing a single sense of a verb, as in (IIIa/IIIb) above. Additionally, many patterns have alternations, recorded in satellite CPA patterns. Alternations are linked to the main CPA pattern through the same sensemodifying mechanisms as those that allow for coercions to be understood. However, alternations are di erent realizations of the same norm. For example, the following are alternations for treat, pattern (I): [[Person 1 &lt;--&gt; Medicament  |Med-Procedure  |Institution]] [[Person 2 &lt;--&gt; Injury  |Ailment  |Bodypart]]</Paragraph>
    </Section>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
CPA Patterns
</SectionTitle>
    <Paragraph position="0"> A CPA pattern extends the traditional notion of selectional context to include a number of other contextual features, such as minor category parsing and subphrasal cues. Accurate identi cation of the semantically relevant aspects of a pattern is not an obvious and straightforward procedure, as has sometimes been assumed in the literature. For example, the presence or absence of an adverbial of manner in the third valency slot around a verb can dramatically alter the verb's meaning. Simple syntactic encoding of argument structure, for instance, is insu cient to discriminate between the two major senses of the verb treat, as illustrated below.</Paragraph>
    <Paragraph position="1">  (3) a. They say their bosses treat them with respect. b. Such patients are treated with antibiotics.</Paragraph>
    <Paragraph position="2">  The ability to recognize the shallow semantic type of a phrase in the context of a predicate is of course crucial |for example, in (3a) recognizing the PP as (a) an adverbial, and (b) an adverbial of manner, rather than an instrumental co-agent (as in (3b)), is crucial for assigning the correct sense to the verb treat above.</Paragraph>
    <Paragraph position="3"> There are four constraint sets that contribute to the patterns for encoding selection contexts. These  are: (4) a. Shallow Syntactic Parsing: Phrase-level recognition of major categories.</Paragraph>
    <Paragraph position="4"> b. Shallow Semantic Typing: 50-100 primitive shallow types, such as Person, Institution, Event, Abstract, Artifact, Location, and so forth. These are the top types selected from the Brandeis Shallow Ontology (BSO), and are similar to entities (and some relations) employed in Named Entity Recognition tasks, such as TREC and ACE.</Paragraph>
    <Paragraph position="5"> c. Minor Syntactic Category Parsing: e.g., locatives, purpose clauses, rationale clauses, temporal adjuncts. d. Subphrasal Syntactic Cue Recognition: e.g.,  genitives, partitives, bare plural/determiner distinctions, in nitivals, negatives.</Paragraph>
    <Paragraph position="6"> The notion of a selection context pattern, as produced by a human annotator, is expressed as a BNF speci cation in Table 1.1 This speci cation relies on word order to specify argument position, and is easily translated to a template with slots allocated for each argument. Within this grammar, a semantic roles can be speci ed for each argument, but this information currently is not used in the automated processing.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Brandeis Shallow Ontology
</SectionTitle>
      <Paragraph position="0"> The Brandeis Shallow Ontology (BSO) is a shallow hierarchy of types selected for their prevalence in manually identi ed selection context patterns. At the time of writing, there are just 65 types, in terms of which patterns for the rst one hundred verbs have been analyzed. New types are added occasionally, but only when all possibilities of using existing types prove inadequate. Once the set of manually extracted patterns is su cient, the type system will be re-populated and become pattern-driven.</Paragraph>
      <Paragraph position="1"> The BSO type system allows multiple inheritance (e.g. Document v PhysObj and Document v Information. The types currently comprising the ontology are listed below. The BSO contains type assignments for 20,000 noun entries and 10,000 nominal collocation entries.</Paragraph>
      <Paragraph position="2">  The acquisition strategy for selectional preferences for predicates proceeds as follows: (5) a. Partition the corpus occurrences of a predicate according to the selection contexts pattern grammar, distinguished by the four levels of constraints mentioned in (4). These are uninterpreted patterns for the predicate.</Paragraph>
      <Paragraph position="3"> b. Within a given pattern, promote the statistically signi cant literal types from the corpus for each argument to the predicate. This induces an interpretation of the pattern, treating the promoted literal type as the speci c binding of a shallow type from step (a) above.</Paragraph>
      <Paragraph position="4"> c. Within a given pattern, coerce all lexical heads in the same shallow type for an argument, into the promoted literal type, assigned in (b) above. This is a coercion of a lexical head to the interpretation of the promoted literal type induced from step (b) above.</Paragraph>
      <Paragraph position="5"> In a sense, (5a) can be seen as a broad multi-level partitioning of the selectional behavior for a predicate according to a richer set of syntactic and semantic discriminants. Step (5b) can be seen as capturing the norms of usage in the corpus, while step (5c) is a way of modeling the exploitation of these norms in the language (through coercion, metonymy, and other generative operations). To illustrate the way in which CPA discriminates uninterpreted patterns from the corpus, we return to the verb treat as it is used in the BNC. Two of its major senses, as listed in (1), emerge as correlated with two distinct context patterns, using the discriminant constraints mentioned in (4) above.</Paragraph>
      <Paragraph position="6">  (6) a. [[Person 1]] treat [[Person 2]]; NO [Adv[Manner]] b. [[Person 1]] treat [[Person 2]] [Adv[Manner]]  Given a distinct (contextual) basis on which to analyze the actual statistical distribution of the words in each argument position, we can promote statistically relevant and signi cant literal types for these positions. For example, for pattern (a) above, this induces Doctor as Person 1, and Patient as bound to Person 2. This produces the interpreted context pattern for this sense as shown below.</Paragraph>
      <Paragraph position="7"> (7) [[doctor]] treat [[patient]] Promoted literal types are corpus-derived and predicate-dependent, and are syntactic heads of phrases that occur with the greatest frequency in argument positions for a given sense pattern; they are subsequently assumed to be subtypes of the particular shallow type in the pattern. Step (5c) above then enables us to bind the other lexical heads in these positions as coerced forms of the promoted literal type. This can be seen below in the concordance sample, where therapies is interpreted as Doctor, and people and girl are interpreted as Patient.</Paragraph>
      <Paragraph position="8"> (8) a. a doctor who treated the girl till an ambulance arrived. b. over 90,000 people have been treated for cholera c. nonsurgical therapies to treat the breast cancer, which Model Bias The assumption within GL is that semantic types in the grammar map systematically to default syntactic templates (cf. Pustejovsky (1995)). These are termed canonical syntactic forms (CSFs). For example, the CSF for the type proposition is a tensed S. There are, however, many possible realizations (such as in nitival S and NP) for this type due to the di erent possibilities available from generative devices in a grammar, such as coercion and co-composition. The resulting set of syntactic forms associated with a particular semantic type is called a phrasal paradigm for that type. The model bias provided by GL acts to guide the interpretation of purely statistically based measures.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Automatic Recognition of Pattern Use
</SectionTitle>
      <Paragraph position="0"> Essentially, this subtask is similar to the traditional supervised WSD problem. Its purpose is (1) to test the discriminatory power of CPA-derived featureset, (2) to extend and re ne the inventory of features captured by the CPA patterns, and (3) to allow for predicate-based argument groupings by classifying unseen instances. Extension and re nement of the inventory of features should involve feature induction, but at the moment this part has not been implemented. During the lexical discovery stage, lexical sets that ll some of the argument slots in the patterns are instantiated from the training examples. As more predicate-based lexical sets within shallow types are explored, the data will permit identi cation of the types of features that unite elements in lexical sets.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 Automatic Pattern Acquisition
</SectionTitle>
      <Paragraph position="0"> The algorithm for automatic pattern acquisition involves the following steps:  (9) a. Collect all constituents in a particular argument position; b. Identify syntactic alternations; c. Perform clustering on all nouns that occur in a particular  argument position of a given predicate; d. For each cluster, measure its relatedness to the known lexical sets, obtained previously during the lexical discovery stage and extended through WSD of unseen instances. If none of the existing lexical sets pass the distance threshold, establish the cluster as a new lexical set, to be used in future pattern speci cation.</Paragraph>
      <Paragraph position="1"> Step (9d) must include extensive ltering procedures to check for shared semantic features, looking for commonality between the members. That is, there must be some threshold overlap between subgroups of the candidate lexical set and and the existing semantic classes. For instance, checking if, for a certain percentage of pairs in the candidate set, there already exists a set of which both elements are members. null</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Current Implementation
</SectionTitle>
    <Paragraph position="0"> The CPA patterns are developed using the British National Corpus (BNC). The sorted instances are used as a training set for the supervised disambiguation. For the disambiguation task, each pattern is translated into into a set of preprocessing-speci c features.</Paragraph>
    <Paragraph position="1"> The BNC is preprocessed with the RASP parser and semantically tagged with BSO types. The RASP system (Briscoe and Carroll (2002)) generates full parse trees for each sentence, assigning a probability to each parse. It also produces a set of grammatical relations for each parse, specifying the relation type, the headword, and the dependent element. All our computations are performed over the single top-ranked tree for the sentences where a full parse was successfully obtained. Some of the RASP grammatical relations are shown in (10).</Paragraph>
    <Paragraph position="2"> (10) subjects: ncsubj, clausal (csubj, xsubj) objects: dobj, iobj, clausal complement modi ers: adverbs, modi ers of event nominals We use endocentric semantic typing, i.e., the head-word of each constituent is used to establish its semantic type. The semantic tagging strategy is similar to the one described in Pustejovsky et al. (2002), and currently uses a subset of 24 BSO types.</Paragraph>
    <Paragraph position="3"> A CPA pattern is translated into a feature set, currently using binary features. It is further complemented with other discriminant context features which, rather than distinguishing a particular pattern, are merely likely to occur with a given subset of patterns; that is, the features that only partially determine or co-determine a sense. In the future, these should be learned from the training set through feature induction from the training sample, but at the moment, they are added manually. The resulting feature matrix for each pattern contains features such as those in (11) below. Each pattern is translated into a template of 15-25 features.</Paragraph>
    <Paragraph position="4"> (11) Selected context features: a. obj institution: object belongs to the BSO type 'Institution' null b. subj human group: subject belongs to the BSO type 'HumanGroup' null c. mod adv ly: target verb has an adverbial modi er, with a -ly adverb d. clausal like: target verb has a clausal argument introduced by 'like' e. iobj with: target verb has an indirect object introduced by 'with' f. obj PRP: direct object is a personal pronoun g. stem VVG: the target verb stem is an -ing form Each feature may be realized by a number of RASP relations. For instance, a feature dealing with objects would take into account RASP relations 'dobj', 'obj2', and 'ncsubj' (for passives).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML