File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/j03-4004_intro.xml

Size: 10,287 bytes

Last Modified: 2025-10-06 14:01:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="J03-4004">
  <Title>Adjectives Using Automatically Acquired Selectional Preferences</Title>
  <Section position="4" start_page="640" end_page="644" type="intro">
    <SectionTitle>
3. Methodology
</SectionTitle>
    <Paragraph position="0"> We acquire selectional preferences from automatically preprocessed and parsed text during a training phase. The parser is applied to the test data as well in the run-time phase to identify grammatical relations among nouns, verbs, and adjectives. The acquired selectional preferences are then applied to the noun-verb and noun-adjective pairs in these grammatical constructions for disambiguation.</Paragraph>
    <Paragraph position="1">  at run time.</Paragraph>
    <Paragraph position="2"> The overall structure of the system is illustrated in Figure 1. We describe the individual components in sections 3.1-3.3 and 4.</Paragraph>
    <Section position="1" start_page="641" end_page="641" type="sub_section">
      <SectionTitle>
3.1 Preprocessing
</SectionTitle>
      <Paragraph position="0"> The preprocessor consists of three modules applied in sequence: a tokenizer, a part-of-speech (POS) tagger, and a lemmatizer.</Paragraph>
      <Paragraph position="1"> The tokenizer comprises a small set of manually developed finite-state rules for identifying word and sentence boundaries. The tagger (Elworthy 1994) uses a bigram hidden Markov model augmented with a statistical unknown word guesser. When applied to the training data for selectional preference acquisition, it produces the single highest-ranked POS tag for each word. In the run-time phase, it returns multiple tag hypotheses, each with an associated forward-backward probability to reduce the impact of tagging errors. The lemmatizer (Minnen, Carroll, and Pearce 2001) reduces inflected verbs and nouns to their base forms. It uses a set of finite-state rules expressing morphological regularities and subregularities, together with a list of exceptions for specific (irregular) word forms.</Paragraph>
    </Section>
    <Section position="2" start_page="641" end_page="642" type="sub_section">
      <SectionTitle>
3.2 Parsing
</SectionTitle>
      <Paragraph position="0"> The parser uses a wide-coverage unification-based shallow grammar of English POS tags and punctuation (Briscoe and Carroll 1995) and performs disambiguation using a context-sensitive probabilistic model (Briscoe and Carroll 1993), recovering from extra-grammaticality by returning partial parses. The output of the parser is a set of grammatical relations (Carroll, Briscoe, and Sanfilippo 1998) specifying the syntactic dependency between each head and its dependent(s), taken from the phrase structure tree that is returned from the disambiguation phase.</Paragraph>
      <Paragraph position="1"> For selectional preference acquisition we applied the analysis system to the 90 million words of the written portion of the British National Corpus (BNC); the parser produced complete analyses for around 60% of the sentences and partial analyses for over 95% of the remainder. Both in the acquisition phase and at run time, we extract from the analyser output subject-verb, verb-direct object, and noun-adjective  Computational Linguistics Volume 29, Number 4 modifier dependencies.</Paragraph>
      <Paragraph position="2">  We did not use the SENSEVAL-2 Penn Treebank-style bracketings supplied for the test data.</Paragraph>
    </Section>
    <Section position="3" start_page="642" end_page="644" type="sub_section">
      <SectionTitle>
3.3 Selectional Preference Acquisition
</SectionTitle>
      <Paragraph position="0"> The preferences are acquired for grammatical relations (subject, direct object, and adjective-noun) involving nouns and grammatically related adjectives or verbs. We use WordNet synsets to define our sense inventory. Our method exploits the hyponym links given for nouns (e.g., cheese is a hyponym of food), the troponym links for verbs  (e.g., limp is a troponym of walk), and the &amp;quot;similar-to&amp;quot; relationship given for adjectives (e.g., one sense of cheap is similar to flimsy).</Paragraph>
      <Paragraph position="1"> The preference models are modifications of the tree cut models (TCMs) originally proposed by Li and Abe (1995, 1998). The main differences between that work and ours are that we acquire adjective as well as verb models, and also that our models are with respect to verb and adjective classes, rather than forms. We acquire models for classes because we are using the models for WSD, whereas Li and Abe used them for structural disambiguation.</Paragraph>
      <Paragraph position="2"> We define a TCM as follows. Let NC be the set of noun synsets (noun classes) in WordNet: NC = {nc [?] WordNet}, and NS be the set of noun senses</Paragraph>
      <Paragraph position="4"> We use G to refer to such a set of classes in a TCM. A TCM is defined by G and a probability distribution:</Paragraph>
      <Paragraph position="6"> The probability distribution is conditioned by the grammatical context. In this work, the probability distribution associated with a TCM is conditioned on a verb class (vc) and either the subject or direct-object relation, or an adjective class (ac) and the adjective-noun relation. Let VC be the set of verb synsets (verb classes) in WordNet: VC = {vc [?] WordNet}. Let AC be the set of adjective classes (which subsume WordNet synsets; we elaborate further on this subsequently). Thus, the TCMs define a probability distribution over NS that is conditioned on a verb class (vc) or adjective class (ac) and a particular grammatical relation (gr): summationdisplay nc[?]G p(nc|vc, gr)=1 (9) Acquisition of a TCM for a given vc and gr proceeds as follows. The data for acquiring the preference are obtained from a subset of the tuples involving verbs in the synset or troponym (subordinate) synsets. Not all verbs that are troponyms or direct members of the synset are used in training. We take the noun argument heads occurring with verbs that have no more than 10 senses in WordNet and a frequency of 20 or more occurrences in the BNC data in the specified grammatical relationship. The threshold of 10 senses removes some highly polysemous verbs having many sense distinctions that are rather subtle. Verbs that have more than 10 senses  Verbs not in WordNet by BNC frequency.</Paragraph>
      <Paragraph position="7"> arguments. The frequency threshold of 20 is intended to remove noisy data. We set the threshold by examining a plot of BNC frequency and the percentage of verbs at particular frequencies that are not listed in WordNet (Figure 2). Using 20 as a threshold for the subject slot results in only 5% verbs that are not found in WordNet, whereas 73% of verbs with fewer than 20 BNC occurrences are not present in WordNet.</Paragraph>
      <Paragraph position="8">  The selectional-preference models for adjective-noun relations are conditioned on an ac. Each ac comprises a group of adjective WordNet synsets linked by the &amp;quot;similar-to&amp;quot; relation. These groups are formed such that they partition all adjective synsets. Thus AC = {ac [?] WordNet adjective synsets linked by similar-to}. For example, Figure 3 shows the adjective classes that include the adjective fundamental and that are formed in this way.</Paragraph>
      <Paragraph position="9">  For selectional-preference models conditioned on adjective classes, we use only those adjectives that have 10 synsets or less in WordNet and have 20 or more occurrences in the BNC.</Paragraph>
      <Paragraph position="10"> The set of ncsinG are selected from all the possibilities in the hyponym hierarchy according to the minimum description length (MDL) principle (Rissanen 1978) as used by Li and Abe (1995, 1998). MDL finds the best TCM by considering the cost (in bits) of describing both the model and the argument head data encoded in the model. The cost (or description length) for a TCM is calculated according to equation (10). The number of parameters of the model is given by k, which is the number of ncsinG minus one. N is the sample of the argument head data. The cost of describing each noun argument head (n) is calculated by the log of the probability estimate for that noun: description length = model description length + data description length</Paragraph>
      <Paragraph position="12"> Adjective classes that include fundamental.</Paragraph>
      <Paragraph position="13"> The probability estimate for each n is obtained using the estimates for all the nss that n has. Let C n be the set of ncs that include n as a direct member: C n = {nc [?] NC|n [?] nc}. Let nc prime be a hypernym of nc on G (i.e. nc</Paragraph>
      <Paragraph position="15"> (i.e., the set of nouns senses at and beneath nc prime in the hyponym hierarchy). Then the estimate p(n) is obtained using the estimates for the hypernym classes on G for all the  prime to give the estimate for each p(ns) under that nc prime .</Paragraph>
      <Paragraph position="16"> The probability estimates for the {nc [?] G} ( p(nc|vc, gr) or p(nc|ac, gr)) are obtained from the tuples from the data of nouns co-occurring with verbs (or adjectives) belonging to the conditioning vc (or ac) in the specified grammatical relationship (&lt; n, v, gr &gt;). The frequency credit for a tuple is divided by |C  freq(nc|vc, gr)(13) This ensures that the total frequency credit at any G across the hyponym hierarchy equals the credit for the conditioning vc. This will be the sum of the frequency credit for all verbs that are direct members or troponyms of the vc, divided by the number of other senses of each of these verbs:  TCMs for the direct-object slot of two verb classes that include the verb seize. To ensure that the TCM covers all NS in WordNet, we modify Li and Abe's original scheme by creating hyponym leaf classes below all WordNet's internal classes in the hyponym hierarchy. Each leaf holds the ns previously held at the internal class. Figure 4 shows portions of two TCMs. The TCMs are similar, as they both contain the verb seize, but the TCM for the class that includes clutch has a higher probability for the entity noun class compared to the class that also includes assume and usurp. This example includes only top-level WordNet classes, although the TCM may use more specific noun classes.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML