File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/w99-0632_metho.xml

Size: 11,830 bytes

Last Modified: 2025-10-06 14:15:33

<?xml version="1.0" standalone="yes"?>
<Paper uid="W99-0632">
  <Title>Using Subcategorization to Resolve Verb Class Ambiguity</Title>
  <Section position="4" start_page="266" end_page="267" type="metho">
    <SectionTitle>
MENT OF COMMUNICATION verb and in both cases
</SectionTitle>
    <Paragraph position="0"> can take the frame NP-V-NP-NP. In sentence (5c) the preferred reading is that of &amp;quot;get&amp;quot; instead of &amp;quot;instrument of communication&amp;quot; (cf. sentence (5d)).</Paragraph>
    <Paragraph position="1">  (5) a. A solicitor wrote him a letter at the airport. null b. I want you to write me a screenplay called &amp;quot;The Trip&amp;quot;.</Paragraph>
    <Paragraph position="2"> 1 Unless stated otherwise the example sentences were taken from the BNC and simplified for clarification purposes.  c. I'll phone you a taxi.</Paragraph>
    <Paragraph position="3"> d. As I entered the room I wished I'd thought of phoning a desperate SOS to James.</Paragraph>
    <Paragraph position="4"> The objective of this paper is to address the verb class disambiguation problem by developing a probabilistic framework which combines linguistic knowledge (i.e., Levin's classification) and frame frequencies acquired from the BNC. Our initial experiments focus on the syntactic frames characteristic for the dative and benefactive alternations (cf. examples (2) and (3)). These frames are licensed by a fairly large number of classes: 19 classes license the double object frame, 22 license the NP-V-NP-PPto frame and 14 classes license the NP-V-NP PPfi~r frame. The semantic and syntactic properties of these alternations have been extensively studied and are well understood (see Levin (1993) and the references therein). Furthermore, they are fairly productive and one would expect them to be well represented in a large corpus.</Paragraph>
    <Paragraph position="5"> In section 3 we describe the statistical model and the estimation of the various model parameters, section 4 presents some preliminary results and section 5 contains some discussion and concluding remarks. null</Paragraph>
  </Section>
  <Section position="5" start_page="267" end_page="269" type="metho">
    <SectionTitle>
3 The Model
</SectionTitle>
    <Paragraph position="0"> We view the choice of a class for a polysemous verb in a given frame as the joint probability P(verb,frame, class) which we rewrite using the chain rule in (6).</Paragraph>
    <Paragraph position="2"> e (frame lverb) P (class I verb, frame) We also make the following independence assumption: null</Paragraph>
    <Paragraph position="4"> The independence assumption reflects Levin's hypothesis that the argument structure of a given verb is a direct reflection of its meaning. Accordingly we assume that the semantic class determines the argument structure of its members without making reference to the individual verbs. By applying Bayes Law we write P(classlframe) as:  (8) P(class\[frame)= P (frame lclass) P (class) P (frame ) By substituting (7) and (8) into (6), P(verb, class,frame) can be written as: (9) P(verb,frame, class)</Paragraph>
    <Paragraph position="6"> We estimate the probabilities P(verb),</Paragraph>
    <Paragraph position="8"> It is easy to obtain f(verb) from the lemmatized BNC. For the experiments reported here, syntactic frames for the dative and benefactive alternations were automatically extracted from the BNC using Gsearch (Keller et al., 1999), a tool which facilitates search of arbitrary POS-tagged corpora for shallow syntactic patterns based on a user-specified context-free grammar and a syntactic query. The acquisition and filtering process is detailed in Lapata (1999).</Paragraph>
    <Paragraph position="9"> We rely on Gsearch to provide moderately accurate information about verb frames in the same way that Hindle and Rooth (1993) relied on Fidditch to provide moderately accurate information about syntactic structure, and Ratnaparkhi (1998) relied on simple heuristics defined over part-of-speech tags to deliver information nearly as useful as that provided by Fidditch. We estimated f(verb,frame) as the number of times a verb co-occurred with a particular frame in the corpus.</Paragraph>
    <Paragraph position="10"> We cannot read off P(frame\[class) from the corpus, because it is not annotated with verb classes.</Paragraph>
    <Paragraph position="11"> Nevertheless we can use the information listed in Levin with respect to the syntactic frames exhibited by the verbs of a given class. For each class  frames we recorded the syntactic frames it licenses (cf. table 2). Levin's description of the argument structure of various verbs goes beyond the simple listing of their subcategofization. Useful information is provided about the thematic roles of verbal arguments and their interpretation. Consider the examples in (15): in (15a) the verb present is a member of the FULFILLING class and its theme is expressed by the prepositional phrase with an award, in (15b) the PP headed by with receives a locative interpretation and the verb load inhabits the SPRAY/LOAD class, whereas in (15c) the prepositional phrase is instrumental and hit inhabits the HIT class. None of the information concerning thematic roles was retained.</Paragraph>
    <Paragraph position="12"> All three classes (FULFILLING, SPRAY/LOAD and HIT) were assigned the frame NP-V-NP-PPwith'.</Paragraph>
    <Paragraph position="13">  (15) a.</Paragraph>
    <Paragraph position="14"> b.</Paragraph>
    <Paragraph position="15"> C.</Paragraph>
    <Paragraph position="16">  John presented the student with an award.</Paragraph>
    <Paragraph position="17"> John loaded the truck with bricks.</Paragraph>
    <Paragraph position="18"> John hit the wall with a hammer.</Paragraph>
    <Paragraph position="19"> Because we didn't have corpus counts for the quantity f(class,frame) we simply assumed that all frames for a given class are equally likely. This means, for instance, that the estimate for P(NP-V-NP-NPtolGIvE) is 1/2 and similarly the estimate for P(NP-VIPERFORMANCE ) is ~ (cf. table 2). This is clearly a simplification, since one would expect f(class,frame) to be different for different corpora, and to vary with respect to class size and the frequency of class members.</Paragraph>
    <Paragraph position="20"> In order to estimate P(class) we first estimate f(class) which we rewrite as follows:</Paragraph>
    <Paragraph position="22"> pass The estimate of f(verb, class) for monosemous verbs reduces to the count of the verb in the corpus. Once again we cannot estimate f(verb, class) for polysemous verbs directly. All we have is the overall frequency of a given verb in the BNC and the number of classes it is a member of according to Levin. We rewrite f(verb, class) as: (17) f (verb, class) = f (verb)p(classlverb) We approximate p(classlverb) by collapsing across all verbs that have the appropriate pattern of ambiguity: null (18) f (verb, class) ~ f (verb)p(classlamb_class) Here amb_class, the ambiguity class of a verb, is the set of classes that it might inhabit. 2 We collapse verbs into ambiguity classes in order to reduce the number of parameters which must be estimated: we certainly lose information, but the approximation makes it easier to get reliable estimates from limited data. In future work we plan to use the EM algorithm (Dempster et al., 1977) to uncover the hidden class, but for the present study, we simply approximate p(classlamb_class) using a heuristic based on class size:</Paragraph>
    <Paragraph position="24"> For each class we recorded the number of its members after discarding verbs whose frequency was less than 1 per 1M in the BNC. This gave us a first approximation of the size of each class. We then computed, for each polysemous verb, the total size of the classes of which it was a member. We calculated p(classlamb_class) by dividing the former by the latter (cf. equation (19)). We obtained an estimate for the class frequency f(class) by multiplying p(classlamb_class) by the observed frequency of the verb in the BNC (cf. equation (18)).</Paragraph>
    <Paragraph position="25"> 2Our use of ambiguity classes is inspired by a similar use in HMM based part-of-speech tagging (Kupiec, 1992).</Paragraph>
    <Paragraph position="27"> Ten most frequent frames in Levin As an example consider the verb pass which has the classes THROW, SEND, GIVE and MARRY. The respective p(classlamb_class) for these classes are 27 20 15 and l0 By multiplying these by the fre- 72' 72' 72 ~&amp;quot; quency of pass in the BNC (19,559) we obtain the estimates for f(verb, class) given in table 3.</Paragraph>
    <Paragraph position="28"> Note that simply relying on class size, without regard to frequency, would give quite different results. For example the class of MANNER OF SPEAKING verbs has 76 members, of which 30 have frequencies which are less than 1 per 1M, and is the seventh largest class in Levin's classification. According to our estimation scheme MANNER OF SPEAKING verbs are the 116th largest class. The estimates for the ten most frequent classes are shown in figure 2.</Paragraph>
    <Paragraph position="29"> The estimation process described above involves at least one gross simplification, since p(classlamb_class) is calculated without reference to the identity of the verb in question. For any two verbs which fall into the same set of classes p(classlamb_class) will be the same, even though one or both may be atypical in its distribution across the classes. Furthermore, the estimation tends to favour large classes, again irrespectively of the identity of the verb in question. For example the verb carry has three classes, CARRY, FIT and COST. Intuitively speaking, the CARRY class is the most frequent (e.g., Smoking can impair the blood which carries oxygen to the brain, I carry sugar lumps around with me). However, since the FIT class (e.g., Thameslink presently carr/es 20,000 passengers daily) is larger than the CARRY class, it will be given a higher probability (0.45 versus 0.4). This is clearly wrong, but it is an empirical question how much it matters.</Paragraph>
    <Paragraph position="30"> Finally, we wanted to estimate the probability of a given frame, P(frame). We could have done this by acquiring Levin compatible subcategorization frames from the BNC. Techniques for the automatic acquisition of subcategofization dictionaries have been developed by Manning (1993), Bfiscoe and Carroll (1997) and Carroll and Rooth (1998).</Paragraph>
    <Paragraph position="31"> But the present study was less ambitious, and narrowly focused on the frames representing the dative and the benefactive alternation. In default of the more ambitious study, which we plan for the future, the estimation of P(frame) was carried out on types and not on tokens. The mapping of Levin's linguistic specifications into surface syntactic information resulted in 79 different frame types. By counting the number of times a given frame is licensed by several semantic classes we get a distribution of frames, a sample of which is shown in figure 3.</Paragraph>
    <Paragraph position="32"> The probabilities P(frmnelclass) and P(framelverb) will be unreliable when the frequency estimates for f(verb,frame) and f(class,frame) are small, and ill-defined when the frequency estimates are zero. Following Hindle and Rooth (1993) we smooth the observed frequencies in the following way, where</Paragraph>
    <Paragraph position="34"> When f(verb,frame) is zero, the estimate used is proportional to the average f(V.frame) f(v) across all verbs. Similarly, when f(class,frame) is zero, our estimate is proportional to the average f(c.l'~ame) f(C) across all classes. We don't claim that this scheme is perfect, but any deficiencies it may have are almost certainly masked by the effects of approximations and simplifications elsewhere in the system.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML