File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/99/w99-0632_evalu.xml
Size: 5,115 bytes
Last Modified: 2025-10-06 14:00:40
<?xml version="1.0" standalone="yes"?> <Paper uid="W99-0632"> <Title>Using Subcategorization to Resolve Verb Class Ambiguity</Title> <Section position="6" start_page="269" end_page="271" type="evalu"> <SectionTitle> 4 Results </SectionTitle> <Paragraph position="0"> We evaluated the performance of the model on all verbs listed in Levin which are polysemous and take frames characteristic for the dative and benefactive alternations. This resulted in 154 verbs which take the NP-V-NP-NP frame, 135 verbs which take the NP-V-NP-PPw frame and 84 verbs which take the NP-V-NP-PPj~,r frame. The verbs were all polysemous and had an average of 3.8 classes. Each class had an average of 3.4 frames. Furthermore, we divided these verbs in two categories: verbs which can be disambiguated solely on the basis of their frame (e.g., serve; category A) and verbs which are genuinely ambiguous, i.e., they inhabit a single frame and yet can be members of more than one semantic class (e.g., write; category B).</Paragraph> <Paragraph position="1"> The task was the following: given that we know the frame of a given verb can we predict its semantic class? In other words by varying the class in the term P(verb,frame, class) we are trying to see whether the class which maximizes it is the one predicted by the lexical semantics and the argument structure of the verb in question.</Paragraph> <Paragraph position="2"> For the verbs belonging to category A (306 in total) we used Levin's own classification in evaluation. The model's performance was considered correct if it agreed with Levin in assigning a verb the appropriate class given a particular frame. For class ambiguous verbs (category B) we compared the model's predictions against manually annotated data. Given the restriction that these verbs are semantically ambiguous in a specific syntactic frame we could not simply sample from the entire BNC, since this would decrease the chances of finding the verb in the frame we are interested in. Instead, for 31 class ambiguous verbs we randomly selected approximately 100 tokens from the data used for the acquisition of frame frequencies for the dative and benefactive alternation. Verbs with frame frequency less than 100 were not used in the evaluation.</Paragraph> <Paragraph position="3"> The selected tokens were annotated with class information by two judges. The judges were given annotation guidelines but no prior training. We measured the judges' agreement on the annotation task using the Kappa coefficient (Siegel and Castellan, 1988) which is the ratio of the proportion of times, P(A), that k raters agree to the proportion of times, P(E), that we would expect the raters to agree by chance (cf. (22)). If there is a complete agreement among the raters, then K = 1, whereas if there is no agreement among the raters (other than the agreement which would be expected to occur by chance),</Paragraph> <Paragraph position="5"> We counted the performance of our model as correct if it agreed with the &quot;most preferred&quot;, i.e., most frequent verb class as determined in the manually annotated corpus sample by taking the average of the responses of both judges.</Paragraph> <Paragraph position="6"> We also compared the results for both categories to a naive baseline which relies only on class information and does not take subcategorization into account. For a given polysemous verb, the baseline was computed by defaulting to its most frequent class, where class frequency was determined by the estimation procedure described in the previous section. null As shown in table 4, in all cases our model out-performs the baseline. It achieves a combined precision of 91.8% for category A verbs. One might expect a precision of 100% since these verbs can be disambiguated solely on the basis of the frame.</Paragraph> <Paragraph position="7"> However, the performance of our model is less, mainly because of the way we estimated the terms P(class) and P(frame\[class): we overemphasize the importance of frequent classes without taking into account how individual verbs distribute across classes.</Paragraph> <Paragraph position="8"> The model achieves a combined precision of 83.9% for category B verbs (cf. table 4). Further- null more, our model makes interesting predictions with respect to the semantic preferences of a given verb. In table 5 we show the class preferences the model came up with for eight randomly selected verbs (class preferences are ranked from left to right, with the leftmost class being the most preferred one). Table 6 summarizes the average class frequencies for the same eight verbs as assigned to corpus tokens by the two judges together with inter-judge agreement (K). The category OTHER is reserved for corpus tokens which either have the wrong frame or for which the classes in question are not applicable. In general agreement on the class annotation task was good with Kappa values ranging from 0.68 to 1. As shown in table 6, with the exceptions of call and produce the model's predictions are borne out in corpus data.</Paragraph> </Section> class="xml-element"></Paper>