File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/e99-1007_metho.xml

Size: 21,821 bytes

Last Modified: 2025-10-06 14:15:19

<?xml version="1.0" standalone="yes"?>
<Paper uid="E99-1007">
  <Title>Automatic Verb Classification Using Distributions of Grammatical Features</Title>
  <Section position="5" start_page="45" end_page="47" type="metho">
    <SectionTitle>
2 Determining the Features
</SectionTitle>
    <Paragraph position="0"> In this section, we present motivation for the features that we investigate in terms of their role in learning the verb classes. We first present the linguistically derived features, then turn to evidence from experimental psycholinguistics to extend the set of potentially relevant features.</Paragraph>
    <Section position="1" start_page="45" end_page="46" type="sub_section">
      <SectionTitle>
2.1 Features of the Verb Classes
</SectionTitle>
      <Paragraph position="0"> The three verb classes under investigation unergatives, unaccusatives, and object-drop - differ in the properties of their transitive/intransitive alternations, which are exemplified below.</Paragraph>
      <Paragraph position="1">  The sentences in (1) use an unergative verb, raced. Unergatives are intransitive action verbs whose transitive form is the causative counterpart of the  intransitive form. Thus, the subject of the intransitive (la) becomes the object of the transitive (lb) (Brousseau and Ritter, 1991; Hale and Keyser, 1993; Levin and Rappaport Hovav, 1995). The sentences in (2) use an unaccusative verb, melted. Unaccusatives are intransitive change of state verbs (2a); like unergatives, the transitive counterpart for these verbs is also causative (2b). The sentences in (3) use an object-drop verb, washed; these verbs have a non-causative transitive/intransitive alternation, in which the object is simply optional.</Paragraph>
      <Paragraph position="2"> Both unergatives and unaccusatives have a causative transitive form, but differ in the semantic roles that they assign to the participants in the event described. In an intransitive unergative, the subject is an Agent (the doer of the event), and in an intransitive unaccusative, the subject is a Theme (something affected by the event). The role assignments to the corresponding semantic arguments of the transitive forms--i.e., the direct objects--are the same, with the addition of a Causal Agent (the causer of the event) as subject in both cases. Object-drop verbs simply assign Agent to the subject and Theme to the optional object.</Paragraph>
      <Paragraph position="3"> We expect the differing semantic role assignments of the verb classes to be reflected in their syntactic behavior, and consequently in the distributional data we collect from a corpus. The three classes can be characterized by their occurrence in two alternations: the transitive/intransitive alternation and the causative alternation. Unergatives are distinguished from the other classes in being rare in the transitive form (see (Stevenson and Merlo, 1997) for an explanation of this fact). Both unergatives and unaccusatives are distinguished from object-drop in being causative in their transitive form, and similarly we expect this to be reflected in amount of detectable causative use. Furthermore, since the causative is a transitive use, and the transitive use of unergatives is expected to be rare, causativity should primarily distinguish unaccusatives from object-drops. In conclusion, we expect the defining features of the verb classes--the intransitive/transitive and causative alternations--to lead to distributional differences in the observed usages of the verbs in these alternations.</Paragraph>
    </Section>
    <Section position="2" start_page="46" end_page="47" type="sub_section">
      <SectionTitle>
2.2 Features of the MV/RR Alternatives
</SectionTitle>
      <Paragraph position="0"> Not only do the verbs under study differ in their thematic properties, they also differ in their processing properties. Because these verbs can occur both in a transitive and an intransitive form, they have been particularly studied in the context of the main verb/reduced relative (MV/RR) ambiguity illustrated below (Bever, 1970): The horse raced past the barn fell.</Paragraph>
      <Paragraph position="1"> The verb raced can be interpreted as either a past tense main verb, or as a past participle within a reduced relative clause (i.e., the horse \[that was\] raced past the barn). Because fell is the main verb, the reduced relative interpretation of raced is required for a coherent analysis of the complete sentence. But the main verb interpretation of raced is so strongly preferred that people experience great difficulty at the verb fell, unable to integrate it with the interpretation that has been developed to that point. However, the reduced relative interpretation is not difficult for all verbs, as in the following example: The boy washed in the tub was angry.</Paragraph>
      <Paragraph position="2"> The difference in ease of interpreting the resolutions of this ambiguity has been shown to be sensitive to both frequency differentials (MacDonald, 1994; Trueswell, 1996) and to verb class distinctions (?).</Paragraph>
      <Paragraph position="3"> Consider the features that distinguish the two resolutions of the MV/RR ambiguity: Main Verb: The horse raced past the barn quickly.</Paragraph>
      <Paragraph position="4"> Reduced Relative: The horse raced past the barn fell.</Paragraph>
      <Paragraph position="5"> In the main verb resolution, the ambiguous verb raced is used in its intransitive form, while in the reduaed relative, it is used in its transitive, causative form. These features correspond directly to the defining alternations of the three verb classes under study (intransitive/transitive, causative). Additionally, we see that other related features to these usages serve to distinguish the two resolutions of the ambiguity. The main verb form is active and a main verb part-of-speech (labeled as VBD by automatic POS taggers); by contrast, the reduced relative form is passive and a past participle (tagged as VBN). Although these properties are redundant with the intransitive/transitive distinction, recent work in machine learning (Ratnaparkhi, 1997; Ratnaparkhi, 1998) has shown that using overlapping features can be beneficial for learning in a maximum entropy framework, and we want to explore it in this setting to test H3 above. 2 In the next section, 2These properties are redundant with the intransitive/transitive distinction, as passive implies transitive use, and necessarily entails the use of a past participle. We performed a correlation analysis that  we describe how we compile the corpus counts for each of the four properties, in order to approximate the distributional information of these alternations. null</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="47" end_page="50" type="metho">
    <SectionTitle>
3 Frequency Distributions of the
Features
</SectionTitle>
    <Paragraph position="0"> We assume that currently available large corpora are a reasonable approximation to language (Pullum, 1996). Using a combined corpus of 65-million words, we measured the relative frequency distributions of the linguistic features (VBD/VBN, active/passive, intransitive/transitive, causative/non-causative) over a sample of verbs from the three lexical semantic classes.</Paragraph>
    <Section position="1" start_page="47" end_page="47" type="sub_section">
      <SectionTitle>
3.1 Materials
</SectionTitle>
      <Paragraph position="0"> We chose a set of 20 verbs from each class - divided into two groups each, as will be explained below - based primarily on the classification of verbs in (Levin, 1993).</Paragraph>
      <Paragraph position="1"> The unergatives are manner of motion verbs: jumped, rushed, marched, leaped, floated, raced, hurried, wandered, vaulted, paraded (group 1); galloped, glided, hiked, hopped, jogged, scooted, scurried, skipped, tiptoed, trotted (group 2).</Paragraph>
      <Paragraph position="2"> The unaccusatives are verbs of change of state: opened, exploded, flooded, dissolved, cracked, hardened, boiled, melted, fractured, solidified (group 1); collapsed, cooled, folded, widened, changed, cleared, divided, simmered, stabilized (group 2).</Paragraph>
      <Paragraph position="3"> The object-drop verbs are unspecified object alternation verbs: played, painted, kicked, carved, reaped, washed, danced, yelled, typed, knitted (group 1); borrowed, inherited, organised, rented, sketched, cleaned, packed, studied, swallowed, called (group 2).</Paragraph>
      <Paragraph position="4"> The verbs were selected from Levin's classes on the basis of our intuitive judgment that they are likely to be used with sufficient frequency to be found in the corpus we had available. Furthermore, they do not generally show massive departures from the intended verb sense in the corpus. (Though note that there are only 19 unaccusatives because ripped, which was initially counted in group 2 of unaccusatives, was then excluded from the analysis as it occurred mostly in a different usage in the corpus; ie, as a verb plus particle.) yielded highly significant R=.44 between intransitive and active use, and R=.36 between intransitive and main verb (VBD) use. We discuss the effects of feature overlap in the experimental section.</Paragraph>
      <Paragraph position="5"> Most of the verbs can occur in the transitive and in the passive. Each verb presents the same form in the simple past and in the past participle, entailing that we can extract both active and passive occurrences by searching on a single token. In order to simplify the counting procedure, we made the assumption that counts on this single verb form would approximate the distribution of the features across all forms of the verb.</Paragraph>
      <Paragraph position="6"> Most counts were performed on the tagged version of the Brown Corpus and on the portion of the Wall Street Journal distributed by the ACL/DCI (years 1987, 1988, 1989), a combined corpus in excess of 65 million words, with the exception of causativity which was counted only for the 1988 year of the WSJ, a corpus of 29 million words.</Paragraph>
    </Section>
    <Section position="2" start_page="47" end_page="48" type="sub_section">
      <SectionTitle>
3.2 Method
</SectionTitle>
      <Paragraph position="0"> We counted the occurrences of each verb token in a transitive or intransitive use (INTR), in an active or passive use (ACT), in a past participle or simple past use (VBD), and in a causative or non-causative use (CAUS). 3 More precisely, the following occurrences were counted in the corpus.</Paragraph>
      <Paragraph position="1"> INTR: the closest nominal group following the verb token was considered to be a potential object of the verb. A verb occurrence immmediately followed by a potential object was counted as transitive. If no object followed, the occurrence was counted as intransitive.</Paragraph>
      <Paragraph position="2"> ACT: main verb (ie, those tagged VBD) were counted as active. Tokens with tag VBN were also counted as active if the closest preceding auxiliary was have, while they were counted as passive if the closest preceding auxiliary was be.</Paragraph>
      <Paragraph position="3"> VBD: A part-of-speech tagged corpus was used, hence the counts for VBD/VBN were simply done based on the POS label according to the tagged corpus.</Paragraph>
      <Paragraph position="4"> C/AUS: The causative feature was approximated by the following steps. First, for each verb occurrence subjects and objects were extracted from a parsed corpus (Collins 1997). Then the propor3In performing this kind of corpus analysis, one has to take into account the fact that current corpus annotations do not distinguish verb senses. However, in these counts, we did not distinguish a core sense of the verb from an extended use of the verb. So, for instance, the sentence Consumer spending jumped 1.7 ~o in February after a sharp drop the month before (WSJ 1987) is counted as an occurrence of the manner-of-motion verb jump in its intransitive form. This kind of extension of meaning does not modify subcategorization distributions (Roland and Jurafsky, 1998), although it might modify the rate of causativity, but this is an unavoidable limitation at the current state of annotation of corpora.</Paragraph>
      <Paragraph position="5">  tion of overlap between the two multisets of nouns was calculated, meant to capture the property of the causative construction that the subject of the intransitive can occur as the object of the transitive. We define overlap as the largest multiset of elements belonging to both the subjects and the object multisets, e.g. {a, a, a, b} A {a} = {a, a, a}. The proportion is the ratio between the overlap and the sum of the subject and object multisets.</Paragraph>
      <Paragraph position="6"> The verbs in group 1 had been used in an earlier study, in which it was important to minimize noisy data, so they generally underwent greater manual intervention in the counts. In adding group 2 for the classification experiment, we chose to minimize the intervention, in order to demonstrate that the classification process is robust enough to withstand the resulting noise in the data.</Paragraph>
      <Paragraph position="7"> For transitivity and voice, the method of count depended on the group. For group 1, the counts were done automatically by regular expression patterns, and then corrected, partly by hand and partly automatically. For group 2, the counts were done automatically without any manual intervention. For causativity, the same counting scripts were used for both groups of verbs, but the input to the counting programs was determined by manual inspection of the corpus for verbs belonging to group 1, while it was extracted automatically from a parsed corpus for group 2 (WSJ 1988, parsed with the parser from (Collins, 1997).</Paragraph>
      <Paragraph position="8"> Each count was normalized over all occurrences of the verb, yielding a total of four relative frequency features: VBD (%VBD tag), ACT (%active use), INTR (%intransitive use), CAUS (%causative  Our goal was to determine whether statistical indicators can be automatically combined to determine the class of a verb from its distributional properties. We experimented both with self-aggregating and supervised methods. The frequency distributions of the verb alternation features yield a vector for each verb that represents the relative frequency values for the verb on each dimension; the set of 59 vectors constitute the data for our machine learning experiments.</Paragraph>
      <Paragraph position="9"> Vector template: \[verb, VBD, ACT, INTK, CAUS\] Example: \[opened, .793, .910, .308, .158\]</Paragraph>
    </Section>
    <Section position="3" start_page="48" end_page="48" type="sub_section">
      <SectionTitle>
Features Accuracy
</SectionTitle>
      <Paragraph position="0"> 1. VBD ACT INTI~ CAUS 52% &amp;quot;2. VBD ACT CAUS 54% 3. VBD ACT INTR 45% '4. ACT INTR. CAUS 47% 5. VBD INTB. CAUS 66%  We must now determine which of the distributions actually contribute to learning the verb classifications. First we describe computational experiments in unsupervised learning, using hierarchical clustering, then we turn to supervised classification.</Paragraph>
    </Section>
    <Section position="4" start_page="48" end_page="49" type="sub_section">
      <SectionTitle>
4.1 Unsupervised Learning
</SectionTitle>
      <Paragraph position="0"> Other work in automatic lexical semantic classification has taken an approach in which clustering over statistical features is used in the automatic formation of classes (Pereira et al., 1993; Pereira et al., 1997; Resnik, 1992). We used the hierarchical clustering algorithm available in SPlus5.0, imposing a cut point that produced three clusters, to correspond to the three verb classes. Table 1 shows the accuracy achieved using the four features described above (row 1), and all three-feature subsets of those four features (rows 25). Note that chance performance in this task (a three-way classification) is 33% correct.</Paragraph>
      <Paragraph position="1"> The highest accuracy in clustering, of 66%-or half the error rate compared to chance--is obtained only by the triple of features in row 5 in the table: VBD, INTR., and CANS. All other sub-sets of features yield a much lower accuracy, of 4554%. We can conclude that some of the features contribute useful information to guide clustering, but the inclusion of ACT actually degrades perfor~ mance. Clearly, having fewer but more relevant features is important to accuracy in verb classification. We will return to the issue in detail of which features contribute most to learning in our discussion of supervised learning below.</Paragraph>
      <Paragraph position="2"> A problem with analyzing the clustering performance is that it is not always clear what counts as a misclassification. We cannot actually know what the identity of the verb class is for each cluster.</Paragraph>
      <Paragraph position="3"> In the above results, we imposed a classification based on the class of the majority of verbs in a cluster, but often there was a tie between classes within a cluster, and/or the same class was the majority class in more than one cluster. To evaluate better the effects of the features in learning, we therefore turned to a supervised learning method,  i Decision Trees Rule Sets Features Accuracy Standard Error Accuracy Standard Error 1. VBD ACT INTR. CAUS 64.2% 1.7% 64.9% 1.6% 2. VBD ACT CADS 55.4% 1.5% 55.7% 1.4% -3. VBD ACT INTR '4. ACT INTR CADS 5. VBD INTR. CADS</Paragraph>
      <Paragraph position="5"> 60.9% 1.2% 62.3% 1.2% where the classification of each verb in a test set is unambiguous.</Paragraph>
    </Section>
    <Section position="5" start_page="49" end_page="50" type="sub_section">
      <SectionTitle>
4.2 Supervised learning
</SectionTitle>
      <Paragraph position="0"> For our supervised learning experiments, we used the publicly available version of the C5.0 machine learning algorithm, 5 a newer version of C4.5 (Quinlan, 1992), which generates decision trees from a set of known classifications. We also had the system extract rule sets automatically from the decision trees. For all reported experiments, we ran a 10-fold cross-validation repeated ten times, and the numbers reported are averages over all the runs. 6 Table 2 shows the results of our experiments on the four features we counted in the corpora (VBD, ACT, INTR., CADS), as well as all three-feature sub-sets of those four. As seen in the table, classification based on the four features performs at 6465%, or 31% over chance. (Recall that this is a 3-way decision, hence baseline is 33%).</Paragraph>
      <Paragraph position="1"> Given the resources needed to extract the features from the corpus and to annotate the corpus itself, we need to understand the relative contribution of each feature to the results - one or more of the features may make little or no contribution to the successful classification behavior. Observe that when either the INTR or CADS feature is removed (rows 2 and 3, respectively, of Table 2), performance degrades considerably, with a decrease in accuracy of 8-10% from the maximum achieved with the four features (row 1). However, when the VBD feature is removed (row 4), there is a smaller decrease in accuracy, of 4-6%. When the ACT feature is removed (row 5), there is an  randomly divides the data into ten parts, and runs ten times on a different 90%-training-data/t0%-test-data split, yielding an average accuracy and standard error. This procedure is then repeated for 10 different random divisions of the data, and accuracy and standard error are again averaged across the ten runs.</Paragraph>
      <Paragraph position="2"> even smaller decrease, of 2-4%. In fact, the accuracy here is very close to the accuracy of the fourfeature results when the standard error is taken into account. We conclude then that INTR and CADS contribute the most to the accuracy of the classification, while ACT seems to contribute little. (Compare the clustering results, in which the best performance was achieved with the subset of features excluding ACT.) This shows that not all the linguistically relevant features are equally useful in learning.</Paragraph>
      <Paragraph position="3"> We think that this pattern of results is related to the combination of the feature distributions: some distributions are highly correlated, while others are not. According to our calculations, CADS is not significantly correlated with any other feature; of the features that are significantly correlated, VBD is more highly correlated with ACT than with INTI~ (R=.67 and g=.36 respectively), while INTR is more highly correlated with ACT than with VBD (R=.44 and R=.36 respectively).</Paragraph>
      <Paragraph position="4"> We expect combinations of features that are not correlated to yield better classification accuracy.</Paragraph>
      <Paragraph position="5"> If we compare the accuracy of the 3-feature combinations in Table 2 (rows 2-5), this hypothesis is confirmed. The three combinations that contain the feature CADS (rows 2, 4 and 5)--the uncorrelated feature--have better performance than the combination that does not (row 3), as expected.</Paragraph>
      <Paragraph position="6"> Now consider the subsets of three features that include CADS with a pair of the other correlated features. The combination containing VBD and INTR (row 5)--the least correlated pair of the features VBD, INTR, and ACT--has the best accuracy, while the combination containing the highly correlated VBD and ACT (row 2) has the worst accuracy. The accuracy of the subset {vso, INTR, CADS} (row 5) is also better than the accuracy of the subset {ACT, INTa, CADS} (row 4), because INTR overlaps with VBD less than with ACT. 7 7We suspect that another factor comes into play, namely how noisy the feature is. The similarity in performance using INTR or CADS in combination with</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML