File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/01/j01-3003_abstr.xml
Size: 10,918 bytes
Last Modified: 2025-10-06 13:41:59
<?xml version="1.0" standalone="yes"?> <Paper uid="J01-3003"> <Title>Automatic Verb Classification Based on Statistical Distributions of Argument Structure</Title> <Section position="2" start_page="0" end_page="375" type="abstr"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> Automatic acquisition of lexical knowledge is critical to a wide range of natural language processing (NLP) tasks (Boguraev and Pustejovsky 1996). Especially important is knowledge about verbs, which are the primary source of relational information in a sentence--the predicate-argument structure that relates an action or state to its participants (i.e., who did what to whom). In facing the task of automatic acquisition of knowledge about verbs, two basic questions must be addressed: What information about verbs and their relational properties needs to be learned? What information can in practice be learned through automatic means? In answering these questions, some approaches to lexical acquisition have focused on learning syntactic information about verbs, by automatically extracting subcategorization frames from a corpus or machine-readable dictionary (Brent 1993; Briscoe and Examples of verbs from the three optionally intransitive classes.</Paragraph> <Paragraph position="1"> Unergative The horse raced past the barn.</Paragraph> <Paragraph position="2"> The jockey raced the horse past the barn.</Paragraph> <Paragraph position="3"> Unaccusative The butter melted in the pan.</Paragraph> <Paragraph position="4"> The cook melted the butter in the pan.</Paragraph> <Paragraph position="5"> Object-Drop The boy played.</Paragraph> <Paragraph position="6"> The boy played soccer.</Paragraph> <Paragraph position="7"> Other work has attempted to learn deeper semantic properties such as selectional restrictions (Resnik 1996; Riloff and Schmelzenbach 1998), verbal aspect (Klavans and Chodorow 1992; Siegel 1999), or lexical-semantic verb classes such as those proposed by Levin (1993) (Aone and McKee 1996; McCarthy 2000; Lapata and Brew 1999; Schulte im Walde 2000). In this paper, we focus on argument structure--the thematic roles assigned by a verb to its arguments--as the way in which the relational semantics of the verb is represented at the syntactic level.</Paragraph> <Paragraph position="8"> Specifically, our proposal is to automatically classify verbs based on argument structure properties, using statistical corpus-based methods. We address the problem of classification because it provides a means for lexical organization which can effectively capture generalizations over verbs (Palmer 2000). Within the context of classification, the use of argument structure provides a finer discrimination among verbs than that induced by subcategorization frames (as we see below in our example classes, which allow the same subcategorizations but differ in thematic assigmnent), but a coarser classification than that proposed by Levin (in which classes such as ours are further subdivided according to more detailed semantic properties). This level of classification granularity appears to be appropriate for numerous language engineering tasks. Because knowledge of argument structure captures fundamental participant/event relations, it is crucial in parsing and generation (e.g., Srinivas and Joshi \[1999\]; Stede \[1998\]), in machine translation (Dorr 1997), and in information retrieval (Klavans and Kan 1998) and extraction (Riloff and Schmelzenbach 1998). Our use of statistical corpus-based methods to achieve this level of classification is motivated by our hypothesis that class-based differences in argument structure are reflected in statistics over the usages of the component verbs, and that those statistics can be automatically extracted from a large annotated corpus.</Paragraph> <Paragraph position="9"> The particular classification problem within which we investigate this hypothesis is the task of learning the three major classes of optionally intransitive verbs in English: unergative, unaccusative, and object-drop verbs. (For the unergative/unaccusative distinction, see Perlmutter \[1978\]; Burzio \[1986\]; Levin and Rappaport Hovav \[1995\]). Table 1 shows an example of a verb from each class in its transitive and intransitive usages. These three classes are motivated by theoretical linguistic properties (see discussion and references below, and in Stevenson and Merlo \[1997b\]; Merlo and Stevenson \[2000b\]). Furthermore, it appears that the classes capture typological distinctions that are useful for machine translation (for example, causative unergatives are ungrammatical in many languages), as well as processing distinctions that are useful for generating naturally occurring language (for example, reduced relatives with unergative verbs are awkward, but they are acceptable, and in fact often preferred to full relatives for unaccusative and object-drop verbs) (Stevenson and Merlo 1997b; Merlo and Stevenson 1998).</Paragraph> <Paragraph position="10"> The question then is what underlies these distinctions. We identify the property that precisely distinguishes among these three classes as that of argument structure-i.e., the thematic roles assigned by the verbs. The thematic roles for each class, and their mapping to subject and object positions, are summarized in Table 2. Note that verbs across these three classes allow the same subcategorization frames (taking an NP object or occurring intransitively); thus, classification based on subcategorization alone would not distinguish them. On the other hand, each of the three classes is comprised of multiple Levin classes, because the latter reflect more detailed semantic distinctions among the verbs (Levin 1993); thus, classification based on Levin's labeling would miss generalizations across the three broader classes. By contrast, as shown in Table 2, each class has a unique pattern of thematic assignments, which categorize the verbs precisely into the three classes of interest.</Paragraph> <Paragraph position="11"> Although the granularity of our classification differs from Levin's, we draw on her hypothesis that semantic properties of verbs are reflected in their syntactic behavior. The behavior that Levin focuses on is the notion of diathesis alternation--an alternation in the expression of the arguments of a verb, such as the different mappings between transitive and intransitive that our verbs undergo. Whether a verb participates in a particular diathesis alternation or not is a key factor in Levin's approach to classification. We, like others in a computational framework, have extended this idea by showing that statistics over the alternants of a verb effectively capture information about its class (Lapata 1999; McCarthy 2000; Lapata and Brew 1999).</Paragraph> <Paragraph position="12"> In our specific task, we analyze the pattern of thematic assignments given in Table 2 to develop statistical indicators that are able to determine the class of an optionally intransitive verb by capturing information across its transitive and intransitive alternants. These indicators serve as input to a machine learning algorithm, under a supervised training methodology, which produces an automatic classification system for our three verb classes. Since we rely on patterns of behavior across multiple occurrences of a verb, we begin with the problem of assigning a single class to the entire set of usages of a verb within the corpus. For example, we measure properties across all occurrences of a word, such as raced, in order to assign a single classification to the lexical entry for the verb race. This contrasts with work classifying individual occurrences of a verb in each local context, which have typically relied on training that includes instances of the verbs to be classified--essentially developing a bias that is used in conjunction with the local context to determine the best classification for new instances of previously seen verbs. By contrast, our method assigns a classification to verbs that have not previously been seen in the training data. Thus, while we do not as yet assign different classes to the instances of a verb, we can assign a single predominant class to new verbs that have never been encountered.</Paragraph> <Paragraph position="13"> To preview our results, we demonstrate that combining just five numerical indicators, automatically extracted from large text corpora, is sufficient to reduce the error Computational Linguistics Volume 27, Number 3 rate in this classification task by more than 50% over chance. Specifically, we achieve almost 70% accuracy in a task whose baseline (chance) performance is 34%, and whose expert-based upper bound is calculated at 86.5%.</Paragraph> <Paragraph position="14"> Beyond the interest for the particular classification task at hand, this work addresses more general issues concerning verb class distinctions based in argument structure. We evaluate our hypothesis that such distinctions are reflected in statistics over corpora through a computational experimental methodology in which we investigate as indicated each of the subhypotheses below, in the context of the three verb classes under study: * Lexical features capture argument structure differences between verb classes. 1 * The linguistically distinctive features exhibit distributional differences across the verb classes that are apparent within linguistic experience (i.e., they can be collected from text).</Paragraph> <Paragraph position="15"> * The statistical distributions of (some of) the features contribute to learning the classifications of the verbs.</Paragraph> <Paragraph position="16"> In the following sections, we show that all three hypotheses above are borne out. In Section 2, we describe the argument structure distinctions of our three verb classes in more detail. In support of the first hypothesis above, we discuss lexical correlates of the underlying differences in thematic assignments that distinguish the three verb classes under investigation. In Section 3, we show how to approximate these features by simple syntactic counts, and how to perform these counts on available corpora. We confirm the second hypothesis above, by showing that the differences in distribution predicted by the underlying argument structures are largely found in the data. In Section 4, in a series of machine learning experiments and a detailed analysis of errors, we confirm the third hypothesis by showing that the differences in the distribution of the extracted features are successfully used for verb classification. Section 5 evaluates the significance of these results by comparing the program's accuracy to an expert-based upper bound. We conclude the paper with a discussion of its contributions, comparison to related work, and suggestions for future extensions.</Paragraph> </Section> class="xml-element"></Paper>