File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/e99-1007_intro.xml
Size: 7,016 bytes
Last Modified: 2025-10-06 14:06:50
<?xml version="1.0" standalone="yes"?> <Paper uid="E99-1007"> <Title>Automatic Verb Classification Using Distributions of Grammatical Features</Title> <Section position="4" start_page="0" end_page="45" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Recent years have witnessed a shift in grammar development methodology, from crafting large grammars, to annotation of corpora. Correspondingly, there has been a change from developing rule-based parsers to developing statistical methods for inducing grammatical knowledge from annotated corpus data. The shift has mostly occurred because building wide-coverage grammars is time-consuming, error prone, and difficult. The same can be said for crafting the rich lexical representations that are a central component of linguistic knowledge, and research in automatic lexical acquisition has sought to address this ((Dorr and Jones, 1996; Dorr, 1997), among others).</Paragraph> <Paragraph position="1"> Yet there have been few attempts to learn fine-grained lexical classifications from the statistical analysis of distributional data, analogously to the induction of syntactic knowledge (though see, e.g., (Brent, 1993; Klavans and Chodorow, 1992; Resnik, 1992)). In this paper, we propose such an approach for the automatic classification of verbs into lexical semantic classes. 1 We can express the issues raised by this approach as follows.</Paragraph> <Paragraph position="2"> 1. Which linguistic distinctions among lexical classes can we expect to find in a corpus? 2. How easily can we extract the frequency distributions that approximate the relevant linguistic properties? 3. Which frequency distributions work best to distinguish the verb classes? In exploring these questions, we focus on verb classification for several reasons. Verbs are very important sources of knowledge in many language engineering tasks, and the relationships among verbs appear to play a major role in the organization and use of this knowledge: Knowledge about verb classes is crucial for lexical acquisition in support of language generation and machine translation (Dorr, 1997), and document classification (Klavans and Kan, 1998). Manual classification of large numbers of verbs is a difficult and resource intensive task (Levin, 1993; Miller et ah, 1990; Dang et ah, 1998).</Paragraph> <Paragraph position="3"> To address these issues, we suggest that one can automatically classify verbs by using statistical approximations to verb diatheses, to train an automatic classifier. We use verb diatheses, following Levin and Dorr, for two reasons. First, verb diatheses are syntactic cues to semantic classes, ~We are aware that a distributional approach rests on one strong assumption on the nature of the representations under study: semantic notions and syntactic notions are correlated, at least in part. This assumption is not uncontroversial (Briscoe and Copestake, 1995; Levin, 1993; Dorr and Jones, 1996; Dorr, 1997). We adopt it here as a working hypothesis without further discussion.</Paragraph> <Paragraph position="4"> hence they can be more easily captured by corpus-based techniques. Second, using verb diatheses reduces noise. There is a certain consensus (Briscoe and Copestake, 1995; Pustejovsky, 1995; Palmer, 1999) that verb diatheses are regular sense extensions. Hence focussing on this type of classification allows one to abstract from the problem of word sense disambiguation and treat residual differences in word senses as noise in the classification task.</Paragraph> <Paragraph position="5"> We present an in-depth case study, in which we apply machine learning techniques to automatically classify a set of verbs based on distributions of grammatical indicators of diatheses, extracted from a very large corpus. We look at three very interesting classes of verbs: unergatives, unaccusatives, and object-drop verbs (Levin, 1993).</Paragraph> <Paragraph position="6"> These are interesting classes because they all participate in the transitivity alternation, and they are minimal pairs - that is, a small number of well-defined distinctions differentiate their transitive/intransitive behavior. Thus, we expect the differences in their distributions to be small, entailing a fine-grained discrimination task that provides a challenging testbed for automatic classification. null The specific theoretical question we investigate is whether the factors underlying the verb class distinctions are reflected in the statistical distributions of lexical features related to diatheses presented by the individual verbs in the corpus. In doing this, we address the questions above by determining what are the lexical features that could distinguish the behavior of the classes of verbs with respect to the relevant diatheses, which of those features can be gleaned from the corpus, and which of those, once the statistical distributions are available, can be used successfully by an automatic classifier.</Paragraph> <Paragraph position="7"> We follow a computational experimental methodology by investigating as indicated each of the hypotheses below: HI: Linguistically and psychologically motivated features for distinguishing the verb classes are apparent within linguistic experience.</Paragraph> <Paragraph position="8"> We analyze the three classes based on properties of the verbs that have been shown to be relevant for linguistic classification (Levin 93), or for disambiguation in syntactic processing (MacDonald94, Trueswel196) to determine potentially relevant distinctive features. We then count those features (or approximations to them) in a very large corpus.</Paragraph> <Paragraph position="9"> H2: The distributional patterns of (some of) those features contribute to learning the classifications of the verbs.</Paragraph> <Paragraph position="10"> We apply machine learning techniques to determine whether the features support the learning of the classifications.</Paragraph> <Paragraph position="11"> H3: Non-overlapping features are the most effective in learning the classifications of the verbs. We analyze the contribution of different features to the classification process.</Paragraph> <Paragraph position="12"> To preview, we find that, related to (HI), linguistically motivated features (related to diatheses) that distinguish the verb classes can be extracted from an annotated, and in one case parsed, corpus. In relation to (H2), a subset of these features is sufficient to halve the error rate compared to chance in automatic verb classification, suggesting that distributional data provides useful knowledge to the classification of verbs. Furthermore, in relation to (H3) we find that features that are distributionally predictable, because they are highly correlated to other features, contribute little to classification performance. We conclude that the usefulness of distributional features to the learner is determined by their informativeness.</Paragraph> </Section> class="xml-element"></Paper>