XML Viewer - j01-3003

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/01/j01-3003_relat.xml
Size: 14,231 bytes
Last Modified: 2025-10-06 14:15:36
<?xml version="1.0" standalone="yes"?>
<Paper uid="J01-3003">
  <Title>Automatic Verb Classification Based on Statistical Distributions of Argument Structure</Title>
  <Section position="10" start_page="399" end_page="402" type="relat">
    <SectionTitle>
7. Related Work
</SectionTitle>
    <Paragraph position="0"> We conclude from the discussion above that our own work and work of others support our hypotheses concerning the importance of the relation between classes of verbs and the syntactic expression of argument structure in corpora. In light of this, it is instructive to evaluate our results in the context of other work that shares this view. Some related work requires either exact exemplars for acquisition, or external pre-compiled resources. For example, Dorr (1997) summarizes a number of automatic classification experiments based on encoding Levin's alternations directly, as symbolic properties of a verb (Dorr, Garman, and Weinberg 1995; Dorr and Jones 1996). Each verb is represented as the binary settings of a vector of possible alternations, acquired through a large corpus analysis yielding exemplars of the alternation. To cope with sparse data, the corpus information is supplemented by syntactic information obtained from the LDOCE and semantic information obtained from WordNet. This procedure classifies 95 unknown verbs with 61% accuracy. Dorr also remarks that this result could be improved to 83% if missing LDOCE codes were added. While Dorr's work requires finding exact exemplars of the alternation, Oishi and Matsumoto (1997) present a method that, like ours, uses surface indicators to approximate underlying properties. From a dictionary of dependency relations, they extract case-marking particles as indicators of the grammatical function properties of the verbs (which they call thematic properties), such as subject and object. Adverbials indicate aspectual properties.</Paragraph>
    <Paragraph position="1"> The combination of these two orthogonal dimensions gives rise to a classification of Japanese verbs.</Paragraph>
    <Paragraph position="2"> Other work has sought to combine corpus-based extraction of verbal properties with statistical methods for classifying verbs. Siegel's work on automatic aspectual classification (1998, 1999) also reveals a close relationship between verb-related syntactic and semantic information. In this work, experiments to learn aspectual classification from linguistically-based numerical indicators are reported. Using combinations of seven statistical indicators (some morphological and some reflecting syntactic cooccurrences), it is possible to learn the distinction between events and states for 739 verb tokens with an improvement of 10% over the baseline (error rate reduction of 74%), and to learn the distinction between culminated and non-culminated events for 308 verb tokens with an improvement of 11% (error rate reduction of 29%) (Siegel 1999).</Paragraph>
    <Paragraph position="3"> In work on lexical semantic verb classification, Lapata and Brew (1999) further support the thesis of a predictive correlation between syntax and semantics in a statistical framework, showing that the frequency distributions of subcategorization frames within and across classes can disambiguate the usages of a verb with more than one known lexical semantic class. On 306 verbs that are disambiguated by subcategorization frame, they achieve 91.8% accuracy on a task with a 65.7% baseline, for a 76% reduction in error rate. On 31 verbs that can take the same subcategorization(s) in different classes--more similar to our situation in that subcategorization alone cannot distinguish the classes--they achieve 83.9% accuracy compared to a 61.3% baseline, for a 58% reduction in error. Aone and McKee (1996), working with a much coarser-grained classification of verbs, present a technique for predicate-argument extraction from multi-lingual texts. Like ours, their work goes beyond statistics over subcategorizations to include counts over the more directly semantic feature of animacy. No numerical evaluation of their results is provided.</Paragraph>
    <Paragraph position="4">  Merlo and Stevenson Statistical Verb Classification Schulte im Walde (2000) applies two clustering methods to two types of frequency data for 153 verbs from 30 Levin (1993) classes. One set of experiments uses verb subcategorization frequencies, and the other uses subcategorization frequencies plus selectional preferences (a numerical measure based on an adaptation of the relative entropy method of Resnik \[1996\]). The best results achieved are a correct classification of 58 verbs out of 153, with a precision of 61% and recall of 36%, obtained using only subcategorization frequencies. We calculate that this corresponds to an F-score of 45% with balanced precision and recall, n The use of selectional preference information decreases classification performance under either clustering algorithm. The results are somewhat difficult to evaluate further, as there is no description of the classes included. Also, the method of counting correctness entails that some &amp;quot;correct&amp;quot; classes may be split across distant clusters (this level of detail is not reported), so it is unclear how coherent the class behaviour actually is.</Paragraph>
    <Paragraph position="5"> McCarthy (2000) proposes a method to identify diathesis alternations. After learning subcategorization frames, based on a parsed corpus, selectional preferences are acquired for slots of the subcategorization frames, using probability distributions over Wordnet classes. Alternations are detected by testing the hypothesis that, given any verb, the selectional preferences for arguments occurring in alternating slots will be more similar to each other than those for slots that do not alternate. For instance, given a verb participating in the causative alternation, its selectional preferences for the sub-ject in an intransitive use, and for the object in a transitive use, will be more similar to each other than the selectional preferences for these two slots of a verb that does not participate in the causative alternation. This method achieves the best accuracy for the causative and the conative alternations (73% and 83%, respectively), despite sparseness of data. McCarthy reports that a simpler measure of selectional preferences based simply on head words yields a lower 63% accuracy. Since this latter measure is very similar to our CAUS feature, we think that our results would also improve by adopting a similar method of abstracting from head words to classes.</Paragraph>
    <Paragraph position="6"> Our work extends each of these approaches in some dimension, thereby providing additional support for the hypothesis that syntax and semantics are correlated in a systematic and predictive way. We extend Dorr's alternation-based automatic classification to a statistical setting. By using distributional approximations of indicators of alternations, we solve the sparse data problem without recourse to external sources of knowledge, such as the LDOCE, and in addition, we are able to learn argument structure alternations using exclusively positive examples. We improve on the approach of Oishi and Matsumoto (1997) by learning argument structure properties, which, unlike grammatical functions, are not marked morphologically, and by not relying on external sources of knowledge. Furthermore, in contrast to Siegel (1998) and Lapata and Brew (1999) our method applies successfully to previously unseen words--i.e., test cases that were not represented in the training set. 13 This is a very important property of lexical acquisition algorithms to be used for lexicon organization, as their main interest lies in being applied to unknown words.</Paragraph>
    <Paragraph position="7"> On the other hand, our approach is similar to the approaches of Siegel, and Lapata and Brew (1999), in attempting to learn semantic notions from distributions of  Computational Linguistics Volume 27, Number 3 indicators that can be gleaned from a text. In our case, we are trying to learn argument structure, a finer-grained classification than the dichotomic distinctions studied by Siegel. Like Lapata and Brew, three of our indicators--TRANS, VBN, PASS--are based on the assumption that distributional differences in subcategorization frames are related to underlying verb class distinctions. However, we also show that other syntactic indicators--cAUS and ANIM--can be devised that tap directly into the argument structure of a verb. Unlike Schulte im Walde (2000), we find the use of these semantic features helpful in classification--using only TRANS and its related features, VBN and PASS, we achieve only 55% accuracy, in comparison to 69.8% using the full set of features. This can perhaps be seen as support for our hypothesis that argument structure is the right level of representation for verb class distinctions, since it appears that our features that capture thematic differences are useful in classification, while Schulte im Walde's selectional restriction features were not.</Paragraph>
    <Paragraph position="8"> Aone and McKee (1996) also use features that are intended to tap into both sub-categorization and thematic role distinctions--frequencies of the transitive use and animate subject use. In our task, we show that subject animacy can be profitably approximated solely with pronoun counts, avoiding the need for reference to external sources of semantic information used by Aone and McKee. In addition, our work extends theirs in investigating much finer-grained verb classes, and in classifying verbs that have multiple argument structures. While Aone and McKee define each of their classes according to a single argument structure, we demonstrate the usefulness of syntactic features that capture relations across different argument structures of a single verb. Furthermore, while Aone and McKee, and others, look at relative frequency of subcategorization frames (as with our TRANS feature), or relative frequency of a property of NPs within a particular grammatical function (as with our ANIM feature), we also look at the paradigmatic relations across a text between thematic arguments in different alternations (with our CAUS feature).</Paragraph>
    <Paragraph position="9"> McCarthy (2000) shows that a method very similar to ours can be used for identifying alternations. Her qualitative results confirm, however, what was argued in Section 2 above: counts that tap directly into the thematic assignments are necessary to fully identify a diathesis alternation. In fact, on close inspection, McCarthy's method does not distinguish between the induced-action alternation (which the unergatives exhibit) and the causative/inchoative alternation (which the unaccusatives exhibit); thus, her method does not discriminate two of our classes. It is likely that a combination of our method, which makes the necessary thematic distinctions, and her more sophisticated method of detecting alternations would give very good results.</Paragraph>
    <Paragraph position="10"> 8. Limitations and Future Work The classification results show that our method is powerful, and suited to the classification of unknown verbs. However, we have not yet addressed the problem of verbs that can have multiple classifications. We think that many cases of ambiguous classification of the lexical entry for a verb can be addressed with the notion of intersective sets introduced by Dang et al. (1998). This is an important concept, which proposes that &amp;quot;regular&amp;quot; ambiguity in classification--i.e., sets of verbs that have the same multi-way classifications according to Levin (1993)--can be captured with a finer-grained notion of lexical semantic classes. Thus, subsets of verbs that occur in the intersection of two or more Levin classes form in themselves a coherent semantic (sub)class. Extending our work to exploit this idea requires only defining the classes appropriately; the basic approach will remain the same. Given the current demonstration of our method on fine-grained classes that share subcategoriza- null Merlo and Stevenson Statistical Verb Classification tion alternations, we are optimistic regarding its future performance on intersective sets.</Paragraph>
    <Paragraph position="11"> Because we assume that thematic properties are reflected in alternations of argument structure, our features require searching for relations across occurrences of each verb. This motivated our initial experimental focus on verb types. However, when we turn to consider ambiguity, we must also address the problem that individual instances of verbs may come from different classes, and we may (like Lapata and Brew \[1999\]) want to classify the individual tokens of a verb. In future research we plan to extend our method to the case of ambiguous tokens, by experimenting with the combination of several sources of information: the classification of each instance will be a function of a bias for the verb type (using the cross-corpus statistics we collect), but also of features of the usage of the instance being classified (cf., Lapata and Brew \[1999\]; Siegel \[1998\]).</Paragraph>
    <Paragraph position="12"> Finally, corpus-based learning techniques collect statistical information related to language use, and are a good starting point for studying human linguistic performance. This opens the way to investigating the relation of linguistic data in text to people's linguistic behaviour and use. For example, Merlo and Stevenson (1998) show that, contrary to the naive assumption, speakers' preferences in syntactic disambiguation are not simply directly related to frequency (i.e., a speaker's preference for one construction over another is not simply modelled by the frequency of the construction, or of the words in the construction). Thus, the kind of corpus investigation we are advocating--founded on in-depth linguistic analysis--holds promise for building more natural NLP systems which go beyond the simplest assumptions, and tie together statistical computational linguistic results with experimental psycholinguistic data.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML