File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/01/j01-3003_concl.xml
Size: 2,755 bytes
Last Modified: 2025-10-06 13:53:00
<?xml version="1.0" standalone="yes"?> <Paper uid="J01-3003"> <Title>Automatic Verb Classification Based on Statistical Distributions of Argument Structure</Title> <Section position="11" start_page="402" end_page="403" type="concl"> <SectionTitle> 9. Conclusions </SectionTitle> <Paragraph position="0"> In this paper, we have presented an in-depth case study, in which we investigate machine learning techniques for automatically classifying a set of verbs into classes determined by their argument structures. We focus on the three major classes of optionally intransitive verbs in English, which cannot be discriminated by their subcategorizations, and therefore require distinctive features that are sensitive to the thematic properties of the verbs. We develop such features and automatically extract them from very large, syntactically annotated corpora. Results show that a small number of linguistically motivated lexical features are sufficient to achieve a 69.8% accuracy rate in a three-way classification task with a baseline (chance) performance of 33.9%, for which the best performance achieved by a human expert is 86.5%.</Paragraph> <Paragraph position="1"> Returning to our original questions of what can and need be learned about the relational properties of verbs, we conclude that argument structure is both a highly useful and learnable aspect of verb knowledge. We observe that relevant semantic properties of verb classes (such as causativity, or animacy of subject) may be successfully approximated through countable syntactic features. In spite of noisy data (arising from diverse sources such as tagging errors, or limitations of our extraction patterns), the lexical properties of interest are reflected in the corpora robustly enough to positively contribute to classification.</Paragraph> <Paragraph position="2"> We remark, however, that deep linguistic analysis cannot be eliminated--in our approach it is embedded in the selection of the features to count. Specifically, our features are derived through a detailed analysis of the differences in thematic role assignments across the verb classes under investigation. Thus, an important contribution of the work is the proposed mapping between the thematic assignment properties of Computational Linguistics Volume 27, Number 3 the verb classes, and the statistical distributions of their surface syntactic properties. We think that using such linguistically motivated features makes the approach very effective and easily scalable: we report a 54% reduction in error rate (a 68% reduction, when the human expert-based upper bound is considered), using only five features that are readily extractable from automatically annotated corpora.</Paragraph> </Section> class="xml-element"></Paper>