File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/c02-1040_intro.xml

Size: 5,006 bytes

Last Modified: 2025-10-06 14:01:17

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1040">
  <Title>Learning Verb Argument Structure from Minimally Annotated Corpora</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The classification of verbs based on their underlying thematic structure involves distinguishing verbs that take the same number and category of arguments but assign di erent thematic roles to these arguments. This is often termed as the classification of verb diathesis roles or the lexical semantics of predicates in natural language (see (Levin, 1993; Mc-Carthy and Korhonen, 1998; Stevenson and Merlo, 1999; Stevenson et al., 1999; Lapata, 1999; Lapata and Brew, 1999; Schulte im Walde, 2000)). Following the method described in (Merlo and Stevenson, 2001; Stevenson and Merlo, 1999; Stevenson et This research was supported in part by NSF grant SBR-8920230. Thanks to Paola Merlo, Dan Gildea, David Chiang, Aravind Joshi and the anonymous reviewers for their comments. Also thanks to Virginie Nanta for an earlier collaboration with the first author on an unsupervised version of this work.</Paragraph>
    <Paragraph position="1"> al., 1999), we exploit the distributions of some selected features from the local context of a verb but we di er from these previous studies in the use of minimally annotated data to construct our classifier.</Paragraph>
    <Paragraph position="2"> The data we use is only passed through a part-of-speech tagger and a chunker which is used to identify base phrasal categories such as noun-phrase and verb-phrase chunks to identify potential arguments of each verb.</Paragraph>
    <Paragraph position="3"> Lexical knowledge acquisition plays an important role in corpus-based NLP. Knowledge of verb selectional preferences and verb subcategorization frames (SFs) can be extracted from corpora for use in various NLP tasks. However, knowledge of SFs is often not fine-grained enough to distinguish various verbs and the kinds of arguments that they can select. We consider a di cult task in lexical knowledge acquisition: that of finding the underlying argument structure which can be used to relate the observed list of SFs of a particular verb. The task involves identifying the roles assigned by the verb to its arguments. Consider the following verbs, each occuring with intransitive and transitive SFs1.</Paragraph>
    <Paragraph position="4"> Unergative  (1) a. The horse raced past the barn.</Paragraph>
    <Paragraph position="5"> b. The jockey raced the horse past the barn.</Paragraph>
    <Paragraph position="6"> Unaccusative (2) a. The butter melted in the pan.</Paragraph>
    <Paragraph position="7"> b. The cook melted the butter in the pan.</Paragraph>
    <Paragraph position="8"> 1The examples are taken from (Merlo and Stevenson, 2001). See (Levin, 1993) for more information. The particular categorization that we use here is motivated in (Stevenson and Merlo, 1997) Object-Drop (3) a. The boy washed.</Paragraph>
    <Paragraph position="9"> b. The boy washed the hall.</Paragraph>
    <Paragraph position="10">  Each of the verbs above occurs with both the intransitive and transitive SFs. However, the verbs di er in their underlying argument structure. Each verb assigns a di erent role to their arguments in the two subcategorization possibilities. For each verb above, the following lists the roles assigned to each of the noun phrase arguments in the SFs permitted for the verb. This information can be used for extracting appropriate information about the relationships between the verb and its arguments.</Paragraph>
    <Paragraph position="11">  TRAN: NPagent washed NPtheme Our task is to identify the transitive and intransitive usage of a particular verb as being related via this notion of argument structure. This is called the argument structure classification of the verb. In the remainder of this paper we will look at the problem of placing verbs into such classes automatically.</Paragraph>
    <Paragraph position="12"> Our results in this paper serve as a replication and extension of the results in (Merlo and Stevenson, 2001). Our main contribution in this paper is to show that a subcategorization frame (SF) learning algorithm previously applied to Czech (Sarkar and Zeman, 2000) can be applied to English and evaluated by classifying verbs into verb alternation classes. We perform this task using only tagged and chunked data as input to our subcategorization frame learning stage. Our result can be compared to previous work (Merlo and Stevenson, 2001) which did not use SF learning but used a 65M word WSJ corpus which was tagged as well as automatically parsed with a Treebank trained statistical parser.</Paragraph>
    <Paragraph position="13"> It is important to note that (Merlo and Stevenson, 2001) extract some features using the tagged information (in fact, those features that we use SF learning to extract) and other features using parse trees.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML