File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-1601_intro.xml
Size: 4,976 bytes
Last Modified: 2025-10-06 14:03:59
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1601"> <Title>Unsupervised Discovery of a Statistical Verb Lexicon</Title> <Section position="3" start_page="0" end_page="1" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> An important source of ambiguity that must be resolved by any natural language understanding system is the mapping between syntactic dependents of a predicate and the semantic roles1 that they each express. The ambiguity stems from the fact that each predicate can allow several alternate mappings, or linkings,2 between its semantic roles and their syntactic realization. For example, the verb increase can be used in two ways: (1) The Fed increased interest rates.</Paragraph> <Paragraph position="1"> (2) Interest rates increased yesterday.</Paragraph> <Paragraph position="2"> The instances have apparently similar surface syntax: they both have a subject and a noun phrase directly following the verb. However, while the subject of increase expresses the agent role in the first, it instead expresses the patient role in the second. Pairs of linkings such as this allowed by a single predicate are often called diathesis alternations (Levin, 1993).</Paragraph> <Paragraph position="3"> The current state-of-the-art approach to resolving this ambiguity is to use discriminative classifiers, trained on hand-tagged data, to classify the semantic role of each dependent (Gildea and Jurafsky, 2002; Pradhan et al., 2005; Punyakanok et al., 2005). A drawback of this approach is that even a relatively large training corpus exhibits considerable sparsity of evidence. The two main hand-tagged corpora are PropBank (Palmer et al., 2003) and FrameNet (Baker et al., 1998), the former of which currently has broader coverage. However, even PropBank, which is based on the 1M word WSJ section of the Penn Treebank, is insufficient in quantity and genre to exhibit many things. A perfectly common verb like flap occurs only twice, across all morphological forms. The first example is an adjectival use (flapping wings), and the second is a rare intransitive use with an agent argument and a path (ducks flapping over Washington). From this data, one cannot learn the basic alternation pattern for flap: the bird flapped its wings vs. the wings flapped.</Paragraph> <Paragraph position="4"> We propose to address the challenge of data sparsity by learning models of verb behavior directly from raw unannotated text, of which there is plenty. This has the added advantage of being easily extendible to novel text genres and languages, and the possibility of shedding light on the question of human language acquisition. The models learned by our unsupervised approach provide a new broad-coverage lexical resource which gives statistics about verb behavior, information that may prove useful in other language processing tasks, such as parsing. Moreover, they may be used discriminatively to label novel verb instances for semantic role. Thus we evaluate them both in terms of the verb alternations that they learn and their accuracy as semantic role labelers.</Paragraph> <Paragraph position="5"> This work bears some similarity to the substantial literature on automatic subcategorization frame acquisition (see, e.g., Manning (1993), Briscoe and Carroll (1997), and Korhonen (2002)). However, that research is focused on acquiring verbs' syntactic behavior, and we are focused on the acquisition of verbs' linking behavior. More relevant is the work of McCarthy and</Paragraph> <Section position="1" start_page="1" end_page="1" type="sub_section"> <SectionTitle> Relation Description </SectionTitle> <Paragraph position="0"> subj NP preceding verb np#n NP in the nth position following verb np NP that is not the subject and not immediately following verb cl#n Complement clause in the nth position following verb cl Complement clause not immediately following verb xcl#n Complement clause without subject in the nth position following verb xcl Complement clause without subject not immediately following verb {1,2,3} and x is a preposition.</Paragraph> <Paragraph position="1"> Korhonen (1998), which used a statistical model to identify verb alternations, relying on an existing taxonomy of possible alternations, as well as Lapata (1999), which searched a large corpus to find evidence of two particular verb alternations. There has also been some work on both clustering and supervised classification of verbs based on their alternation behavior (Stevenson and Merlo, 1999; Schulte im Walde, 2000; Merlo and Stevenson, 2001). Finally, Swier and Stevenson (2004) perform unsupervised semantic role labeling by using hand-crafted verb lexicons to replace supervised semantic role training data. However, we believe this is the first system to simultaneously discover verb roles and verb linking patterns from unsupervised data using a unified probabilistic model.</Paragraph> </Section> </Section> class="xml-element"></Paper>