File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/p02-1029_intro.xml

Size: 3,004 bytes

Last Modified: 2025-10-06 14:01:30

<?xml version="1.0" standalone="yes"?>
<Paper uid="P02-1029">
  <Title>Inducing German Semantic Verb Classes from Purely Syntactic Subcategorisation Information</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> A long-standing linguistic hypothesis asserts a tight connection between the meaning components of a verb and its syntactic behaviour: To a certain extent, the lexical meaning of a verb determines its behaviour, particularly with respect to the choice of its arguments. The theoretical foundation has been established in extensive work on semantic verb classes such as (Levin, 1993) for English and (Vazquez et al., 2000) for Spanish: each verb class contains verbs which are similar in their meaning and in their syntactic properties.</Paragraph>
    <Paragraph position="1"> From a practical point of view, a verb classification supports Natural Language Processing tasks, since it provides a principled basis for filling gaps in available lexical knowledge. For example, the English verb classification has been used for applications such as machine translation (Dorr, 1997), word sense disambiguation (Dorr and Jones, 1996), and document classification (Klavans and Kan, 1998).</Paragraph>
    <Paragraph position="2"> Various attempts have been made to infer conveniently observable morpho-syntactic and semantic properties for English verb classes (Dorr and Jones, 1996; Lapata, 1999; Stevenson and Merlo, 1999; Schulte im Walde, 2000; McCarthy, 2001).</Paragraph>
    <Paragraph position="3"> To our knowledge this is the first work to obtain German verb classes automatically. We used a robust statistical parser (Schmid, 2000) to acquire purely syntactic subcategorisation information for verbs. The information was provided in form of probability distributions over verb frames for each verb. There were two conditions: the first with relatively coarse syntactic verb subcategorisation frames, the second a more delicate classification subdividing the verb frames of the first condition using prepositional phrase information (case plus preposition). In both conditions verbs were clustered using k-Means, an iterative, unsupervised, hard clustering method with well-known properties, cf. (Kaufman and Rousseeuw, 1990). The goal of a series of cluster analyses was (i) to find good values for the parameters of the clustering process, and (ii) to explore the role of the syntactic frame descriptions in verb classification, to demonstrate the implicit induction of lexical meaning components from syntactic properties, and to suggest ways in which the syntactic information might further be refined.</Paragraph>
    <Paragraph position="4"> Our long term goal is to support the development of Computational Linguistics (ACL), Philadelphia, July 2002, pp. 223-230. Proceedings of the 40th Annual Meeting of the Association for high-quality and large-scale lexical resources.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML