File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/a97-1012_intro.xml

Size: 3,041 bytes

Last Modified: 2025-10-06 14:06:16

<?xml version="1.0" standalone="yes"?>
<Paper uid="A97-1012">
  <Title>References</Title>
  <Section position="3" start_page="0" end_page="72" type="intro">
    <SectionTitle>
1 Background
</SectionTitle>
    <Paragraph position="0"> Previous work in finite-state parsing at sentence level falls into two categories: the constructive approach or the reductionist approach.</Paragraph>
    <Paragraph position="1"> The origins of the constructive approach go back to the parser developed by Joshi (Joshi, 1996). It is based on a lexical description of large collections of syntactic patterns (up to several hundred thousand rules) using subcategorisation frames (verbs + essential arguments) and local grammars (Roche, 1993).</Paragraph>
    <Paragraph position="2"> It is, however, still unclear whether this heavily lexicalized method can account for all sentence structures actually found in corpora, especially due to the proliferation of non-argumental complements in corpus analysis.</Paragraph>
    <Paragraph position="3"> Another constructive line of research concentrates on identifying basic phrases such as in the FASTUS information extraction system (Appelt et al., 1993) or in the chunking approach proposed in (Abney,  1991; Federici et al., 1996). Attempts were made to mark the segments with additional syntactic information (e.g. subject or object) (Grefenstette, 1996) using simple heuristics, for the purpose of information retrieval, but not for robust parsing.</Paragraph>
    <Paragraph position="4"> The reductionist approach starts from a large number of alternative analyses that get reduced through the application of constraints. The constraints may be expressed by a set of elimination rules applied in a sequence (Voutilainen, Tapanainen, 1993) or by a set of restrictions applied in parallel (Koskenniemi et al., 1992). In a finite-state constraint grammar (Chanod, Tapanainen, 1996), the initial sentence network represents all the combinations of the lexical readings associated with each token. The acceptable readings result from the intersection of the initial sentence network with the constraint networks. This approach led to very broad coverage analyzers, with good linguistic granularity (the information is richer than in typical chunking systems). However, the size of the intermediate networks resulting from the intersection of the initial sentence network with the sets of constraints raises serious efficiency issues.</Paragraph>
    <Paragraph position="5"> The new approach proposed in this paper aims at merging the constructive and the reductionist approaches, so as to maintain the coverage and granularity of the constraint-based approach at a much lower computational cost. In particular, segments (chunks) are defined by constraints rather than patterns, in order to ensure broader coverage. At the same time, segments are defined in a cautious way, to ensure that clause boundaries and syntactic functions (e.g. subject, object, PP-Obj) can be defined with a high degree of accuracy.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML