File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/p03-2006_intro.xml

Size: 3,477 bytes

Last Modified: 2025-10-06 14:01:48

<?xml version="1.0" standalone="yes"?>
<Paper uid="P03-2006">
  <Title>Finding non-local dependencies: beyond pattern matching</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Non-local dependencies (also called long-distance, long-range or unbounded) appear in many frequent linguistic phenomena, such as passive, WHmovement, control and raising etc. Although much current research in natural language parsing focuses on extracting local syntactic relations from text, non-local dependencies have recently started to attract more attention. In (Clark et al., 2002) long-range dependencies are included in parser's probabilistic model, while Johnson (2002) presents a method for recovering non-local dependencies after parsing has been performed.</Paragraph>
    <Paragraph position="1"> More specifically, Johnson (2002) describes a pattern-matching algorithm for inserting empty nodes and identifying their antecedents in phrase structure trees or, to put it differently, for recovering non-local dependencies. From a training corpus with annotated empty nodes Johnson's algorithm first extracts those local fragments of phrase trees which connect empty nodes with their antecedents, thus &amp;quot;licensing&amp;quot; corresponding non-local dependencies. Next, the extracted tree fragments are used as patterns to match against previously unseen phrase structure trees: when a pattern is matched, the algorithm introduces a corresponding non-local dependency, inserting an empty node and (possibly) co-indexing it with a suitable antecedent.</Paragraph>
    <Paragraph position="2"> In (Johnson, 2002) the author notes that the biggest weakness of the algorithm seems to be that it fails to robustly distinguish co-indexed and free empty nodes and it is lexicalization that may be needed to solve this problem. Moreover, the author suggests that the algorithm may suffer from overlearning, and using more abstract &amp;quot;skeletal&amp;quot; patterns may be helpful to avoid this.</Paragraph>
    <Paragraph position="3"> In an attempt to overcome these problems we developed a similar approach using dependency structures rather than phrase structure trees, which, moreover, extends bare pattern matching with machine learning techniques. A different definition of pattern allows us to significantly reduce the number of patterns extracted from the same corpus. Moreover, the patterns we obtain are quite general and in most cases directly correspond to specific linguistic phenomena. This helps us to understand what information about syntactic structure is important for the recovery of non-local dependencies and in which cases lexicalization (or even semantic analysis) is required. On the other hand, using these simplified patterns, we may loose some structural information important for recovery of non-local dependencies. To avoid this, we associate patterns with certain structural features and use statistical classifi- null cation methods on top of pattern matching.</Paragraph>
    <Paragraph position="4"> The evaluation of our algorithm on data automatically derived from the Penn Treebank shows an increase in both precision and recall in recovery of non-local dependencies by approximately 10% over the results reported in (Johnson, 2002). However, additional work remains to be done for our algorithm to perform well on the output of a parser.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML