File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/p02-1036_intro.xml

Size: 5,778 bytes

Last Modified: 2025-10-06 14:01:28

<?xml version="1.0" standalone="yes"?>
<Paper uid="P02-1036">
  <Title>Dynamic programming for parsing and estimation of stochastic uni cation-based grammars</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Stochastic Uni cation-Based Grammars (SUBGs) use log-linear models (also known as exponential or MaxEnt models and Markov Random Fields) to dene probability distributions over the parses of a unication grammar. These grammars can incorporate virtually all kinds of linguistically important constraints (including non-local and non-context-free constraints), and are equipped with a statistically sound framework for estimation and learning.</Paragraph>
    <Paragraph position="1"> Abney (1997) pointed out that the non-context-free dependencies of a uni cation grammar require stochastic models more general than Probabilistic Context-Free Grammars (PCFGs) and Markov Branching Processes, and proposed the use of log-linear models for de ning probability distributions over the parses of a uni cation grammar. Unfortunately, the maximum likelihood estimator Abney proposed for SUBGs seems computationally intractable since it requires statistics that depend on the set of all parses of all strings generated by the grammar. This set is in nite (so exhaustive enumeration is impossible) and presumably has a very complex structure (so sampling estimates might take an extremely long time to converge).</Paragraph>
    <Paragraph position="2"> Johnson et al. (1999) observed that parsing and related tasks only require conditional distributions over parses given strings, and that such conditional distributions are considerably easier to estimate than joint distributions of strings and their parses. The conditional maximum likelihood estimator proposed by Johnson et al. requires statistics that depend on the set of all parses of the strings in the training cor-Computational Linguistics (ACL), Philadelphia, July 2002, pp. 279-286. Proceedings of the 40th Annual Meeting of the Association for pus. For most linguistically realistic grammars this set is nite, and for moderate sized grammars and training corpora this estimation procedure is quite feasible.</Paragraph>
    <Paragraph position="3"> However, our recent experiments involve training from the Wall Street Journal Penn Tree-bank, and repeatedly enumerating the parses of its 50,000 sentences is quite time-consuming. Matters are only made worse because we have moved some of the constraints in the grammar from the uni cation component to the stochastic component. This broadens the coverage of the grammar, but at the expense of massively expanding the number of possible parses of each sentence.</Paragraph>
    <Paragraph position="4"> In the mid-1990s uni cation-based parsers were developed that do not enumerate all parses of a string but instead manipulate and return a packed representation of the set of parses. This paper describes how to nd the most probable parse and the statistics required for estimating a SUBG from the packed parse set representations proposed by Maxwell III and Kaplan (1995). This makes it possible to avoid explicitly enumerating the parses of the strings in the training corpus.</Paragraph>
    <Paragraph position="5"> The methods proposed here are analogues of the well-known dynamic programming algorithms for Probabilistic Context-Free Grammars (PCFGs); speci cally the Viterbi algorithm for nding the most probable parse of a string, and the Inside-Outside algorithm for estimating a PCFG from unparsed training data.1 In fact, because Maxwell and Kaplan packed representations are just Truth Maintenance System (TMS) representations (Forbus and de Kleer, 1993), the statistical techniques described here should extend to non-linguistic applications of TMSs as well.</Paragraph>
    <Paragraph position="6"> Dynamic programming techniques have been applied to log-linear models before.</Paragraph>
    <Paragraph position="7"> Lafferty et al. (2001) mention that dynamic programming can be used to compute the statistics required for conditional estimation of log-linear models based on context-free grammars where the properties can include arbitrary functions of the input string. Miyao and Tsujii (2002) (which 1However, because we use conditional estimation, also known as discriminative training, we require at least some discriminating information about the correct parse of a string in order to estimate a stochastic uni cation grammar.</Paragraph>
    <Paragraph position="8"> appeared after this paper was accepted) is the closest related work we know of. They describe a technique for calculating the statistics required to estimate a log-linear parsing model with non-local properties from packed feature forests.</Paragraph>
    <Paragraph position="9"> The rest of this paper is structured as follows.</Paragraph>
    <Paragraph position="10"> The next section describes uni cation grammars and Maxwell and Kaplan packed representation.</Paragraph>
    <Paragraph position="11"> The following section reviews stochastic uni cation grammars (Abney, 1997) and the statistical quantities required for ef ciently estimating such grammars from parsed training data (Johnson et al., 1999). The nal substantive section of this paper shows how these quantities can be de ned directly in terms of the Maxwell and Kaplan packed representations. null The notation used in this paper is as follows. Variables are written in upper case italic, e.g., X;Y , etc., the sets they range over are written in script, e.g., X;Y, etc., while speci c values are written in lower case italic, e.g., x;y, etc. In the case of vector-valued entities, subscripts indicate particular components.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML