File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/95/j95-2002_abstr.xml

Size: 6,337 bytes

Last Modified: 2025-10-06 13:48:22

<?xml version="1.0" standalone="yes"?>
<Paper uid="J95-2002">
  <Title>An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities</Title>
  <Section position="2" start_page="0" end_page="166" type="abstr">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> Context-free grammars are widely used as models of natural language syntax. In their probabilistic version, which defines a language as a probability distribution over strings, they have been used in a variety of applications: for the selection of parses for ambiguous inputs (Fujisaki et al. 1991); to guide the rule choice efficiently during parsing (Jones and Eisner 1992); to compute island probabilities for non-linear parsing (Corazza et al. 1991). In speech recognition, probabilistic context-free grammars play a central role in integrating low-level word models with higher-level language models (Ney 1992), as well as in non-finite-state acoustic and phonotactic modeling (Lari and Young 1991). In some work, context-free grammars are combined with scoring functions that are not strictly probabilistic (Nakagawa 1987), or they are used with context-sensitive and/or semantic probabilities (Magerman and Marcus 1991; Magerman and Weir 1992; Jones and Eisner 1992; Briscoe and Carroll 1993).</Paragraph>
    <Paragraph position="1"> Although clearly not a perfect model of natural language, stochastic context-free grammars (SCFGs) are superior to nonprobabilistic CFGs, with probability theory providing a sound theoretical basis for ranking and pruning of parses, as well as for integration with models for nonsyntactic aspects of language. All of the applications listed above involve (or could potentially make use of) one or more of the following  * Speech Technology and Research Laboratory, SRI International, 333 Ravenswood Ave., Menlo Park, CA 94025. E-mail: stolcke@speech.sri.com. (c) 1995 Association for Computational Linguistics Computational Linguistics Volume 21, Number 2 standard tasks, compiled by Jelinek and Lafferty (1991). 1 .</Paragraph>
    <Paragraph position="2"> .</Paragraph>
    <Paragraph position="3"> 3.</Paragraph>
    <Paragraph position="4"> .</Paragraph>
    <Paragraph position="5">  What is the probability that a given string x is generated by a grammar G? What is the single most likely parse (or derivation) for x? What is the probability that x occurs as a prefix of some string generated by G (the prefix probability of x)? How should the parameters (e.g., rule probabilities) in G be chosen to maximize the probability over a training set of strings? The algorithm described in this article can compute solutions to all four of these problems in a single framework, with a number of additional advantages over previously presented isolated solutions.</Paragraph>
    <Paragraph position="6"> Most probabilistic parsers are based on a generalization of bottom-up chart parsing, such as the CYK algorithm. Partial parses are assembled just as in nonprobabilistic parsing (modulo possible pruning based on probabilities), while substring probabilities (also known as &amp;quot;inside&amp;quot; probabilities) can be computed in a straightforward way. Thus, the CYK chart parser underlies the standard solutions to problems (1) and (4) (Baker 1979), as well as (2) (Jelinek 1985). While the Jelinek and Lafferty (1991) solution to problem (3) is not a direct extension of CYK parsing, the authors nevertheless present their algorithm in terms of its similarities to the computation of inside probabilities. null In our algorithm, computations for tasks (1) and (3) proceed incrementally, as the parser scans its input from left to right; in particular, prefix probabilities are available as soon as the prefix has been seen, and are updated incrementally as it is extended. Tasks (2) and (4) require one more (reverse) pass over the chart constructed from the input.</Paragraph>
    <Paragraph position="7"> Incremental, left-to-right computation of prefix probabilities is particularly important since that is a necessary condition for using SCFGs as a replacement for finite-state language models in many applications, such a speech decoding. As pointed out by Jelinek and Lafferty (1991), knowing probabilities P (Xo... xi) for arbitrary prefixes Xo... xi enables probabilistic prediction of possible follow-words Xi+l, as P(xi+l I xo...xi) = P(Xo...xixi+I)/P(xo...xi). These conditional probabilities can then be used as word transition probabilities in a Viterbi-style decoder or to incrementally compute the cost function for a stack decoder (Bahl, Jelinek, and Mercer 1983).</Paragraph>
    <Paragraph position="8"> Another application in which prefix probabilities play a central role is the extraction of n-gram probabilities from SCFGs (Stolcke and Segal 1994). Here, too, efficient incremental computation saves time, since the work for common prefix strings can be shared.</Paragraph>
    <Paragraph position="9"> The key to most of the features of our algorithm is that it is based on the top-down parsing method for nonprobabilistic CFGs developed by Earley (1970). Earley's algorithm is appealing because it runs with best-known complexity on a number of special classes of grammars. In particular, Earley parsing is more efficient than the bottom-up methods in cases where top-down prediction can rule out potential parses of substrings. The worst-case computational expense of the algorithm (either for the</Paragraph>
    <Section position="1" start_page="166" end_page="166" type="sub_section">
      <SectionTitle>
Andreas Stolcke Efficient Probabilistic Context-Free Parsing
</SectionTitle>
      <Paragraph position="0"> known specialized algorithms, but can be substantially better on well-known grammar classes.</Paragraph>
      <Paragraph position="1"> Earley's parser (and hence ours) also deals with any context-free rule format in a seamless way, without requiring conversions to Chomsky Normal Form (CNF), as is often assumed. Another advantage is that our probabilistic Earley parser has been extended to take advantage of partially bracketed input, and to return partial parses on ungrammatical input. The latter extension removes one of the common objections against top-down, predictive (as opposed to bottom-up) parsing approaches (Magerman and Weir 1992).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML