File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/99/j99-4004_abstr.xml
Size: 16,600 bytes
Last Modified: 2025-10-06 13:49:44
<?xml version="1.0" standalone="yes"?> <Paper uid="J99-4004"> <Title>Semiring Parsing</Title> <Section position="2" start_page="0" end_page="578" type="abstr"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> For a given grammar and string, there are many interesting quantities we can compute.</Paragraph> <Paragraph position="1"> We can determine whether the string is generated by the grammar; we can enumerate all of the derivations of the string; if the grammar is probabilistic, we can compute the inside and outside probabilities of components of the string. Traditionally, a different parser description has been needed to compute each of these values. For some parsers, such as CKY parsers, all of these algorithms (except for the outside parser) strongly resemble each other. For other parsers, such as Earley parsers, the algorithms for computing each value are somewhat different, and a fair amount of work can be required to construct each one. We present a formalism for describing parsers such that a single simple description can be used to generate parsers that compute all of these quantities and others. This will be especially useful for finding parsers for outside values, and for parsers that can handle general grammars, like Earley-style parsers.</Paragraph> <Paragraph position="2"> Although our description format is not limited to context-free grammars (CFGs), we will begin by considering parsers for this common formalism. The input string will be denoted wlw2... Wn. We will refer to the complete string as the sentence. A CFG G is a 4-tuple (N, ~, R, S) where N is the set of nonterminals including the start symbol S, ~ is the set of terminal symbols, and R is the set of rules, each of the form A --* a for A c N and a E (N U ~)*. We will use the symbol ~ for immediate derivation and for its reflexive, transitive closure.</Paragraph> <Paragraph position="3"> We will illustrate the similarity of parsers for computing different values using the CKY algorithm as an example. We can write this algorithm in its iterative form as shown in Figure 1. Here, we explicitly construct a Boolean chart, chart\[1..n, 1..IN I, 1..n + 1\]. Element chart\[i,A,j\] contains TRUE if and only if A G wi... wj-1. The algorithm consists of a first set of loops to handle the singleton productions, a second set of loops to handle the binary productions, and a return of the start symbol's chart entry.</Paragraph> <Paragraph position="4"> Next, we consider probabilistic grammars, in which we associate a probability with every rule, P(A --* a). These probabilities can be used to associate a probability * One Microsoft Way, Redmond, WA 98052. E-mail: joshuago@microsoft.com Q 1999 Association for Computational Linguistics Computational Linguistics Volume 25, Number 4 boolean chart\[1..n, 1..IN I, 1..n+1\] := FALSE; for s := 1 to n/* start position */ for each rule A -+ ws c R chart\[s, A, s+l\] := TRUE; for l := 2 to n/* length, shortest to longest */ for s := 1 to n-l+1/*startposition */ for t := 1 to / - 1/* split length */ for each rule A -+ BC C/ R /* extra TRUE for expository purposes */ chart\[s, A, s.l.l\] := chart\[s, A, s+l\] V (chart\[s, B, s + t\] A chart\[s / t, C, s + l\] A TRUE); return chart\[l, S, n+ 1\]; Figure 1 CKY recognition algorithm.</Paragraph> <Paragraph position="5"> float chart\[1..n, 1..IN\[, 1..n/1\] := 0; for s := I to n/* start position */ for each rule A --+ ws E R chart\[s, A, s+l\] := P(A --+ ws); for / := 2 to n/* length, shortest to longest */ for s := I to n-l+ l /* start position */ for t := 1 to 1 - 1/* split length */ for each rule A -+ BC c R chart\[s, A, s+l\] := chart\[s, A, s+l\] + (chart\[s, B, s+t\] x chart\[s+t, C, s+l\] x P(A -+ BC)); return chart\[l, S, n+ 1\]; CKY inside algorithm.</Paragraph> <Paragraph position="6"> with a particular derivation, equal to the product of the rule probabilities used in the derivation, or to associate a probability with a set of derivations, A ~ wi. * * wj-1 equal to the sum of the probabilities of the individual derivations. We call this latter probability the inside probability of i,A,j. We can rewrite the CKY algorithm to compute the inside probabilities, as shown in Figure 2 (Baker 1979; Lari and Young 1990). Notice how similar the inside algorithm is to the recognition algorithm: essentially, all that has been done is to substitute + for V, x for A, and P(A ~ ws) and P(A ~ BC) for TRUE. For many parsing algorithms, this, or a similarly simple modification, is all that is needed to create a probabilistic version of the algorithm. On the other hand, a simple substitution is not always sufficient. To give a trivial example, if in the CKY recognition algorithm we had written</Paragraph> <Paragraph position="8"> instead of the less natural chart\[s, A, s/l\] := chart\[s,A, s,l,l\] V chart\[s, B, s+t\] A chart\[s+t, C, s-t-l\] A TRUE; larger changes would be necessary to create the inside algorithm.</Paragraph> <Paragraph position="9"> Besides recognition, four other quantities are commonly computed by parsing algorithms: derivation forests, Viterbi scores, number of parses, and outside probabilities. The first quantity, a derivation forest, is a data structure that allows one to</Paragraph> <Section position="1" start_page="574" end_page="576" type="sub_section"> <SectionTitle> Goodman Semiring Parsing </SectionTitle> <Paragraph position="0"> efficiently compute the set of legal derivations of the input string. The derivation forest is typically found by modifying the recognition algorithm to keep track of &quot;back pointers&quot; for each cell of how it was produced. The second quantity often computed is the Viterbi score, the probability of the most probable derivation of the sentence.</Paragraph> <Paragraph position="1"> This can typically be computed by substituting x for A and max for V. Less commonly computed is the total number of parses of the sentence, which, like the inside values, can be computed using multiplication and addition; unlike for the inside values, the probabilities of the rules are not multiplied into the scores. There is one last commonly computed quantity, the outside probabilities, which we will describe later, in Section 4.</Paragraph> <Paragraph position="2"> One of the key points of this paper is that all five of these commonly computed quantities can be described as elements of complete semirings (Kuich 1997).</Paragraph> <Paragraph position="3"> The relationship between grammars and semirings was discovered by Chomsky and Schiitzenberger (1963), and for parsing with the CKY algorithm, dates back to Teitelbaum (1973). A complete semiring is a set of values over which a multiplicative operator and a commutative additive operator have been defined, and for which infinite summations are defined. For parsing algorithms satisfying certain conditions, the multiplicative and additive operations of any complete semiring can be used in place of A and V, and correct values will be returned. We will give a simple normal form for describing parsers, then precisely define complete semirings, and the conditions for correctness.</Paragraph> <Paragraph position="4"> We now describe our normal form for parsers, which is very similar to that used by Shieber, Schabes, and Pereira (1995) and by Sikkel (1993). This work can be thought of as a generalization from their work in the Boolean semiring to semirings in general.</Paragraph> <Paragraph position="5"> In most parsers, there is at least one chart of some form. In our normal form, we will use a corresponding, equivalent concept, items. Rather than, for instance, a chart element chart\[i,A,j\], we will use an item \[i,A,j\]. Furthermore, rather than use explicit, procedural descriptions, such as</Paragraph> <Paragraph position="7"> we will use inference rules such as</Paragraph> <Paragraph position="9"> The meaning of an inference rule is that if the top line is all true, then we can conclude the bottom line. For instance, this example inference rule can be read as saying that if A ~ BC and B G wi... Wk-1 and C ~ wk... wj-1, then A G wl... Wj_l.</Paragraph> <Paragraph position="10"> The general form for an inference rule will be</Paragraph> <Paragraph position="12"> where if the conditions A1 ... Ak are all true, then we infer that B is also true. The Ai can be either items, or (in an extension of the usual convention for inference rules) rules, such as R(A ~ BC). We write R(A ---* BC) rather than A --~ BC to indicate that we could be interested in a value associated with the rule, such as the probability of the rule if we were computing inside probabilities. If an Ai is in the form R(...), we call it a rule. All of the Ai must be rules or items; when we wish to refer to both rules and items, we use the word terms.</Paragraph> <Paragraph position="13"> We now give an example of an item-based description, and its semantics. Figure 3 gives a description of a CKY-style parser. For this example, we will use the inside</Paragraph> <Paragraph position="15"> Our first step is to use the unary rule,</Paragraph> <Paragraph position="17"> The effect of the unary rule will exactly parallel the first set of loops in the CKY inside algorithm. We will instantiate the free variables of the unary rule in every possible way. For instance, we instantiate the free variable i with the value 1, and the free variable A with the nonterminal X. Since wl = x, the instantiated rule is then</Paragraph> <Paragraph position="19"> Because the value of the top line of the instantiated unary rule, R(X ---, x), has value 0.8, we deduce that the bottom line, \[1,X, 2\], has value 0.8. We instantiate the rule in two other ways, and compute the following chart values:</Paragraph> <Paragraph position="21"> Next, we will use the binary rule, R(A --* BC) \[i, B, k\] \[k, C,j\] \[i,A,j\] The effect of the binary rule will parallel the second set of loops for the CKY inside algorithm. Consider the instantiation i = 1, k -- 2, j = 3, A -- X, B = X, C -- X,</Paragraph> <Paragraph position="23"/> </Section> <Section position="2" start_page="576" end_page="576" type="sub_section"> <SectionTitle> Goodman Semiring Parsing </SectionTitle> <Paragraph position="0"> We use the multiplicative operator of the semiring of interest to multiply together the values of the top line, deducing that \[1, X, 3\] = 0.2 x 0.8 x 0.8 = 0.128. Similarly,</Paragraph> <Paragraph position="2"> There are two more ways to instantiate the conditions of the binary rule:</Paragraph> <Paragraph position="4"> The first has the value 1 x 0.8 x 0.128 = 0.1024, and the second also has the value 0.1024. When there is more than one way to derive a value for an item, we use the additive operator of the semiring to sum them up. Thus, \[1, S, 4\] -- 0.2048. Since \[1, S, 4\] is the goal item for the CKY parser, we know that the inside value for xxx is 0.2048.</Paragraph> <Paragraph position="5"> The goal item exactly parallels the return statement of the CKY inside algorithm.</Paragraph> </Section> <Section position="3" start_page="576" end_page="577" type="sub_section"> <SectionTitle> 1.1 Earley Parsing </SectionTitle> <Paragraph position="0"> Many parsers are much more complicated than the CKY parser, and we will need to expand our notation a bit to describe them. Earley's algorithm (Earley 1970) exhibits most of the complexities we wish to discuss. Earley's algorithm is often described as a bottom-up parser with top-down filtering. In a probabilistic framework, the bottom-up sections compute probabilities, while the top-down filtering nonprobabilistically removes items that cannot be derived. To capture these differences, we expand our notation for deduction rules, to the following:</Paragraph> <Paragraph position="2"> conditions with values in whichever semiring we are using. 1 While the values of all main conditions are multiplied together to yield the value for the item under the line, the side conditions are interpreted in a Boolean manner: if all of them are nonzero, the rule can be used, but if any of them are zero, it cannot be. Other than for checking whether they are zero or nonzero, their values are ignored.</Paragraph> <Paragraph position="3"> Figure 4 gives an item-based description of Earley's parser. We assume the addition of a distinguished nonterminal S' with a single rule S' --+ S. An item of the form \[i,A --, c~ ,J fl, j\] asserts that A ~ aft G wi... wj-lfl.</Paragraph> <Paragraph position="4"> variables in the A, B, C). The side conditions usually cannot depend on any contextual information, such as the grandfather of A1, which would not be well defined, since there might be many derivations of A1. Of course, one could encode the grandfather of A1 as a variable in the item A1, and then have a dependency on that variable. This would guarantee that the context was unique and well defined. The prediction rule includes a side condition, making it a good example. The rule is: R(B~'7) \[i,A~a. Bfl, j\] ~,--~ 7_~ . ~,j\] Through the prediction rule, Earley's algorithm guarantees that an item of the form ~', B -+ * '7,j\] can only be produced if S ~ Wl ... wj_lB6 for some 6; this top-down filtering leads to significantly more efficient parsing for some grammars than the CKY algorithm. The prediction rule combines side and main conditions. The side condition, \[i,A --+ ce * Bfl,j\], provides the top-down filtering, ensuring that only items that might be used later by the completion rule can be predicted, while the main condition, R(B --+ &quot;7), provides the probability of the relevant rule. The side condition is interpreted in a Boolean fashion, while the main condition's actual probability is used.</Paragraph> <Paragraph position="5"> Unlike the CKY algorithm, Earley's algorithm can handle grammars with epsilon (e), unary, and n-ary branching rules. In some cases, this can significantly complicate parsing. For instance, given unary rules A --+ B and B --+ A, a cycle exists. This kind of cycle may allow an infinite number of different derivations, requiring an infinite summation to compute the inside probabilities. The ability of item-based parsers to handle these infinite loops with relative ease is a major attraction.</Paragraph> </Section> <Section position="4" start_page="577" end_page="578" type="sub_section"> <SectionTitle> 1.2 Overview </SectionTitle> <Paragraph position="0"> This paper will simplify the development of new parsers in three important ways.</Paragraph> <Paragraph position="1"> First, it will simplify specification of parsers: the item-based description is simpler than a procedural description. Second, it will make it easier to generalize parsers across tasks: a single item-based description can be used to compute values for a variety of applications, simply by changing semirings. This will be especially advantageous for parsers that can handle loops resulting from rules like A --+ A and computations resulting from C/ productions, both of which typically lead to infinite stuns. In these cases, the procedure for computing an infinite sum differs from semi-</Paragraph> </Section> <Section position="5" start_page="578" end_page="578" type="sub_section"> <SectionTitle> Goodman Semiring Parsing </SectionTitle> <Paragraph position="0"> ring to semiring, and the fact that we can specify that a parser computes an infinite sum separately from its method of computing that sum will be very helpful. The third use of these techniques is for computing outside probabilities, values related to the inside probabilities that we will define later. Unlike the other quantities we wish to compute, outside probabilities cannot be computed by simply substituting a different semiring into either an iterative or item-based description. Instead, we will show how to compute the outside probabilities using a modified interpreter of the same item-based description used for computing the other values.</Paragraph> <Paragraph position="1"> In the next section, we describe the basics of semiring parsing. In Section 3, we derive formulas for computing most of the values in semiring parsers, except outside values, and then in Section 4, show how to compute outside values as well. In Section 5, we give an algorithm for interpreting an item-based description, followed in Section 6 by examples of using semiring parsers to solve a variety of problems.</Paragraph> <Paragraph position="2"> Section 7 discusses previous work, and Section 8 concludes the paper.</Paragraph> </Section> </Section> class="xml-element"></Paper>