File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/95/j95-2002_evalu.xml

Size: 6,299 bytes

Last Modified: 2025-10-06 14:00:14

<?xml version="1.0" standalone="yes"?>
<Paper uid="J95-2002">
  <Title>An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities</Title>
  <Section position="10" start_page="196" end_page="198" type="evalu">
    <SectionTitle>
B.2 Completion
</SectionTitle>
    <Paragraph position="0"> Unlike prediction, the completion step still involves iteration. Each complete state derived by completion can potentially feed other completions. An important detail here is to ensure that all contributions to a state's c~ and ~/are summed before proceeding with using that state as input to further completion steps.</Paragraph>
    <Paragraph position="1"> One approach to this problem is to insert complete states into a prioritized queue.</Paragraph>
    <Paragraph position="2"> The queue orders states by their start indices, highest first. This is because states corresponding to later expansions always have to be completed first before they can lead to the completion of expansions earlier on in the derivation. For each start index, the entries are managed as a first-in, first-out queue, ensuring that the dependency graph formed by the states is traversed in breadth-first order.</Paragraph>
    <Paragraph position="3"> The completion pass can now be implemented as follows. Initially, all complete states from the previous scanning step are inserted in the queue. States are then removed from the front of the queue and used to complete other states. Among the new states thus produced, complete ones are again added to the queue. The process iterates until no more states remain in the queue. Because the computation of probabilities already includes chains of unit productions, states derived from such productions need not be queued, which also ensures that the iteration terminates.</Paragraph>
    <Paragraph position="4"> A similar queuing scheme, with the start index order reversed, can be used for the reverse completion step needed in the computation of outer probabilities (Section 5.2).</Paragraph>
    <Paragraph position="5"> B.3 Efficient Parsing with Large Sparse Grammars During work with a moderate-sized, application-specific natural language grammar taken from the BeRP speech system (Jurafsky, Wooters, Tajchman, Segal, Stolcke, Foster, and Morgan 1994) we had an opportunity to optimize our implementation of the algorithm. Below we relate some of the lessons learned in the process.</Paragraph>
    <Paragraph position="6"> B.3.1 Speeding up matrix inversions. Both prediction and completion steps make use of a matrix R defined as a geometric series derived from a matrix P, R=I+p+p2+ .... (i_p) -1.</Paragraph>
    <Paragraph position="7">  Computational Linguistics Volume 21, Number 2 Both P and R are indexed by the nonterminals in the grammar. The matrix P is derived from the SCFG rules and probabilities (either the left-corner relation or the unit production relation).</Paragraph>
    <Paragraph position="8"> For an application using a fixed grammar the time taken by the precomputation of left-corner and unit production matrices may not be crucial, since it occurs offline. There are cases, however, when that cost should be minimized, e.g., when rule probabilities are iteratively reestimated.</Paragraph>
    <Paragraph position="9"> Even if the matrix P is sparse, the matrix inversion can be prohibitive for large numbers of nonterminals n. Empirically, matrices of rank n with a bounded number p of nonzero entries in each row (i.e., p is independent of n) can be inverted in time O(n2), whereas a full matrix of size n x n would require time O(n3).</Paragraph>
    <Paragraph position="10"> In many cases the grammar has a relatively small number of nonterminals that have productions involving other nonterminals in a left-corner (or the RHS of a unit production). Only those nonterminals can have nonzero contributions to the higher powers of the matrix P. This fact can be used to substantially reduce the cost of the matrix inversion needed to compute R.</Paragraph>
    <Paragraph position="11"> Let P' be a subset of the entries of P, namely, only those elements indexed by non-terminals that have a nonempty row in P. For example, for the left-corner computation, P' is obtained from P by deleting all rows and columns indexed by nonterminals that do not have productions starting with nonterminals. Let I' be the identity matrix over the same set of nonterminals as P'. Then R can be computed as</Paragraph>
    <Paragraph position="13"> = I+R',P.</Paragraph>
    <Paragraph position="14"> Here R' is the inverse of I' - P', and * denotes a matrix multiplication in which the left operand is first augmented with zero elements to match the dimensions of the right operand, P.</Paragraph>
    <Paragraph position="15"> The speedups obtained with this technique can be substantial. For a grammar with 789 nonterminals, of which only 132 have nonterminal productions, the left-corner matrix was computed in 12 seconds (including the final multiply with P and addition of/). Inversion of the full matrix I - P took 4 minutes, 28 seconds. 21 B.3.2 Linking and bottom-up filtering. As discussed in Section 4.8, the worst-case run-time on fully parameterized CNF grammars is dominated by the completion step. However, this is not necessarily true of sparse grammars. Our experiments showed that the computation is dominated by the generation of Earley states during the prediction steps.</Paragraph>
    <Paragraph position="16"> It is therefore worthwhile to minimize the total number of predicted states generated by the parser. Since predicted states only affect the derivation if they lead to subsequent scanning, we can use the next input symbol to constrain the relevant predictions. To this end, we compute the extended left-corner relation RLT, indicating which terminals can appear as left corners of which nonterminals. RLT is a Boolean 21 These figures are not very meaningful for their absolute values. All measurements were obtained on a Sun SPARCstation 2 with a CommonLisp/CLOS implementation of generic sparse matrices that was not particularly optimized for this task.  Andreas Stolcke Efficient Probabilistic Context-Free Parsing matrix with rows indexed by nonterminals and columns indexed by terminals. It can be computed as the product</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML