File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/91/h91-1045_intro.xml
Size: 3,662 bytes
Last Modified: 2025-10-06 14:05:00
<?xml version="1.0" standalone="yes"?> <Paper uid="H91-1045"> <Title>Calculating the Probability of a Partial Parse of a Sentence</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> INTRODUCTION </SectionTitle> <Paragraph position="0"> Stochastic context-free grammars have been suggested for a role in speech-recognition algorithms, e.g. \[1, 4, 9\]. In order to he fully effective as an adjunct to speech recognition, the power of the probability apparatus needs to be applied to the problem of controlling the branched search for parses of ambiguous input.</Paragraph> <Paragraph position="1"> The method we suggest for doing this employs shift-reduce (LR) parsing of context-free grammars together with a probability based score for ranking competing parse hypotheses. Shift-reduce parsers can be made very efficient for unambiguous grammars (and unambiguous inputs) and Tomita \[7\] shows how much of this efficiency can be maintained in the face of ambiguity. This makes this class of parsers a good candidate for many speech problems. The structural simplicity of shift-reduce parsers makes the analysis of the interaction of the parser with the stochastic properties of the language particularly clean.</Paragraph> <Paragraph position="2"> The score we calculate is the likelihood that the collection of subtrees constructed by the parser so far can he completed into a full parse tree by means of the steps that the parser is constrained to follow, taking into account all possibilities for the unscanned part of the input. This score is the same as that suggested by Wright \[9\], who also studied shift-reduce parsers. We provide an exact method for calculating the desired quantity, while Wright's calculation requires several approximations.</Paragraph> <Paragraph position="3"> Why do we care about this particular quantity? As a first rough answer, note that when this quantity is zero, then the hypothesis should be abandoned; there is no possibility that the parse tree can he completed. Furthermore, the bigger this quantity is, the larger the mass of the probability space that can be explored by pursuing that particular hypothesis.</Paragraph> <Paragraph position="4"> For a more detailed answer, consider a breadth first search of candidate hypotheses. For each one we would like to know which is the correct one, given the grammar and the text segment we have observed: a,,...,at. We would like to calculate P(Hla,,... , a~).</Paragraph> <Paragraph position="5"> This quantity is equal to P(H&al ..... a~)/P(al,..., a~).</Paragraph> <Paragraph position="6"> The denominator in the above expression P(a,,..., at) is the grand probability of seeing the observations al,..., at given the grammar. This is some fixed quantity. We might not know what it is, but as long as we are only comparing hypotheses that all explain the same string ah.--, at, this quantity is a scaling factor that can safely be ignored. The numerator is the quantity we intend to calculate.</Paragraph> <Paragraph position="7"> For a depth-first or best-first search, as employed by \[1\], the quantity P(al,..., at) cannot be ignored. This makes the depth-first approach significantly more complicated.</Paragraph> <Paragraph position="8"> In the rest of this paper we will restrict our attention to grammars in Chomsky-normal form. A similar probability analysis can be made for arbitrary context-free grammars, but the notation becomes cumbersome and the formulae more complicated. We note that all the topics in this paper are treated in considerably more detail, including proofs, in \[3\].</Paragraph> </Section> class="xml-element"></Paper>