File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/p99-1074_metho.xml

Size: 11,098 bytes

Last Modified: 2025-10-06 14:15:28

<?xml version="1.0" standalone="yes"?>
<Paper uid="P99-1074">
  <Title>Robust, Finite-State Parsing for Spoken Language Understanding</Title>
  <Section position="4" start_page="574" end_page="574" type="metho">
    <SectionTitle>
3 The Power of Regular Grammars
</SectionTitle>
    <Paragraph position="0"> Tomita (1986) has argued that context-free grammars (CFGs) are over-powered for natural language. Chart parsers are designed to deal with the worst case of very-deep or infinite self-embedding allowed by CFGs. However, in natural language this worst case does not occur. Thus, broad coverage Generalized Left-Right (GLR) parsers based on Tomita's algorithm, which ignore the worst case scenario, case-flame style regular expressions.</Paragraph>
    <Paragraph position="1"> are in practice more efficient and faster than comparable chart-parsers (Briscoe and Carroll, 1993).</Paragraph>
    <Paragraph position="2"> PROFER explicitly disallows the worst case of center-self-embedding that Tomita's GLR design allows -- but ignores. Aside from infinite center-self-embedding, a regular grammar formalism like PROFER's can be used to define every pattern in natural language definable by a GLR parser.</Paragraph>
  </Section>
  <Section position="5" start_page="574" end_page="575" type="metho">
    <SectionTitle>
4 The Compilation Process
</SectionTitle>
    <Paragraph position="0"> The following small grammar will serve as the basis for a high-level description of the compilation process.</Paragraph>
    <Paragraph position="2"> In Kaiser et al. (1999) the relationship between PROFER's compilation process and that of both Pereira and Wright's (1997) FSAs and CMU's Phoenix system has been described.</Paragraph>
    <Paragraph position="3"> Here we wish to describe what happens during PROFER's compilation stage in terms of the Left-Right parsing notions of item-set formation and reduction.</Paragraph>
    <Paragraph position="4"> As compilation begins the FSM always starts at state 0:0 (i.e., net 0, start state 0) and traverses an arc labeled by the top-level net name to the 0:1 state (i.e., net 0, final state 1), as illustrated in Figure 5. This initial arc is then re-written by each of its rewrite patterns (Figure 5).</Paragraph>
    <Paragraph position="5"> As each new net within the grammar description is encountered it receives a unique net-ID number, the compilation descends recursively into that new sub-net (Figure 5), reads in its  grammar description file, and compiles it. Since rewrite names are unique only within the net in which they appear, they can be processed iteratively during compilation, whereas net names must be processed recursively within the scope of the entire grammar's definition to allow for re-use.</Paragraph>
    <Paragraph position="6"> As each element within a rewrite pattern is encountered a structure describing its exact context is filled in. All terminals that appear in the same context are grouped together as a &amp;quot;context-group&amp;quot; or simply &amp;quot;context.&amp;quot; So arcs in the final FSM are traversed by &amp;quot;contexts&amp;quot; not terminals.</Paragraph>
    <Paragraph position="7"> When a net name itself traverses an arc it is glued into place contextually with e arcs (i.e., NULL arcs) (Figure 6). Since net names, like any other pattern element, are wrapped inside of a context structure before being situated in the FSM, the same net name can be re-used inside of many different contexts, as in Figure 6.  As the end of each net definition file is reached, all of its NULL arcs are removed. Each initial state of a sub-net is assumed into its parent state -- which is equivalent to item-set formation in that parent state (Figure 7 left-side). Each final state of a sub-net is erased, and its incoming arcs are rerouted to its terminal parent's state, thus performing a reduction (Fig-</Paragraph>
  </Section>
  <Section position="6" start_page="575" end_page="576" type="metho">
    <SectionTitle>
5 The Parsing Process
</SectionTitle>
    <Paragraph position="0"> At run-time, the parse proceeds in a strictly breadth-first manner (Figure 8,(Kaiser et al., 1999)). Each destination state within a parse is named by a hash-table key string composed of a sequence of &amp;quot;net:state&amp;quot; combinations that uniquely identify the location of that state within the FSM (see Figure 8). These &amp;quot;net:state&amp;quot; names effectively represent a snapshot of the stack-configuration that would be seen in a parallel GLR parser.</Paragraph>
    <Paragraph position="1"> PROFER deals with ambiguity by &amp;quot;splitting&amp;quot; the branches of its graph-structured stack (as is done in a Generalized Left-Right parser (Tomita, 1986)). Each node within the graph-structured stack holds a &amp;quot;token&amp;quot; that records the information needed to build a bracketed parse-tree for any given branch.</Paragraph>
    <Paragraph position="2"> When partial-paths converge on the same state within the FSM they are scored heuristically, and all but the set of highest scoring partial paths are pruned away. Currently the heuristics favor interpretations that cover the most input with the fewest slots. Command line parameters can be used to refine the heuristics, so that certain kinds of structures be either minimized or maximized over the parse.</Paragraph>
    <Paragraph position="3"> Robustness within this scheme is achieved by allowing multiple paths to be propagated in parallel across the input space. And as each such</Paragraph>
  </Section>
  <Section position="7" start_page="576" end_page="577" type="metho">
    <SectionTitle>
6 Discussion
</SectionTitle>
    <Paragraph position="0"> Many researchers have looked at ways to improve corpus-based language modeling techniques. One way is to parse the training set with a structural parser, build statistical models of the occurrence of structural elements, and then use these statistics to build or augment an n-gram language model.</Paragraph>
    <Paragraph position="1"> Gillet and Ward (1998) have reported reductions in perplexity using a stochastic context-free grammar (SCFG) defining both simple semantic &amp;quot;classes&amp;quot; like dates and times, and degenerate classes for each individual vocabulary word. Thus, in building up class statistics over a corpus parsed with their grammar they are able to capture both the traditional n-gram word sequences plus statistics about semantic class sequences. null Briscoe has pointed out that using stochastic context-free grammars (SCFGs) as the basis for language modeling, &amp;quot;...means that information about the probability of a rule applying at a particular point in a parse derivation is lost&amp;quot; (1993). For this reason Briscoe developed a GLR parser as a more &amp;quot;natural way to obtain a finite-state representation ...&amp;quot; on which the statistics of individual &amp;quot;reduce&amp;quot; actions could be determined. Since PROFER's state names effectively represent the stack-configurations of a parallel GLR parser it also offers the ability to perform the full-context statistical parsing that Briscoe has called for.</Paragraph>
    <Paragraph position="2"> Chelba and Jelinek (1999) use a structural language model (SLM) to incorporate the longer-range structural knowledge represented in statistics about sequences of phrase-headword/non-terminal-tag elements exposed by a tree-adjoining grammar. Unlike SCFGs their statistics are specific to the structural context in which head-words occur. They have shown both reduced perplexity and improved word error rate (WER) over a conventional tri-gram system.</Paragraph>
    <Paragraph position="3"> One can also reduce complexity and improve word-error rates by widening the speech recognition problem to include modeling not only the word sequence, but the word/part-of-speech (POS) sequence. Heeman and Allen (1997) has shown that doing so also aids in identifying speech repairs and intonational boundaries in spontaneous speech.</Paragraph>
    <Paragraph position="4"> However, all of these approaches rely on corpus-based language modeling, which is a large and expensive task. In many practical uses of spoken language technology, like using simple structured dialogues for class room instruction (as can be done with the CSLU toolkit (Sutton et al., 1998)), corpus-based language modeling may not be a practical possibility.</Paragraph>
    <Paragraph position="5"> In structured dialogues one approach can be to completely constrain recognition by the known expectations at a given state. Indeed, the CSLU toolkit provides a generic recognizer, which accepts a set of vocabulary and word sequences defined by a regular grammar on a perstate basis. Within this framework the task of a recognizer is to choose the best phonetic path through the finite-state machine defined by the regular grammar. Out-of-vocabulary words are accounted for by a general purpose &amp;quot;garbage&amp;quot; phoneme model (Schalkwyk et al., 1996).</Paragraph>
    <Paragraph position="6"> We experimented with using PROFER in the same way; however, our initial attempts to do so did not work well. The amount of information carried in PROFER's token's (to allow for bracketing and heuristic scoring of the semantic hypotheses) requires structures that are an order of magnitude larger than the tokens in a typical acoustic recognizer. When these large tokens are applied at the phonetic-level so many  are needed that a memory space explosion occurs. This suggests to us that there must be two levels of tokens: small, quickly manipulated tokens at the acoustic level (i.e., lexical level), and larger, less-frequently used tokens at the structural level (i.e., syntactic, semantic, pragmatic level).</Paragraph>
  </Section>
  <Section position="8" start_page="577" end_page="577" type="metho">
    <SectionTitle>
7 Future Work
</SectionTitle>
    <Paragraph position="0"> In the MINDS system Young et al. (1989) reported reduced word error rates and large reductions in perplexity by using a dialogue structure that could track the active goals, topics and user knowledge possible in a given dialogue state, and use that knowledge to dynamically create a semantic case-frame network, whose transitions could in turn be used to constrain the word sequences allowed by the recognizer.</Paragraph>
    <Paragraph position="1"> Our research aim is to maximize the effectiveness of this approach. Therefore, we hope to: * expand the scope of PROFER's structural definitions to include not only word patterns, but intonation and stress patterns as well, and * consider how build to general language models that complement the use of the categorial constraints PROFER can impose (i.e., syllable-level modeling, intonational boundary modeling, or speech repair modeling). null Our immediate efforts are focused on considering how to modify PROFER to accept a word-graph as input -- at first as part of a loosely-coupled system, and then later as part of an integrated system in which the elements of the word-graph are evaluated against the structural constraints as they are created.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML