File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-0303_metho.xml
Size: 32,218 bytes
Last Modified: 2025-10-06 14:14:36
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0303"> <Title>An Efficient Distribution of Labor in a Two Stage Robust Interpretation Process</Title> <Section position="3" start_page="26" end_page="26" type="metho"> <SectionTitle> 2 Alternative Avenues Towards Robustness </SectionTitle> <Paragraph position="0"> There are a wide range of different approaches to handling the problem of extragrammaticality, but which way is best? Three basic avenues exist whereby the coverage of a natural language understanding system can be expanded: further development of the parsing grammar, addition of flexibility to the parsing algorithm, or addition of a post-processing repair stage after the parsing stage.</Paragraph> <Paragraph position="1"> It is always possible to add additional rules to a parsing grammar in order to expand the coverage, but this approach is both time intensive in terms of development and ultimately computationally expensive at run time since large, cumbersome grammars generate excessive amounts of ambiguity. Adding flexibility to the parsing algorithm is preferable in some respects, particularly in that it reduces the grammar development burden. However, it lends itself to the same weakness in terms of computational expense. In the extreme case, in a Minimum Distance Parser (MDP) (Lehman, 1989; Hipp, 1992), any ungrammatical sentence can be mapped onto a sentence inside of the coverage of the grammar through a series of insertions, deletions, and in some cases substitutions or transpositions.</Paragraph> <Paragraph position="2"> The more flexibility, the better the coverage in theory, but in realistic large scale systems this approach becomes computationally intractable. Current efforts towards robust interpretation have focused on less powerful partial parsers (Abney, 1996; Nord, 1996; Srinivas et al., 1996; Federici, Montemagni, and Pirrelli, 1996) and repair approaches where the labor is distributed between two or more stages (Ehrlich and Hanrieder, 1996; Danieli and Gerbino, 1995). The purpose of the second stage is to assemble the pieces of the partial parse produced in the first stage. In this paper we present a two stage approach composed of a partial parser followed by a completely automatic repair module.</Paragraph> <Paragraph position="3"> Though two stage approaches have grown in popularity in recent years because of their efficiency, they have done so at the cost of requiring hand coded repair heuristics (Ehrlich and Hanrieder, 1996; Danieli and Gerbino, 1995). In contrast, the ROSE approach does not require any hand coded knowledge sources dedicated to repair, thus making it possible to achieve the benefits of repair without losing the quality of domain independence.</Paragraph> <Paragraph position="4"> In this paper, we compare the performance of the two stage ROSE approach with MDP. A parameterized version of Lavie's GLR* parser (Lavie, 1995) is used which has been extended to perform a limited version of MDP in which insertions and deletions are possible, but not transpositions or substitutions. We refer to this parameterized MDP parser as LR MDP. We run LR MDP over the same test corpus in different settings, demonstrating the flexibility/quality/parse time trade off. With this we demonstrate that the two stage ROSE approach, coupling the restricted version of the GLR* parser with a post-processing repair stage, achieves better translation quality far more efficiently than any flexibility setting of LR MDP over the same corpus.</Paragraph> </Section> <Section position="4" start_page="26" end_page="27" type="metho"> <SectionTitle> 3 MDP versus Two Stage </SectionTitle> <Paragraph position="0"> Interpretation Efforts towards solving the problem of extragrammaticality have primarily been in the direction of building flexible parsers. In principle, Minimum Distance Parsers (Lehman, 1989; Hipp, 1992) have the greatest flexibility. They fit an extragrammatical sentence to the parsing grammar through a series of insertions, deletions, and transpositions. Since any string can be mapped onto any other string through a series of insertions, deletions, and transpositions, this approach makes it possible to repair any sentence. The underlying assumption behind the MDP approach is that the analysis of the string that deviates the least from the input string is most likely to be the best analysis. Thus, Minimum Distance Parsing appears to be a reasonable approach.</Paragraph> <Paragraph position="1"> In practice, however, Minimum Distance Parsing has only been used successfully in very small and limited domains. Lehman's core grammar, described in (Lehman, 1989), has on the order of 300 rules, and all of the inputs to her system can be assumed to be commands to a calendar program. Hipp's Circuit Fix-It Shop system, described in (Hipp, 1992), has a vocabulary of only 125 words and a grammar size of only 500 rules. Flexible parsing algorithms introduce a great deal of extra ambiguity. This in turn may deem certain approaches impractical for systems of realistic scale. Therefore, an important question one must ask is whether the MDP approach can scale up to a larger system and/or domain.</Paragraph> <Paragraph position="2"> An example of a more restrictive parsing algorithm is Lavie's GLR* skipping parser described in (Lavie, 1995). GLR* is a parsing system based on Tomita's Generalized LR parsing algorithm which was designed to be robust to two particular types of extra-grammaticality: noise in the input, and limited grammar coverage. GLR* attempts to overcome these forms of extra-grammaticality by ignoring the unparsable words and fragments and conducting a search for the maximal subset of the original input that is covered by the grammar.</Paragraph> <Paragraph position="3"> The GLR* parser is capable of skipping over any portion of an input utterance that cannot be incorporated into a grammatical analysis and recover the analysis of the largest grammatical subset of the utterance. Partial analyses for skipped portions of the utterance can also be returned by the parser. Thus, whereas MDP considers insertions and transpositions in addition to deletions, GLR* only considers deletions. GLR* can be viewed as a restricted form of MDP applied to an efficient non-robust general parsing method. GLR* can, in most cases, achieve most of the robustness of the more general MDP approach while maintaining feasibility, due to efficiency properties of the GLR approach and an effective well:guided search. In the evaluation presented in this paper, GLR* has been restricted to skip only initial segments so that the partial analyses returned are always for contiguous portions of the sentence.</Paragraph> <Paragraph position="4"> Because GLR* was designed as an enhancement to the widely used standard GLR context-free parsing algorithm, grammars, lexicons and other tools developed for the standard GLR parser can be used without modification. GLR* uses the standard SLR(0) parsing tables which are compiled in advance from the grammar. It inherits the benefits of GLR in terms of ease of grammar development, and, to a large extent, efficiency properties of the parser itself. In the case that an input sentence is completely grammatical, GLR* will normally return the exact same parse as the GLR parser.</Paragraph> <Paragraph position="5"> The weakness of this and other partial parsing approaches (Abney, 1996; Nord, 1996; Srinivas et al., 1996; Federici, Montemagni, and Pirrelli, 1996) is that part of the original meaning of the utterance may be thrown away with the portion(s) of the utterance that are skipped if only the analysis for the largest subset is returned, or part of the analysis will be missing if the parser only attempts to build a partial parse. These less powerful algorithms trade coverage for speed. The idea is to introduce enough flexibility to gain an acceptable level of coverage at an acceptable computational expense.</Paragraph> <Paragraph position="6"> The goal behind the two stage approach (Ehrlich and Hanrieder, 1996; Danieli and Gerbino, 1995) is to increase the coverage possible at a reasonable computational cost by introducing a post-processing repair stage, which constructs a complete meaning representation out of the fragments of a partial parse. Since the input to the second stage is a collection of partial parses, the additional flexibility that is introduced at this second stage can be channeled just to the part of the analysis that the parser does not have enough knowledge to handle straightforwardly. This is unlike the MDP approach, where the full amount of flexibility is unnecessarily applied to every part of the analysis, even in completely grammatical sentences. Therefore, this two stage process is a more efficient distribution of labor since the first stage is highly constrained by the grammar and the results of this first stage are then used to constrain the search in the second stage. Additionally, in cases where the limited flexibility parser is sufficient, the second stage can be entirely bypassed, yielding an even greater savings in time.</Paragraph> </Section> <Section position="5" start_page="27" end_page="29" type="metho"> <SectionTitle> 4 The Two Stage Interpretation </SectionTitle> <Paragraph position="0"> The main goal of the two stage ROSE approach is to achieve the ability to robustly interpret spontaneous natural language efficiently in a system at least as large and complex as the JANUS multi-lingual machine translation system, which provides the context for this work. In this section we describe the division of labor between the Partial Parsing stage and the Combination stage in the ROSE approach.</Paragraph> <Section position="1" start_page="27" end_page="28" type="sub_section"> <SectionTitle> 4.1 The Partial Parsing Stage </SectionTitle> <Paragraph position="0"> The first stage in our approach is the Partial Parsing stage where the goal is to obtain an analysis for islands of the speaker's utterance if it is not possible to obtain an analysis for the whole utterance. This is accomplished with a restricted version of Lavie's GLR* parser (Lavie, 1995; Lavie and Tomita, 1993) that produces an analysis for contiguous portions of the input sentence. See Figure 1 for an example parse. Here the GLR* parser attempts to handle the sentence &quot;That wipes out my mornings.&quot; The expression &quot;wipes out&quot; does not match anything in the parsing grammar. The grammar also does not allow time expressions to be modified by possessive pronouns. So &quot;my mornings&quot; also does not parse. Although the grammar recognizes &quot;out&quot; as a way of expressing a rejection, as in &quot;Tuesdays are out,&quot; it does not allow the time being rejected to follow the &quot;out&quot;. However, although the parser was not able to obtain a complete parse for this sentence, it was able to extract four chunks.</Paragraph> <Paragraph position="1"> The chunks are feature structures in which the parser encodes the meaning of portions of the user's sentence. This frame based meaning representation is called an interlingua because it is language independent. It is defined by an interlingua specification, which serves as the primary symbolic knowledge source used during the Combination stage. Each frame encodes a concept in the domain. The set of frames in the meaning representation are arranged into subsets that are assigned a particular type.</Paragraph> <Paragraph position="2"> Each frame is associated with a set of slots. The slots represent relationships between feature structures. Each slot is associated with a type which determines the set of possible frames which can be fillers of that slot. Though this meaning representation specification is knowledge that must be encoded by hand, it is knowledge that can be used by all aspects of the system, not only the repair module as is the case with repair rules. Arguably, any well designed system would have such a specification to describe its meaning representation.</Paragraph> <Paragraph position="3"> The four chunks extracted by the parser each encode a different part of the meaning of the sentence &quot;That wipes out my mornings.&quot; The first chunk represents the meaning of &quot;that&quot;. The second one represents the meaning of &quot;out&quot;. Since &quot;out&quot; is generally a way of rejecting a meeting time in this domain, the associated feature structure represents the concept of a response that is a rejection. Since &quot;wipes&quot; does not match anything in the grammar, this token is left without any representation among the fragments returned by the parser. The last two chunks represent the meaning of &quot;my&quot; and &quot;mornings&quot; respectively. null The disadvantage of this skipping parser over the MDP approach is that it does not have the ability to perform some necessary repairs that the more complicated approach can make. In this case, for example, it is unable to determine how these pieces fit together into one coherent parse. The goal of the Combination stage is to overcome this limitation efficiently. Thus, the second stage of the interpretation process is responsible for making the remaining types of repairs. More flexibility can be introduced in the second stage efficiently since the search space has already been reduced with the addition of the knowledge obtained from the partial parse.</Paragraph> </Section> <Section position="2" start_page="28" end_page="29" type="sub_section"> <SectionTitle> 4.2 The Combination Stage </SectionTitle> <Paragraph position="0"> The purpose of the Combination stage is to make the remainder of the types of repairs that could in principle be done with a minimum distance parser using insertions, deletions, and transpositions, but that cannot be performed with the skipping parser.</Paragraph> <Paragraph position="1"> The Combination stage takes as input the partial analyses returned by the skipping parser. These chunks are combined into a set of best repair hypotheses. The hypotheses built during this combination process specify how to build meaning representations out of the partial analyses produced by the parser that are meant to represent the meaning of the speaker's whole sentence, rather than just parts.</Paragraph> <Paragraph position="2"> Since the meaning representation is compositional, a single, more complete meaning representation can be built by assembling the meaning representations for the parts of the sentence.</Paragraph> <Paragraph position="3"> In this Combination stage, a genetic programming (Koza, 1992; Koza, 1994) approach is used to evolve a population of programs that specify how to build complete meaning representations from the chunks returned from the parser. The repair module must determine not only which subset of chunks returned by the parser to include in the final result, but also how to put them together. For example, the ideal repair hypothesis for the example in Figure 2 is one that specifies that the temporal expression should be inserted into the NI-IEN slot in the *RESPOND frame.</Paragraph> <Paragraph position="4"> The repair process is analogous in some ways to fitting pieces of a puzzle into a mold that contains receptacles for particular shapes. In this analogy, the meaning representation specification acts as the mold with receptacles of different shapes, making it possible to compute all of the ways partial analyses can fit together in order to create a structure that is legal in this frame based meaning representation.</Paragraph> <Paragraph position="5"> Both the skipping parsing algorithm and the genetic programming combination algorithm are completely domain independent. Therefore, the ROSE approach maintains the positive quality of domain independence that the minimum distance parsing approach has while avoiding some of the computational expense.</Paragraph> </Section> </Section> <Section position="6" start_page="29" end_page="31" type="metho"> <SectionTitle> 5 The Genetic Programming </SectionTitle> <Paragraph position="0"> Recovery from parser failure is a natural application for genetic programming (Koza, 1992; Koza, 1994). One can easily conceptualize the process of constructing a meaning representation hypothesis as the execution of a computer program that assembles the set of chunks returned from the parser. This program would specify the operations required for building larger chunks out of smaller chunks and then even larger ones from those. Because the programs generated by the genetic search are hierarchical, they naturally represent the compositional nature of the repair process.</Paragraph> <Section position="1" start_page="29" end_page="30" type="sub_section"> <SectionTitle> 5.1 Constructing Alternative Hypotheses </SectionTitle> <Paragraph position="0"> See Figure 2 for an example repair hypotheses.</Paragraph> <Paragraph position="1"> I4Y-COMB is a simple function that attempts to insert the second feature structure into some slot in the first feature structure. It selects a slot, if a suitable one can be found, and then instantiates the third parameter to this slot. In this case, the WHEN slot is selected. So the feature structure corresponding to &quot;mornings&quot; is inserted into the WHEN slot in the feature structure corresponding to &quot;out&quot;. The result is a feature structure indicating that &quot;Mornings are out.&quot; Though this is not an exact representation of the speaker's meaning, it is the best that can be done with the available feature structures ~. Notice that since the expression &quot;wipes out&quot; is foreign to the parsing grammar, and no similar expression is associated with the same meaning in it, the MDP approach would also not be able to do better than this since it can only insert and delete in order to fit the current sentence to the rules in its parsing grammar.</Paragraph> <Paragraph position="2"> Additionally, since the time expression follows &quot;out&quot; rather than preceding it as the grammar expects, only MDP with transpositions in addition to insertions and deletions would be able to arrive at the 2Note that part of the expression &quot;wipes out&quot; matches a rule in the grammar that happens to have a similar meaning since &quot;out&quot; can be used as a rejection as in &quot;Tuesday is out.&quot; If the expression had been &quot;out of sight&quot;, which is positive, both the ROSE approach and MDP would construct the opposite meaning from the intended meaning. Problems like this can only be dealt with through interaction with the user to confirm that repaired meanings reflect the speaker's true intention.</Paragraph> <Paragraph position="3"> same result. Note that the feature structures corresponding to &quot;my&quot; and &quot;that&quot; are not included in this hypothesis. The job of the Combination Mechanism is both to determine which fragments to include as well as how to combine the selected ones.</Paragraph> <Paragraph position="4"> In the genetic programming approach, a population of programs are evolved that specify how to build complete meaning representations from the chunks returned from the parser. A complete meaning representation is one that is meant to represent the meaning of the speaker's whole utterance, rather than just part. Partial solutions are evolved through the genetic search specifying how to build parts of the full meaning representation. Because in the same population there can be programs that specify how to build different parts of the meaning representation, different parts of the full solution are evolved in parallel, making it possible to evolve a complete solution quickly.</Paragraph> <Paragraph position="5"> Since a set of alternative meaning representation hypotheses are constructed during the Combination stage, the result is similar to an ambiguous parse.</Paragraph> <Paragraph position="6"> See Figure 3 and Figure 4 for two alternative repair hypotheses produced during the Combination stage for the example in Figure 1. The result of each of the hypotheses is an alternative representation for the sentence. The first hypothesis, displayed in Figure 3, corresponds to the interpretation, &quot;Mornings and that are out.&quot; The problem with this hypothesis is that it includes the chunk &quot;that&quot;, which in this case should be left out.</Paragraph> <Paragraph position="7"> In the second hypothesis, displayed in Figure 4, the repair module attempts to insert the rejection chunk into the time expression chunk, the opposite of the ideal order. No slot could be found in the time expression chunk in which to insert the rejection expression chunk. In this case, the slot remains uninstantiated and the largest chunk, in this case the time expression chunk, is returned. This hypotheses produces a feature structure that is indeed a portion of the correct structure, though not the complete structure.</Paragraph> </Section> <Section position="2" start_page="30" end_page="30" type="sub_section"> <SectionTitle> 5.2 Applying the Genetic Programming Paradigm to Repair </SectionTitle> <Paragraph position="0"> There are five steps involved in applying the genetic programming paradigm to a particular problem: determining a set of terminals, determining a set of functions, determining a fitness measure, determining the parameters and variables to control the run, and determining the method for deciding when to stop the evolution process. The first two constrain the range of repairs that the Repair process is capable of making. The fitness measure determines how alternative repair hypotheses are ranked, and thus whether it is possible that the search will converge on the correct hypothesis rather than on a sub-optimal competing hypothesis. The last two factors determine how quickly it will converge and how long it is given to converge.</Paragraph> <Paragraph position="1"> The set of terminals for this problem is most naturally a chunk from the parser. Each operation involved in the repair process takes chunks as input and returns an augmented chunk as output. The single operator, called my-comb, takes two chunks as input. It inserts the second chunk into a slot in the first chunk. If it is not possible to insert the second chunk into the first one, it attempts to merge them. If this too is not possible, the largest chunk is returned.</Paragraph> <Paragraph position="2"> The fitness measure is trained from repair examples from a separate corpus and is discussed in more detail below. The parameters for the run, such as the size of the population of programs on each generation, are determined experimentally from the training corpus.</Paragraph> </Section> <Section position="3" start_page="30" end_page="31" type="sub_section"> <SectionTitle> 5.3 Training a Fitness Function </SectionTitle> <Paragraph position="0"> The purpose of the trained fitness function is to rank the repair hypotheses that are produced in each generation. Since survival of the fittest is the key to the evolutionary process, the determination of which hypotheses are more fit is absolutely crucial. Since the purpose of the repair module is to evolve a hypothesis that generates the ideal meaning representation structure, hypotheses that produce meaning representation structures closer to the ideal representation should be ranked as better than others that produce structures that are more different. Of course, the repair module does not have access to that ideal structure while it is searching for the best combination of chunks. So a fitness function is trained that must estimate how close the result of a particular repair hypothesis is to the ideal structure by considering secondary evidence.</Paragraph> <Paragraph position="1"> The first step in training a fitness function is to decide which pieces of information to make available to the fitness function for it to use in making its decision. The fitness function, once it is trained, combines these pieces of information into a single score that can be used for ranking the hypotheses. In the current version of the ranking function, three pieces of information are given: the number of operations in the repair hypothesis, the number of frames and atomic slot fillers in the resulting meaning representation structure, and the average of the statistical scores for the set of repairs that were made. The statistical score of a repair corresponds to the mutual information between a slot and the type of filler that was inserted into it. This statistical information is trained on a training corpus of meaning representation structures.</Paragraph> <Paragraph position="2"> Each piece of information provided to the fitness function is represented as a numerical score. The number of operations in the repair hypothesis is a measure of how complex the hypothesis is. The purpose of this score is to allow the fitness function to prefer simpler solutions. The number of frames and atomic slot fillers is a measure of how complete a repair hypothesis is. It allows the fitness function to prefer more complete solutions over less complete ones. The statistical scores are a rough measure of the quality of the decisions that were made in formulating the hypothesis, such as the decision of which slot in one structure to insert another structure into.</Paragraph> <Paragraph position="3"> The fitness function that combines these three pieces of information is trained over a training corpus of sentences than need repair coupled with ideal meaning representation structures. The purpose of the training process is to learn a function that can make wise decisions about the trade offs between these three different factors. Sometimes these three factors make conflicting predictions about which hypotheses are better. For example, a structure with a large number of frames that was constructed by making a lot of statistically unlikely decisions may be less good than a smaller structure made with decisions that were more likely to be correct. The factor that only considers the completeness of the solution would predict that the hypothesis producing the larger structure is better. On the other hand, the factor considering only the statistical predictions would choose the other hypothesis. Neither factor will be correct in all circumstances. Simple repair hypotheses tend to be better in general, but this goal can conflict with the goal of having a large resulting structure. The goal of the training process is to learn a function that can make these trade-offs successfully.</Paragraph> <Paragraph position="4"> The trained fitness function combines the three given numerical scores using addition, subtraction, multiplication, and division. It is trained using a genetic programming technique. A successful fitness function ranks hypotheses the same way as an ideal fitness function that can compare the resulting structures with the ideal one. Before a fitness function can be trained, there must first be training data. Appropriate training data for the fitness function is a set of ranked lists of scores, e.g., the three scores mentioned above. Each set of three scores corresponds to the repair hypothesis it was extracted from. These sets of scores in the training examples are ranked the way the ideal fitness function would rank the associated hypotheses. The purpose of the training process is to find a function that combines the three scores into a single score such that when the set of single scores are sorted, the ordering is the same as in the training example. Correctly sorting the sets of scores is equivalent to ranking the hypotheses themselves. Therefore, a function that can successfully sort the scores in the training examples will be correspondingly good at ranking repair hypotheses. null</Paragraph> </Section> </Section> <Section position="7" start_page="31" end_page="32" type="metho"> <SectionTitle> 6 A Comparative Analysis in a </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="31" end_page="32" type="sub_section"> <SectionTitle> Large Scale Practical Setting </SectionTitle> <Paragraph position="0"> In order to compare the two stage repair approach with the single stage MDP approach in a practical, large-scale scenario, we conducted a comparative evaluation. As mentioned above, we make use of a version of Lavie's GLR* parser (Lavie, 1995) extended to be able to perform both skipping and inserting which we refer to as LR MDP. This makes it possible to compare the two stage ROSE approach to MDP keeping all other factors constant.</Paragraph> <Paragraph position="1"> The parser uses a semantic grammar with approximately 1000 rules which maps the input sentence onto an interlingua representation (ILT) which represents the meaning of the sentence in a language-independent manner. This ILT is then passed to a generation component which generates a sentence in the target language which is then graded by a human judge as Bad, Partial, Okay, or Perfect in terms of translation quality. Partial indicates that the result communicated part of the content of the original sentence while not containing any incorrect information. Okay indicates that the generated sentence communicated all of the relevant information in the original sentence but not in the ideal way. Perfect indicates both that the result communicated the relevant information and that it did so in a smooth, high quality manner. The corpus used in this evaluation contains 500 sentences from a corpus of spontaneous scheduling dialogues collected in English.</Paragraph> <Paragraph position="2"> In a previous experiment we determined that the two stage approach performs about two orders of magnitude faster than LR MDP. For the purpose of the evaluation presented in this paper we tested the effect of imposing a maximum deviation penalty on the minimum distance parser in order to determine how much flexibility could be allowed before the computational cost would become unreasonable.</Paragraph> <Paragraph position="3"> A full, unconstrained implementation of MDP can find an analysis for any sentence using a combination of insertions, deletions, and transpositions. However, in order to make it viable to test the MDP approach in a system as large as the one which provides the context for this work, we make use of a more restricted version of MDP. While the full MDP algorithm allows insertions, deletions, and transpositions, our more constrained version of MDP allows only insertions and deletions. Although this still allows the MDP parser to repair any sentence, in some cases the result will not be as complete as it would have been with the unconstrained version of MDP or with the two stage repair process. Additionally, with a lexicon on the order of 1000 lexical items, it is not practical to do insertions on the level of the lexical items themselves. Instead, we allow only non-terminals to be inserted. An insertion penalty equivalent to the minimum number of words it would take to generate a given non-terminal is assigned to a parse for each inserted non-terminal.</Paragraph> <Paragraph position="4"> In order to test the effect of imposing a maximum deviation penalty, we used a parameterized version of LR MDP, where the deviation penalty of a parse is the total number of words skipped plus the parse's associated insertion penalty as described above.</Paragraph> <Paragraph position="5"> The avenues of exploration made available here are far from exhaustive. Substitutions and transpositions are not allowed in this version of the parser, nor is it possible to set a separate maximum penalty for skipping and for inserting. Additionally, insertions and deletions are weighted equally, where some researchers have weighted them differently (Hipp, 1992). These and other possibilities are left for future inquiry.</Paragraph> </Section> </Section> class="xml-element"></Paper>