File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/p99-1052_metho.xml
Size: 15,035 bytes
Last Modified: 2025-10-06 14:15:27
<?xml version="1.0" standalone="yes"?> <Paper uid="P99-1052"> <Title>Charting the Depths of Robust Speech Parsing</Title> <Section position="5" start_page="406" end_page="408" type="metho"> <SectionTitle> 3 Computing Best Partial Analyses </SectionTitle> <Paragraph position="0"> In contrast to a traditional parser which never comes up with an analysis for input not covered by the grammar, our approach focuses on partial analyses without giving up the correctness of the overall deep grammar. These partial analyses are combined in a later stage (see Section 4) to form total analyses. But what is a partial analysis? Obviously a derivation (sub)tree licensed by the grammar which covers a continuous part of the input (i.e., a passive parser edge). But not every passive edge is a good candidate, since otherwise we would end up with perhaps thousands of them. Our ap- null proach lies in between these two extremes: computing a connected sequence of best partial analyses which cover the whole input. The idea here is to view the set of passive edges of a parser as a directed graph which needs to be evaluated according to a user-defined (and therefore grammar and language specific) metric. Using this graph, we then compute the shortest paths w.r.t.</Paragraph> <Paragraph position="1"> the evaluation function, i.e., paths through this graph with minimum cost.</Paragraph> <Paragraph position="2"> Since this graph is acyclic and topologically sorted (vertices are integers and edges always connect a vertex to a larger vertex), we have chosen the DAG-shortest-path algorithm (Cotmen et al., 1990) which runs in O(V + E). This fast algorithm is a solution to the single-source shortest-paths problem. We modified and extended this algorithm to cope with the needs we encountered in speech parsing: (i) one can use several start and end vertices (e.g., in the case of n-best chains or WHGs); (ii) all best shortest paths are returned (i.e., we obtain a shortest-path subgraph); and (iii) evaluation and selection of the best edges is done incrementally as is the case for parsing the n-best chains (i.e., only new passive edges entered into the chart are evaluated and may be selected by our shortest-path algorithm).</Paragraph> <Paragraph position="3"> We now sketch the basic algorithm. Let G = (V, E) denote the set of passive edges, PS the set of start vertices, E the set of end vertices, and let n be the vertex with the highest number (remember, vertices are integers): n = max(V). In the algorithm, we make use of two global vectors of length n which store information associated with each vertex: dist keeps track of the distance of a vertex to one of the start vertices (the so-called shortest-path estimate), whereas pred records the predecessors of a given vertex, weight defines the cost of an edge and is assigned its value during the evaluation stage of our algorithm according to the user-defined function Estimate. Finally, Adj consists of all vertices adjacent to a given vertex (we use an adjacency-list representation).</Paragraph> <Paragraph position="4"> Clearly, before computing the shortest path, the distance of a vertex to one of the start vertices is infinity, except for the start vertices, and there is of course no shortest path subgraph</Paragraph> <Paragraph position="6"> od.</Paragraph> <Paragraph position="7"> After initialization, we perform evaluation and relaxation on every passive edge, taken in topologically sorted order. Relaxing an edge (u, v) means checking whether we can improve the shortest path(s) to v via u. There are two cases to consider: either we overwrite the shortest-path estimate for v since the new one is better (and so have a new predecessor for v, viz., u), or the shortest-path estimate is as good as the old one, hence we have to add v to the predecessors of v. In case the shortest-path estimate is worse, there is clearly nothing to do.</Paragraph> <Paragraph position="9"> ft.</Paragraph> <Paragraph position="10"> The shortest paths are then determined by estimating and relaxing edges, beginning with the start vertices S. The shortest path subgraph is stored in pred and can be extracted by walking from the end vertices PS 'back' to the start vertices.</Paragraph> <Paragraph position="12"> return pred.</Paragraph> <Paragraph position="13"> After we have determined the shortest-path subgraph, the feature structures associated with these edges are selected and transformed to the corresponding VITs which are then sent to the robust semantic processing component. This approach has an important property: even if certain parts of the input have not undergone at least one rule application, there are still lexical edges which help to form a best path through the passive edges. Hence, this approach shows anytime behavior which is a necessary requirement in time-critical (speech) applications: even if the parser is interrupted at a certain point, we can always return a shortest path up to that moment through our chart.</Paragraph> <Paragraph position="14"> Let us now give an example to see what the evaluation function on edges (i.e., derivation trees) might look like3: * n-ary trees (n > 1) with utterance status (e.g., NPs, PPs): value 1 * lexical items: value 2 * otherwise: value oo If available, other properties, such as prosodic information or probabilistic scores can also be utilized in the evaluation function to determine the best edges.</Paragraph> <Paragraph position="15"> Note that the paths PR and QR are chosen, but not ST, although S is the longest edge. By using uniform costs, all three paths would be selected.</Paragraph> <Paragraph position="16"> Depending on the evaluation, our method does not necessarily favor paths with longest edges as the example in Figure 1 shows -- the above strategy instead prefers paths containing no lexical edges (where this is possible) and aThis is a slightly simplified form of the evaluation that is actually used for the German grammar. there might be several such paths having the same cost. Longest (sub)paths, however, can be obtained by employing an exponential functions during the evaluation of an edge e E E:</Paragraph> <Paragraph position="18"/> </Section> <Section position="6" start_page="408" end_page="410" type="metho"> <SectionTitle> 4 Robust Semantic Processing </SectionTitle> <Paragraph position="0"> The second phase of processing, after producing a set of partial analyses, consists of assembling and combining the fragments, where possible. We call this robust semantic processing (Worm and Rupp, 1998), since the structures being dealt with are semantic representations (VITs) and the rules applied refer primarily to the semantic content of fragments, though they also consider syntactic and prosodic information, e.g., about irregular boundaries.</Paragraph> <Paragraph position="1"> This phase falls into three tasks: 1. storing the partial analyses from the parser, 2. combining them on the basis of a set of rules, and 3. selecting a result.</Paragraph> <Paragraph position="2"> For storing of partial results, both delivered from the parser or constructed later, we make use of a chart-like data structure we call VIT hypothesis graph (VHG), since it bears a resemblance to the WHG which is input to the parser. It is organized according to WHG vertices. We give an example in Figure 2, which will be explained in 4.1.</Paragraph> <Paragraph position="3"> Combination of partial results takes place using a set of rules which describe how fragmentary analyses can be combined. There are language-independent rules, e.g., describing the combination of a semantic functor with a possible argument, and language specific ones, such as those for dealing with self-corrections in German. Each operation carried out delivers a confidence value which influences the score assigned to an edge.</Paragraph> <Paragraph position="4"> The overall mechanism behind the robust semantic processing resembles that of a chart parser. It runs in parallel with the HPSG parser; each time the parser delivers partial results, they are handed over and processed, while the parser may continue to look for a better path in the WHG. The processing strategy is 81: oa'st + Ihnen + den h~ll~ n T~g ~'109998.3~ f43.11 i 3: pa'st (9999.01 I~ V 2: Ihnen (9999.01 ~ I 42: pa'sl + Ihnel (19998.31 \[3.21 43: a'sl + Ihnen (19999.0}\[3~2\] ~ 1: den halben Taq (89999.0) 23: den halben Taq (80999.1) \[1\] 41: Ihnen + den halbert Ta.q (90998.9) \]'2,23\] path), although the number of passive edges is 217. agenda-based, giving priority to new parser results. null Selection of a result means that the best edge covering the whole input, or if that has not been achieved, an optimal sequence of edges has to be selected. We use a simple graph search algorithm which finds the path with the highest sum of individual scores.</Paragraph> <Paragraph position="5"> Note that the robust semantic processing has the anytime property as well: as soon as the first partial result has been entered into the chart, a result can be delivered on demand.</Paragraph> <Section position="1" start_page="409" end_page="409" type="sub_section"> <SectionTitle> 4.1 An Example </SectionTitle> <Paragraph position="0"> Consider the utterance (1) where the case of the NP den halben Tag ('half the day') is accusative and thus does not match the subcategorization requirements of the verb passen ('suit') which would require nominative.</Paragraph> <Paragraph position="1"> (1) Pa6t Ihnen den halben Tag? 'Does half the day suit you?' According to the grammar, this string is illformed, thus no complete analysis can be achieved. However, the parser delivers fragments for pa~t, Ihnen, and den halben Tag. (2) verb_arg_r :: \[ \[type (Vl, verbal), missing_arg (Vl) \], \[type (V2, term), pos sible_arg (V2, Vl) \] \] \[apply_fun (V1, V2, V3), assign_mood(V3,V4)\] & V4.</Paragraph> <Paragraph position="2"> When these results are stored, the rule in (2) will combine the verb with its first argument, Ihnen. Each rule consists of three parts: mnemonic rule name, tests on a sequence of input VITs and the operations performed to construct the ouput VIT. The first separator is : :, the second --->. A further application of the same rule accounts for the second argument, den halben Tag. However, the confidence value for the second combination will reflect the violation of the case requirement. The resulting edge spans the whole input and is selected as output. The corresponding VHG is shown in Figure 2.</Paragraph> </Section> <Section position="2" start_page="409" end_page="410" type="sub_section"> <SectionTitle> 4.2 Bridging </SectionTitle> <Paragraph position="0"> Not all cases can be handled as simply. Often, there are partial results in the input which cannot be integrated into a spanning result. In these cases, a mechanism called bridging is ap- null plied. Consider the self-correction in (3).</Paragraph> <Paragraph position="1"> (3) Ich treffe ... habe einen Terrain am</Paragraph> <Paragraph position="3"> Again, the parser will only find partial results.</Paragraph> <Paragraph position="4"> Combinations of ich with tre~e lead nowhere; the combination of the second verb with the NP does not lead to a complete analysis either (cf.</Paragraph> <Paragraph position="5"> Figure 3). Note that if a nominal argument can bind several argument roles, for each such reading there is a passive edge in the VHG. Its score reflects to what degree the selectional requirements of the verb, in terms of the required case and sortal restrictions, have been met.</Paragraph> <Paragraph position="6"> If no spanning result exists when all rules have been applied, the bridging mechanism produces new active edges which extend edges already present. Here, it extends the active edge aiming to combine ich with a verbal functor to end after tre\]\]e, thus allowing for a combination with the VP already built, habe einen Termin left to right corresponds to the linear nature of self-corrections, in which material to the right replaces some to the left.</Paragraph> </Section> <Section position="3" start_page="410" end_page="410" type="sub_section"> <SectionTitle> 4.3 Scoring and Result Selection </SectionTitle> <Paragraph position="0"> The scoring function for edges takes into account their length, the coverage of the edge, the number of component edges it consists of, and the confidence value for the operation which created it. It has to satisfy the following property, which is illustrated in Figure 4: If there are two edges which together span an interval (edges a and b) and another edge which has been built from them (edge c), the latter should get a better score than the sequence of the original two edges. If there is another edge from the parser which again spans the complete interval (edge d), it should get a better score than the edge built from the two components.</Paragraph> <Paragraph position="1"> The selection is done in two different ways.</Paragraph> <Paragraph position="2"> If there is more than one spanning result, the scores of the spanning results are weighted according to a statistical model describing sequence probabilities based on semantic predicates (Ruland et al., 1998) and the best is selected. Otherwise, the best sequence, i.e., the one with the highest score, is chosen in square time, using a standard graph search algorithm.</Paragraph> </Section> </Section> <Section position="7" start_page="410" end_page="411" type="metho"> <SectionTitle> 5 Empirical Results </SectionTitle> <Paragraph position="0"> For an intermediate evaluation of the robust semantic processing phase, we ran our system consisting of HPSG parser and robust semantic processing on a dialogue from the VERBMOBIL corpus of spontaneous appointment negotiation dialogues, producing WHGs from the original recorded audio data. The dialogue consists of 90 turns. These 90 turns were split into 130 segments according to pauses by the speech recognizer. The segments received 213 segment analyses, i.e., there are 1.6 analyses per segment on average. 172 (80.8%) of these were generated by the parser and 41 (19.2%) were assembled from parser results by robust semantic processing. Of these 41 results, 34 (83%) were sensibly improved, while 7 (17~0) did not represent a real improvement.</Paragraph> <Paragraph position="1"> This evaluation is local in the sense that we only consider the input-output behaviour of robust semantic processing. We do this in order to exclude the effects of insufficiencies introduced by other modules in the system, since they would distort the picture. For this same reason, the criterion we apply is whether the result delivered is a sensible combination of the frag- null ments received, without reference to the original utterance or the translation produced. However, in the long run we plan to compare the complete system's behaviour with and without the robust processing strategy.</Paragraph> </Section> class="xml-element"></Paper>