File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-1021_metho.xml
Size: 4,459 bytes
Last Modified: 2025-10-06 14:14:58
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1021"> <Title>Spoken Dialogue Interpretation with the DOP Model</Title> <Section position="3" start_page="141" end_page="142" type="metho"> <SectionTitle> 5. Interfacing DOP with speech </SectionTitle> <Paragraph position="0"> So far, we have dealt with the estimation of the probability P(M, W\[ C) of a meaning M and a word string W given a dialogue context C. However, in spoken dialogue processing, the word string W is not given. The input for DOP in the OVIS system are word-graphs produced by the speech recognizer (these word-graphs are generated by our project partners from the University of Nijmegen).</Paragraph> <Paragraph position="1"> A word-graph is a compact representation for all sequences of words that the speech recognizer hypothesizes for an acoustic utterance A (see e.g.</Paragraph> <Paragraph position="2"> figure 10). The nodes of the graph represent points in time, and a transition between two nodes i and j, represents a word w that may have been uttered between the corresponding points in time. For convenience we refer to transitions in the word-graph using the notation <i, j, w>. The word-graphs are optimized to eliminate epsilon transitions. Such transitions represent periods of time when the speech recognizer hypothesizes that no words are uttered.</Paragraph> <Paragraph position="3"> Each transition is associated with an acoustic score.</Paragraph> <Paragraph position="4"> This is the negative logarithm (of base 10) of the acoustic probability P(a I w) for a hypothesized word w normalized by the length of w. Reconverting these acoustic scores into their corresponding probabilities, the acoustic probability P(A I W) for a hypothesized word string W can be computed by the product of the probabilities associated to each transition in the corresponding word-graph path. Figure (10) shows an example of a simplified word-graph for the uttered sentence lk wil graag vanmorgen naar Leiden (&quot;I'd like to go this morning to Leiden&quot;):</Paragraph> <Paragraph position="6"> The probabilistic interface between DOP and speech word-graphs thus consists of the interface between the DOP probabilities P(M, W IC) and the word-graph</Paragraph> <Paragraph position="8"> The probability P(M, W IC) is computed by the dialogue-sensitive DOP model as explained in the previous section. To estimate the probability P(A IM, W, C) on the basis of the information available in the word-graphs, we must make the following independence assumption: the acoustic utterance A depends only on the word string W, and not on its context C and meaning M (cf. Bod & Scha 1994). Under this assumption:</Paragraph> <Paragraph position="10"> To make fast computation feasible, we furthermore assume that most of the probability mass for each meaning and acoustic utterance is focused on a single word string W (this will allow for efficient Viterbi best first search):</Paragraph> <Paragraph position="12"> Thus, the probability of a meaning M for an acoustic utterance A given a context C is computed by the product of the DOP probability P(M, W I C) and the word-graph probability P(A I W).</Paragraph> <Paragraph position="13"> As to the parsing of word-graphs, it is well-known that parsing algorithms for word strings can easily be generalized to word-graphs (e.g. van Noord 1995). For word strings, the initialization of the chart usually consists of entering each word w i into chart entry <i, i+1>. For word-graphs, a transition <i,j, w> corresponds to a word w between positions i and j where j is not necessarily equal to i+1 as is the case for word strings (see figure I0). It is thus easy to see that for word-graphs the initialization of the chart consists of entering each word w from transition <i,j, w> into chart entry <i,j>. Next, parsing proceeds with the subtrees that are triggered by the dialogue context C (provided that all subtrees are converted into equivalent rewrite rules -- see Bod 1992, Sima'an 1995). The most likely derivation is computed by a bottom-up best-first CKY parser adapted to DOP (Sima'an 1995, 1997). This parser has a time complexity which is cubic in the number of word-graph nodes and linear in the grammar size. The top-node meaning of the tree resulting from the most likely derivation is taken as the best meaning M for an utterance A given context C.</Paragraph> </Section> class="xml-element"></Paper>