File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-1021_metho.xml

Size: 4,459 bytes

Last Modified: 2025-10-06 14:14:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-1021">
  <Title>Spoken Dialogue Interpretation with the DOP Model</Title>
  <Section position="3" start_page="141" end_page="142" type="metho">
    <SectionTitle>
5. Interfacing DOP with speech
</SectionTitle>
    <Paragraph position="0"> So far, we have dealt with the estimation of the probability P(M, W\[ C) of a meaning M and a word string W given a dialogue context C. However, in spoken dialogue processing, the word string W is not given. The input for DOP in the OVIS system are word-graphs produced by the speech recognizer (these word-graphs are generated by our project partners from the University of Nijmegen).</Paragraph>
    <Paragraph position="1"> A word-graph is a compact representation for all sequences of words that the speech recognizer hypothesizes for an acoustic utterance A (see e.g.</Paragraph>
    <Paragraph position="2"> figure 10). The nodes of the graph represent points in time, and a transition between two nodes i and j, represents a word w that may have been uttered between the corresponding points in time. For convenience we refer to transitions in the word-graph using the notation &lt;i, j, w&gt;. The word-graphs are optimized to eliminate epsilon transitions. Such transitions represent periods of time when the speech recognizer hypothesizes that no words are uttered.</Paragraph>
    <Paragraph position="3"> Each transition is associated with an acoustic score.</Paragraph>
    <Paragraph position="4"> This is the negative logarithm (of base 10) of the acoustic probability P(a I w) for a hypothesized word w normalized by the length of w. Reconverting these acoustic scores into their corresponding probabilities, the acoustic probability P(A I W) for a hypothesized word string W can be computed by the product of the probabilities associated to each transition in the corresponding word-graph path. Figure (10) shows an example of a simplified word-graph for the uttered sentence lk wil graag vanmorgen naar Leiden (&amp;quot;I'd like to go this morning to Leiden&amp;quot;):</Paragraph>
    <Paragraph position="6"> The probabilistic interface between DOP and speech word-graphs thus consists of the interface between the DOP probabilities P(M, W IC) and the word-graph</Paragraph>
    <Paragraph position="8"> The probability P(M, W IC) is computed by the dialogue-sensitive DOP model as explained in the previous section. To estimate the probability P(A IM, W, C) on the basis of the information available in the word-graphs, we must make the following independence assumption: the acoustic utterance A depends only on the word string W, and  not on its context C and meaning M (cf. Bod &amp; Scha 1994). Under this assumption:</Paragraph>
    <Paragraph position="10"> To make fast computation feasible, we furthermore assume that most of the probability mass for each meaning and acoustic utterance is focused on a single word string W (this will allow for efficient Viterbi best first search):</Paragraph>
    <Paragraph position="12"> Thus, the probability of a meaning M for an acoustic utterance A given a context C is computed by the product of the DOP probability P(M, W I C) and the word-graph probability P(A I W).</Paragraph>
    <Paragraph position="13"> As to the parsing of word-graphs, it is well-known that parsing algorithms for word strings can easily be generalized to word-graphs (e.g. van Noord 1995). For word strings, the initialization of the chart usually consists of entering each word w i into chart entry &lt;i, i+1&gt;. For word-graphs, a transition &lt;i,j, w&gt; corresponds to a word w between positions i and j where j is not necessarily equal to i+1 as is the case for word strings (see figure I0). It is thus easy to see that for word-graphs the initialization of the chart consists of entering each word w from transition &lt;i,j, w&gt; into chart entry &lt;i,j&gt;. Next, parsing proceeds with the subtrees that are triggered by the dialogue context C (provided that all subtrees are converted into equivalent rewrite rules -- see Bod 1992, Sima'an 1995). The most likely derivation is computed by a bottom-up best-first CKY parser adapted to DOP (Sima'an 1995, 1997). This parser has a time complexity which is cubic in the number of word-graph nodes and linear in the grammar size. The top-node meaning of the tree resulting from the most likely derivation is taken as the best meaning M for an utterance A given context C.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML