File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-1021_intro.xml

Size: 16,385 bytes

Last Modified: 2025-10-06 14:06:32

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-1021">
  <Title>Spoken Dialogue Interpretation with the DOP Model</Title>
  <Section position="2" start_page="0" end_page="141" type="intro">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> The Data-Oriented Parsing (DOP) model (cf. Bod 1992, 1995; Bod &amp; Kaplan 1998; Scha 1992; Sima'an 1995, 1997; Rajman 1995) is a probabilistic parsing model which does not single out a narrowly predefined set of structures as the statistically significant ones. It accomplishes this by maintaining a large corpus of analyses of previously occurring utterances. New utterances are analyzed by combining subtrees from the corpus. The occurrence-frequencies of the subtrees are used to estimate the most probable analysis of an utterance.</Paragraph>
    <Paragraph position="1"> To date, DOP has mainly been applied to corpora of trees labeled with syntactic annotations.</Paragraph>
    <Paragraph position="2"> Let us illustrate this with a very simple example.</Paragraph>
    <Paragraph position="3"> Suppose that a corpus consists of only two trees:  To combine subtrees, a node-substitution operation indicated as o is used. Node-substitution identifies the leftmost nonterminai frontier node of one tree with the root node of a second tree (i.e., the second tree is substituted on the leftmost nonterminal frontier node of the first tree). A new input sentence such as Mary likes Susan can thus be parsed by combining subtrees from this corpus, as in (2):  DOP computes the probability of substituting a subtree t on a specific node as the probability of selecting t among all subtrees in the corpus that could be substituted on that node. This probability is equal to the number of occurrences of t, divided by the total number of occurrences of subtrees t' with the same root label as t. Let rl(t) return the root label of t then: P(t) = #(t) / ~,t,:rl(t,)=rl(t)#(t'). The probability of a derivation is computed by the product of the probabilities of the subtrees is consists of. The probability of a parse tree is computed by the sum of the probabilities of all derivations that produce that parse tree.</Paragraph>
    <Paragraph position="4"> Bod (1992) demonstrated that DOP can be implemented using conventional context-free parsing techniques. However, the computation of the most probable parse of a sentence is NP-hard (Sima'an 1996). The most probable parse can be estimated by iterative Monte Carlo sampling (Bod 1995), but efficient algorithms exist only for sub-optimal solutions such as the most likely derivation of a sentence (Bod 1995, Sima'an 1995) or the &amp;quot;labelled recall parse&amp;quot; of a sentence (Goodman 1996). So far, the syntactic DOP model has been tested on the ATIS corpus and the Wall Street Journal corpus, obtaining significantly better test results than other stochastic parsers (Charniak 1996). For example, Goodman (1998) compares the results of his DOP parser to a replication of Pereira &amp; Schabes (1992) on the same training and test data. While the Pereira &amp; Schabes method achieves 79.2% zero-crossing brackets accuracy, DOP obtains 86.1% on the same data (Goodman 1998: p. 179, table 4.4). Thus the DOP method outperforms the Pereira &amp; Schabes method with an accuracy-increase of 6.9%, or an errorreduction of 33%. Goodman also performs a statistical analysis using t-test, showing that the differences are statistically significant beyond the 98th percentile. In Bod et al. (1996), it was shown how DOP can be generalized to semantic interpretation by using corpora annotated with compositional semantics. In the current paper, we extend the DOP model to spoken dialogue understanding, and we show how it can be used as an efficient and robust NLP component in a practical spoken dialogue system called OVIS.</Paragraph>
    <Paragraph position="5"> OVIS, Openbaar Vervoer Informatie Systeem (&amp;quot;Public Transport Information System&amp;quot;), is a Dutch spoken language information system which operates over ordinary telephone lines. The prototype system is the immediate goal of the NWO Priority Programme &amp;quot;Language and Speech Technology&amp;quot;.</Paragraph>
    <Paragraph position="6"> The backbone of any DOP model is an annotated language corpus. In the following section, we therefore start with a description of the corpus that was developed for the OVIS system, the &amp;quot;OVIS corpus&amp;quot;. We then show how this corpus can be used by DOP to compute the most likely meaning M of a word string W: argmax g P(M, W). Next we demonstrate how the dialogue context C can be integrated so as to compute argmaxg P(M, W I C). Finally, we interface DOP with speech and show how the most likely meaning M of an acoustic utterance A given dialogue context C is computed: argmax g P(M, A I C). The last section of this paper deals with the experimental evaluation of the model.</Paragraph>
    <Paragraph position="7"> 2. The OVIS corpus: trees enriched with compositional frame semantics The OVIS corpus currently consists of 10,000 syntactically and semantically annotated user utterances that were collected on the basis of a pilot version of the OVIS system 2. The user utterances are answers to system questions such as From where to where do you want to travel?, At what time do you want to travel from Utrecht to Leiden?, Could you please repeat your destination ?.</Paragraph>
    <Paragraph position="8"> For the syntactic annotation of the OVIS user utterances, a tag set of 40 lexical/syntactic categories  was developed. This tag set was deliberately kept small so as to improve the robustness of the DOP parser. A correlate of this robustness is that the parser will overgenerate, but as long as the probability model can accurately select the correct utterance-analysis from all possible analyses, this overgeneration is not problematic. Robustness is further achieved by a special category, called ERROR. This category is used for stutters, false starts, and repairs. No grammar is used to determine the correct syntactic annotation; there is a small set of guidelines, that has the degree of detail necessary to avoid an &amp;quot;anything goes&amp;quot; attitude in the annotator, but leaves room for the annotator's perception of the structure of the utterance (see Bonnema et al. 1997).</Paragraph>
    <Paragraph position="9"> The semantic annotations are based on the update language defined for the OVIS dialogue manager by Veldhuijzen van Zanten (1996). This language consists of a hierarchical frame structure with slots and values for the origin and destination of a train connection, for the time at which the user wants to arrive or depart, etc. The distinction between slots and values can be regarded as a special case of ground and focus distinction (Vallduvi 1990). Updates specify the ground and focus of the user utterances.</Paragraph>
    <Paragraph position="10"> For example, the utterance Ik wil niet vandaag maar morgen naar Almere (literally: &amp;quot;I want not today but tomorrow to Almere&amp;quot;) yields the following update: (4) user.wants. ( ( \[# today\] ; \[ ! tomorrow\] ) ; destination .place. town. almere) An important property of this update language is that it allows encoding of speech-act information (v. Noord et al. 1997). The &amp;quot;#&amp;quot; in the update means that the information between the square brackets (representing the focus of the user-utterance) must be retracted, while the &amp;quot;!&amp;quot; denotes the corrected information. This update language is used to semantically enrich the syntactic nodes of the OVIS trees by means of the following annotation convention:  * Every meaningful lexical node is annotated with a slot and/or value from the update language which represents the meaning of the lexical item.</Paragraph>
    <Paragraph position="11"> * Every meaningful non-lexical node is annotated  with a formula schema which indicates how its meaning representation can be put together out of the meaning representations assigned to its daughter nodes.</Paragraph>
    <Paragraph position="12"> In the examples below, these schemata use the variable dl to indicate the meaning of the leftmost daughter constituent, d2 to indicate the meaning of the second daughter node constituent, etc. For instance, the full (syntactic and semantic) annotation for the above sentence Ik wil niet vandaag maar morgen naar Almere is given in figure (5).</Paragraph>
    <Paragraph position="13"> Note that the top-node meaning of (5) is compositionally built up out of the meanings of its sub-constituents. Substituting the meaning representations into the corresponding variables yields the update expression (4). The OVIS annotations are in contrast with other corpora and systems (e.g. Miller et al. 1996), in that our annotation convention exploits  Note that the ERROR category has no semantic annotation; in the top-node semantics of Van Voorburg 3 To maintain our annotation convention in the face of phenomena such as non-standard quantifier scope or discontinuous constituents may create complications in the syntactic or semantic analyses assigned to certain sentences and their constituents. It is therefore not clear yet whether our current treatment ought to be viewed as completely general, or whether a more sophisticated treatment in the vein of van den Berg et al. (1994) should be worked out.</Paragraph>
    <Paragraph position="14">  naar van Venlo naar Voorburg, the meaning of the false start Van Voorburg naar is thus absent:</Paragraph>
    <Paragraph position="16"> des tination, place, town. voorburg ) The manual annotation of 10,000 OVIS utterances may seem a laborious and error-prone process. In order to expedite this task, a flexible and powerful annotation workbench (SEMTAGS) was developed by Bonnema (1996). SEMTAGS is a graphical interface, written in C using the XVIEW toolkit. It offers all functionality needed for examining, evaluating, and editing syntactic and semantic analyses. SEMTAGS is mainly used for correcting the output of the DOP parser. After the first 100 OVIS utterances were annotated and checked by hand, the parser used the subtrees of these annotations to produce analyses for the next 100 OVIS utterances. These new analyses were checked and corrected by the annotator using SEMTAGS, and were added to the total set of annotations. This new set of 200 analyses was then used by the DOP parser to predict the analyses for a next subset of OVIS utterances. In this incremental, bootstrapping way, 10,000 OVIS utterances were annotated in approximately 600 hours (supervision included). For further information on OVIS and how to obtain the corpus, see http://earth.let.uva.nl/-rens.</Paragraph>
    <Paragraph position="17"> 3. Using the OVIS corpus for data-oriented semantic analysis An important advantage of a corpus annotated according to the Principle of Compositionality of Meaning is that the subtrees can directly be used by DOP for computing syntactic/semantic representations for new utterances. The only difference is that we now have composite labels which do not only contain syntactic but also semantic information. By way of illustration, we show how a representation for the input utterance lk wil van Venlo naar Almere (&amp;quot;I want from Venlo to Almere&amp;quot;) can be constructed out of subtrees from the trees in figures (5) and (6):</Paragraph>
    <Paragraph position="19"> origin.place town.venlo destination.place town.almere I I I I van venlo near almere which yields the following top-node update semantics: (9) user.wants.</Paragraph>
    <Paragraph position="20"> ( origin, place, town. venlo ; destination, place, town. almere) The probability calculations for the semantic DOP model are similar to the original DOP model. That is, the probability of a subtree t is equal to the number of occurrences of t in the corpus divided by the number of occurrences of all subtrees t' that can be substituted on the same node as t. The probability of a derivation D = t 1 o ... o t n is the product of the probabilities of its subtrees t i. The probability of a parse tree T is the sum of the probabilities of all derivations D that produce T. And the probability of a meaning M and a word string W is the sum of the probabilities of all parse trees T of W whose top-node meaning is logically equivalent to M (see Bod et al. 1996).</Paragraph>
    <Paragraph position="21"> As with the most probable parse, the most probable meaning M of a word string W cannot be computed in deterministic polynomial time. Although the most probable meaning can be estimated by iterative Monte Carlo sampling (see Bod 1995), the computation of a sufficiently large number of random derivations is currently not efficient enough for a practical application. To date, only the most likely derivation can be computed in near to real-time (by a best-first Viterbi optimization algorithm). We therefore assume that most of the probability mass for each top-node meaning is focussed on a single derivation.</Paragraph>
    <Paragraph position="22"> Under this assumption, the most likely meaning of a string is the top-node meaning generated by the most likely derivation of that string (see also section 5). 4. Extending DOP to dialogue context: context-dependent subcorpora We now extend the semantic DOP model to compute the most likely meaning of a sentence given the previous dialogue. In general, the probability of a top-node meaning M and a particular word string W i given a dialogue-context Ci = Wi-l, Wi-2 ... WI is given by P(M, W i I Wi-l, Wi-2 ... WI).</Paragraph>
    <Paragraph position="23"> Since the OVIS user utterances are typically answers to previous system questions, we assume that the meaning of a word string W i does not depend on the full dialogue context but only on the previous (system) question Wi.l. Under this assumption, P(M, W i l Ci) = P(M,W i I Wi_l) For DOP, this formula means that the update semantics of a user utterance W i is computed on the basis of the subcorpus which contains all OVIS utterances (with their annotations) that are answers to the system question Wi_ 1. This gives rise to the following interesting model for dialogue processing: each system question triggers a context-dependent domain (a subcorpus) by which the user answer is analyzed and interpreted. Since the number of different system questions is a small closed set (see Veldhuijzen van Zanten 1996), we can create off-line for each subcorpus the corresponding DOP parser.</Paragraph>
    <Paragraph position="24"> In OVIS, the following context-dependent subcorpora can be distinguished:  (1) place subcorous: utterances following questions like From where to where do you want to travel? What is ),our destination ?, etc.</Paragraph>
    <Paragraph position="25"> (2) date subcorpus: utterances following questions like When do you want to travel?, When do you want to leave from X?, When do you want to arrive in Y?, etc.</Paragraph>
    <Paragraph position="26"> (3) time subcorpus: utterances following questions like At what time do you want to travel? At what time do you want to leave from X?, At what time do you want to arrive in Y?, etc.</Paragraph>
    <Paragraph position="27"> (4) yes/no subcorpus: utterances following y/nquestions like Did you say that ... ? Thus you want to arrive at... ?  Note that a subcorpus can contain utterances whose topic goes beyond the previous system question. For example, if the system asks From where to where do you want to travel?, and the user answers with: From Amsterdam to Groningen tomorrow morning, then the date-expression tomorrow morning ends up in the place-subcorpus.</Paragraph>
    <Paragraph position="28"> It is interesting to note that this context-sensitive DOP model can easily be generalized to domain-dependent interpretation: a corpus is clustered into subcorpora, where each subcorpus corresponds to a topic-dependent domain. A new utterance is interpreted by the domain in which it gets highest probability. Since small subcorpora tend to assign higher probabilities to utterances than large subcorpora (because relative frequencies of subtrees in small corpora tend to be higher), it follows that a language user strives for the smallest, most specific domain in which the perceived utterance can be analyzed, thus establishing a most specific common ground.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML