File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1304_metho.xml
Size: 15,728 bytes
Last Modified: 2025-10-06 14:10:41
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1304"> <Title>Interactive Question Answering and Constraint Relaxation in Spoken Dialogue Systems</Title> <Section position="4" start_page="28" end_page="28" type="metho"> <SectionTitle> 2 System architecture </SectionTitle> <Paragraph position="0"> Our dialogue system employs the following architecture: the output of a speech recognizer (Nuance, using a statistical language model) is analyzed by both a general-purpose statistical dependency parser and a (domain-specific) topic classifier. Parse trees and topic labels are matched by the 'dialogue move scripts' of the dialogue manager (Mirkovic and Cavedon, 2005; Weng et al., 2005). The scripts serve to license the instantiation of dialogue moves and their integration into the 'dialogue move tree.' The use of dialogue move scripts is motivated by the need to quickly tailor the system to new domains: only the scripts need to be adapted, not the underlying machinery implemented in Java. The scripts define short sequences of dialog moves, for example a command move (&quot;play song X&quot;) may be followed either by a disambiguation question or a confirmation that the command will be executed. A dialogue proceeds by integrating such scripted sequences into the dialogue move tree, yielding a relatively 'flat' dialogue structure.</Paragraph> <Paragraph position="1"> Query constraints are built by dialogue move scripts if the parse tree matches input patterns specified in the scripts. These query constraints are the starting point for the processing strategies described in this paper. The dialogue system is fully implemented and has been used in restaurant selection and MP3 player tasks. There are 41 task-independent, generic dialogue move scripts, 52 restaurant selection scripts and 89 MP3 player scripts. The examples in this paper are mostly taken from the restaurant selection task.</Paragraph> </Section> <Section position="5" start_page="28" end_page="30" type="metho"> <SectionTitle> 3 Knowledge and Content management </SectionTitle> <Paragraph position="0"> The Knowledge Manager (KM) controls access to domain knowledge that is structured according to domain-dependent ontologies. The KM makes use of OWL, a W3C standard, to represent the ontological relationships between domain entities. The knowledge base can be dynamically updated with new instances at any point. In a typical interaction, the Dialog Manager converts a user's query into a semantic frame (i.e., a set of semantic constraints) and sends this to the KM via the content optimizer. For example, in the Restaurant domain, a request such as &quot;I want to find an inexpensive Japanese restaurant that takes reservations&quot; results in the semantic frame below, where Category is a system property, and the other constraints are inherited properties of the Restaurant class:</Paragraph> <Paragraph position="2"> In addition to the KM module, we employ a Content Optimization (CO) module that acts as an intermediary between dialogue and knowledge management during the query process. It receives semantic frames from the Dialogue Manager, revises the semantic frames if necessary (see below), and queries the Knowledge Manager.</Paragraph> <Paragraph position="3"> The content optimizer also resolves remaining ambiguities in the interpretation of constraints.</Paragraph> <Paragraph position="4"> For example, if the user requests an unknown cuisine type, the otherwise often accurate classifier will not be able to provide a label since it operates under a closed-world assumption. In contrast, the general purpose parser may be able to provide an accurate syntactic analysis. However, the parse still needs to be interpreted by the content optimizer which has the domain-specific knowledge to determine that &quot;Montenegrin restaurant&quot; is a cuisine constraint rather than a service level constraint, for example. (See also section 7).</Paragraph> <Paragraph position="5"> Depending on the items in the query result set, configurable properties, and (potentially) a user model, the CO module selects and performs an appropriate optimization strategy. To increase portability, the module contains a library of domain-independent strategies and makes use of external configuration files to tailor it to specific domains.</Paragraph> <Paragraph position="6"> The CO module can modify constraints depending on the number of items in the result set, the system ontology, and information from a user model. Constraints can be relaxed, tightened, added or removed. The manner in which a constraint is modified depends on what kind of values it takes. For example, for the Cuisine constraint, values are related hierarchically (e.g., Chinese, Vietnamese, and Japanese are all sub-types of Asian), whereas PriceLevel values are linear (e.g., cheap, moderate, expensive), and acceptsCreditCards values are binary (e.g., ac- null cepted or not accepted).</Paragraph> <Paragraph position="7"> If the original query returns no results, the content optimizer selects a constraint to modify and then attempts to relax the constraint value. If relaxation is impossible, it removes the constraint instead. Constraint relaxation makes use of the ontological relationships in the knowledge base.</Paragraph> <Paragraph position="8"> For example, relaxing a Cuisine constraint entails replacing it with its parent-concept in the domain ontology. Relaxing a linear constraint entails replacing the current value with an adjacent value.</Paragraph> <Paragraph position="9"> Relaxing a binary constraint entails replacing the current value with its opposite value.</Paragraph> <Paragraph position="10"> Based on the ontological structures, the content optimizer also calculates statistics for every set of items returned by the knowledge manager in response to a user's query. If the result set is large, these figures can be used by the dialogue manager to give meaningful responses (e.g., in the MP3 domain, &quot;There are 85 songs. Do you want to list them by a genre such as Rock, Pop, or Soul?&quot;).</Paragraph> <Paragraph position="11"> The content optimizer also produces constraints that represent meta-knowledge about the ontology, for example, in response to a user input &quot;What cuisines are there?&quot;: (2) rdfs:subClassOf = restaurant:Cuisine The processing modules described in the next sections can use meta-level constraints in similar ways to object-level constraints (see (1)).</Paragraph> <Paragraph position="12"> 4 Dialogue strategies for dealing with query results In the following two sections, we describe how our dialogue and generation strategies tie in with the choices made by the content optimizer. Consider the following discourse-initial interaction for which the semantic frame (1) is constructed: (3) U: i want to find an inexpensive Japanese restaurant that takes reservations S: I found 9 inexpensive Japanese restaurants that take reservations .</Paragraph> <Paragraph position="13"> Here are the first few : The example query has a relatively small result set which can be listed directly. This is not always the case, and thus we need dialogue strategies that deal with different result set sizes. For example, it does not seem sensible to produce &quot;I found 2500 restaurants. Here are the first few: ...&quot;. At what point does it become unhelpful to list items? We do not have a final answer to this question - however, it is instructive that the (human) wizard in our data collection experiments did not start listing when the result set was larger than about 10 items. In the implemented system, we define dialogue strategies that are activated at adjustable thresholds.</Paragraph> <Paragraph position="14"> Even if the result set is large and the system does not list any result items, the user may still want to see some example items returned for the query. This observation is based on comments by subjects in experimental dry-runs that in some cases it was difficult to obtain any query result at all. For example, speech recognition errors may make it difficult to build up a sufficiently complex query. In response to this, we always give some example items even if the result set is large. (An alternative would be to start listing items after a certain number of dialogue turns.) Furthermore, the system should encourage the user to refine the query by suggesting constraints that have not been used yet. This is done by maintaining a list of constraints in the generator that is used up as the dialogue progresses. This list is roughly ordered by how likely the constraint will be useful. For example, using cuisine type is suggested before proposing to ask for information about reservations or credit cards.</Paragraph> <Paragraph position="15"> In our architecture, information flows from the CO module to the generator (see section 5) via the dialogue move scripts of the dialogue manager.</Paragraph> <Paragraph position="16"> These are conditioned on the size of the final result set and whether or not any modifications were performed. Table 1 summarizes the main dialogue strategies. These dialogue strategies represent implicit confirmations and are used if NLU has a high confidence in its analysis of the user utterance (see (Varges and Purver, 2006) for more details on our handling of robustness issues). Small result sets up to a threshold t1 are listed in a single sentence. For medium-sized result sets up to a threshold t2, the system starts listing immediately. For large result sets, the generator shows example items and makes suggestions as to what constraint the user may use next. If the CO module performs any constraint modification, the first, constraint realizing sentence of the system turns reflects the modification. ('NP-original' and 'NP-optimized' in table 1 are used for brevity and are explained in the next section.)</Paragraph> </Section> <Section position="6" start_page="30" end_page="31" type="metho"> <SectionTitle> 5 Generation </SectionTitle> <Paragraph position="0"> The generator produces turns that verbalize the constraints used in the database query. This is important since the system may miss or misinterpret constraints, leading to uncertainty for the user about what constraints were used. For this reason, a generic system response such as &quot;I found 9 items.&quot; is not sufficient.</Paragraph> <Paragraph position="1"> The input to the generator consists of the name of the dialogue move and the relevant instantiated nodes of the dialogue move tree. From the instantiated move nodes, the generator obtains the database query result including information about query modifications. The core of the generator is a set of productions1 written in the Java Expert System Shell (Friedman-Hill, 2003). We follow the bottom-up generation approach for production systems described in (Varges, 2005) and perform mild overgeneration of candidate moves, followed by ranking. The highest-ranked candidate is selected for output.</Paragraph> <Paragraph position="2"> Productions map individual database constraints to phrases such as &quot;open for lunch&quot;, &quot;within 3 miles&quot; and &quot;a formal dress code&quot;, and recursively combine them into NPs. This includes the use of coordination to produce &quot;restaurants with a 5-star rating and a formal dress code&quot;, for example. The NPs are integrated into sentence templates, several of which can be combined to form an output candidate turn. For example, a constraint realizing template &quot;I found no [NP1Productions are 'if-then' rules that operate over a shared knowledge base of facts.</Paragraph> <Paragraph position="3"> original] but there are [NUM] [NP-optimized] in my database&quot; (see below for further explanation) can be combined with a follow-up sentence template such as &quot;You could try to look for [NPconstraint-suggestion]&quot;. null The selection of which sentence template to use is determined by the dialogue move scripts. Typically, a move-realizing production produces several alternative sentences. On the other hand, the NP generation rules realize constraints regardless of the specific dialogue move at hand. This allows us to also use them for clarification questions based on constraints constructed from classifier information if the parser and associated parse-matching patterns fail; all that is required is a new sentence template, for example &quot;Are you looking for [NP]?&quot;. We currently use 102 productions overall in the restaurant and MP3 domains, 38 of them to generate NPs that realize 19 possible input constraints (for both domains).</Paragraph> <Paragraph position="4"> The decision of the CO module to relax or remove constraints also affects the generator: there are two sets of constraints, an 'original' one directly constructed from the user utterance, and an 'optimized' one used by the KM module to obtain the query result (see section 3). In case of constraint modifications, these two sets are not identical but often overlapping. To avoid generating separate sets of NPs independently for the two constraint sets, we assign unique indices to the constraints and hand the generator two index sets as targets of NP generation. We overgenerate NPs and check their index sets before integrating them into sentence templates.</Paragraph> <Paragraph position="5"> Ranking of candidate output moves is done by using a combination of factors. First, the ranker computes an alignment score for each candidate, based on its ngram-based overlap with the user utterance. For example, this allows us to prefer &quot;restaurants that serve Chinese food&quot; over &quot;Chinese restaurants&quot; if the user used a wording more similar to the first. We note that the Gricean Maxim of Brevity, applied to NLG in (Dale and Reiter, 1995), suggests a preference for the second, shorter realization. However, if the user thought it necessary to use &quot;serves&quot;, maybe to avoid confusion of constraints or even to correct an earlier mislabeling, then the system should make it clear that it understood the user correctly by using those same words, thus preferring the first realization. Mild overgeneration combined with alignment also allows us to map the constraint PriceLevel=0-10 in example (1) above to both &quot;cheap&quot; and &quot;inexpensive&quot;, and use alignment to 'play back' the original word choice to the user.</Paragraph> <Paragraph position="6"> As these examples show, using alignment for ranking in NLG allows one to employ overgeneration techniques even in situations where no corpus data is available.2 Second, ranking uses a variation score to 'cycle' over sentence-level paraphrases. In the extreme case of repeated identical user inputs, the system simply chooses one paraphrase after the other, and starts over when all paraphrases have been used.</Paragraph> <Paragraph position="7"> Third, we use an ngram filter based on bad examples ngrams, removing, for example, &quot;Chinese cheap restaurants&quot; but keeping &quot;cheap Chinese restaurant.&quot; For generalization, we replace constraint realizations with semantic tags derived from the constraint names (except for the head noun), for example the trigram 'CUISINE PRICE restaurants'. An alternative is to use a more com2However, we do have wizard-of-oz data to inform the system design (see section 7).</Paragraph> <Paragraph position="8"> plex grammar formalism to prevent ungrammatical candidate moves.</Paragraph> </Section> class="xml-element"></Paper>