File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-0602_metho.xml

Size: 15,322 bytes

Last Modified: 2025-10-06 14:14:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0602">
  <Title>Feature A B C D E F G</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Common components in practical
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Dialogue Management Systems
</SectionTitle>
      <Paragraph position="0"> Our recent survey of a number of dialogue management systems has led us to identify those features and components which occur in many of the systems. By examining a range of successful systems, from flight information services (Fraser 1995b) and appointment scheduling in Verbmobil (Alexander and Reithinger 1995, Maier 1996, Alexandersson 1996) to theatre ticket booking (Hulstijn et al. 1996) and virtual space navigation (Nugues et al. 1996), a template for a generic dialogue management system has been drafted. A number of features are incorporated, including a pragmatics interpreter dealing with discourse phenomena such as anaphoric resolution and ellipsis, a model of the task structure and how it relates to the dialogue structure, a model of conversation incorporating an interaction strategy and a recovery strategy, and a semantic interpreter which resolves the full interpretation of an utterance in light of its context. This generic template can be used in the design of future dialogue management systems, highlighting important features and the mechanisms required to implement them. The template also provides an application-independent method for assessing systems according to the features they exhibit.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="10" type="metho">
    <SectionTitle>
4 Advantages of qualitative
</SectionTitle>
    <Paragraph position="0"> assessment against a standard Speech And Language Technology researchers are used to thinking of evaluation in terms of speed and accuracy of system outputs, for example 'success rate' of a speech recogniser or syntactic parser in analysing a standard test corpus. However, 'Din- null logue Management' is a high-level linguistic concept which cannot be measured so straightforwardly for  several reasons: - existing DMSs are very domain-specific, and we need to compare dialogue systems across domains; so it makes no sense to look for a common standard 'test corpus'; - the boundary between 'good' and 'bad' dialogue is very ill-defined, so it makes little sense to try to assess against a target 'correct output', or even by subjective assessment of 'pleasantness' of output; - the structure of dialogue (and hence a DMS) is complex, multi-level, and non-algorithmic, making a single overall 'evaluation metric' meaningless without consideration of component behaviours; - we need to evaluate the integrated system holistically, as opposed to measuring speed or accuracy of individual components; - alternative dialogue systems use a wide range of  alternative component technologies; only by fitting these against a generic template can we discriminate between superficial and substantive differences in component assumptions and functionalities.</Paragraph>
    <Paragraph position="1"> There is a useful analogy with evaluation of NL parsers; typically, rival parsers are compared by measuring speed (sentences-per-minute) and/or accuracy (e.g. percentage of sentences parsed) - e.g. (Sutcliffe et al 1996). However, rival parsing schemes include varying 'levels' of syntactic information, as shown in EAGLES recommendations (Leech et al 1995). Atwell (1996) proposes an orthogonal evaluation of parsing schemes against the generic EAGLES 'template' of syntactic levels, so that a given parser speed/accuracy measure should be moderated by a 'genericness' weight; for example, the EN-GCG parser (Voutilainen and Jarvinen 1996) is very fast and accurate BUT its underlying parsing scheme instantiates only a small subset of the EAGLES 'template', which moderates an overall 'score'. In much the same way, we propose that very unlike rival DMSs can be meaningfully compared by assessing how well they match our generic template for dialogue management architecture, and using this 'genericness' score to temper any measures of speed, accuracy, naturalness, etc.</Paragraph>
    <Paragraph position="2"> Consider (Churcher et al 1997), which included a first attempt at an outline of a generic spoken language system. The model includes generic modules for syntactic, semantic, and speech act constraints; these constraints are integrated into spoken input interpretation to compensate for limitations in speech recognition components. The model constitutes a template tool for designing integrated systems; it specifies the standard components and how they fit together. As is the predicament of any generic system it is necessarily vague and since it attempts to combine components found in a variety of individual models, it may not fit all systems, if any in particular. null In our survey, we studied how this generic model mapped onto a range of existing real systems, by looking at the representation formats for the various linguistic features in the dialogue management schemes; as with grammatical analysis schemes, there is a need for a theory-neutral 'interlingua' standard dialogue representation scheme (Atwell 1996).</Paragraph>
  </Section>
  <Section position="6" start_page="10" end_page="11" type="metho">
    <SectionTitle>
5 Features of Natural Dialogue
</SectionTitle>
    <Paragraph position="0"> 'Naturalness' in dialogue is difficult to define, but by examining phenomena which occur in human to human dialogue we can begin to draw some features which contribute to its definition. The proposed model in (Churcher et al 97) reflects this to a certain extent by incorporating components for phenomena such as anaphora and ellipsis whilst abstracting away from those components which are domain specific, such as the model of task/dialogue structure. To begin with, seven such features are described below.</Paragraph>
    <Paragraph position="1"> A: Anaphora Anaphora frequently occurs in dialogue. This form of deixis is applied to words which can only be interpreted in the given context of the dialogue. There are a number of different forms of anaphora including personal pronouns (&amp;quot; I&amp;quot;, &amp;quot;you&amp;quot;, &amp;quot;he/she/it&amp;quot; etc.), spatial anaphora (&amp;quot;there&amp;quot;, &amp;quot;that&amp;quot; etc.) and temporal anaphora (&amp;quot;then&amp;quot;). Expressions relative to the current context often need to be interpreted into an absolute or canonical form. This form of anaphora includes expressions such as &amp;quot;next week&amp;quot; and &amp;quot;the next entry&amp;quot; which can only be resolved in relation to a previous expression. By incorporating anaphora, a speaker can reduce redundancy and economise their speech.</Paragraph>
    <Paragraph position="2"> B: Ellipsis Ellipsis commonly occurs in a sentence where for reasons of economy, style or emphasis, part of the structure is omitted. The missing structure can be recovered from the context of the dialogue and normally the previous sentences. Without modelling ellipsis, dialogue can appear far from natural.</Paragraph>
    <Paragraph position="3"> C: Recovery strategy Although misunderstandings often occur in conversations, speakers have the ability to recover from these and other deviations in communication. Taleb (1996) presents an analysis of the type of communicative deviations which can occur in conversation and categorises them into content and role deviations. The inadequacies of speech recognition technology introduces additional potential deviations. A dialogue management system must be able to recover from any deviations which occur. Seldom in human to human conversation does the dialogue 'break down'.</Paragraph>
    <Paragraph position="4"> D: Interaction strategy At any stage in a dialogue, one participant has the initiative of the conversation. In everyday conversation, it is possible for either participant to take the initiative at any stage. Turning to dialogue management, the interaction strategy is important when defining the naturalness of the system. Systemorientated question and answer systems where the system has the initiative throughout the dialogue are the simplest to model since the user is explicitly constrained in their response. The greater freedom the user has to control the dialogue, the more complicated this modelling strategy becomes. Where the user has the initiative throughout the dialogue such as in command and control applications, the user has greater expressibility and freedom of choice. The most difficult dialogues to model are those where the initiative can be taken be either the system or the user at various points in the dialogue. As noted by Eckert (1996), mixed initiative systems involve dialogues which approach the intricacies of conversational turn-taking, utilising strategies which determine when, for example, the system can take the initiative away from the user. For systems using speech recognition, the ability to confirm or clarify given information is essential, hence system-orientated or mixed initiative should exist.</Paragraph>
    <Paragraph position="5"> E: Functional perplexity To a lesser extent, the range of tasks that can be performed by a particular dialogue is important. In human to human conversations, for example, an utterance can perform more than one illocutionary or speech act. In an analogous way, a dialogue can include more than one task, whether it is to book tickets for a performance, or to enquire about flight times. Looking to individual utterances, the greater the number of acts which can be performed, the more complex (or perplex) the language model becomes. In everyday conversation, humans are adept at marking topic boundaries and changes. For applications where more than one task is to be performed in a single dialogue, the dialogue manager needs to  be able to identify when the user switches from one task to another. Functional perplexity is a measure of the density of the topic changes in a single dialogue and is accordingly difficult to calculate. A simpler measure is to count the number of semantically distinct tasks a user can perform.</Paragraph>
    <Paragraph position="6"> F: Language perplexity The ability to express oneself as one wishes and still be understood is an important factor which contributes to naturalness in dialogue. This does not necessarily entail a very large vocabulary since corpus studies and similar language elicitation exercises can provide a relatively small, core vocabulary. The user's freedom of expression is implicitly related to the initiative strategy employed by the dialogue manager. For example, when the system has the initiative, the user's language can be explicitly constrained. In contrast a system which allows the user to take the initiative has less control of the user's language. Again, as with functional perplexity, the perplexity of a language in this sense is difficult to measure but it is helpful to look to the extent that the system attempts to constrain the user's language for performing a task. The level of constraint should not be measured when the system is recovering from deviations in the dialogue, since focussing the user may be necessary for recovering from the deviation in as few steps as possible.</Paragraph>
    <Paragraph position="7"> G: Over-informativeness There are two interpretations of over-informativeness, system and user orientated, system orientated over-informativeness allows the dialogue manager to present more information to the user than was actually explicitly requested. User orientated over-informativeness is an important feature to have and is directly related to the degree of freedom of expression. In natural dialogue, a speaker can provide more information than is actually requested. Humans are able to take this additional information into consideration or ignore it depending on how relevant it is to the conversation. The information may have been volunteered in anticipation of a future request for information and as a result a dialogue manager which ignores it will not appear very natural. As an example, consider the following dialogue between the system and user where the user responds with a reply which is overinformative: null User: I'd like to make an appointment.</Paragraph>
    <Paragraph position="8"> System: Who would you like to make an appointment with? User: John Smith at 2pm.</Paragraph>
  </Section>
  <Section position="7" start_page="11" end_page="12" type="metho">
    <SectionTitle>
6 A Questionnaire
</SectionTitle>
    <Paragraph position="0"> Whilst each of the above features are important, it is not obvious which are more important to 'naturalness' than others. Turning to the research community we asked those who had designed systems incorporating dialogue management for their experiences and opinions. The questionnaire asked the community to rank the features according to how important they thought they were to their particular dialogue manager and to comment on each one. Given the time constraints, it was not possible to ask more detailed questions about each feature, although the respondents were encouraged to give examples.</Paragraph>
    <Paragraph position="1"> Table 1 shows the six systems detailed, table 2 a summary of the importance of the features to each system. The results range from 1 - the most important to 7 - the least important; the ratings were allowed to be tied.</Paragraph>
    <Paragraph position="2">  Note that where '-' occurs, the feature was not ranked, and so is omitted from the mean. It is interesting to note that different respondents interpreted the ranking differently. Whilst some understood the points system to indicate the order of importance of each feature, others, such as \[6\] considered the points to be an indication of how important the feature was to their system.</Paragraph>
    <Paragraph position="3"> By taking the mean of the scores, the features can be ordered as follows, most important first:  The initial, tentative ranking of features indicates that anaphora and ellipsis are important, whilst functional perplexity and interaction strategy are least important. Given that the systems surveyed performed just one or two tasks, it is not surprising that functional perplexity is not ranked highly. The low ranking of the interaction strategy reflects the application of the system. For example, system \[4\], Verbmobil, regarded the interaction strategy to be of low importance since it is a minimally intrusive system which facilitates the dialogue between two humans.</Paragraph>
    <Paragraph position="4"> What is made clear is that we need to conduct further research into explicitly quantifying each feature for this approach to be worthwhile. Whilst features such as over-informativeness are either present or not, others are finer grained; the interaction strategy can be system-orientated, user-orientated or a combination of both. Language perplexity, in the sense meant here, needs to be quantified, too, before it can be considered a useful feature. In retrospect, the ranking of each feature needs to be made consistent. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML