File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/w02-0219_metho.xml
Size: 16,105 bytes
Last Modified: 2025-10-06 14:07:57
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-0219"> <Title>A New Taxonomy for the Quality of Telephone Services Based on Spoken Dialogue Systems</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Quality of Service Taxonomy </SectionTitle> <Paragraph position="0"> It is obvious that quality is not an entity which could be measured in an easy way, e.g. using a technical instrument. The quality of a service results from the perceptions of its user, in relation to what they expect or desire from the service. In the following, it will thus be made use of the definition of quality developed by Jekosch (2000): &quot;Quality is the result of the judgment of a perceived constitution of an entity with regard to its desired constitution. [...] The perceived constitution contains the totality of the features of an entity. For the perceiving person, it is a characteristic of the identity of the entity.&quot; The entity to be judged in our case is the service the user interacts with (through the telephone network), and which is based on a spoken dialogue system. Its quality is a compromise between what s/he expects or desires, and the characteristics s/he perceives while using the service.</Paragraph> <Paragraph position="1"> At this point, it is useful to differentiate between quality elements and quality features, as it was also proposed by Jekosch. Whereas the former are system or service characteristics which are in the hands of the designer (and thus can be optimized to reach high quality), the latter are perceptive dimensions forming the overall picture in the mind of the user.</Paragraph> <Paragraph position="2"> Generally, no stable relationship which would be valid for all types of services, users and situations can be established between the two. Evaluation frameworks such as PARADISE establish a temporary relationship, and try to reach some cross-domain validity. Due to the lack of quality elements which can really be manipulated in some way by the designer, however, the framework has to start mostly from dialogue and system measures which cannot be directly controlled. These measures will be listed in Section 4.</Paragraph> <Paragraph position="3"> The quality of a service (QoS) is often addressed only from the designer side, e.g. in the definition used by the International Telecommunication Union for telephone services (ITU-T Rec. E.800, 1994).</Paragraph> <Paragraph position="4"> It includes service support, operability, security and serveability. Whereas these issues are necessary for a successful set-up of the service, they are not directly perceived by the user. In the following taxonomy, the focus is therefore put on the user side. The overall picture is presented in Figure 1. It illustrates the categories (white boxes) which can be sub-divided into aspects (gray boxes), and their relationships (arrows). As the user is the decision point for each quality aspect, user factors have to be seen in a distributed way over the whole picture. This fact has tentatively been illustrated by the gray cans on the upper side of the taxonomy, but will not be further addressed in this paper. The remaining categories are discussed in the following.</Paragraph> <Paragraph position="5"> Walker et al. (1997) identified three factors which carry an influence on the performance of SDSs, and which therefore are thought to contribute to its quality perceived by the user: agent factors (mainly related to the dialogue and the system itself), task factors (related to how the SDS captures the task it has been developed for) and environmental factors (e.g.</Paragraph> <Paragraph position="6"> factors related to the acoustic environment and the transmission channel). Because the taxonomy refers to the service as a whole, a fourth point is added here, namely contextual factors such as costs, type of access, or the availability. All four types of factors subsume quality elements which can be expected to carry an influence on the quality perceived by the user. The corresponding quality features are summarized into aspects and categories in the following lower part of the picture.</Paragraph> <Paragraph position="7"> The agent factors carry an influence on three quality categories. On the speech level, input and output quality will have a major influence. Quality features for speech output have been largely investigated in the literature, and include e.g. intelligibility, naturalness, or listening-effort. They will depend on the whole system set-up, and on the situation and task the user is confronted with. Quality features related to the speech input from the user (and thus to the system's recognition and understanding capabilities) are far less obvious. They are, in addition, much more difficult to investigate, because the user only receives an indirect feedback on the system's capabilities, namely from the system reactions which are influences by the dialogue as a whole. Both speech input and output are highly influenced by the environmental factors.</Paragraph> <Paragraph position="8"> On the language and dialogue level, dialogue cooperativity has been identified as a key requirement for high-quality services (Bernsen et al., 1998). The classification of cooperativity into aspects which was proposed by Bernsen et al., and which is related to Grice's maxims (Grice, 1975) of cooperative behavior in HHI, is mainly adopted here, with one exception: we regard the partner asymmetry aspect under a separate category called dialogue symmetry, together with the aspects initiative and interaction control. Dialogue cooperativity will thus cover the aspects informativeness, truth and evidence, relevance, manner, the user's background knowledge, and meta-communication handling strategies.</Paragraph> <Paragraph position="9"> Adopting the notion of efficiency used by ETSI and ISO (ETSI Technical Report ETR 095, 1993), efficiency designates the effort and resources expanded in relation to the accuracy and completeness with which users can reach specified goals. It is proposed to differentiate three categories of efficiency. Communication efficiency relates to the efficiency of the dialogic interaction, and includes -besides the aspects speed and conciseness -- also the smoothness of the dialogue (which is sometimes called &quot;dialogue quality&quot;). Note that this is a significant difference to many other notions of efficiency, which only address the efforts and resources, but not the accuracy and completeness of the goals to be reached. Task efficiency is related to the success of the system in accomplishing the task; it covers task success as well as task ease. Service efficiency is the adequacy of the service as a whole for the purpose defined by the user. It also includes the &quot;added value&quot; which is contributed to the service, e.g. in comparison to other means of information (comparable interfaces or human operators).</Paragraph> <Paragraph position="10"> In addition to efficiency aspects, other aspects exist which relate to the agent itself, as well as its perception by the user in the dialogic interaction.</Paragraph> <Paragraph position="11"> We subsume these aspects under the category &quot;comfort&quot;, although other terms might exist which better describe the according perceptions of the user.</Paragraph> <Paragraph position="12"> Comfort covers the agent's &quot;social personality&quot; (perceived friendliness, politeness, etc.), as well as the cognitive demand required from the user.</Paragraph> <Paragraph position="13"> Depending on the area of interest, several notions of usability are common. Here, we define usability as the suitability of a system or service to fulfill the user's requirements. It considers mainly the ease of using the system and may result in user satisfaction. It does, however, not cover service efficiency or economical benefit, which carry an influence on the utility (usability in relation to the financial costs and to other contextual factors) of the service. Walker et al. (1998) also state that &quot;user satisfaction ratings [...] have frequently been used in the literature as an external indicator of the usability of an agent.&quot; As Kamm and Walker (1997), we assume that user satisfaction is predictive of other system designer objectives, e.g. the willingness to use or pay for a service. Acceptability, which is commonly defined on this more or less &quot;economic&quot; level, can therefore be seen in a relationship to usability and utility. It is a multidimensional property of a service, describing how readily a customer will use the service. The acceptability of a service (AoS) can be represented as the ratio of potential users to the quantity of the target user group, see definitions on AoS adopted by</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> EURESCOM (EURESCOM Project P.807 Deliver- </SectionTitle> <Paragraph position="0"> able 1, 1998).</Paragraph> <Paragraph position="1"> From the schematic, it can be seen that a large number of aspects contribute to what can be called communication efficiency, usability or user satisfaction. Several interrelations (and a certain degree of inevitable overlap) exist between the categories and aspects, which are marked by arrows. The interrelations will become more apparent by taking a closer look to the underlying quality features which can be associated with each aspect. They will be presented in the following section.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Classification of Quality Features </SectionTitle> <Paragraph position="0"> In Tables 1 and 2, an overview is given of the quality features underlying each aspect of the QoS taxonomy. For the aspects related to dialogue cooperativity, these aspects partly stem from the design guideline definitions given by Bernsen et al. (1998).</Paragraph> <Paragraph position="1"> For the rest, quality features which have been used in experimental investigations on different types of dialogue systems have been classified. They do not solely refer to telephone-based services, but will be valid for a broader class of systems and services.</Paragraph> <Paragraph position="2"> By definition, quality features are percepts of the users. They can consequently only be measured by asking users in realistic scenarios, in a subjective way. Several studies with this aim are reported in the literature. The author analyzed 12 such investigations and classified the questions which were asked to the users (as far as they have been reported) according to the quality features. For each aspect given in Tables 1 and 2, at least two questions could be identified which addressed this aspect. This classification cannot be reproduced here for space reasons. Additional features of the questionnaires directly address user satisfaction (e.g. perceived satisfaction, degree of enjoyment, user happiness, system likability, degree of frustration or irritation) and acceptability (perceived acceptability, willingness to use the system in the future).</Paragraph> <Paragraph position="3"> From the classification, it seems that the taxonomy adequately covers what researchers intuitively would include in questionnaires investigating usability, user satisfaction and acceptability.</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Classification of Dialogue and System </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Measures </SectionTitle> <Paragraph position="0"> Experiments with human subjects are still the only way to investigate quality percepts. They are, however, time-consuming and expensive to carry out.</Paragraph> <Paragraph position="1"> For the developers of SDSs, it is therefore interesting to identify quality elements which are in their hands, and which can be used for enhancing the quality for the user. Unfortunately, only few such elements are known, and their influence on service quality is only partly understood. Word accuracy or word error rate, which are common measures to describe the performance of speech recognizers, can be taken as an example. Although they can be measured partly instrumentally (provided that an agreed-upon corpus with reference transcriptions exists), and the system designer can tune the system to increase the word accuracy, it cannot be determined beforehand how this will affect system usability or user satisfaction.</Paragraph> <Paragraph position="2"> For filling this gap, dialogue- and system-related measures have been developed. They can be determined during the users' experimental interaction with the system or from log-files, either instrumentally (e.g. dialogue duration) or by an expert evaluator (e.g. contextual appropriateness). Although they provide useful information on the perceived quality of the service, there is no general relationship between one or several such measures, and specific quality features. The PARADISE framework (Walker et al., 1997) produces such a relationship for a specific scenario, using multivariate linear regression. Some generalizablility can be reached, but the exact form of the relationship and its constituting input parameters have to be established for each system anew.</Paragraph> <Paragraph position="3"> A generalization across systems and services might be easier if a categorization of dialogue and system measures can be reached. Tables 3 and 4 in the Appendix report on the classification of 37 different measures defined in literature into the QoS taxonomy. No measures have been found so far which directly relate to speech output quality, agent personality, service efficiency, usability, or user satisfaction. With the exception of the first aspect, it may however be assumed that they will be addressed by a combination of the measures related to the underlying aspects.</Paragraph> </Section> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Comparison to Human-Human Services </SectionTitle> <Paragraph position="0"> It has been stated earlier that the QoS taxonomy for telephone-based spoken dialogue services has been derived from an earlier schematic addressing human-to-human telephone services (M&quot;oller, 2000). This schematic is depicted in Figure 2, with slight modifications on the labels of single categories from the original version.</Paragraph> <Paragraph position="1"> In the HHI case, the focus is placed on the categories of speech communication. This category (re null placing environmental and agent factors of the HMI case) is divided into a one-way voice transmission category, a conversational category (conversation effectiveness), and a user-related category (ease of communication; comparable to the category &quot;comfort&quot; in the HMI case). The task and service categories of the interaction with the SDS are replaced by the service categories of the HHI schematic. The rest of the schematic is congruent in both cases, although the single aspects which are covered by each category obviously differ.</Paragraph> <Paragraph position="2"> The taxonomy of Figure 2 has fruitfully been used to classify three types of entities: a0 quality elements which are used for the set-up and planning of telephone networks (some of these elements are given in the gray boxes of Figure 2) a0 assessment methods commonly used for measuring quality features in telecommunications a0 quality prediction models which estimate single quality features from the results of instrumental measurements Although we seem to be far from reaching a comparable level in the assessment and prediction of HMI quality issues, it is hoped that the taxonomy of Figure 1 can be equally useful with respect to telephone services based on SDSs.</Paragraph> </Section> class="xml-element"></Paper>