File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/h93-1075_metho.xml
Size: 23,168 bytes
Last Modified: 2025-10-06 14:13:24
<?xml version="1.0" standalone="yes"?> <Paper uid="H93-1075"> <Title>A SIMULATION-BASED RESEARCH STRATEGY FOR DESIGNING COMPLEX NL SYSTEMS*</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> A SIMULATION-BASED RESEARCH STRATEGY FOR DESIGNING COMPLEX NL SYSTEMS* </SectionTitle> <Paragraph position="0"/> </Section> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> ABSTRACT </SectionTitle> <Paragraph position="0"> Basic research is critically needed to guide the development of a new generation of multimodal and multilingual NL systems. This paper summarizes the goals, capabilities, computing environment, and performance characteristics of a new semi-automatic simulation technique. This technique has been designed to support a wide spectrum of empirical studies on highly interactive speech, writing, and multi-modal systems incorporating pen and voice. Initial studies using this technique have provided information on people's language, performance, and preferential use of these communication modalities, either alone or in multimodal combination. One aim of this research has been to explore how the selection of input modality and presentation format can be used to reduce difficult sources of linguistic variability in people's speech and writing, such that more robust system processing results. The development of interface techniques for channeling users' language will be important to the ability of complex NL systems to function successfully in actual field use, as well as to the overall commercialization of this technology. Future extensions of the present simulation research also are discussed.</Paragraph> </Section> <Section position="3" start_page="0" end_page="370" type="metho"> <SectionTitle> 1. INTRODUCTION </SectionTitle> <Paragraph position="0"> Basic research is critically needed to guide the develop: ment of a new generation of complex natural language systems that are still in the planning stages, such as ones that support multimodal, multilingual, or multiparty exchanges across a variety of intended applications. In the case of planned multimodal systems, for example, the potential exists to support more robust, productive, and flexible human-computer interaction than that afforded by current unimodal ones \[3\]. However, since multimodal systems are relatively complex, the problem of how to design optimal configurations is unlikely to be solved through simple intuition alone. Advance empirical work *This research was supported in part by Grant No. IFti-9213472 from the National Science Foundation to the first authors, as well as additional funding and equipment donations from ATR Internstional, Apple Computer, USWest, and ~Vacom Inc. Any opinions, findings, or conclusions expressed in this paper are those of the authors, and do not necessarily reflect the views of our sponsors. t Michelle Wang is afl'diated with the Computer Science Department and Jeremy Gaston with the Symbolic Systems Program at with human subjects will be needed to generate a factual basis for designing multimodal systems that can actually deliver performance superior to unimodal ones.</Paragraph> <Paragraph position="1"> In particular, there is a special need for both methodological tools and research results based on high-quality simulations of proposed complex NL systems. Such simulations can reveal specific information about people's language, task performance, and preferential use of different types of systems, so that they can be designed to handle expected input. Likewise, simulation research provides a relatively affordable and nimble way to compare the specific advantages and disadvantages of alternative architectures, such that more strategic designs can be developed in support of particular applications. In the longer term, conclusions based on a series of related simulation studies also can provide a broader and more principled perspective on the best application prospects for emerging technologies such as speech, pen, and multimodal systems incoporating them.</Paragraph> <Paragraph position="2"> In part for these reasons, simulation studies of spoken language systems have become common in the past few years, and have begun to contribute to our understanding of human speech to computers \[1, 5, 6, 7, 8, 17\]. However, spoken language simulations typically have been slow and cumbersome. There is concern that delayed responding may systematically distort the data that these simulation studies were designed to collect, especially for a modality like speech from which people expect speed \[6, 10, 15\]. Unlike research on spoken language systems, there currently is very little literature on handwriting and pen systems. In particular, no simulation studies have been reported on: (1) interactive handwriting 1 \[6\], (2) comparing interactive speech versus handwriting as alternative ways to interact with a system, or (3) examining the combined use of speech and handwriting to simulated multimodal systems of different types. Potential advantages of a combined pen/voi~ system have been outlined previously \[4, 12\]. High quality simulation 1Although we are familiar with nonlnteractive writing from everyday activities like personal notetaking, very llttle-is known about interactive writing and pen use as a modality of human-computer interaction.</Paragraph> <Paragraph position="3"> research on these topics will be especially important to the successful design of mobile computing technology, much of which will emphasize communications and be keyboardless.</Paragraph> <Paragraph position="4"> The simulation technique developed for this research aims to: (1) support a very rapid exchange with simulated speech, pen, and pen/voice systems, such that response delays are less than 1 second and interactions can be subject-paced, (2) provide a tool for investigating interactive handwriting and other pen functionality, and (3) devise a technique appropriate for comparing people's use of speech and writing, such that differences between these communication modalities and their related technologies can be better understood. Toward these ends, an adaptable simulation method was designed that supports a wide range of studies investigating how people speak, write, or use both pen and voice when interacting with a system to complete qualitatively different tasks (e.g., verbal/temporal, computational/numeric, graphic/cartographic). The method also supports examination of different issues in spoken, written, and combined pen/voice interactions (e.g., typical error patterns and resolution strategies).</Paragraph> <Paragraph position="5"> In developing this simulation, an emphasis was placed on providing automated support for streamlining the simulation to the extent needed to create facile, subject-paced interactions with clear feedback, and to have comparable specifications for the different modalities. Response speed was achieved in part by using scenarios with correct solutions, and by preloading information. This enabled the assistant to click on predefined fields in order to respond quickly. In addition, the simulated system was based on a conversational model that provides analogues of human backchannel and propositional confirmations. Initial tasks involving service transactions embedded propositional-level confirmations in a compact transaction &quot;receipt,&quot; an approach that contributed to the simulation's clarity and speed. Finally, emphasis was placed on automating features to reduce attentional demand on the simulation assistant, which also contributed to the fast pace and low rate of technical errors in the present simulation.</Paragraph> </Section> <Section position="4" start_page="370" end_page="372" type="metho"> <SectionTitle> 2. SIMULATION METHOD </SectionTitle> <Paragraph position="0"> Basic simulation features for the studies completed to date are summarized below, and have been detailed elsewhere \[16\], although some adaptations to these specifications are in progress to accommodate planned research.</Paragraph> <Paragraph position="1"> 2.1. Procedure and Instructions Volunteer participants coming into the Computer Dialogue Laboratory at SRI are told that the research project aims to develop and test a new pen/voice system for use on future portable devices. To date, subjects have included a broad spectrum of white-collar professionals, excluding computer scientists. All participants so far have believed that the &quot;system&quot; was a fully functional one. Following each session, they are debriefed about the nature and rationale for conducting a simulation.</Paragraph> <Paragraph position="2"> During the study, subjects receive written instructions about how to enter information on an LCD tablet when writing, when speaking, and when free to use both modalities. When writing, they are told to handwrite information with the electronic stylus directly onto active areas on the tablet. They are free to print or write cursive. When speaking, subjects are instructed to tap and hold the stylus on active areas as they speak into the microphone. During free choice, people are completely free to use either modality in any way they wish. Participants also receive written instructions about how to use the system to complete realistic tasks, which currently focus on the broad class of service-oriented transactions (e.g., car rental reservations, personal banking, real estate selection). Then they practice several scenarios using spoken and written input until the system and the tasks are completely clear.</Paragraph> <Paragraph position="3"> People are encouraged to speak and write naturally.</Paragraph> <Paragraph position="4"> They are asked to complete the tasks according to instructions, while working at their own pace. Other than providing motivation to complete the tasks and specifying the input modality, an effort is made not to influence the specific manner in which subjects express themselves. They are encouraged to focus on completing the tasks and are told that, if their input cannot be processed for any reason, this will be clear immediately since the system will respond with ??? to prompt them to try again. Subjects are told how to remove or replace information as needed. Otherwise, they are told that input will be confirmed by the system on a transaction receipt, which they can monitor to check that their requests are being met (see next section for details). Of course, participants' input actually is received by an informed assistant, who performs the role of interpreting and responding as the system would.</Paragraph> <Paragraph position="5"> The simulation assistant is instructed to respond as accurately and rapidly as possible to any spoken or written information corresponding to predefined receipt fields.</Paragraph> <Paragraph position="6"> Essentially, the assistant tracks the subject's input, clicking with a mouse on predefined fields on a Sun SPARCstation to send confirmations back to the subject. Under some circumstances, the assistant is instructed to send a ??? prompt instead of a confirmation. For example, subjects receive ??? feedback when input is judged to be inaudible or illegible, when the subject forgets to supply task-critical information, or when input clearly is inappropriate, ambiguous, or underspecified. In general, however, the assistant is instructed to use ??? feedback sparingly in order to minimize intervention with people's natural tendencies to speak or write. If the subject commits a procedural error, such as forgetting to click before entering speech or attempting to enter information using the wrong modality, then the assistant is instructed not to respond until the subject recovers and correctly engages the system. The assistant's task is sufficiently automated that he or she is free to focus attention on monitoring the accuracy of incoming information, and on maintaining sufficient vigilance to respond promptly with confirmations.</Paragraph> <Section position="1" start_page="371" end_page="371" type="sub_section"> <SectionTitle> 2.2. Presentation Format </SectionTitle> <Paragraph position="0"> For studies completed to date, two different prompting techniques have been used to guide subjects' spoken and written input-- one unconstrained and one forms-based.</Paragraph> <Paragraph position="1"> In the relatively unconstrained presentation format, subjects must take the initiative to ask questions or state needs in one general workspace area. No specific system prompts direct their input. They simply continue providing information until their transaction receipt is completed, correctly reflecting their requests. In this case, guidance is provided primarily by the task itself and the receipt. When the presentation format is a form, labeled fields are used to elicit specific task content, for example: Car pickup locationl I. In this case, the interaction is more system-guided, and linguistic and layout cues are used to channel the content and order of people's language as they work.</Paragraph> <Paragraph position="2"> For other studies in which people work with visual information (e.g., graphic/cartographic tasks), different graphic dimensions of presentation format are manipulated. In all studies, the goal is to examine the impact of presentation format on people's language and performance as they either speak or write to a simulated system. As a more specific aim, assessments are being conducted of the extent to which different formats naturally constrain linguistic variability, resulting in opportunities for more robust natural language processing.</Paragraph> </Section> <Section position="2" start_page="371" end_page="371" type="sub_section"> <SectionTitle> 2.3. Conversational Feedback </SectionTitle> <Paragraph position="0"> With respect to system feedback, a conversational model of human-computer interaction was adopted. As a result, analogues are provided of human backchannel and propositional-level confirmations. These confirmations function the same for different input modalities and presentation formats. With respect to backchannel signals, subjects receive *** immediately following spoken input, and an electronic ink trace following written input.</Paragraph> <Paragraph position="1"> These confirmations are presented in the tablet's active area or a narrow &quot;confirmation panel&quot; just below it. Subjects are told that this feedback indicates that their input has been legible/audible and processable by the system, and that they should continue.</Paragraph> <Paragraph position="2"> In addition to this backchannel-level signal, subjects are told to verify that their requests are being met successfully by checking the content of the receipt at the bottom of the tablet. This receipt is designed to confirm all task-critical information supplied during the interaction, thereby providing propositional confirmations. It remains visible throughout the transaction, and is completed gradually as the interaction proceeds. Although the receipt varies for different tasks, its form and content remains the same for different modalities and presentation formats.</Paragraph> <Paragraph position="3"> Apart from confirmation feedback, the simulation also responds to people's questions and commands by transmitting textual and tabular feedback. For example, if a subject selects the car model that he or she wants and then says, &quot;Do you have infant seats?&quot; or &quot;Show me the car options,&quot; a brief table would be displayed in which available items like infant seats and car phones are listed along with their cost.</Paragraph> </Section> <Section position="3" start_page="371" end_page="372" type="sub_section"> <SectionTitle> 2.4. Automated Features </SectionTitle> <Paragraph position="0"> To simplify and speed up system responding, the correct receipt information associated with each task is preloaded for the set of tasks that a subject is to receive. A series of preprogrammed dependency relations between specified task-critical information and associated receipt fields is used to support the automation of propositional confirmations. As mentioned earlier, with this arrangement the assistant simply needs to click on certain predefined fields to send appropriate acknowledgments automatically as the subject gradually supplies relevant information. Of course, if the subject makes a performance error, the assistant must manually type and confirm the error that occurs. In such cases, however, canonical answers are maintained so that they can be confirmed quickly when people self-correct, which they tend to do over 50% of the time. The automated simulation strategy described above works well when research can take advantage of task scenarios that entail a limited set of correct answers.</Paragraph> <Paragraph position="1"> An additional automated feature of the present simulation technique is a &quot;random error generator,&quot; which is designed to ensure that subjects encounter at least a minimal level of simulated system errors, in part to support the credibility of the simulation. In this research, if subjects do not receive at least one ??? response from the system during a set of two tasks, then the simulation generates one. This results in a minimum baseline rate of one simulated error per 33 items of information supplied, or 3%, which in this research has been considered a relatively error-free environment. The simulated errors are distributed randomly across all task-critical information supplied for the set of tasks.</Paragraph> <Paragraph position="2"> The described method for organizing simulated response feedback was responsible in part for the fast pace of the present simulation. In studies conducted to date, response delays during the simulation have averaged 0.4 second between a subject's input and visible confirmation on the tablet receipt, with less than a 1-second delay in all conditions. The rate of technical errors in executing the assistant's role according to instructions has been low, averaging 0.05 such errors per task. Furthermore, any major error by the assistant would result in discarding that subject's data, which currently has been averaging 6% of subjects tested. The present simulation also appears to be adequately credible, since no participants to date have doubted that it was a fully functional system. As a result, no data has been discarded for this reason.</Paragraph> </Section> <Section position="4" start_page="372" end_page="372" type="sub_section"> <SectionTitle> 2.6. Simulation Environment </SectionTitle> <Paragraph position="0"> The computing equipment that supports this simulation technique includes two Sun workstations, one a SPARCstation 2, that are linked via ethernet. A Wacom HD648A integral transparent digitizing tablet/LCD display is interfaced to the SPARC 2 through a Vigra S-bus VGA card. An accompanying cordless digitizing pen is used for writing, clicking to speak, pointing, or otherwise operating the tablet. A Crown PCC 160 microphone transmits spoken input from the subject to the simulation assistant, who listens through a pair of stereo speakers from a remote location. The assistant also views an image of the subject working at the tablet, along with an image of all visible input and feedback occurring on the tablet.</Paragraph> <Paragraph position="1"> The user interface is based on the X-windows system, employing MIT Athena widgets. X-windows is used for its ability to display results on multiple screens, including the subject's tablet and the assistant's workstation, and because the resulting program runs on equipment from several manufacturers. Two aspects of the system architecture are designed for rapid interface adaptability. First, Widget Creation Language (WCL) enables nonprogrammers to alter the user interface layout. Second, a simple textual language and interpreter were created to enable declarative specification of widget behavior and interrelations. Some widget behavior also is written in the C programming language.</Paragraph> <Paragraph position="2"> Various modifications to the standard X-windows operation have been deployed to ensure adequate real-time responding needed for acceptable handwriting quality and speed. To avoid objectionable lag in the system's electronic ink echo, a high-performance workstation (i.e., Sun SPARCstation 2) is used to process the subject's input.</Paragraph> </Section> <Section position="5" start_page="372" end_page="372" type="sub_section"> <SectionTitle> 2.7. Data Capture </SectionTitle> <Paragraph position="0"> With respect to data collection, all human-computer interactions are videotaped for subsequent analysis.</Paragraph> <Paragraph position="1"> The recording is a side-by-side split-screen image, created using a JVC KM-1200U special-effects generator.</Paragraph> <Paragraph position="2"> Videotaping is conducted unobtrusively with a remote genlocked Panasonic WV-D5000 videocamera filming through a one-way mirror. Data capture includes a close-up of the subject working at the LCD tablet, and a real-time record of interactions on the tablet, including the subject's input, simulated feedback, and the gradually completed receipt. This image is recorded internally from the assistant's workstation, is processed through a Lyon Lamb scan converter, and then is merged using the special-effects generator and preserved on videotape for later analysis. In addition to being transmitted to the simulation assistant, the subject's speech is recorded and stored in analog form on a timecoded videotape, and later is transcribed for data analysis. All handwritten input is recorded on-line during real-time tablet interactions, which then is preserved on videotape and available for hardcopy printout.</Paragraph> </Section> </Section> <Section position="5" start_page="372" end_page="373" type="metho"> <SectionTitle> 3. RESEARCH DESIGN </SectionTitle> <Paragraph position="0"> In studies conducted at SRI to date, the experimental design usually has been a completely-crossed factoriM with repeated measures, or a within-subjects design. Primary factors of interest have included: (1) communication modality (speech-only, pen-only, combined pen/voice), and (2) presentation format (form-based, unconstrained). In a typical study, each subject completes a series of 12 tasks, two representing each of the six main conditions. The order of presenting conditions is counterbalanced across subjects.</Paragraph> <Paragraph position="1"> This generM design has been selected for its relative efficiency and power and, in particular, for its ability to control linguistic variability due to individual differences.</Paragraph> <Paragraph position="2"> In brief, for example, this design permits comparing how the same person completing the same tasks displays one type of language and performance while speaking, but then switches this language and performance when writing. null</Paragraph> </Section> class="xml-element"></Paper>