File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-3701_intro.xml
Size: 6,626 bytes
Last Modified: 2025-10-06 14:04:15
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-3701"> <Title>Usability Issues in an Interactive Speech-to-Speech Translation System for Healthcare</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Increasing globalization and immigration have led to growing demands on US institutions for health-care and government services in languages other than English. These institutions are already overwhelmed: the State of Minnesota, for example, had no Somali-speaking physicians for some 12,000 Somali refugees and only six Hmongspeaking physicians to serve 50,000 Hmong residents (Minnesota Interpreter Standards Advisory Committee, 1998). San Francisco General Hospital, to cite another example, receives approximately 3,500 requests for interpretation per month, or 42,000 per year for 35 different languages.</Paragraph> <Paragraph position="1"> Moreover, requests for medical interpretation services are distributed among all the wards and clinics, adding a logistical challenge to the problem of a high and growing demand for interpretation services (Paras, et al., 2002). Similar situations are found throughout the United States.</Paragraph> <Paragraph position="2"> It is natural to hope that automatic real-time translation in general, and spoken language translation (SLT) in particular, can help to meet this communicative need. From the viewpoint of research and development, the high demand in healthcare makes this area especially attractive for fielding early SLT systems and seeking early adopters.</Paragraph> <Paragraph position="3"> With this goal in view, several speech translation systems have aimed at the healthcare area.</Paragraph> <Paragraph position="4"> (See www.sehda.com, DARPA's CAST program, www.phraselator.com, etc.) However, these efforts have encountered several issues or limitations.</Paragraph> <Paragraph position="5"> First, they have been confined to narrow domains. In general, SLT applications have been able to achieve acceptable accuracy only by staying within restricted topics, in which fixed phrases could be used (e.g., www.phraselator.com), or in which grammars for automatic speech recognition (ASR) and machine translation (MT) could be optimized. For example, MedSLT (Bouillon et al, 2005) is limited to some 600 specific words per sub-domain. IBM's MASTOR system, with 30,000 words in each translation direction, has much broader coverage, but remains comparable in lexicon size to commercial MT systems of the early 1980s.</Paragraph> <Paragraph position="6"> Granted, restriction to narrow domains may often be appropriate, given the large effort involved in compiling extensive lexical resources and the time required for deployment. A tightly focused approach permits relatively quick development of new systems and provides a degree of flexibility to experiment with different architectures and different languages.</Paragraph> <Paragraph position="7"> Our emphasis, however, is on breaking out of narrow domains. We seek to maximize versatility by providing exceptional capacity to move from topic to topic while maintaining adequate accuracy. null To provide a firm foundation for such versatility, we &quot;give our systems a liberal arts education&quot; by incorporating very broad-coverage ASR and MT technology. Our MT lexicons, for example, contain roughly 300,000 words in each direction.</Paragraph> <Paragraph position="8"> But of course, as coverage increases, perplexity and the ASR and MT errors due to it increase in proportion, especially in the absence of tight integration between these components. To compensate, we provide a set of facilities that enable users from both sides of the language barrier to interactively monitor and correct these errors. Putting users in the speech translation loop in this way does in fact permit conversations to range widely (Seligman, 2000). We believe that this highly interactive approach will prove applicable to the healthcare area.</Paragraph> <Paragraph position="9"> We have described these interactive techniques in (Dillinger and Seligman, 2004; Zong and Seligman, forthcoming). We will review them only briefly here, in Section 2.</Paragraph> <Paragraph position="10"> A second limitation of current speech translation systems for healthcare is that bilingual (bidirectional) communication has been difficult to enable. While speech-to-speech translation has sometimes proven practical from the English side, translation from the non-English side has been more difficult to achieve. Partly, this limitation arises from human factors issues: while naive observers might expect spoken input to be effortless for anyone who can talk, the reality is that users must learn to use most speech interfaces, and that this learning process can be difficult for users who are less literate or less computer literate. Further, many healthcare venues make speech input difficult: they may be noisy, microphones may be awkward to situate or to pass from speaker to speaker, and so on.</Paragraph> <Paragraph position="11"> Our group's approach to training- or venuerelated difficulties for speech input is to provide an array of alternative input modes. In addition to providing input through dictated speech, users of our system can freely alternate among three other input modes, using handwriting, a touch screen, and standard bilingual keyboards.</Paragraph> <Paragraph position="12"> In this paper, we will focus on practical usability issues in the design of user interfaces for highly interactive approaches to SLT in healthcare applications. With respect to interactivity per se, we will discuss the following specific issues: * In a highly interactive speech translation system, monitoring and correction of ASR and MT are vital for accuracy and confidence, but can be time consuming - in a field where time is always at a premium.</Paragraph> <Paragraph position="13"> * Interactivity demands a minimum degree of computer and print literacy, which some patients may lack.</Paragraph> <Paragraph position="14"> To address these issues, we have developed a facility called Translation Shortcuts(TM), to be explained throughout Section 3.</Paragraph> <Paragraph position="15"> Section 4 will describe our approach to multi-modal input. As background, however, Section 2 will quickly review our approach to highly interactive - and thus uniquely broad-coverage - spoken language translation. Before concluding, we will in Section 5 point out planned future developments.</Paragraph> </Section> class="xml-element"></Paper>