XML Viewer - w06-3701

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-3701_metho.xml
Size: 18,612 bytes
Last Modified: 2025-10-06 14:10:59
<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-3701">
  <Title>Usability Issues in an Interactive Speech-to-Speech Translation System for Healthcare</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Highly Interactive, Broad-coverage SLT
</SectionTitle>
    <Paragraph position="0"> We now briefly summarize our group's approach to highly interactive, broad-coverage SLT.</Paragraph>
    <Paragraph position="1"> The twin goals of accuracy and broad-coverage have generally been in opposition: speech translation systems have gained tolerable accuracy only by sharply restricting both the range of topics that can be discussed and the sets of vocabulary and structures that can be used to discuss them. The essential problem is that both speech recognition and translation technologies are still quite errorprone. While the error rates may be tolerable when each technology is used separately, the errors combine and even compound when they are used together. The resulting translation output is generally below the threshold of usability - unless restriction to a very narrow domain supplies sufficient constraints to significantly lower the error rates of both components.</Paragraph>
    <Paragraph position="2"> As explained, our group's approach has been to concentrate on interactive monitoring and correction of both technologies.</Paragraph>
    <Paragraph position="3"> First, users can monitor and correct the speaker-dependent speech recognition system to ensure that the text that will be passed to the machine translation component is completely correct. Voice commands (e.g. Scratch That or Correct &lt;incorrect text&gt;) can be used to repair speech recognition errors. Thus, users of our SLT enrich the interface between ASR and MT.</Paragraph>
    <Paragraph position="4"> Next, during the MT stage, users can monitor, and if necessary correct, one especially important aspect of the translation - lexical disambiguation.</Paragraph>
    <Paragraph position="5"> Our system's approach to lexical disambiguation is twofold: first, we supply a Back-Translation, or re-translation of the translation. Using this paraphrase of the initial input, even a monolingual user can make an initial judgment concerning the quality of the preliminary machine translation output. (Other systems, e.g. IBM's MASTOR, have also employed re-translation. Our implementations, however, exploit proprietary technologies to ensure that the lexical senses used during back translation accurately reflect those used in forward translation.) In addition, if uncertainty remains about the correctness of a given word sense, we supply a proprietary set of Meaning Cues(TM) - synonyms, definitions, etc. - which have been drawn from various resources, collated in a database (called SELECT(TM)), and aligned with the respective lexica of the relevant MT systems. With these cues as guides, the user can monitor the current, proposed meaning and select (when necessary) a different, preferred meaning from among those available.</Paragraph>
    <Paragraph position="6"> Automatic updates of translation and back translation then follow. Future versions of the system will allow personal word-sense preferences thus specified in the current session to be stored and reused in future sessions, thus enabling a gradual tuning of word-sense preferences to individual needs. Facilities will also be provided for sharing such preferences across a working group.</Paragraph>
    <Paragraph position="7"> Given such interactive correction of both ASR and MT, wide-ranging, and even jocular, exchanges become possible (Seligman, 2000).</Paragraph>
    <Paragraph position="8"> As we have said, such interactivity within a speech translation system can enable increased accuracy and confidence, even for wide-ranging conversations.</Paragraph>
    <Paragraph position="9"> Accuracy of translation is, in many healthcare settings, critical to patient safety. When a doctor is taking a patient's history or instructing the patient in a course of treatment, even small errors can have clinically relevant effects. Even so, at present, healthcare workers often examine patients and instruct them in a course of treatment through gestures and sheer good will, with no translation at all, or use untrained human interpreters (friends, family, volunteers, or staff) in an error-prone attempt to solve the immediate problem (Flores, et al., 2003). As a result, low-English proficiency patients are often less healthy and receive less effective treatment than English speakers (Paras, et al., 2002). We hope to demonstrate that highly interactive real-time translation systems in general, and speech translation systems in particular, can help to bridge the language gap in healthcare when human interpreters are not available.</Paragraph>
    <Paragraph position="10"> Accuracy in an automatic real-time translation system is necessary, but not sufficient. If health-care workers have no means to independently assess the reliability of the translations obtained, practical use of the system will remain limited.</Paragraph>
    <Paragraph position="11"> Highly interactive speech translation systems can foster the confidence on both sides of the conversation, which is necessary to bring such systems into wide use. In fact, in this respect at least, they may sometimes prove superior to human interpreters, who normally do not provide clients with the means for judging translation accuracy.</Paragraph>
    <Paragraph position="12"> The value of enabling breadth of coverage, as well as accuracy and confidence, should also be clear: for many purposes, the system must be able to translate a wide range of topics outside of the immediate healthcare domain - for example, when a patient tries to describe what was going on when an accident occurred. The ability to ask about interests, family matters, and other life concerns is vital for establishing rapport, managing expectations and emotions, etc.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Translation Shortcuts
</SectionTitle>
    <Paragraph position="0"> Having summarized our approach to highly interactive speech translation, we now turn to examination of practical interface issues for this class of SLT system. This section concentrates on Translation Shortcuts(TM).</Paragraph>
    <Paragraph position="1"> Shortcuts are designed to provide two main advantages: null First, re-verification of a given utterance is unnecessary. That is, once the translation of an utterance has been verified interactively, it can be saved for later reuse, simply by activating a Save as Shortcut button on the translation verification screen. The button gives access to a dialogue in which a convenient Shortcut Category for the Shortcut can be selected or created. At reuse time, no further verification will be required. (In addition to such dynamically created Personal Shortcuts, any number of prepackaged Shared Shortcuts can be included in the system.) Second, access to stored Shortcuts is very quick, with little or no need for text entry. Several facilities contribute to meeting this design criterion. null * A Shortcut Search facility can retrieve a set of relevant Shortcuts given only keywords or the first few characters or words of a string. The desired Shortcut can then be executed with a single gesture (mouse click or stylus tap) or voice command. null NOTE: If no Shortcut is found, the system automatically allows users access to the full power of broad-coverage, interactive speech translation. Thus, a seamless transition is provided between the Shortcuts facility and full, broad-coverage translation. null * A Translation Shortcuts Browser is provided, so that users can find needed Shortcuts by traversing a tree of Shortcut categories. Using this interface, users can execute Shortcuts even if their ability to input text is quite limited, e.g. by tapping or clicking alone.</Paragraph>
    <Paragraph position="2"> Figure 1 shows the Shortcut Search and Short-cuts Browser facilities in use. Points to notice: * On the left, the Translation Shortcuts Panel has slid into view and been pinned open. It contains the Translation Shortcuts Browser, split into two main areas, Shortcuts Categories (above) and Shortcuts List (below).</Paragraph>
    <Paragraph position="3"> * The Categories section of the Panel shows current selection of the Conversation category, containing everyday expressions, and its Staff subcategory, containing expressions most likely to be used by healthcare staff members. There is also a Patients subcategory, used for patient responses.</Paragraph>
    <Paragraph position="4"> Categories for Administrative topics and Pa- null tient's Current Condition are also visible; and new ones can be freely created.</Paragraph>
    <Paragraph position="5"> * Below the Categories section is the Short-cuts List section, containing a scrollable list of alphabetized Shortcuts. (Various other sorting criteria will be available in the future, e.g. sorting by frequency of use, recency, etc.) * Double clicking on any visible Shortcut in  the List will execute it. Clicking once will select and highlight a Shortcut. Typing Enter will execute the currently highlighted Shortcut (here &amp;quot;Good morning&amp;quot;), if any.</Paragraph>
    <Paragraph position="6"> * It is possible to automatically relate options for a patient's response to the previous staff member's utterance, e.g. by automatically going to the sibling Patient subcategory if the prompt was given from the Staff subcategory.</Paragraph>
    <Paragraph position="7"> Because the Shortcuts Browser can be used without text entry, simply by pointing and clicking, it enables responses by minimally literate users. In the future, we plan to enable use even by completely illiterate users, through two devices: we will enable automatic pronunciation of Shortcuts and categories in the Shortcuts Browser via text-tospeech, so that these elements can in effect be read aloud to illiterate users; and we will augment Shared Shortcuts with pictorial symbols, as clues to their meaning.</Paragraph>
    <Paragraph position="8"> A final point concerning the Shortcuts Browser: it can be operated entirely by voice commands, although this mode is more likely to be useful to staff members than to patients.</Paragraph>
    <Paragraph position="9"> We turn our attention now to the Input Window, which does double duty for Shortcut Search and arbitrary text entry for full translation. We will consider the search facility first, as shown in Figure 2.</Paragraph>
    <Paragraph position="10"> * Shortcuts Search begins automatically as soon as text is entered by any means - voice, handwriting, touch screen, or standard keyboard into the Input Window.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
* The Shortcuts Drop-down Menu appears
</SectionTitle>
    <Paragraph position="0"> just below the Input Window, as soon as there are results to be shown. The user has entered &amp;quot;Good&amp;quot; and a space, so the search program has received its first input word. The drop-down menu shows the results of a keyword-based search.</Paragraph>
    <Paragraph position="1">  show that this is in fact a Shortcut, so that verification will not be necessary. However, final text not matching a Shortcut, e.g. &amp;quot;Good job,&amp;quot; will be passed to the routines for full translation with verification. null</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Multimodal input
</SectionTitle>
    <Paragraph position="0"> As mentioned, an unavoidable issue for speech translation systems in healthcare settings is that speech input is not appropriate for every situation.</Paragraph>
    <Paragraph position="1"> Current speech-recognition systems are unfamiliar for many users. Our system attempts to overcome this training issue to some extent by incorporating standard commercial-grade dictation systems for broad-coverage and ergonomic speech recognition. These products already have established user bases in the healthcare community.</Paragraph>
    <Paragraph position="2"> Even so, some training may be required: optional generic Guest profiles are supplied by our system for male and female voices in both languages; but optional voice enrollment, requiring five minutes or so, is helpful to achieve best results. Such training time is practical for healthcare staff, but will be realistic for patients only when they are repeat visitors, hospital-stay patients, etc.</Paragraph>
    <Paragraph position="3"> As mentioned, other practical usability issues for the use of speech input in healthcare settings include problems of ambient noise (e.g. in emergency rooms or ambulances) and problems of microphone and computer arrangement (e.g. to accommodate not only desktops but counters or service windows which may form a barrier between staff and patient).</Paragraph>
    <Paragraph position="4"> To deal with these and other usability issues, we have found it necessary to provide a range of input modes: in addition to dictated speech, we enable handwritten input, the use of touch screen keyboards for text input, and the use of standard keyboards. All of these input modes must be completely bilingual, and language switching must be arranged automatically when there is a change of active participant. Further, it must be possible to change input modes seamlessly within a given utterance: for example, users must be able to dictate the input if they wish, but then be able to make corrections using handwriting or one of the remaining two modes. Figure 3 shows such seamless bilingual operation: the user has dictated the sentence &amp;quot;Tengo nauseas&amp;quot; in Spanish, but there was a speech-recognition error, which is being corrected by handwriting.</Paragraph>
    <Paragraph position="5"> Of course, even this flexible range of input options does not solve all problems. As mentioned, illiterate patients pose special problems. Again, naive users tend to suppose that speech is the ideal input mode for illiterates. Unfortunately, however, the careful and relatively concise style of speech that is required for automatic recognition is often difficult to elicit, so that recognition accuracy remains low; and the ability to read and correct the results is obviously absent. Just as obviously, the remaining three text input modes will be equally ineffectual for illiterates.</Paragraph>
    <Paragraph position="6"> As explained, our current approach to low literacy is to supply Translation Shortcuts for the minimally literate, and - in the future - to augment Shortcuts with text-to-speech and iconic pictures.</Paragraph>
    <Paragraph position="7"> Staff members will usually be at least minimally literate, but they present their own usability issues.</Paragraph>
    <Paragraph position="8"> Their typing skills may be low or absent. Handling the computer and/or microphone may be awkward in many situations, e.g. when examining a patient or taking notes. (Speech translation systems are expected to function in a wide range of physical settings: in admissions or financial aid offices, at massage tables for physical therapy with patients lying face down, in personal living rooms for home therapy or interviews, and in many other locations.) To help deal with the awkwardness issues, our system provides voice commands, which enable hands-free operation. Both full interactive translation and the Translation Shortcut facility (using either the Browser or Search elements) can be run hands-free. To a limited degree, the system can be used eyes-free as well: text-to-speech can be used to pronounce the back-translation so that preliminary judgments of translation quality can be made without looking at the computer screen.</Paragraph>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Future developments
</SectionTitle>
    <Paragraph position="0"> We have already mentioned plans to augment the Translation Shortcuts facility with text-to-speech and iconic pictures, thus moving closer to a system suitable for communication with completely illiterate or incapacitated patients.</Paragraph>
    <Paragraph position="1"> Additional future directions follow.</Paragraph>
    <Paragraph position="2"> * Server-based architectures: We plan to move toward completely or partially server-based arrangements, in which only a very thin client software application - for example, a web interface - will run on the client device. Such architectures will permit delivery of our system on smart phones in the Blackberry or Treo class. Delivery on handhelds will considerably diminish the issues of physical awkwardness discussed above, and anytime/anywhere/any-device access to the system will considerably enlarge its range of uses.</Paragraph>
    <Paragraph position="3"> * Pooling Translation Shortcuts: As explained above, the current system now supports both Personal (do-it-yourself) and Shared (prepackaged) Translation Shortcuts. As yet, however, there are no facilities to facilitate pooling of Personal Shortcuts among users, e.g. those in a working group. In the future, we will add facilities for exporting and importing shortcuts.</Paragraph>
    <Paragraph position="4"> * Translation memory: Translation Short-cuts can be seen as a variant of Translation Memory, a facility that remembers past successful translations so as to circumvent error-prone reprocessing. However, at present, we save Shortcuts only when explicitly ordered. If all other successful translations were saved, there would soon be far too many to navigate effectively in the Translation Shortcuts Browser. In the future, however, we could in fact record these translations in the background, so that there would be no need to re-verify new input that matched against them. Messages would advise the user that verification was being bypassed in case of a match.</Paragraph>
    <Paragraph position="5"> * Additional languages: The full SLT system described here is presently operational only for bidirectional translation between English and Spanish. We expect to expand the system to Mandarin Chinese next. Limited working prototypes now exist for Japanese and German, though we expect these languages to be most useful in application fields other than healthcare.</Paragraph>
    <Paragraph position="6"> * Testing: Systematic usability testing of the full system is under way. We look forward to presenting the results at a future workshop.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML