File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/06/p06-2051_relat.xml
Size: 4,692 bytes
Last Modified: 2025-10-06 14:15:57
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-2051"> <Title>Spontaneous Speech Understanding for Robust Multi-Modal Human-Robot Communication</Title> <Section position="4" start_page="0" end_page="391" type="relat"> <SectionTitle> 2 Related Work </SectionTitle> <Paragraph position="0"> Some of the most explored speech processing systems are telephone-based information systems.</Paragraph> <Paragraph position="1"> Their design rather differs from that of situated HRI. They are uni-modal so that every information has to be gathered from speech. However, speech input is different as users utter longer phrases which are generally grammatically correct. These systems are often based on a large corpus and can therefore be well trained to perform satisfactory speech recognition results. A prominent example for this is the telephone based weather forecast information service JUPITER (Zue et al., 2000).</Paragraph> <Paragraph position="2"> Over the past years interest increased in mobile robot applications where the challenges are even more complex. While many of these problems (person tracking, attention, path nding) are already in the focus of research, robust speech understanding has not yet been extensively explored in the context of HRI. Moreover, interpretation of situated dialogs in combination with additional knowledge sources is rarely considered. Recent projects with related scope are the mobile robots CARL (Lopes et al., 2005) and ALBERT (Rogalla et al., 2002), and the robotic chandelier Elvis (Juster and Roy, 2004). The main task of the robot CARL is robust language understanding in context of knowledge acquisition and management.</Paragraph> <Paragraph position="3"> It combines deep and shallow parsing to achieve robustness. ALBERT is designed to understand speech commands in combination with gestures and object detection with the task to handle dishes. The home lighting robot Elvis gets instructions about lighting preferences of a user via speech and gestural input. The robot itself has a xed position but the user may walk around in the entire room.</Paragraph> <Paragraph position="4"> It uses keyword spotting to analyze the semantic content of speech. As speech recognition in such robot scenarios is a complex and dif cult task, in these systems the speech understanding analysis is constrained to a small set of commands and not oriented towards spontaneous speech. However, deep speech understanding is necessary for more complex human robot interaction.</Paragraph> <Paragraph position="5"> There is only little research in semantic speech analysis of spontaneous speech. A widely used approach of interpreting sentences is the idea of case grammar (Bruce, 1975). Each verb has a set of named slots, that can be lled by other slots, typically nouns. Syntactic case information of words inside a sentence marks the semantic roles and thus, the corresponding slots can be lled. Another approach of processing spontaneous speech by using semantic information for the Air Travel Information Service (ATIS) task is implemented in the Phoenix system (Ward, 1994). Slots in frames represent the basic semantic entities known to the system. A parser using semantic grammars maps input onto these frame representations. The idea of our approach is similar to that of the Phoenix system, in that we also use semantic entities for extracting information. Much effort has been made in the eld of parsing strategies combined with semantic information. These systems support preferably task oriented dialog systems, e.g., the ATIS task as in (Popescu et al., 2004) and (Milward, 2000), or virtual world scenarios (Gorniak and Roy, 2005), which do not have to deal with uncertain visual input. The aim of the FrameNet project (Fillmore and Baker, 2001) is to create a lexicon resource for English, where every entry receives a semantic frame description.</Paragraph> <Paragraph position="6"> In contrast to other presented approaches we focus on deep semantic analysis of situated spontaneous speech.Written language applications have the advantage to be trainable on large corpora, which is not the case for situated speech based applications. And furthermore, interpretation of situated speech depends on environmental information. Utterances in this context are normally less complex, still our approach is based on a lexicon that allows a broad variety of utterances. It also takes speech recognition problems into account by ignoring non-consistent word hypotheses and scoring interpretations according to their semantic completeness. By adding pragmatic information, natural dialog processing is facilitated.</Paragraph> </Section> class="xml-element"></Paper>