File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/n04-3011_intro.xml
Size: 2,930 bytes
Last Modified: 2025-10-06 14:02:18
<?xml version="1.0" standalone="yes"?> <Paper uid="N04-3011"> <Title>Use and Acquisition of Semantic Language Model</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Any spoken language understanding system must deal with two critical issues: how to accurately infer user's intention from speech, and how to do it robustly amidst the prevalent spontaneous speech effects where users would inevitably stutter, hesitate, and self correct themselves on a regular basis. To address these issues, it has been proposed (Miller et al., 1994; Wang, 2000; Esteve et al., 2003) that one can extend the statistical pattern recognition framework commonly used for automatic speech recognition (ASR) to the spoken language understanding (SLU) problem. The &quot;pattern&quot; to be recognized for ASR is a string of word, and for SLU, a tree of semantic objects that represent the domain entities and tasks that describe the user's intention. As is the case for ASR where a language model plays the pivotal role in guiding the recognizer to compose plausible string hypotheses, a pattern recognition based SLU relies on what is often called the semantic language model (SLM) to detect semantic objects and construct a parse tree from the user's utterance. Because the end outcome is a parse tree, SLM is usually realized using the structured language model techniques so that the semantic structure of the utterance can be included in modeling the language (Wang, 2000; Erdogan et al., 2002).</Paragraph> <Paragraph position="1"> In this article, we describe an application of SLM in the semantic synchronous understanding (SSU) framework for multimodal conversational systems. A key idea of SSU is to immediately recognize and parse user's utterance, accepting only speech segments conforming to the prediction of SLM while the user is still speaking.</Paragraph> <Paragraph position="2"> Since the SLM can be updated in real-time during the course of interaction, irrelevant expressions, including the spontaneous speech, can be gracefully rejected based on what makes sense to the dialog context. In Sec.</Paragraph> <Paragraph position="3"> 2, we describe a study on the efficacy of SSU for a mobile personal information management (PIM) application called MiPad (Huang et al., 2000). The SLM used there was manually derived with combined CFG and N-gram (Microsoft, 1999; Wang, 2002) by consulting the structure of the PIM back end without any user data.</Paragraph> <Paragraph position="4"> Obviously, the linguistic coverage of the SLM can be further improved with modern data-driven learning techniques. In Sec. 3, we describe one such learning technique that can utilize the manually crafted model as a bootstrapping template to enrich the SLM when suitable amount of training data become available.</Paragraph> </Section> class="xml-element"></Paper>