XML Viewer - p03-1032

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/p03-1032_intro.xml
Size: 8,386 bytes
Last Modified: 2025-10-06 14:01:47
<?xml version="1.0" standalone="yes"?>
<Paper uid="P03-1032">
  <Title>Extracting Key Semantic Terms from Chinese Speech Query for Web Searches</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> We are entering an information era, where information has become one of the major resources in our daily activities. With its wide spread adoption, Internet has become the largest information wealth for all to share. Currently, most (Chinese) search engines can only support term-based information retrieval, where the users are required to enter the queries directly through keyboards in front of the computer. However, there is a large segment of population in China and the rest of the world who are illiterate and do not have the skills to use the computer. They are thus unable to take advantage of the vast amount of freely available information.</Paragraph>
    <Paragraph position="1"> Since almost every person can speak and understand spoken language, the research on &amp;quot;(Chinese) natural language speech query retrieval&amp;quot; would enable average persons to access information using the current search engines without the need to learn special computer skills or training. They can simply access the search engine using common devices that they are familiar with such as the telephone, PDA and so on.</Paragraph>
    <Paragraph position="2"> In order to implement a speech-based information retrieval system, one of the most important challenges is how to obtain the correct query terms from the spoken natural language query that convey the main semantics of the query. This requires the integration of natural language query processing and speech recognition research.</Paragraph>
    <Paragraph position="3"> Natural language query processing has been an active area of research for many years and many techniques have been developed (Jacobs and Rau1993; Kupie, 1993; Strzalkowski, 1999; Yu et al, 1999). Most of these techniques, however, focus only on written language, with few devoted to the study of spoken language query processing.</Paragraph>
    <Paragraph position="4"> Speech recognition involves the conversion of acoustic speech signals to a stream of text. Because of the complexity of human vocal tract, the speech signals being observed are different, even for multiple utterances of the same sequence of words by the same person (Lee et al 1996). Furthermore, the speech signals can be influenced by the differences across different speakers, dialects, transmission distortions, and speaking environments. These have contributed to the noise and variability of speech signals. As one of the main sources of errors in Chinese speech recognition come from substitution (Wang 2002; Zhou 1997), in which a wrong but similar sounding term is used in place of the correct term, confusion matrix has been used to record confused sound pairs in an attempt to eliminate this error. Confusion matrix has been employed effectively in spoken document retrieval (Singhal et al, 1999 and Srinivasan et al 2000) and to minimize speech recognition errors (Shen et al, 1998). However, when such method is used directly to correct speech recognition errors, it tends to bring in too many irrelevant terms (Ng 2000).</Paragraph>
    <Paragraph position="5"> Because important terms in a long document are often repeated several times, there is a good chance that such terms will be correctly recognized at least once by a speech recognition engine with a reasonable level of word recognition rate. Many spoken document retrieval (SDR) systems took advantage of this fact in reducing the speech recognition and matching errors (Meng et al 2001; Wang et al 2001; Chen et al 2001). In contrast to SDR, very little work has been done on Chinese spoken query processing (SQP), which is the use of spoken queries to retrieval textual documents. Moreover, spoken queries in SQP tend to be very short with few repeated terms.</Paragraph>
    <Paragraph position="6"> In this paper, we aim to integrate the spoken language and natural language research to process spoken queries with speech recognition errors. The main contribution of this research is in devising a divide-and-conquer strategy to alleviate the speech recognition errors. It first employs the Chinese query model to isolate the Core Semantic String (CSS) that conveys the semantics of the spoken query. It then breaks the CSS into basic components corresponding to phrases, and uses a multi-tier strategy to map the basic components to known phrases in a dictionary in order to further eliminate the errors.</Paragraph>
    <Paragraph position="7"> In the rest of this paper, an overview of the proposed approach is introduced in Section 2. Section 3 describes the query model, while Section 4 outlines the use of multi-tier approach to eliminate errors in CSS. Section 5 discusses the experimental setup and results. Finally, Section 6 contains our concluding remarks.</Paragraph>
    <Paragraph position="8"> 2 Overview of the proposed approach There are many challenges in supporting surfing of Web by speech queries. One of the main challenges is that the current speech recognition technology is not very good, especially for average users that do not have any speech trainings. For such unlimited user group, the speech recognition engine could achieve an accuracy of less than 50%. Because of this, the key phrases we derived from the speech query could be in error or missing the main semantic of the query altogether. This would affect the effectiveness of the resulting system tremendously.</Paragraph>
    <Paragraph position="9"> Given the speech-to-text output with errors, the key issue is on how to analyze the query in order to grasp the Core Semantic String (CSS) as accurately as possible. CSS is defined as the key term sequence in the query that conveys the main semantics of the query. For example, given the query:</Paragraph>
    <Paragraph position="11"> me the information on how the U.S. separates the most-favored-nation status from human rights issue in china). The CSS in the query is underlined.</Paragraph>
    <Paragraph position="12"> We can segment the CSS into several basic components that correspond to key concepts such as: a11a42a12 (U.S.), a15a16a12 (China), a18a16a20a2a21a43a22 (human rights issue),a33a23a34a13a12a32a35a9a36 (the most-favored-nation status) anda37a6a38 (separate).</Paragraph>
    <Paragraph position="13"> Because of the difficulty in handling speech recognition errors involving multiple segments of CSSs, we limit our research to queries that contain only one CSS string. However, we allow a CSS to include multiple basic components as depicted in the above example. This is reasonable as most queries posed by the users on the Web tend to be short with only a few characters (Pu 2000).</Paragraph>
    <Paragraph position="14"> Thus the accurate extraction of CSS and its separation into basic components is essential to alleviate the speech recognition errors. First of all, isolating CSS from the rest of speech enables us to ignore errors in other parts of speech, such as the greetings and polite remarks, which have no effects on the outcome of the query. Second, by separating the CSS into basic components, we can limit the propagation of errors, and employ the set of known phrases in the domain to help correct the errors in these components separately.</Paragraph>
    <Paragraph position="15">  To achieve this, we process the query in three main stages as illustrated in Figure 1. First, given the user's oral query, the system uses a speech recognition engine to convert the speech to text. Second, we analyze the query using a query model (QM) to extract CSS from the query with minimum errors. QM defines the structures and some of the standard phrases used in typical queries.</Paragraph>
    <Paragraph position="16"> Third, we divide the CSS into basic components, and employ a multi-tier approach to match the ba- null sic components to the nearest known phrases in order to correct the speech recognition errors. The aim here is to improve recall without excessive lost in precision. The resulting key components are then used as query to standard search engine.</Paragraph>
    <Paragraph position="17"> The following sections describe the details of our approach.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML