File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/h01-1073_metho.xml

Size: 8,158 bytes

Last Modified: 2025-10-06 14:07:34

<?xml version="1.0" standalone="yes"?>
<Paper uid="H01-1073">
  <Title>[6] K. Hacioglu, W. Ward, &amp;quot;Dialog-Context Dependent Language Modeling Using N-Grams and Stochastic Context-Free Grammars&amp;quot;,</Title>
  <Section position="4" start_page="2" end_page="2" type="metho">
    <SectionTitle>
3. CU MOVE
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
3.1 Task Overview
</SectionTitle>
      <Paragraph position="0"> The &amp;quot;CU Move&amp;quot; system represents our work towards achieving graceful human-computer interaction in automobile environments. Initially, we have considered the task of vehicle route planning and navigation. As our work progresses, we will expand our dialog system to new tasks such as information retrieval and summarization and multimedia access.</Paragraph>
      <Paragraph position="1"> The problem of voice dialog within vehicle environments offers some important speech research challenges. Speech recognition in car environments is in general fragile, with word-error-rates (WER) ranging from 30-65% depending on driving conditions.</Paragraph>
      <Paragraph position="2"> These changing environmental conditions include speaker changes (task stress, emotion, Lombard effect, etc.) as well as the acoustic environment (road/wind noise from windows, air conditioning, engine noise, exterior traffic, etc.).</Paragraph>
      <Paragraph position="3"> In developing the CU-Move system [13,14], there are a number of research challenges that must be overcome to achieve reliable and natural voice interaction within the car environment. Since the speaker is performing a task (driving the vehicle), the driver will experience a measured level of user task stress and therefore this should be included in the speaker-modeling phase. Previous studies have clearly shown that the effects of speaker stress and Lombard effect can cause speech recognition systems to fail rapidly. In addition, microphone type and placement for in-vehicle speech collection can impact the level of acoustic background noise and speech recognition performance.</Paragraph>
    </Section>
    <Section position="2" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
3.2 Signal Processing
</SectionTitle>
      <Paragraph position="0"> Our research for robust recognition in automobile environments is concentrated on development of an intelligent microphone array. Here, we employ a Gaussian Mixture Model (GMM) based environmental classification scheme to characterize the noise conditions in the automobile. By integrating an environmental classification system into the microphone array design, decisions can be made as to how best to utilize a noiseadaptive frequency-partitioned iterative enhancement algorithm [15,16] or model-based adaptation algorithms [17,18] during decoding to optimize speech recognition accuracy on the beamformed signal.</Paragraph>
    </Section>
    <Section position="3" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
3.3 Data Collection
</SectionTitle>
      <Paragraph position="0"> A five-channel microphone array was constructed using Knowles microphones and a multi-channel data recorder housing built (Fostex) for in-vehicle data collection. An additional reference microphone is situated behind the driver's seat. Fig. 3 shows the  Fostex multi-channel data recorder (right).</Paragraph>
      <Paragraph position="1"> As part of the CU-Move system formulation, a two phase data collection plan has been initiated. Phase I focuses on collecting acoustic noise and probe speech from a variety of cars and driving conditions. Phase II focuses on a extensive speaker collection across multiple U.S. sites. A total of eight vehicles have been selected for acoustic noise analysis. These include the following: a compact car, minivan, cargo van, sport utility vehicle (SUV), compact and full size trucks, sports car, full size luxury car. A fixed 10 mile route through Boulder, CO was used for Phase I data collection. The route consisted of city (25 &amp; 45mph) and highway driving (45 &amp; 65mph). The route included stop-and-go traffic, and prescribed locations where driver/passenger windows, turn signals, wiper blades, air conditioning were operated. Each data collection run per car lasted approximately 35-45 minutes. A detailed acoustic analysis of Phase I data can be found in [13]. Our plan is to begin Phase II speech/dialogue data collection during spring 2001, which will include (i) phonetically balanced utterances, (ii) task-specific vocabularies, (iii) natural extemporaneous speech, and (iv) human-to-human and Wizard-of-Oz (WOZ) interaction with CU-Communicator and CU-Move dialog systems.</Paragraph>
    </Section>
    <Section position="4" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
3.4 Prototype Dialog System
</SectionTitle>
      <Paragraph position="0"> Finally, we have developed a prototype dialog system for data collection in the car environment. The dialog system is based on the MIT Galaxy-II Hub architecture with base system components derived from the CU Communicator system [1].</Paragraph>
      <Paragraph position="1"> Users interacting with the dialog system can enter their origin and destination address by voice. Currently, 1107 street names for Boulder, CO area are modeled. The system can resolve street addresses by business name via interaction with an Internet telephone book. This allows users to ask more natural route queries (e.g., &amp;quot;I need an auto repair shop&amp;quot;, or &amp;quot;I need to get to the Boulder Marriott&amp;quot;). The dialog system automatically retrieves the driving instructions from the Internet using an online WWW route direction provider. Once downloaded, the driving directions are queried locally from an SQL database. During interaction, users mark their location on the route by providing spoken odometer readings. Odometer readings are needed since GPS information has not yet been integrated into the prototype dialog system. Given the odometer reading of the vehicle as an estimate of position, route information such as turn descriptions, distances, and summaries can be queried during travel (e.g., &amp;quot;What's my next turn&amp;quot;, &amp;quot;How far is it&amp;quot;, etc.).</Paragraph>
      <Paragraph position="2"> The prototype system uses the CMU Sphinx-II speech recognizer with cellular telephone acoustic models along with the Phoenix Parser [10] for semantic parsing. The dialog manager is mixed-initiative and event driven. For route guidance, the natural language generator formats the driving instructions before presentation to the user by the text-to-speech server. For example, the direction, &amp;quot;Park Ave W. becomes 22nd St.&amp;quot; is reformatted to, &amp;quot;Park Avenue West becomes Twenty Second Street&amp;quot;. Here, knowledge of the task-domain can be used to significantly improve the quality of the output text. For speech synthesis, we have developed a Hub-compliant server that interfaces to the AT&amp;T NextGen speech synthesizer.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="2" end_page="2" type="metho">
    <SectionTitle>
3.5 Future Work
</SectionTitle>
    <Paragraph position="0"> We have developed a Hub compliant server that interfaces a Garmin GPS-III global positioning device to a mobile computer via a serial port link. The GPS server reports vehicle velocity in the X,Y,Z directions as well as real-time updates of vehicle position in latitude and longitude. HRL Laboratories has developed a route server that interfaces to a major navigation content provider. The HRL route server can take GPS coordinates as inputs and can describe route maneuvers in terms of GPS coordinates. In the near-term, we will interface our GPS server to the HRL route server in order to provide real-time updating of vehicle position. This will eliminate the need for periodic location update by the user and also will allow for more interesting dialogs to be established (e.g., the computer might proactively tell the user about upcoming points of interest, etc.).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML