File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/89/h89-2018_intro.xml

Size: 3,001 bytes

Last Modified: 2025-10-06 14:04:50

<?xml version="1.0" standalone="yes"?>
<Paper uid="H89-2018">
  <Title>THE COLLECTION AND PRELIMINARY ANALYSIS OF A SPONTANEOUS SPEECH DATABASE*</Title>
  <Section position="3" start_page="0" end_page="126" type="intro">
    <SectionTitle>
INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> One of the first tasks confronting researchers developing a spoken language system is the collection of data for analysis, system training, and evaluation. Since people do not always say grammatically well-formed sentences during a spoken dialogue with a computer, the currently available read speech databases may not capture the acoustic and linguistic variabilities found in goal-directed spontaneous speech. As a first attempt to create a spontaneous speech database 1, we have recently collected a large amount of data from 100 subjects during simulated dialogues with the VOYAGER spoken language system. The purpose of this paper is to document the database construction process, and to provide some preliminary linguistic and acoustic analysis.</Paragraph>
    <Paragraph position="1"> VOYAGER is a system that knows about the physical environment of a specific geographical area as well as certain objects inside this area, and can provide assistance on how to get from one location to another within this area. It currently focuses on the geographic area of the city of Cambridge, Massachusetts, between MIT and Harvard University, and can deal with several distinct concepts including directions, distance and time of travel between objects, relationships such as &amp;quot;nearest,&amp;quot; and simple properties such as phone numbers or types of food served. VOYAQErt also has a limited amount of discourse knowledge which enables it to respond to queries such as: &amp;quot;How do I get there?&amp;quot; It can also deal with certain clarification fragments such as: &amp;quot;The bank in Harvard Square.&amp;quot; A detailed description of the VOYAGER system can be found elsewhere in these proceedings \[1\].</Paragraph>
    <Paragraph position="2"> VOYAGER is made up of three components. The first component, the SUMMIT speech recognition system \[2\], converts the speech signal into a set of word hypotheses. The natural language component, TINA \[3\], provides a linguistic interpretation of the set of words. The parse tree generated by TINA is translated into a query language form, which is used to produce a response. Currently VOYAGER can generate responses in the form of text, graphics, and synthetic speech. The back end is an enhanced version of a direction assistance program developed by Jim Davis of MIT's Media Laboratory \[4\].</Paragraph>
    <Paragraph position="3"> *This research was supported by DARPA under Contract N00014-89-J-1332, monitored through the Office of Naval Research. lWe loosely use the terra spontaneous speech to mean the speech produced by a person &amp;quot;on the fly&amp;quot; when interacting with a computer for problem solving.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML