File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/89/h89-1046_metho.xml
Size: 10,205 bytes
Last Modified: 2025-10-06 14:12:20
<?xml version="1.0" standalone="yes"?> <Paper uid="H89-1046"> <Title>ALTERNATIVE SOURCES OF DATA</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> INITIAL DRAFT GUIDELINES FOR THE DEVELOPMENT OF THE NEXT-GENERATION SPOKEN LANGUAGE SYSTEMS SPEECH RESEARCH DATABASE </SectionTitle> <Paragraph position="0"/> </Section> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> Texas Instruments OBJECTIVE </SectionTitle> <Paragraph position="0"> To best serve the strategic needs of the DARPA SLS research program by creating the next-generation speech database(s).</Paragraph> <Paragraph position="1"> To promote progress on the important SLS research problems: -- phonetic modeling (acoustic-phonetic decoding) -- higher level modeling of the speech mechanism -- language modeling (above the level of speech) To be adequate, practical, and timely -- comprehensive enough to support true learning -- limited enough to be accomplished -- soon enough to be valuable</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> BACKGROUND </SectionTitle> <Paragraph position="0"> The DARPA speech research program has made significant progress in the development of speech recognition technology. This progress has led to an increase in the overall scope of the program, from &quot;speech recognition&quot; to &quot;spoken language recognition&quot;. This change in the nature of the research effort requires that a new speech database be developed to better support the new research objectives.</Paragraph> <Paragraph position="1"> The value of establishing a few well-designed databases, shared by all of the research contributors, was demonstrated in the first phase of the DARPA speech research program. Such databases integrate the various research efforts by providing a common research medium which affords a shared understanding of the problem and by self-direction of technical efforts through unequivocal demonstrations of the strengths/weaknesses of competing approaches.</Paragraph> </Section> <Section position="4" start_page="0" end_page="257" type="metho"> <SectionTitle> DATABASE REQUIREMENTS </SectionTitle> <Paragraph position="0"> Primary requirements: -- &quot;natural&quot; (task-oriented) speech, unconstrained vocabulary or fixed syntax -- support for all speech recognition problem areas -- emphasis on speaker independent recognition by fixed</Paragraph> <Paragraph position="2"> The desiderata for the next-generation spoken language system speech database include the following: First, the speech must be &quot;natural&quot;. This implies that the speech be spontaneous as well as unconstrained by vocabulary or syntax.</Paragraph> <Paragraph position="3"> Second, the speech should involve or simulate interactive man/machine problem solving. This implies that the speaker's focus of attention is not on the speech act itself but rather on the task which the speech serves.</Paragraph> <Paragraph position="4"> Third, the database should be sufficiently representative and general so that it can support SLS technology development that is useful in general, beyond the specific task domain of the database.</Paragraph> <Paragraph position="5"> Fourth, the database development effort must be doable, with the data available to the researchers soon enough to drive the research effort.</Paragraph> <Paragraph position="6"> Fifth, the database should be relevant to DARPA needs and should demonstrate a high intrinsic value of spoken language systems for DoD applications.</Paragraph> <Paragraph position="7"> There is one other important consideration in the creation of an SLS database. The database should not be so difficult that it discourages the enthusiasm so essential for strong effort and steady progress in SLS technology research and development. This will be difficult for a spontaneous SLS database. Perhaps the database might be graded into categories of difficulty, thus allowing a gradual progress toward solutions for the most difficult problems.</Paragraph> <Paragraph position="9"/> </Section> <Section position="5" start_page="257" end_page="258" type="metho"> <SectionTitle> ALTERNATIVE SOURCES OF DATA </SectionTitle> <Paragraph position="0"> From among the three general categories of speech data, namely &quot;read&quot; speech, &quot;performance task&quot; speech, and &quot;natural task&quot; speech, probably the most satisfactory candidate is &quot;natural task&quot; speech. A discussion of the problems with the different categories of speech databases follows: I) &quot;Read&quot; speech database problems: The principal objection to &quot;read&quot; speech data is that it does not support the development of technology to recognize natural spoken language. The argument that transcripts of spoken language could be read is weak. It is not even clear that a reasonable representation of natural speech could be presented to the subjects to be read. Most agree that &quot;read&quot; speech is not suitable, and so I will not argue further against this category.</Paragraph> <Paragraph position="1"> 2) &quot;Performance task&quot; simulation problems: Several factors contribute to make &quot;performance task&quot; simulation impractical. First, the effort to create a meaningful simulation would be difficult at best. An affordable effort could not support the creation of a target application, and would necessarily be limited to creating a simulation of a spoken language interface. But working within existing target applications will limit the effective language domain, because few if any existing applications have the large domain and vocabulary that the SLS effort proposes to study. How to support a large meaningful language with this simulation is a critical challenge. Second, there would be a large effort required to create meaningful task scenarios for the subjects, to find (or train) a sufficient number of knowledgeable subjects, and to orient and interest the subjects in the specifics of the assigned task. Third, the need to use a trained and skilled human translator (to listen to the subjects and input appropriate commands to the system simulation), will limit the amount of data that can be collected to a level that will not supply the needed amount of data for productive research.</Paragraph> <Paragraph position="3"> 3) &quot;Natural task&quot; speech database problems: There are several negative aspects to a &quot;natural task&quot; speech database, mostly related to the difficulty of the recognition task: Because of the in situ nature of the speech, there will be significant additional dimensions of variation in the speech signal. These will include such things as greater acoustic variation, and less or no conscious control of a subject on his speech. Further, the usage of vocabulary and syntax will be highly skewed, with rare forms occurring just often enough to maximize the error rate through insufficient exposure to training data. (All spontaneous speech databases will be subject to such skewing.) The benefits of a &quot;natural task&quot; database, however, may outweigh or overcome the shortcomings. Most important, a natural task database allows study of the underlying principles that govern natural spoken language without the risk of corruption or bias of the language which might be caused by artifacts induced by simulation or data collection constraints. Also, the &quot;natural task&quot; scenario will support a higher level language, in terms of intelligent command/control dialog, than could be expected from a database query system such as spread sheet or personnel database. Finally, a most important benefit of the &quot;natural task&quot; scenario is that it can much more efficiently provide a large speech database. This can help to overcome the skewed distribution of vocabulary and syntax.</Paragraph> <Paragraph position="4"> Of the candidate tasks listed above, the first two, ATC operations and travel planning, probably fit the SLS database objectives best. They are both highly interactive and relatively well controlled. (ATC controllers use head-mounted microphones, and travel planning uses the telephone.) Also, they are both highly interactive and, at least in principle, support easy and efficient speech data collection. (ATC already record their data, although the bandwidth is typically limited to less than 4 kHz.) Dictation would be a good candidate, except that it is really not an interactive speech task, and therefore it does not support the SLS objective directly.</Paragraph> </Section> <Section position="6" start_page="258" end_page="259" type="metho"> <SectionTitle> EVALUMION ISSUES </SectionTitle> <Paragraph position="0"> In progressing from &quot;speech recognition&quot; to &quot;spoken language systems&quot;, two new dimensions will be added to performance evaluation. First, &quot;coverage&quot; will be an important, perhaps dominant, factor in system performance. Although this has always been an important issue in the practical use of speech recognition, it has typically been ignored, and evaluation has considered only speech data that falls within the formal language model. Second, it will no longer be adequate to evaluate speech at the level of orthographic transcription. It will be necessary to &quot;understand&quot; the speech sufficiently to determine the appropriate response to it. This is the greatest strategic challenge facing the SLS research effort, and one for which there will be no general near-term solution. The hope is that technology may be developed to support specific task-domain applications. The immediate challenge is to define an evaluation methodology for spoken language understanding.</Paragraph> </Section> <Section position="7" start_page="259" end_page="261" type="metho"> <SectionTitle> IMPORTANT PROBLEM DIMENSIONS IN THE DATABASE </SectionTitle> <Paragraph position="0"/> </Section> class="xml-element"></Paper>