File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/91/h91-1069_abstr.xml
Size: 5,008 bytes
Last Modified: 2025-10-06 13:47:10
<?xml version="1.0" standalone="yes"?> <Paper uid="H91-1069"> <Title>SESSION 12: SLS AND PROSODY</Title> <Section position="1" start_page="0" end_page="353" type="abstr"> <SectionTitle> SESSION 12: SLS AND PROSODY </SectionTitle> <Paragraph position="0"> This last session of the Meeting was really two sub-sessions on two quite different topics. The session opened with two thought-provoking papers from M1T on what might be called a philosophy of building a spoken language system.</Paragraph> <Paragraph position="1"> The first paper, by Seneff, Hirschman and Zue, concerned the ATIS domain. The authors state what seems, after it is pointed out, an obvious truth, but which too few system builders seem to understand: when attempting a project like ATIS, the time to worry about the kinds of interaction that will take place in a dialogue is at the very beginning of the project. The MIT ATIS system has not yet been fully designed, but the paper discusses, and gives some examples of, the kinds information exchange it is going to have to handle.</Paragraph> <Paragraph position="2"> The second paper, by Hirsehman, Seneff, Goodine and Phillips, was of the same flavor, but in a more practical vein. It describes some actual Natural Language improvements to the M1T Voyager system, having to do with merging of acoustic and grarnmatieal evidence during the tree-search part of the recognition algorithm. Some simple (new) ideas have already provided a 33% improvement in recognition score.</Paragraph> <Paragraph position="3"> In the discussion period, it was clear that questioners had no quarrel with the aims expressed by the talkers; all questions had to do with the architecture, the actual implementation, of the algorithms discussed in the papers.</Paragraph> <Paragraph position="4"> The final three papers were on prosody. Suggestions that prosody might be used in automatic speech understanding go back at least to the ARPA SUR project of the early 1970s; however no recognition system has actually ever used prosody.</Paragraph> <Paragraph position="5"> In the early days, when compute-time was a primary issue, the notion was that prosodic information could be used to order competing theories, and thus speed the search. Today, with computational resources faster and cheaper, the emphasis is on use of prosody for disambiguation.</Paragraph> <Paragraph position="6"> The papers in this sub-session reported work that is very early in the process of folding prosody into the recognition process; in fact, all three might be described as feasibility tests. None reported work in which prosodic information is automatically extracted from an unknown incoming utterance.</Paragraph> <Paragraph position="7"> The first paper, by Price, Ostendorf, Shattuck-Hufnagel and Fong, and the second by Wang and Hirschberg, are somewhat in the nature of thought experiments. Price et al. are interested in whether or not prosody actually can be used to disambiguate; they show that human listeners can indeed use it to some extent. Wang and Hirschberg show that in the ATIS domain, which is syntactically simple, it is possible to predict fairly well, given the text of a sentence, where intonation boundaries will occur.</Paragraph> <Paragraph position="8"> The third paper, by Wightman, Veilleux and Ostendorf, describes what comes closest to a practical experiment. Here, given incoming sentences where the words are known in advance, their algorithm measures certain phoneme durations, and uses them in a simple disambiguation task, with very encouraging results.</Paragraph> <Paragraph position="9"> Questions focused on three topics. The first was the statistical gathering of prosodic evidence. There was general agreement that there is not now any set of statistics on prosody from a large corpus. Further, since prosodic units are much longer than phonemes, for example, it wiU take a lot of text to get reliable estimates. It was thought that some of the corpora now being collected will be large enough. It is perhaps of interest that there was no discussion of just what measurements we need to ffmd the distribution of.</Paragraph> <Paragraph position="10"> There was a question about whether 90% accuracy in prosody (reported in one of the papers) was good or bad. The fact that the question was asked, and that in the discussion there was no solid opinion on either side, is very revealing of our state of knowledge of prosody and its usefulness in automatic understanding of speech.</Paragraph> <Paragraph position="11"> The third topic was the general usefulness of prosody.</Paragraph> <Paragraph position="12"> Opinions expressed were that prosody is not just good for speeding up search; that when prosody can be successfuUy extracted from speech it will be a useful addition to the probabilistic recognition framework; and that in fact there are many situations in which prosody will be the only way to get at the true meaning of a spoken sentence.</Paragraph> </Section> class="xml-element"></Paper>