File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/h94-1091_metho.xml

Size: 4,780 bytes

Last Modified: 2025-10-06 14:13:50

<?xml version="1.0" standalone="yes"?>
<Paper uid="H94-1091">
  <Title>Evaluating the Use of Prosodic Information in Speech Recognition and Understanding</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
PROJECT GOALS
</SectionTitle>
    <Paragraph position="0"> The goal of this project is to investigate the use of different levels of prosodic information in speech recognition and understanding. The two main thrusts in the current work involve the use of prosodic information in parsing and detection/correction of disflueneies, but we have also investigated duration :modeling for continuous speech recognition. The research :involves determining a representation of prosodic information suitable for use in speech understanding systems, developing reliable algorithms for detection of prosodic cues in speech, investigating architectures for integrating prosodic cues in speech understanding systems, and assessing potential performance improvements by evaluating prosody algorithms in actual spoken language systems (SLS). This research is sponsored jointly by ARIA and NSF, NSF grant no. IPA8905249. null</Paragraph>
  </Section>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
RECENT RESULTS
</SectionTitle>
    <Paragraph position="0"> Recent results on this project are summarized below, with names of the students primarily responsible for the work indicated in parentheses.</Paragraph>
    <Paragraph position="1"> * Extended the prosodic prominence and break acoustic models by implementing an iterative pruning algorithm (hi. VeiUeux), integrating text and acoustic models (D.</Paragraph>
    <Paragraph position="2"> Macannuco), and developing a new energy feature based on results of recent linguistic studies.</Paragraph>
    <Paragraph position="3"> * Continued work in prosody-parse scoring, running tests on a larger set of sentences and optimizing on word error rate and achieving 10% reduction in word error by combining the prosody-parse score with acoustic and language scores from the MIT ATIS system. Experiments on the SPA ATIS system are in progress. (N. Veilleux) * Further explored parametric duration modeling in CSR using maximum likelihood clustering and speaking rate adaptation. Observed a 10% reduction in word error on the RM task, but no improvement in initial experiments on the WSJ task. (C. Fong) * In analyses of the ATIS corpus, found that: hesitations are associated with intonation patterns similar to those in frilled pauses (in addition to long pauses and lengthened segments) and occur at locations with relatively higher perplexity in the language model (N. Veilleux and A.</Paragraph>
    <Paragraph position="4"> Schlosser), that filled pauses occur almost exclusively in low-probability word sequences and have longer schwa duration than in the determiner &amp;quot;a&amp;quot;, and that there are differences in the fO patterns of fluent vs. disfluent single word repetitions (E. Shriberg).</Paragraph>
    <Paragraph position="5"> * Developed methods for automatic detection of fragments from acoustic cues and patterns in the N-Best recognizer output using decision trees. (M. Hendrix) * Developed a taxonomy for disfluencies and analyzed distributional properties of 5000 hand-labeled disfluencies from the ATIS corpus, the Switchboard corpus, and a third comparison corpus of human-human air travel planning speech. Findings include general models for predicting overall disfluency rates, relative rates of disfluency types, and relationships between disfluency type and type-independent features (e.g. presence of a word fragment or editing phrase). (E. Shriberg) * Analyzed patterns of occurrences of word-initial glottaliTs!ion, finding a high coincidence rate with phrase onset and prominence marking, which suggests new cues for prosodic pattern detection. (L. Dilley) \[This work is also funded by an NIH grant.\]</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="448" type="metho">
    <SectionTitle>
PLANS FOR THE COMING YEAR
</SectionTitle>
    <Paragraph position="0"> * Further refine the break index and prominence recognition algorithms to improve accuracy on the ATIS corpus, and investigate the use of detected prominence in the SPA ATIS system as a knowledge source for rejecting or correcting template marcher output.</Paragraph>
    <Paragraph position="1"> * Improve the parse scoring algorithm performance by exploring new syntactic features, and assess performance on SPA vs. MIT ATIS systems.</Paragraph>
    <Paragraph position="2"> * Refine the fragment detection algorithm and extend to detection of other disfluencies by integrating acoustic and pattern matching text cues, and evaluate usefulness in the SPA ATIS system.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML