File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/n04-4040_concl.xml
Size: 1,956 bytes
Last Modified: 2025-10-06 13:54:03
<?xml version="1.0" standalone="yes"?> <Paper uid="N04-4040"> <Title>A Lexically-Driven Algorithm for Disfluency Detection</Title> <Section position="7" start_page="0" end_page="0" type="concl"> <SectionTitle> 6 Conclusions and Future Work </SectionTitle> <Paragraph position="0"> We have presented a TBL approach to detecting disfluencies that uses primarily lexical features. Our system performed comparably with other systems that relied on both prosodic and lexical features. Our speaker style (high frequency word) feature enabled us to detect rarer disfluencies, although this was not a large factor in our performance. It does appear to be a promising technique for future research however.</Paragraph> <Paragraph position="1"> The technique described here shows promise for extension to disfluency detection in other languages. Since TBL is a weakly statistical technique, it does not require a large training corpus and could be more rapidly applied to new languages. Assuming the basic forms of disfluencies in other languages are similar to those in English, very few modifications would be required.</Paragraph> <Paragraph position="2"> The longer edits that the system currently misses may be detectable using parsing, with the intuition that a parser trained on fluent speech may perform poorly in the presence of longer edits. Techniques using parse trees to identify disfluencies have shown success in the past (Hindle, 1983). The system could use portions of the parse structure as features and could relabel entire subtrees of the parse tree. Repeated words are another feature of the longer edits, which we might leverage off of by performing a weighted alignment of the edit and the repair. Eventually it may prove that more elaborate acoustic cues will be needed to identify these edits, at which point a model of interruption points could be included as a feature in the rules learned by the system.</Paragraph> </Section> class="xml-element"></Paper>