File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/e06-1035_intro.xml
Size: 3,394 bytes
Last Modified: 2025-10-06 14:03:19
<?xml version="1.0" standalone="yes"?> <Paper uid="E06-1035"> <Title>Automatic Segmentation of Multiparty Dialogue</Title> <Section position="3" start_page="0" end_page="273" type="intro"> <SectionTitle> 2 Previous Work </SectionTitle> <Paragraph position="0"> Prior research on segmentation of spoken &quot;documents&quot; uses approaches that were developed for text segmentation, and that are based solely on textual cues. These include algorithms based on lexical cohesion (Galley et al., 2003; Stokes et al., 2004), as well as models using annotated features (e.g., cue phrases, part-of-speech tags, coreference relations) that have been determined to correlate with segment boundaries (Gavalda et al., 1997; Beeferman et al., 1999). Blei et al. (2001) and van Mulbregt et al. (1999) use topic language models and variants of the hidden Markov model (HMM) to identify topic segments. Recent systems achieve good results for predicting topic boundaries when trained and tested on human transcriptions. For example, Stokes et al. (2004) report an error rate (Pk) of 0.25 on segmenting broadcast news stories using unsupervised lexical cohesion-based approaches. However, topic segmentation of multiparty dialogue seems to be a considerably harder task. Galley et al. (2003) report an error rate (Pk) of 0.319 for the task of predicting major topic segments in meetings.1 Although recordings of multiparty dialogue lack the distinct segmentation cues commonly found in text (e.g., headings, paragraph breaks, andothertypographic cues) ornewsstory segmentation (e.g., the distinction between anchor and interview segments), they contain conversation-based features that may be of use for automatic segmentation. These include silence, overlap rate, speaker activity change (Galley et al., 2003), and cross-speaker linking information, such as adjacency pairs (Zechner and Waibel, 2000). Many of these features can be expected to be complimentary. For segmenting spontaneous multiparty dialogue into major topic segments, Galley et al. (2003) have shownthat amodel integrating lexical and conversation-based features outperforms one based on solely lexical cohesion information.</Paragraph> <Paragraph position="1"> However, the automatic segmentation models in prior work were developed for predicting top-level topic segments. In addition, compared to read speech and two-party dialogue, multi-party dialogues typically exhibit a considerably higher word error rate (WER) (Morgan et al., 2003).</Paragraph> <Paragraph position="2"> We expect that incorrectly recognized words will impair the robustness of lexical cohesion-based approaches and extraction of conversation-based discourse cues and other features. Past research on broadcast news story segmentation using ASR transcription has shown performance degradation from 5% to 38% using different evaluation metrics (van Mulbregt et al., 1999; Shriberg et al., 2000; Blei and Moreno, 2001). However, no prior study has reported directly on the extent of this degradation on the performance of a more subtle topic segmentation task and in spontaneous multiparty dialogue. In this paper, we extend prior work by investigating the effect of using ASRoutput on the models that have previously been proposed. In addition, we aim to find useful features and models for the subtopic prediction task.</Paragraph> </Section> class="xml-element"></Paper>