File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/95/p95-1015_intro.xml
Size: 6,821 bytes
Last Modified: 2025-10-06 14:05:51
<?xml version="1.0" standalone="yes"?> <Paper uid="P95-1015"> <Title>Combining Multiple Knowledge Sources for Discourse Segmentation</Title> <Section position="3" start_page="0" end_page="108" type="intro"> <SectionTitle> 2 Discourse Segmentation 2.1 Related Work </SectionTitle> <Paragraph position="0"> Segmentation has played a significant role in much work on discourse. The linguistic structure of Grosz and Sidner's (1986) tri-partite discourse model consists of multi-utterance segments whose hierarchical relations are isomorphic with intentional structure.</Paragraph> <Paragraph position="1"> In other work (e.g., (Hobbs, 1979; Polanyi, 1988)), segmental structure is an artifact of coherence relations among utterances, and few if any specific claims are made regarding segmental structure per se. Rhetorical Structure Theory (RST) (Mann and Thompson, 1988) is another tradition of defining relations among utterances, and informs much work in generation. In addition, recent work (Moore and Paris, 1993; Moore and Pollack, 1992) has addressed the integration of intentions and rhetorical relations.</Paragraph> <Paragraph position="2"> Although all of these approaches have involved detailed analyses of individual discourses or representative corpora, we believe there is a need for more rigorous empirical studies.</Paragraph> <Paragraph position="3"> Researchers have begun to investigate the ability of humans to agree with one another on segmen- null tation, and to propose methodologies for quantifying their findings. Several studies have used expert coders to locally and globally structure spoken discourse according to the model of Grosz and Sidnet (1986), including (Grosz and Hirschberg, 1992; Hirschberg and Grosz, 1992; Nakatani et al., 1995; Stifleman, 1995). Hearst (1994) asked subjects to place boundaries between paragraphs of expository texts, to indicate topic changes. Moser and Moore (1995) had an expert coder assign segments and various segment features and relations based on RST. To quantify their findings, these studies use notions of agreement (Gale et al., 1992; Moset and Moore, 1995) and/or reliability (Passonneau and Litman, 1993; Passonneau and Litman, to appear; Isard and Carletta, 1995).</Paragraph> <Paragraph position="4"> By asking subjects to segment discourse using a non-linguistic criterion, the correlation of linguistic devices with independently derived segments can then be investigated in a way that avoids circularity.</Paragraph> <Paragraph position="5"> Together, (Grosz and Hirschberg, 1992; Hirschberg and Grosz, 1992; Nakatani et al., 1995) comprise an ongoing study using three corpora: professionally read AP news stories, spontaneous narrative, and read and spontaneous versions of task-oriented monologues. Discourse structures are derived from subjects' segmentations, then statistical measures are used to characterize these structures in terms of acoustic-prosodic features. Grosz and Hirschberg's work also used the classification and regression tree system CART (Breiman et al., 1984) to automatically construct and evaluate decision trees for classifying aspects of discourse structure from intonational feature values. Morris and Hirst (1991) structured a set of magazine texts using the theory of (Grosz and Sidner, 1986), developed a thesaurus-based lexical cohesion algorithm to segment text, then qualitatively compared their segmentations with the results. Hearst (1994) presented two implemented segmentation algorithms based on term repetition, and compared the boundaries produced to the boundaries marked by at least 3 of 7 subjects, using information retrieval metrics. Kozima (1993) had 16 subjects segment a simplified short story, developed an algorithm based on lexical cohesion, and qualitatively compared the results. Reynar (1994) proposed an algorithm based on lexical cohesion in conjunction with a graphical technique, and used information retrieval metrics to evaluate the algorithm's performance in locating boundaries between concatenated news articles.</Paragraph> <Section position="1" start_page="108" end_page="108" type="sub_section"> <SectionTitle> 2.2 Our Previous Results </SectionTitle> <Paragraph position="0"> We have been investigating a corpus of monologues collected and transcribed by Chafe (1980), known as the Pear stories. As reported in (Passonneau and Litman, 1993), we first investigated whether units of global structure consisting of sequences of utterances could be reliably identified by naive subjects. We analyzed linear segmentations of 20 narratives performed by naive subjects (7 new subjects per narrative), where speaker intention was the segment criterion. Subjects were given transcripts, asked to place a new segment boundary between lines (prosodic phrases) 1 wherever the speaker had a new communicative goal, and to briefly describe the completed segment. Subjects were free to assign any number of boundaries. The qualitative results were that segments varied in size from 1 to 49 phrases in length (Avg.-5.9), and the rate at which subjects assigned boundaries ranged from 5.5% to 41.3%. Despite this variation, we found statistically significant agreement among subjects across all narratives on location of segment boundaries (.114 z 10 -6 < p < .6 z 10-9).</Paragraph> <Paragraph position="1"> We then looked at the predictive power of linguistic cues for identifying the segment boundaries agreed upon by a significant number of subjects. We used three distinct algorithms based on the distribution of referential noun phrases, cue words, and pauses, respectively. Each algorithm (NP-A, CUE-A, PAUSE-A) was designed to replicate the subjects' segmentation task (break up a narrative into contiguous segments, with segment breaks falling between prosodic phrases). NP-A used three features, while CUE-A and PAUSE-A each made use of a single feature. The features are a subset of those described in section 3.</Paragraph> <Paragraph position="2"> To evaluate how well an algorithm predicted segmental structure, we used the information retrieval (IR) metrics described in section 3. As reported in (Passonneau and Litman, to appear), we also evaluated a simple additive method for combining algorithms in which a boundary is proposed if each separate algorithm proposes a boundary. We tested all pairwise combinations, and the combination of all three algorithms. No algorithm or combination of algorithms performed as well as humans. NP-A performed better than the other unimodal algorithms, and a combination of NP-A and PAUSE-A performed best. We felt that significant improvements could be gained by combining the input features in more complex ways rather than by simply combining the outputs of independent algorithms.</Paragraph> </Section> </Section> class="xml-element"></Paper>