File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/j97-1005_intro.xml
Size: 3,561 bytes
Last Modified: 2025-10-06 14:06:15
<?xml version="1.0" standalone="yes"?> <Paper uid="J97-1005"> <Title>Discourse Segmentation by Human and Automated Means</Title> <Section position="3" start_page="0" end_page="104" type="intro"> <SectionTitle> SEGMENT X Meanwhile, </SectionTitle> <Paragraph position="0"> there are three little boys, up on the road a little bit, and they see this little accident. And u-h they come over, and they help \]him~ and you know, help Ihiml pick up the pears and everything. SEGMENT Y A-nd the one thing that struck me about the- three little boys that were there, is that one had ay uh I don't know what you call them, but it's a paddle, and a ball-, is attached to the paddle, and you know you bounce it? And that sound was really prominent. SEGMENT Z Well anyway, so- u-m tsk all the pears are picked up, and I he \]'s on his way again, Figure 1 Discourse segment structure and linguistic devices. ure 1--which describe how three boys come to the aid of another boy who fell off of a bike --are more closely related to one another than to those in the intervening segment Y--which describe the paddleball toy owned by one of the three boys. The second discourse feature of interest is that the usage of a wide range of lexicogrammatical devices seems to constrain or be constrained by this more abstract structure. Consider the interpretation of the referent of the boxed pronoun he in segment Z. The referent of the underlined noun phrase one in segment Y is the most recently mentioned male referent: without the segmentation, the reasoning required to reject it in favor of the intended referent of he is quite complex. However, segment Z begins with certain features that indicate a resumption of the speaker goals associated with segment X, such as the use of the phrase well anyway, and the repeated mention of the event of picking up the pears. In terms of the segmentation shown here, the referents introduced in segment X are more relevant for interpreting the pronoun in segment Z. Note also that cue words (italicized) explicitly mark the boundaries of all three segments. Our work is motivated by the hypothesis that natural language technologies can more sensibly interpret discourse, and can generate more comprehensible discourse, if they take advantage of this interplay between segmentation and linguistic devices. In Section 2, we give a brief overview of related work. In Section 3, we present our analysis of segmentation data collected from a population of naive subjects. Our results demonstrate an extremely significant pattern of agreement on segment boundaries. In Section 4, we use boundaries abstracted from the data produced by our subjects to quantitatively evaluate algorithms for segmenting discourse. In Section 4.1, we discuss the coding and evaluation methods. In Section 4,2, we test an initial set of algorithms for computing segment boundaries from a particular type of linguistic feature, either referential noun phrases, cue phrases, or pauses. In Section 4.3.1, we analyze the errors of our initial algorithms in order to identify a set of enriched input features, and to determine how to combine information from the three linguistic knowledge sources. In Section 4.3.2, we use machine learning to automatically construct segmentation algorithms from large feature sets. Our results suggest that it is possible to approach human levels of performance, given multiple knowledge sources. In Section 5, we discuss the significance of our results and briefly highlight our current directions.</Paragraph> </Section> class="xml-element"></Paper>