File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/p97-1011_intro.xml
Size: 4,126 bytes
Last Modified: 2025-10-06 14:06:15
<?xml version="1.0" standalone="yes"?> <Paper uid="P97-1011"> <Title>Learning Features that Predict Cue Usage</Title> <Section position="2" start_page="0" end_page="80" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Discourse cues are words or phrases, such as because, first, and although, that mark structural and semantic relationships between discourse entities. They play a crucial role in many discourse processing tasks, including plan recognition (Litman and Allen, 1987), text comprehension (Cohen, 1984; Hobbs, 1985; Mann and Thompson, 1986; Reichman-Adar, 1984), and anaphora resolution (Grosz and Sidner, 1986). Moreover, research in reading comprehension indicates that felicitous use of cues improves comprehension and recall (Goldman, 1988), but that their indiscriminate use may have detrimental effects on recall (Millis, Graesser, and Haberlandt, 1993).</Paragraph> <Paragraph position="1"> Our goal is to identify general strategies for cue usage that can be implemented for automatic text generation. From the generation perspective, cue usage consists of three distinct, but interrelated problems: (1) occurrence: whether or not to include a cue in the generated text, (2) placement: where the cue should be placed in the text, and (3) selection: what lexical item(s) should be used.</Paragraph> <Paragraph position="2"> Prior work in text generation has focused on cue selection (McKeown and Elhadad, 1991; Elhadad and McKeown, 1990), or on the relation between cue occurrence and placement and specific rhetorical structures (RSsner and Stede, 1992; Scott and de Souza, 1990; Vander Linden and Martin, 1995).</Paragraph> <Paragraph position="3"> Other hypotheses about cue usage derive from work on discourse coherence and structure. Previous research (Hobbs, 1985; Grosz and Sidner, 1986; Schiffrin, 1987; Mann and Thompson, 1988; Elhadad and McKeown, 1990), which has been largely descriptive, suggests factors such as structural features of the discourse (e.g., level of embedding and segment complexity), intentional and informational relations in that structure, ordering of relata, and syntactic form of discourse constituents.</Paragraph> <Paragraph position="4"> Moser and Moore (1995; 1997) coded a corpus of naturally occurring tutorial explanations for the range of features identified in prior work. Because they were also interested in the contrast between occurrence and non-occurrence of cues, they exhaustively coded for all of the factors thought to contribute to cue usage in all of the text. From their study, Moscr and Moore identified several interesting correlations between particular features and specific aspects of cue usage, and were able to test specific hypotheses from the hterature that were based on constructed examples.</Paragraph> <Paragraph position="5"> In this paper, we focus on cue occurrence and placement, and present an empirical study of the hypotheses provided by previous research, which have never been systematically evaluated with naturally occurring data. Wc use a machine learning program, C4.5 (Quinlan, 1993), on the tagged corpus of Moser and Moore to induce decision trees. The number of coded features and their interactions makes the manual construction of rules that predict cue occurrence and placement an intractable task.</Paragraph> <Paragraph position="6"> Our results largely confirm the suggestions from the hterature, and clarify them by highhghting the most influential features for a particular task. Discourse structure, in terms of both segment structure and levels of embedding, affects cue occurrence the most; intentional relations also play an important role. For cue placement, the most important factors are syntactic structure and segment complexity.</Paragraph> <Paragraph position="7"> The paper is organized as follows. In Section 2 we discuss previous research in more detail. Section 3 provides an overview of Moser and Moore's coding scheme. In Section 4 we present our learning experiments, and in Section 5 we discuss our results and conclude.</Paragraph> </Section> class="xml-element"></Paper>