File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/p01-1023_intro.xml

Size: 4,031 bytes

Last Modified: 2025-10-06 14:01:12

<?xml version="1.0" standalone="yes"?>
<Paper uid="P01-1023">
  <Title>Empirically Estimating Order Constraints for Content Planning in Generation</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In a language generation system, a content planner typically uses one or more &amp;quot;plans&amp;quot; to represent the content to be included in the output and the ordering between content elements.</Paragraph>
    <Paragraph position="1"> Some researchers rely on generic planners (e.g., (Dale, 1988)) for this task, while others use plans based on Rhetorical Structure Theory (RST) (e.g., (Bouayad-Aga et al., 2000; Moore and Paris, 1993; Hovy, 1993)) or schemas (e.g., (McKeown, 1985; McKeown et al., 1997)). In all cases, constraints on application of rules (e.g., plan operators), which determine content and order, are usually hand-crafted, sometimes through manual analysis of target text.</Paragraph>
    <Paragraph position="2"> In this paper, we present a method for learning the basic patterns contained within a plan and the ordering among them. As training data, we use semantically tagged transcripts of domain experts performing the task our system is designed to mimic, an oral briefing of patient status after undergoing coronary bypass surgery. Given that our target output is spoken language, there is some level of variability between individual transcripts. It is difficult for a human to see patterns in the data and thus supervised learning based on hand-tagged training sets can not be applied. We need a learning algorithm that can discover ordering patterns in apparently unordered input.</Paragraph>
    <Paragraph position="3"> We based our unsupervised learning algorithm on techniques used in computational genomics (Durbin et al., 1998), where from large amounts of seemingly unorganized genetic sequences, patterns representing meaningful biological features are discovered. In our application, a transcript is the equivalent of a sequence and we are searching for patterns that occur repeatedly across multiple sequences. We can think of these patterns as the basic elements of a plan, representing small clusters of semantic units that are similar in size, for example, to the nucleus-satellite pairs of RST.1 By learning ordering constraints over these ele1Note, however, that we do not learn or represent intention. null age, gender, pmh, pmh, pmh, pmh, med-preop, med-preop, med-preop, drip-preop, med-preop, ekg-preop, echo-preop, hct-preop, procedure, . . .</Paragraph>
    <Paragraph position="4">  the transcript shown in Figure 1.</Paragraph>
    <Paragraph position="5"> ments, we produce a plan that can be expressed as a constraint-satisfaction problem. In this paper, we focus on learning the plan elements and the ordering constraints between them. Our system uses combinatorial pattern matching (Rigoutsos and Floratos, 1998) combined with clustering to learn plan elements. Subsequently, it applies counting procedures to learn ordering constraints among these elements.</Paragraph>
    <Paragraph position="6"> Our system produced a set of 24 schemata units, that we call &amp;quot;plan elements&amp;quot;2, and 29 ordering constraints between these basic plan elements, which we compared to the elements contained in the orginal hand-crafted plan that was constructed based on hand-analysis of transcripts, input from domain experts, and experimental evaluation of the system (McKeown et al., 2000).</Paragraph>
    <Paragraph position="7"> The remainder of this article is organized as follows: first the data used in our experiments is presented and its overall structure and acquisition methodology are analyzed. In Section 3 our techniques are described, together with their grounding in computational genomics. The quantitative and qualitative evaluation are discussed in Section 4. Related work is presented in Section 5. Conclusions and future work are discussed in Section 6.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML