File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/p03-1034_concl.xml

Size: 4,672 bytes

Last Modified: 2025-10-06 13:53:34

<?xml version="1.0" standalone="yes"?>
<Paper uid="P03-1034">
  <Title>Integrating Discourse Markers into a Pipelined Natural Language Generation Architecture</Title>
  <Section position="8" start_page="0" end_page="0" type="concl">
    <SectionTitle>
6 Preliminary Evaluation
</SectionTitle>
    <Paragraph position="0"> Evaluation of multi-paragraph text generation is exceedingly difficult, as empirically-driven methods are not sufficiently sophisticated, and subjective human evaluations that require multiple comparisons of large quantities of text is both difficult to control for and time-consuming. Evaluating our approach is even more difficult in that the interference between discourse markers and revision is not a highly fre- null quent occurrence in multi-page text. For instance, in our corpora we found that these interference effects occurred 23% of the time for revised clauses and 56% of the time with discourse markers. In other words, almost one of every four clause revisions potentially forces a change in discourse marker lexicalizations and one in every two discourse markers occur near a clause revision boundary.</Paragraph>
    <Paragraph position="1"> However, the &amp;quot;penalty&amp;quot; associated with incorrectly selecting discourse markers is fairly high leading to confusing sentences, although there is no cognitive science evidence that states exactly how high for a typical reader, despite recent work in this direction (Tree and Schrock, 1999). Furthermore, there is little agreement on exactly what constitutes a discourse marker, especially between the spoken and written dialogue communities (e.g., many members of the latter consider &amp;quot;uh&amp;quot; to be a discourse marker). We thus present an analysis of the frequencies of various features from three separate New York Times articles generated by the STORYBOOK system. We then describe the results of running our combined revision and discourse marker module with the discourse plans used to generate them.</Paragraph>
    <Paragraph position="2"> While three NYT articles is not a substantial enough evaluation in ideal terms, the cost of evaluation in such a knowledge-intensive undertaking will continue to be prohibitive until large-scale automatic or semiautomatic techniques are developed.</Paragraph>
    <Paragraph position="3"> The left side of table 1 presents an analysis of the frequencies of revisions and discourse markers as found in each of the three NYT articles. In addition, we have indicated the number of times in our opinion that revisions and discourse markers co-occurred (i.e., a discourse marker was present at the junction site of the clauses being aggregated).</Paragraph>
    <Paragraph position="4"> The right side of the table indicates the difference between the accuracy of two different versions of the system: separate signifies the initial configuration of the STORYBOOK system where discourse marker insertion and revision were performed as separate process, while integrated signifies that discourse markers were lexicalized during revision as described in this paper. The difference between these two numbers thus represents the number of times per article that the integrated clause aggregation and discourse marker module was able to improve the resulting text.</Paragraph>
    <Paragraph position="5"> 7Conclusion Efficiency and software engineering considerations dictate that current large-scale NLG systems must be constructed in a pipeline fashion that minimizes backtracking and communication between modules.</Paragraph>
    <Paragraph position="6"> Yet discourse markers and revision both operate at the clause level, which leads to the potential of interference effects if they are not resolved at the same location in a pipelined architecture. We have analyzed recent theoretical and applied work in both discourse markers and revision, showing that although no previous NLG system has yet integrated both components into a single architecture, an architecture for multi-paragraph generation which separated the two into distinct, unlinked modules would not be able to guarantee that the final text contained appropriately lexicalized discourse markers. Instead, our combined revision and discourse marker module in an implemented pipelined NLG system is able to correctly insert appropriate discourse markers despite changes made by the revision system. A corpus analysis indicated that significant interference effects between revision and discourse marker lexicalization are possible. Future work may show that similar interference effects are possible as successive modules are added to pipelined NLG systems.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML