File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/00/j00-2005_abstr.xml
Size: 5,889 bytes
Last Modified: 2025-10-06 13:41:41
<?xml version="1.0" standalone="yes"?> <Paper uid="J00-2005"> <Title>Squibs and Discussions Pipelines and Size Constraints</Title> <Section position="2" start_page="0" end_page="0" type="abstr"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> Some types of documents need to fit on a limited number of pages. For example, this article, because it is a squib, must fit on eight pages in the style (font, layout, etc.) specified by Computational Linguistics. However, in certain cases it is useful to include as much information as possible given the size limit; for example, I want to convey as much information as possible about my research in the allowed eight pages.</Paragraph> <Paragraph position="1"> Maximizing the amount of content subject to a size limit is also a problem for some natural language generation (NLG) systems. For example, the STOP system (Reiter, Robertson, and Osman 1999) produces personalized smoking-cessation leaflets that must fit on four A5 pages, in a certain style; but it is useful if the leaflets can convey as much information as possible given this size constraint.</Paragraph> <Paragraph position="2"> One problem with performing this optimization in an NLG system is that the size of a document is primarily determined by how much content it contains, that is by decisions made during the content determination process. However, an NLG system cannot accurately determine the size of a document until the document has been completely processed by the NLG system and (in some cases) by an external document presentation system, such as LaTeX or Microsoft Word. This is because the size of the document is highly dependent on its exact surface form. This is a phenomenon that may be familiar to readers who have tried to revise a paper to fit a page-limit constraint by making small changes to wording or even orthography.</Paragraph> <Paragraph position="3"> In consequence, it may be difficult to satisfy the size constraint while &quot;filling up&quot; the allowed pages in a pipelined NLG system that performs content determination in an early pipeline module, before the surface form of the document is known. This is especially true if each pipeline module is restricted to sending a single solution to the next pipeline module, instead of multiple possible solutions.</Paragraph> <Paragraph position="4"> In this paper I give a brief summary of the pipeline debate and of STOP, present my experimental results, and then discuss the implications of this work.</Paragraph> <Paragraph position="5"> * Department of Computing Science, Aberdeen AB24 3UE, UK. E-maih ereiter@csd.abdn.ac.uk (~) 2000 Association for Computational Linguistics Computational Linguistics Volume 26, Number 2 2. Pipelines in NLG For the past 20 years, the NLG community has generally agreed that modularizing NLG systems is sensible. This has become even more true in recent years, because of a growing trend to incorporate existing modules (especially realization systems such as FUF/SURGE \[Elhadad and Robin 1997\]) into new systems. While different systems use different numbers of modules, all recent systems that I am aware of are divided into modules.</Paragraph> <Paragraph position="6"> This leads to the question of how modules should interact. In particular, is it acceptable to arrange modules in a simple pipeline, where a later module cannot affect an earlier module? Or is it necessary to allow revision or feedback, where a later module can request that an earlier module modify its results? If a pipeline is used, should modules pass a single solution down the line, or should they pass multiple solutions and let subsequent modules choose between these? Many authors have argued that pipelines cannot optimally handle certain linguistic phenomena. For example, Danlos and Namer (1988) point out that in French, whether a pronoun unambiguously refers to an entity depends on word ordering.</Paragraph> <Paragraph position="7"> This is because the pronouns le or la (which convey gender information) are abbreviated to 1' (which does not contain gender information) when the word following the pronoun starts with a vowel. But in a pipelined NLG system, pronominalization decisions are typically made earlier than word-ordering decisions; for example in the three-stage pipelined architecture presented by Reiter and Dale (2000), pronominalization decisions are made in the second stage (microplanning), but word ordering is chosen during the third stage (realization). This means that the microplanner will not be able to make optimal pronominalization decisions in cases where le or la are unambiguous, but I' is not, since it does not know word order and hence whether the pronoun will be abbreviated.</Paragraph> <Paragraph position="8"> Many other such cases are described in Danlos's book (Danlos 1987). The common theme behind many of these examples is that pipelines have difficulties satisfying linguistic constraints (such as unambiguous reference) or performing linguistic optimizations (such as using pronouns instead of longer referring expressions whenever possible) in cases where the constraints or optimizations depend on decisions made in multiple modules. This is largely due to the fact that pipelined systems cannot perform general search over a decision space that includes decisions made in more than one module.</Paragraph> <Paragraph position="9"> Despite these arguments, most applied NLG systems use a pipelined architecture; indeed, a pipeline was used in every one of the systems surveyed by Reiter (1994) and Paiva (1998). This may be because pipelines have many engineering advantages, and in practice the sort of problems pointed out by Danlos and other pipeline critics do not seem to be a major problem in current applied NLG systems (Mittal et al. 1998).</Paragraph> </Section> class="xml-element"></Paper>