File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/w94-0319_metho.xml

Size: 33,645 bytes

Last Modified: 2025-10-06 14:13:53

<?xml version="1.0" standalone="yes"?>
<Paper uid="W94-0319">
  <Title>Has a Consensus NL Generation Architecture Appeared, and is it Psycholinguistically Plausible?</Title>
  <Section position="3" start_page="0" end_page="163" type="metho">
    <SectionTitle>
2 The Systems Surveyed
</SectionTitle>
    <Paragraph position="0"> The analysis presented here is based on a survey of generation systems that: 1. Were written (or at least substantially extended) since the late 1980s. This excludes early systems such as Davey's PROTEUS or Jacobs's KING.</Paragraph>
    <Paragraph position="1"> 2. Are complete systems that start from an intention, a query, or some data that needs to be communicated, and produce actual sentences as output. This rules out systems that only implement part of the generation process, such as Cawsey's EDGE system (discourse planning) or my own FN (noun-phrase construction).</Paragraph>
    <Paragraph position="2">  7th International Generation Workshop * Kennebunkport, Maine * June 21-24, 1994 3. Were motivated, at least to some degree, by the  desire to interface to application programs. This excludes systems that were primarily intended to be computational explorations of a particular linguistic theory, such as Patten's SLANG, or computational models of observed linguistic behavior, such as Hovy's PAULINE.</Paragraph>
    <Paragraph position="3"> 4. Are well enough known that I could easily obtain information about them.</Paragraph>
    <Paragraph position="4"> In short, the idea was to survey recent systems that looked at the entire generation problem, and that were motivated by applications and engineering considerations as well as linguistic theory. The systems examined were: I FUF \[Elhadad, 19921: Developed at Columbia University and used in several projects there, including COMET and ADVISOR If; I will use the term 'FUF' in this paper to refer to both FUF itself and the various related systems at Columbia. Several other universities have also recently begun to use FUF in their research. FUF is based on Kay's functional unification formalism \[Kay, 1979\].</Paragraph>
    <Paragraph position="5"> IDAS \[Reiter et a/., 1992\]: Developed at Edinburgh University, IDAS Was a prototype online documentatidn system for users of complex machinery. From a theoretical perspective, IDAS's main objective was to show that a single representation and reasoning system can be used for both domain and linguistic knowledge \[Reiter and Mellish, 1992\]. JOYCE \[Rarnbow and Korelsky, 1992\]: Developed at Odyssey Research Associates, JOYCZ is taken as a representative of several NL generation systerns produced by ORA and CoGenTex, including GOSSIP, FOG, and LFS. These systems are all aimed at commercial or government applications (in JoYcz's case, producing summaries of software designs), and are all based on Mel'~uk's Meaning-Text theory \[Mel'~uk, 1988\].</Paragraph>
  </Section>
  <Section position="4" start_page="163" end_page="163" type="metho">
    <SectionTitle>
PENMAN \[Penman Natural
</SectionTitle>
    <Paragraph position="0"> Language Group, 1989\]: Under development at ISI since the early 1980's, PENMAN has been used in several demonstration systems. As usual, I will use ~PENMAN' to refer to both PENMAN itself and the systems that were built around it. PENMAN'S theoretical basis is systemic linguistics \[Halliday, 1985\] and rhetorical-structure theory.</Paragraph>
    <Paragraph position="1"> SPOKESMAN \[Meteer, 1989\]: SPOKESMAN was developed at BBN for various applications, and has some of the same design goals as McDonald's x The selection rules are of course not completely well defined, which means there was inevitably some arbitrariness when I used them to select particular systems to include in the survey. I encourage any reader who believes that I have unfairly omitted a system to contact me, so that this system can be included in future versions of the survey. MUMBLE system \[McDonald, 1983\], including in particular the desire to build a system that at least in some respects is psycholinguistically plausible. SPOKESMAN uses Tree-Adjoining Grammars \[Joshi, 1987\] for syntactic processing.</Paragraph>
    <Paragraph position="2"> All of the examined systems produce English, and they also are mostly aimed at producing technical texts (instead of, say, novels or newspaper articles); it would be interesting to examine systems aimed at other languages or other types of applications, and see if this caused any architectural differences.</Paragraph>
  </Section>
  <Section position="5" start_page="163" end_page="164" type="metho">
    <SectionTitle>
3 An Overview of the Consensus
Architecture
</SectionTitle>
    <Paragraph position="0"> As can be seen, the chosen systems have widely different theoretical bases. It is therefore quite interesting that they all seem to have ended up with broadly similar architectures, in that they break up the generation process into a similar set of modules, and they all use a pipeline archi~eetnre to connect the modules; i.e., the modules are linearly ordered, and information flows from each module to its successor in the pipeline, with no feedback from later modules to earlier modules. The actual modules possessed by the systems (discussed in more detail in Section 4, as is the pipeline architecture) are: Content Determination: This maps the initial input of the generation system (e.g., a query to be answered, or an intention to be satisfied) onto a semantic form, possibly annotated with rhetorical (e.g., RST) relations.</Paragraph>
    <Paragraph position="1"> Sentence Plen~ng: Many names have been used for this process; here I use one suggested by Rambow and Korelsky \[1992\]. The basic goal is to map conceptual structures onto linguistic ones: this includes generating referring expressions, choosing content words and (abstract) grammatical relatiouships, and grouping information into clauses and sentences.</Paragraph>
    <Paragraph position="2"> Surface Generation: I use this term in a fairly narrow sense here, to mean a module that takes as input an abstract specification of information to be communicated by syntax and function words, and produces as output a surface form that communicates this information (e.g., maps :speechact imperatPSve into an English sentence that lacks a surface subject). All of the examined systems had separate sentence-planning and surfacegeneration modules, and the various intermediate forms used to pass information between these modules conveyed similar kinds of information.</Paragraph>
    <Paragraph position="3"> Morphology: Most of the systems have a fairly simple morphological component, presumably since English morphology is quite simple.</Paragraph>
    <Section position="1" start_page="164" end_page="164" type="sub_section">
      <SectionTitle>
7th International Generation Workshop * Kennebunkport, Maine * June 21-24, 1994
</SectionTitle>
      <Paragraph position="0"> Formatting: .IDAS, JOYCE, and PENMAN also contain mechanisms for formatting (in the I~TEX sense) their output, and/or adding hypertext annotations to enable users to click on portions of the generated text,</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="164" end_page="167" type="metho">
    <SectionTitle>
4 A More Detailed Examination of
</SectionTitle>
    <Paragraph position="0"> the Architecture This section describes the consensus architecture in more detail, with particular emphasis on some of the design decisions embodied in it that more theoretically motivated researchers have disagreed with. It furthermore examines the plausibility of these decisions from a psyeholinguistic perspective, and argues that in many respects they agree with what is known about how humansgenerate text.</Paragraph>
    <Section position="1" start_page="164" end_page="165" type="sub_section">
      <SectionTitle>
4.1 Modularized Pipeline Architecture
</SectionTitle>
      <Paragraph position="0"> The consensus architecture divides the generation process into multiple modules, with information flowing in a 'pipeline' fashion from one module to the next. By pipeline, I mean that the modules are arranged in a linear order, and each module receives information only from its predecessor (and the various linguistic and domain knowledge bases), and sends information only to its successor. Information does not flow 'backwards' from a module to its predecessor, and global 'blackboards' that all modules can access and modify are not used. I do not mean by 'pipeline' that generation must be incremental in the sense that, say, syntactic processing of the first sentence is done at the same time as semantic processing of the second; I believe most of the systems examined could in fact do this, but they have not bothered to do so (probably because it would not be of much benefit to the applications programs of interest).</Paragraph>
      <Paragraph position="1"> 4.1.1 Design decision: avoid integrated architect~e Many NL generation researchers have argued against dividing the generation process into modules; perhaps the best-known are Appelt \[1985\] and Danlos \[1984\]. Others, such as Rubinoff \[1992\], have accepted modules but have argued that the architecture must allow feedback between later modules and earlier modules, which argues against the one-way information flow of the pipeline architecture.</Paragraph>
      <Paragraph position="2"> The argument against pipelines and modules is almost always some Variant of 'there are linguistic phenomena that can o~ly be properly handled by looking at constraints from different levels (intentional, semantic, syntactic, morphological), and this is difficult to do in a pipeline system? To take one fairly random exampie, Danlos and Namer \[1988\] have pointed out that Since the French masculine and feminine pronouns le and la are abbreviated to l' before a word that starts with a vowel, and since in some cases le and la may be unambiguous references while l' is not, the referring expression system must have some knowledge of surface word order and selected content and function words before it can decide whether a pronoun is acceptable; this will not be possible if referring expressions are chosen before syntactic structures are built, as happens in the consensus architecture.</Paragraph>
      <Paragraph position="3"> There is undoubtably some truth to these arguments, but the applications builder also has to consider the engineering reality that the sorts of systems proposed by Appelt, Danlos, and Namer are extremely difficult to build from an engineering perspective. The engineering argument for modularization is particularly strong; Mart has put this very well in \[Mart, 1976, page 485\]: Any large computation should be split up and implemented as a collection of small subparts that are as nearly independent of one another as the overall task allows. If a process is not designed in this way a small change in one place will have consequences in many other places. This means that the process as a whole becomes extremely difficult to debug or improve, whether by a human designer or in the course of natural evolution, because a small chance to improve one part has to be accompanied by many simultaneous compensatory changes elsewhere.</Paragraph>
      <Paragraph position="4"> Mart argues that a modularized structure makes sense both for human engineers and for the evolutionary process that produced the human brain. The evidence is indeed strong that the human brain is highly modularized. This evidence comes from many sources (e.g., cognitive experiments and PET scans of brain activity), but I think perhaps the most convincing evidence is from studies of humans with brain damage. Such people tend to lose specific abilities, not suffer overall degradation that applies equally to all abilities. Ellis and Young \[1988\] provide an excellent summary of such work, and list patients that, for example * can produce syntactically correct utterances but can not organize utterances into coherent wholes, i.e., can perform surface generation but not content determination.</Paragraph>
      <Paragraph position="5"> * can generate word streams that tell a narrative but are not organized into sentences, i.e., can perform content determination but not surface generation. null * can produce coherent texts organized in grammatical structures, but have a severely restricted vocabulary; i.e., have impaired lexical choice (these patients still have conceptual knowledge, they just have problems lexicalizing it).</Paragraph>
      <Paragraph position="6"> The main engineering argument for arranging modules into a pipeline instead of a more complex structure</Paragraph>
    </Section>
    <Section position="2" start_page="165" end_page="165" type="sub_section">
      <SectionTitle>
7th International Generation Workshop deg Kennebunkport, Maine * June 21-24, 1994
</SectionTitle>
      <Paragraph position="0"> is again simplicity and ease of debugging. In a one-way pipeline of N modules there are only N-1 interfaces between modules, while a pipeline with 'two-way' information flow has 2(N-l) interfaces, and a system that fully connects each module with every.other module will have N(N-1) interfaces. A system that has a two-way interface between every possible pair of modules will undoubtably be able to handle many linguistic phenomena in a more powerful, elegant, principled, ere, manner than a system that arranges modules in a simple one-way pipeline; such a system will also, however, be much more difficult to build and (especially) debug.</Paragraph>
      <Paragraph position="1"> It is easy to argue that a one-way pipeline is worse at handling some linguistic phenomena than a richlyconnected architecture, but this is not the end of the story for the system-building engineer; he or she has to balance the cost of the pipeline being inefficient and/or inelegant at handling some phenomena against the benefit of the pipeline being a much easier structure to build and debug. We have insufficient engineering data at present to make any well-substantiated claims about whether the one-way pipeline has the optimal cost/benefit tradeoff or not (and in any case this will probably depend somewhat on the circumstances of each application \[Reiter and Mellish, 1993\]), but the circumstantial evidence on this question is striking; despite the fact that so many theoretical papers have argued against pipelines and very few (if any) have argued for pipelines, every one of the applications-oriented systems examined in this survey chose to use the one-way pipeline architecture.</Paragraph>
      <Paragraph position="2"> In other words, an applications systems builder can not look at particular linguistic phenomena in isolation; he or she must weigh the benefits of 'properly' handling these phenomena against the cost of implementing the proposed architecture. In the French pronoun case described by Danlos and Namer, for example, the applications builder might argue that in the great majority of cases no harm will in fact be done if the referring-expression generator simply ignores the possibility that pronouns may be abbreviated to I', especially given humans' ability to use context to disambiguate references; and if a situation does arise where it is absolutely essential that the human reader be able to correctly disambiguate a reference, then perhaps pronouns should not be used in any case. Given this, and the very high engineering cost of building an integrated architecture of the sort proposed by Danlos and Namer, is implementing such an architecture truly the most effective way of using scarce engineering resources? null Psycholinguistic research on self-monitoring and self-repair (summarized in \[Levelt, 1989, pages 458299\]) suggests that there is some feedback in the human language generation system, so the human language processor is probably more complex than a sireple one-way pipeline; but it may not be much more complex. To the best of my knowledge, most of the observed self-repair phenomena could be explained by an architecture that added a few feedback loops from later stages of the pipeline back to the initial planner; this would only sUghtly add to the number of inter-module interfaces (perhaps N+i instead of N-l, say), and hence would have a much lower engineering cost than implementing the fully connected 'every module communicates with every other module' architecture.</Paragraph>
      <Paragraph position="3"> Whether the human language engine is organized as a 'pipeline plus a few feedback loops' or an 'every module talks to every other module' architecture is unknown at this point; hopefully new psycholinguistic experiments will shed more light on this issue. I think it would be very interesting, for example, to test human French speakers on situations of the sort described by Danlos and Namer, and see what they actually did in such contexts; I do not believe that such an experiment has (to date) been performed.</Paragraph>
    </Section>
    <Section position="3" start_page="165" end_page="166" type="sub_section">
      <SectionTitle>
4.2 Content Determination
</SectionTitle>
      <Paragraph position="0"> Content determination takes the initial input to the generation system, which may be, for example, a query to be answered or an intention to be satisfied, and produces from it a 'semantic form', 'conceptual representation', or 'list of propositions', i.e., a specification of the meaning content of the output text. I will in this paper use the term semantic representation for this meaning specification. Roughly speaking, the semantic representations used by all of the examined systeins can be characterized as some kind of 'semantic net' (using the term in its broadest sense, as in \[Sowa, 1991\]) where the primitive elements in the net are conceptual instead of linguistic (e.g., domain KB concepts instead of English words). In some cases the semantic nets also include discourse and rhetorical relations between portions of the net; subsequent portions of the generator use these to generate discourse connectives (e.g., However), control formatting (e.g., the use of bulletized lists), etc.</Paragraph>
      <Paragraph position="1"> The systems examined use quite different content-determination mechanisms (i.e., there was no consensus); schemas \[McKeown, 1985\] were the most popular approach.</Paragraph>
      <Paragraph position="2"> 4.2.1 Design decision: integrated content determination and rhetorical pl~nn;ng Content determination in the systems examined basically performs two functions: Deep content determination: Determine what information should be communicated to the hearer.</Paragraph>
      <Paragraph position="3"> Rhetorical pl~nnlng: Organize this information in a rhetorically coherent manner.</Paragraph>
      <Paragraph position="4"> Hovy \[1988\] has proposed an architecture where these tasks are performed separately (in particular, the</Paragraph>
    </Section>
    <Section position="4" start_page="166" end_page="166" type="sub_section">
      <SectionTitle>
7th International Generation Workshop * Kennebunkport, Maine * June 21-24, 1994
</SectionTitle>
      <Paragraph position="0"> application program performs deep content determination, while the generation system performs rhetorical planning). Among the systems exan~ned, however, ttovy is unique in taking this approach; the builders of the other systems (including Moore and Paris \[1989\], who also worked with PENMAN) apparently believe that these two processes are so closely related that they should be performed simultaneously.</Paragraph>
      <Paragraph position="1"> I am not aware of any psychological data that directly address this issue. However, Hovy's architecture requires the language-producing agent to completely determine the content of a paragraph before he/she/it can begin to utter it (since the rhetorical planner determines what the first sentence is, and it is not called until deep content determination is completed), and intuitively it seems implausible to me that human speakers do this; it also goes against incremental theories of human speech production \[Levelt, 1989, pages 24-27\].</Paragraph>
    </Section>
    <Section position="5" start_page="166" end_page="167" type="sub_section">
      <SectionTitle>
4.3 Sentence pl~nn;ng
</SectionTitle>
      <Paragraph position="0"> The sentence planner converts the semantic representation, which is specified in terms of domain entities, into an abstract linguistic representation that specifies content words and grammatical relationships. I will use Mel'-~uk's term deep syntactic form for this representation.</Paragraph>
      <Paragraph position="1"> All of the systems analyzed possess a deep syntactic representation; none attempt to go from semantics to surface form in a single step. IDAS and PENMAN use variants of the same deep syntactic language, SPL \[Kasper, 1989\]. FUF and .JOYCE use deep syntactic languages that are based (respectively) on functional unification and meaning-text theory, but these convey much the same information as SPL. SPOKESMAN uses the realization specification language of MUMBLE \[McDonald, 1983\] as its deep syntactic representation; I have found it difficult to compare this language to the others, but McDonald (personal communication) agrees that it conveys essentially the same information as SPL.</Paragraph>
      <Paragraph position="2"> Unfortunately, while all of the systems possessed a module which converted semantic representations into deep syntactic ones, each system used a different name for this module. In FUF it is the 'lexical chooser', in IDAS it is the 'text planner', in JOYCE it is the 'sentence planner', in SPOKESMAN it is the 'text structurer', and in PENMAN it doesn't seem to have a name at all, e.g., Hovy \[1988\] simply refers to 'pre-generation text-planning tasks'. I use the JOYCE term here because I think it is the least ambiguous.</Paragraph>
      <Paragraph position="3"> The specific tasksperformed by the sentence planner include:  1. Mapping domain concepts and relations into content words and grammatical relations.</Paragraph>
      <Paragraph position="4"> 2. Generating referring expressions for individual domain entities.</Paragraph>
      <Paragraph position="5"> 3. Grouping propositions into clauses and sentences.  Relatively little is said in the papers about clause grouping and referring-expression generation, but more information is available on the first task, mapping domain entities onto linguistic entities. All the examined systems except perhaps PENMAN use a variant of what I have elsewhere called the 'structuremapping' approach JR.citer, 1991\]; z I do not know what approach PENMAN uses (the papers are not clear on this). Structure-mapping is based on a dictionary that lists the semantic-net equivalents of linguistic resources \[Meteor, 1991\] such as content words and grammatical relationships. This dictionary might, for example, indicate that the English word sisteris equivalent (in the domain knowledge-base of interest) to the structure Sibling with attribute Sex:Female, and that the domain relation Part-of can be expressed with the grammatical possessive, e.g., the car's engine. Given this dictionary, the structure-mapping algorithm iteratively replaces semantic structures by linguistic ones, until the entire semantic net has been recoded into a linguistic structure. There may be several ways of recoding a semantic representation into a linguistic one, which means structure-mapping systems have a choice between using the first acceptable reduction they find, or doing a search for a reduction that maximizes some optimality criterion (e.g., fewest number of words).</Paragraph>
      <Paragraph position="6"> The papers I read were not very clear on this issue, but I believe that while most of the systems surveyed use the first acceptable reduction found, FUF in some cases searches for an optimal reduction.</Paragraph>
      <Paragraph position="7"> 4.3.1 Design decision: separation of lexical choice from surface realization The consensus architecture clearly separates lexical choice of content words (done during sentence planning) from syntactic processing (performed during surface generation). In other words, it does not use an integrated 'lexicogrammar', which systemic theorists in particular (e.g., \[Matthiessen, 1991\]) have argued for, and which is implicit in some unification-based approaches, such as the semantic head-driven algorithm \[Shieber etal., 1990\].</Paragraph>
      <Paragraph position="8"> Despite these theoretical arguments, none of the systems examined used an integrated lexicogrammar, including unification-based FUF and systemic-based PENMAN. 3 In contrast, earlier unification-based sys2Even though I have previously argued against structure-mapping because it does not do a good job of handling lexical preferences ill.citer, 1991\], I nevertheless ended up using this technique when I moved from my Ph.D research to the more applications-oriented IDAS project. Perhaps this is another example of engineering considerations overriding theoretical arguments.</Paragraph>
      <Paragraph position="9"> SThe P~NMAN papers do not explicitly say where lexical choice is performed. However, all examples of PENMAN SPL input that I have seen have essentially had content</Paragraph>
    </Section>
    <Section position="6" start_page="167" end_page="167" type="sub_section">
      <SectionTitle>
7th International Generation Workshop * Kennebunkport, Maine * June 21-24, 1994
</SectionTitle>
      <Paragraph position="0"> terns, such as the tactical component of McKeown's TEXT system \[McKeown, 1985\], did integrate lexical and syntactic processing in a single 'tactical generator'; also, systemic systems that have been less driven by application needs than PENMAN, such as GENESYS \[Fawcett and Tucker, 1990\], have used integrated lexicogrammars. null There is psychological evidence that at least some lexicai processing is separated from syntactic processing, e.g., the patient mentioned in Section 4.1.1 who was able to perform content-determination and syntactic generation but had a very restricted speaking vocabulary. I think it's also very suggestive that humans have different learning patterns for content and function words; the former are 'open-class' and easily learned, while the latter are 'closed-class' and people tend to stick to the ones they learned as children. There is less evidence on the location of lexical choice in the psycholinguistic pipeline, and on whether it is performed in one stage or distributed among several stages.</Paragraph>
    </Section>
    <Section position="7" start_page="167" end_page="167" type="sub_section">
      <SectionTitle>
4.4 Surface Generation
</SectionTitle>
      <Paragraph position="0"> Surface generation has been used to mean many different things in the literature. I use it here to refer to the &amp;quot;portion of the generation system that knows how grammatical relationships are actually expressed in English (or whatever the target language is). For example, it is the surface generator that knows what function words and word order relationships are used in English for imperative, interrogative, and negated sentences; it is the surface generator that knows which auxiliaries are required for the various English tenses; and it is the surface generator that knows when pronominalization is syntactically required (John scolded himself, not John scolded John).</Paragraph>
      <Paragraph position="1"> 4.4.1 Design decision: top-down algorithm with (Almost?) no backtracking The grammars and grammar representations used by the systems examined are quite different, but all systems process the grammars with a top-down algorithm that uses minimal, if any, backtracking. None of the systems use the semantic head-driven generation algorithm \[Shieber et al., 1990\], although this is probably the single best-known algorithm for surface generation; Elhadad \[1992, chapter 4\] claims that such an algorithm is only necessary for systems that at-tempt to simultaneously perform both lexical choice and surface generation, which none of the examined systems do. Perhaps more interestingly, four of the five systems do not allow backtracking, and the fifth, FUF, allows backtracking but does not seem to use it much (if at all) during surface generation (backtracking is used in FUF during sentence planning). This is words already specified, which suggests that lexical choice is performed before syntactic processing in PENMAN.</Paragraph>
      <Paragraph position="2"> interesting, since backtracking is usually regarded as an essential component of unification-based generation approaches; it is certainly used in the semantic-head-driven algorithm, and in the TEXT generator \[McKeown, 1985\].</Paragraph>
      <Paragraph position="3"> From a psycholinguistic perspective, many people have argued that human language production is inerementai (see the summary in \[Levelt, 1989, pages 24-27\]), which means that of necessity it cannot inelude much backtracking. The garden-path phenomena shows that there are limits to how much syntactic backtracking people people perform during language understanding. This evidence is of course suggestive rather than definitive; it seems likely that there are limitations on how much (if any) backtracking humans will perform during syntactic processing (see also the arguments in \[McDonald, 1983\]), but there is no hard proof of this (as far as I am aware).</Paragraph>
    </Section>
    <Section position="8" start_page="167" end_page="167" type="sub_section">
      <SectionTitle>
4.5 Morphology and Formatting
</SectionTitle>
      <Paragraph position="0"> These modules will not be further examined here, mainly because little information is given in the papers on the details of how morphology and formatting are implemented.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="167" end_page="168" type="metho">
    <SectionTitle>
5 A Controversial (?) Vlew
</SectionTitle>
    <Paragraph position="0"> I would like to conclude with a perhaps controversial personal opinion. There have been many cases where NL generation researchers (including myself) have claimed that a certain linguistic phenomena is best handled by a certain architecture. Even if this is true, however, if it turns out that adopting this architecture will substantially complicate the design of the overall generation system, and that the most common cases of the phenomena of interest can be adequately handled by adding a few heuristics to the appropriate stage of a simpler architecture, then the engineeringoriented NL worker must ask him- or herself if the benefits of the proposed architecture truly outweigh its costs. For instance, one cannot simply argue that an integrated architecture is superior to a pipeline because it is better suited to handling certain kinds of pronominalization; it is also necessary to evaluate the engineering cost of shifting to an integrated architecture, and determine if, for example, better overall performance for the amount of engineering resources available could be obtained by keeping the general pipeline architecture, and instead investing some of the engineering resources 'saved' by this decision into building more sophisticated heuristics into the pronominalization module.</Paragraph>
    <Paragraph position="1"> In doing so, I believe (and again this is a personal belief that probably cannot be substantiated by the existing evidence) that the NL engineer is coming close to the 'reasoning' of the evolutionary process that created the human language system. Evolution does not care about elegant declarative formalisms or</Paragraph>
    <Section position="1" start_page="168" end_page="168" type="sub_section">
      <SectionTitle>
7th International Generation Workshop * Kennebunkport, Maine * June 21-24, 1994
</SectionTitle>
      <Paragraph position="0"> 'proper' (as opposed to 'hacky') handling of special cases; evolution's goal is to maximize performance in real-world situations, while maintaining an architecture that can be easily tinkered with by future evolutionary processes. In short, evolution is an engineer, not a mathematician. 4 It is thus perhaps not surprising if NL generation systems designed to be used in real-world applications end up with an architecture that seem to bear some resemblance to the architecture of the human language processor; 5 and future attempts to build applications-oriented generation systems may end up giving us real insights into how language processing Works in humans, even if this is not the main purpose of these systems. Similarly, psycholinguistic knowledge of how the human language generator works may suggest useful algorithms for NL engineers; one such ease is described in \[Reiter and Dale, 1992\].</Paragraph>
      <Paragraph position="1"> Cross-fertilization between psycholinguistics and NL engineering will only arise, however, if the results of engineering analyses are reported in the research literature, especially when they suggest going against some theoretical principle. Unfortunately, to date the results of such analyses have all-too-often been regarded more as embarrassments (since they contradict theory) than as valuable observations, and hence have not been published. I would like to conclude this paper by encouraging generation researchers to regard the results of engineering analyses to be as interesting and as important to the understanding of language as conventional linguistic analyses. After all, as Woods \[1975\] has pointed.out, while descriptive analyses of language can at best tell us what the brain does, engineering analyses can potentially offer insights on why the brain functions as it does.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML