File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/98/w98-1425_evalu.xml
Size: 5,803 bytes
Last Modified: 2025-10-06 14:00:34
<?xml version="1.0" standalone="yes"?> <Paper uid="W98-1425"> <Title>A FLEXIBLE SHALLOW APPROACH TO TEXT GENERATION</Title> <Section position="7" start_page="244" end_page="245" type="evalu"> <SectionTitle> 4 Costs and Benefits </SectionTitle> <Paragraph position="0"> As Reiter and Mellish note, the use of shallow techniques needs to be justified through a cost-benefit analysis \[Reiter and Mellish, 1993\]. We specify the range of possible applications our approach is * useful for, exemplified by the report generator developed for the TEMSIS project.</Paragraph> <Paragraph position="1"> This application took an effort of about eight person months, part of which were spent implementing interfaces to the TEMSIS server and to the database, and for making ourselves acquainted with details of the domain. The remaining time was spent on (1) the elicitation of user requirements and the definition of a small text corpus, (2) the design of IR according to the domain distinctions required for the corpus texts, and (3) text organization, adaptation of TG/2 and grammar development. null The grammars comprise 105 rules for the German and 122 for the French version. There are about twenty test predicates and IR access functions, most of which are needed for both languages. The French version was designed on the basis of the German one and took little more than a week to implement. The system covers a total of 384 different report structures that differ in at least one linguistic aspect.</Paragraph> <Section position="1" start_page="244" end_page="245" type="sub_section"> <SectionTitle> 4.1 Benefits </SectionTitle> <Paragraph position="0"> Altogether, the development effort was very low. We believe that reusing an in-depth surface generator for this task would not have scored better. Our method has a number of advantages: (1) Partial reusability. Despite its domain-dependence, parts of the system are reusable. The TG/2 interpreter has been adopted without modifications. Moreover, a sub-grammar for time expressions in the domain of appointment scheduling was reused with only minor extensions. (2) Modeling flexibility. Realization techniques of different granularity (canned text, templates, context-free grammars) allow the grammar writer to model general, linguistic knowledge as well as more specific task and domain-oriented wordings.</Paragraph> <Paragraph position="1"> (3) Processing &quot;speed. Shallow processing is fast. In our system, the average generation time of less than a second can almost be neglected (the overall run-time is longer due to database access). (4) Multi-lingual extensions. Additional languages can be included with little effort because the IR is neutral towards particular languages.</Paragraph> <Paragraph position="2"> (5) Variations in wording. Alternative formulations are easily integrated by defining conflicting rules in TGL. These are ordered according to a set of criteria that cause the system to prefer certain formulations to others (cf. \[Busemann, 1996\]): Grammar rules leading to preferred formulations are selected first from a conflict set of concurring rules~ The preference mechanisms will be used in a future version to tailor the texts for administrative and public uses.</Paragraph> </Section> <Section position="2" start_page="245" end_page="245" type="sub_section"> <SectionTitle> 4.2 Costs </SectionTitle> <Paragraph position="0"> As argued above, the orientation towards the application task and domain yields some important benefits. On the other hand, there are limitations in reusability and flexibility: (1) IR cannot be reused for other applications. The consequences for the modules interfaced by IR, the text organizer and the text realizer, are a loss in generality. Since both modules keep a generic interpreter apart from partly domain-specific knowledge, the effort of transporting the components to new applications is, however, restricted to modifying the knowledge sources.</Paragraph> <Paragraph position="1"> (2) By associating canned text with domain acts ' TG/2 behaves in a domain and task specific way. This keeps the flexibility in the wording, which can only partly be influenced by the text organizer, inherently lower than with in-depth approaches.</Paragraph> <Paragraph position="2"> 4.3 When does it pay off? We take it for granted that the TEMSIS generation application stands for a class of comparable tasks that can be characterized as follows. The generated texts are information-conveying reports in a technical domain. The subIanguage allows for a rather straight-forward mapping onto IR expressions, and IR expressions can be realized in a context-independent way. For these kinds of applications, our methods provide sufficient flexibility by omitting unnecessary or known information from both the schemes and its IR expressions, and by including particles to increase coherency. The reports could be generated in multiple languages. We recommend the opportunistic us e of shallow techniques for this type of application.</Paragraph> <Paragraph position="3"> Our approach is not suitable for tasks involving deliberate sentence planning, the careful choice of lexemes, or a sophisticated distribution of information onto linguistic units. Such tasks would not be compatible with the loose couPling of our components via IR. In addition, they would require complex tests to be formulated in TGL rules, rendering the grammar :rather obscure. Finally, if the intended coverage of content is to be kept extensibl e or is not known precisely enough at an early phase Of development, the eventual redesign of the intermediate structure and associated mapping rules for text organization may severely limit the usefulness Of our approachl</Paragraph> </Section> </Section> class="xml-element"></Paper>