File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/95/j95-1002_abstr.xml
Size: 7,069 bytes
Last Modified: 2025-10-06 13:48:22
<?xml version="1.0" standalone="yes"?> <Paper uid="J95-1002"> <Title>Expressing Rhetorical Relations in Instructional Text: A Case Study of the Purpose Relation</Title> <Section position="2" start_page="0" end_page="30" type="abstr"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> Natural language provides an extensive set of lexical and grammatical forms for expressing concepts, many of which may, taken out of context, appear to be interchangeable. They are not interchangeable. Writers systematically choose the particular form from this set that they feel will produce the most effective expression given the communicative context. An important task of the text generation researcher is to inform the text generation process with a specification of both the range of these forms and the contexts in which they are used.</Paragraph> <Paragraph position="1"> The current study addresses this issue in the context of expressing procedural relations between actions in instructional text, that is, in written, procedural directions. The complexity of procedural relations typically expressed in such text has given rise to complex variations of expression in language. Consider, for example, the problem of expressing purpose relations. Such expressions could take many conceivable forms, all of which are perfectly grammatical: (la) Pull out sharply in order to remove the phone.</Paragraph> <Paragraph position="2"> (lb) To remove phone, pull out sharply.</Paragraph> <Paragraph position="3"> * Information Technology Research Institute, University of Brighton, Lewes Road, Brighton BN2 4AT, UK. E-mail: knvl@itri.bton.ac.uk t Department of Computer Science, University of Colorado, Boulder, CO 80309-0430, USA. E-mail: martin@cs.colorado.edu (~) 1995 Association for Computational Linguistics Computational Linguistics Volume 21, Number 1 (lc) Pull out sharply for phone removal.</Paragraph> <Paragraph position="4"> (ld) Pull out sharply for removing the phone.</Paragraph> <Paragraph position="5"> (le) For the phone, pull out sharply.</Paragraph> <Paragraph position="6"> (lf) Remove phone by pulling out sharply.</Paragraph> <Paragraph position="7"> (lg) Remove the phone. Pull out sharply.</Paragraph> <Paragraph position="8"> (lh) The purpose of pulling out sharply is to remove the phone.</Paragraph> <Paragraph position="9"> (li) Pulling out sharply achieves the purpose of removing the phone. (l j) Removing the phone involves pulling out sharply.</Paragraph> <Paragraph position="10"> (lk) The method for removing the phone is to pull out sharply.</Paragraph> <Paragraph position="11"> As can be seen, purpose expressions occur either before or after the expression of their related sub-actions (referred to here as the issue of slot) and are expressed in a number of grammatical forms (the issue of form). They may be linked with a variety of conjunctions or prepositions (the issue of linker) and may or may not be combined into a single sentence with the expression of their sub-actions (the issue of clause combining). The current study addresses these four issues of choice in the context of instructional text.</Paragraph> <Paragraph position="12"> Text generation systems must know which forms to produce and when to produce them. Formal linguistic analyses are useful for weeding out grammatically unacceptable forms, but they do not provide a principled means of determining which of the grammatically acceptable forms should be used in any given communicative context. As an alternative, the current study has employed the following four-step process for identifying both the relevant forms of expression and the contexts in which they are used:</Paragraph> <Paragraph position="14"> Collect a corpus of text from the relevant genre and encode a full range of the lexical and grammatical features of all of the text.</Paragraph> <Paragraph position="15"> Perform a linguistic analysis of part of the corpus. This analysis involves determining the range of forms used in the corpus and then using an iterative cycle of hypothesis formation and testing to determine the communicative contexts in which each is used.</Paragraph> <Paragraph position="16"> Implement the results of this analysis in the text generation system.</Paragraph> <Paragraph position="17"> Compare, in detail, the output of the system with the text found in the corpus, differentiating between the predictions concerning text that was specifically used in the analysis (the training set) and text that was not (the testing set).</Paragraph> <Paragraph position="18"> This process begins and ends with the corpus, providing an empirically based approach to identifying the range of lexical and grammatical forms that are used in real text and to determining the contextual issues that are relevant to choosing among them. Although the corpus study has become a common methodology in natural language generation, seldom are the representation and analysis techniques given in any detail, and detailed evaluations of the resulting text are not provided. These details are provided, for our study, in this paper.</Paragraph> <Paragraph position="19"> Our corpus is divided into training and testing portions. The training portion, used in step 2, constitutes approximately one-third of the full corpus and consists entirely of Keith Vander Linden and James H. Martin Expressing Rhetorical Relations cordless telephone manuals. The methodology is successfully applied to this portion, showing that there are, in fact, patterns of expression in cordless telephone manuals that can be identified and implemented. The study is then extended by testing the system's predictions on a separate and more diverse portion of the corpus that includes instructions for different types of devices and processes. This additional testing serves both to disallow over-fitting of the data in the training portion and to give a measure of how far beyond the telephone domain the predictions can legitimately be applied. No testing was done on noninstructional texts, and no claims are made concerning the applicability of the system's predictions in those areas.</Paragraph> <Paragraph position="20"> Following a review of relevant work in the area of natural language generation, this paper will discuss how these four steps have been applied to the generation of rhetorical relations in instructional text. It will detail what rhetorical relations in instructional text are and how they were collected and represented. It will then discuss how the corpus analysis was performed and how the results were implemented in IMAGENE, an instructional text generation system. The details of IMAGENE'S treatment of purpose expressions are given as representative of the coverage and form of the full system (more details concerning the other relations can be found elsewhere \[Vander Linden 1993c\]). It will conclude with a discussion of how well IMAGENE'S predictions match the text in the training and the testing portions of the corpus.</Paragraph> </Section> class="xml-element"></Paper>