File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/02/w02-1702_abstr.xml
Size: 5,749 bytes
Last Modified: 2025-10-06 13:42:41
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-1702"> <Title>Cascading XSL filters for content selection in multilingual document generation</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> Content selection is a key factor of any successful document generation system.</Paragraph> <Paragraph position="1"> This paper shows how a content selection algorithm has been implemented using an efficient combination of XML/XSL technology and the framework of RST for discourse modeling. The system generates multilingual documents adapted to user profiles in a learning environment for the web. This CourseViewGenerator applies simplified RST schemes to the elaboration of a master document in XML from which content segments are chosen to suit the user's needs. The personalisation of the document is achieved through the application of a sequence of filtering levels of text selection based on the user aspects given as input. These cascading filters are implemented in XSL.</Paragraph> <Paragraph position="2"> Introduction It is widely accepted that content selection plays a crucial role in text generation (Reiter and Dale 2000). This process is normally seen as a goal-directed activity in which text segments are fit into the discourse structure of the text so as to convey a coherent communicative goal (Grosz and Sidner 1986). Content planning techniques, such as textual schemas (McKeown 1985) or plan operators (Moore and Paris 1993), have been successfully used as models of text generation. There are cases, though, in which these techniques may face some limitations, for example, when the structure of the discourse is difficult to anticipate (Mellish et al. 1998). Nevertheless, when a set of well-defined communicative goals exists, complex goals can be broken down into sequences of utterances and generation becomes an efficient &quot;top-down'' process (Marcu 1997).</Paragraph> <Paragraph position="3"> This paper shows a macro level content selection algorithm that applies user profiles to constrain and discriminate the contents of a text, whose discourse structure is represented using a simplified version of Rhetorical Structure Theory (Mann and Thompson 1988).</Paragraph> <Paragraph position="4"> The algorithm has been implemented using XML/XSL-based technology in a multilingual document generation system for educational purposes. The main objective of this CourseViewGenerator system (Barrutieta, 2001 and Barrutieta et al., 2001) is to automatically produce multilingual learning documents that suit the student's needs at each particular stage of the learning process. Figure 1 shows the overall architecture of the system.</Paragraph> <Paragraph position="5"> We will begin by explaining the different parts of the system before addressing in more detail the content selection algorithm itself. The system starts by constructing a master document of the kind Hirst et al. (1997) proposed. This master document consists in a full-fledged text with references to all necessary multimedia elements (figures, tables, pictures, links, etc.). In our case, this master document takes the shape of a simple text file with all relevant information tagged in XML. Tags carry information of the logical composition of the text as well as metadata information about its discourse structure. The text is seen as raw data, and tags encapsulate these raw data as metadata. The structure of the discourse is represented using a simplified version of RST. RST is simplified in the sense that the granularity of discourse segments does not transcend the boundaries of the sentence.</Paragraph> <Paragraph position="6"> Table 1. illustrates this gross-grained version of RST in which discourse relations are represented as XML tags.</Paragraph> <Paragraph position="7"> As any other standard RST discourse tree, this simplified RST contains a nucleus for each text paragraph, and one or several satellites linked by a discourse relation to the nucleus within the same paragraph. The nucleus is an absolutely essential segment of the text, as it carries the main message that the author wants to convey. Satellites can be replaced or erased without changing the overall message and play an important supporting role for the nucleus. In our system, satellites are selected or discarded depending on the reader's profile.</Paragraph> <Paragraph position="8"> The reader's profile is defined through a set of user aspects. These take the form of multivalue parameters that were sketched after a number of surveys were conducted among teachers, students and other experts from the educational context. As a result of these surveys a user model was proposed (Barrutieta et al, 2002). Table 2 illustrates a simplified version of the model.</Paragraph> <Paragraph position="9"> Based on this user model, we will now discuss the content selection algorithm (henceforth CSA). The CSA determines which segments of the discourse are going to be used in order to make explicit the set of parameters that conform with the user's profile. In principle, nuclei will always be chosen (as they convey the main message of the text); satellites, however, will be selected depending on their relation to the nucleus and the user aspects that are activated at the time of generation.</Paragraph> <Paragraph position="10"> The selection algorithm works in three consecutive phases: parallel selection, horizontal filtering and vertical filtering.</Paragraph> <Paragraph position="11"> Vertical filtering is the most important phase of the three as it is here that the parts of the discourse tree are selected or discarded.</Paragraph> </Section> class="xml-element"></Paper>