File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/w98-1407_metho.xml
Size: 17,169 bytes
Last Modified: 2025-10-06 14:15:15
<?xml version="1.0" standalone="yes"?> <Paper uid="W98-1407"> <Title>AUTOMATIC GENERATION OF SUBWAY DIRECTIONS: SALIENCE GRADATION AS A FACTOR FOR \ DETERMINING MESSAGE AND FORM</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> AUTOMATIC GENERATION OF SUBWAY DIRECTIONS: SALIENCE GRADATION AS A FACTOR FOR \ DETERMINING MESSAGE AND FORM </SectionTitle> <Paragraph position="0"/> </Section> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> A frequently encountered problem in urban life is navigation. In order to get to some place we use private means or public transportation, and if we lack clear directions we tend to ask for help. We will deal in this paper with the descriptions of subway routes and their automatic generation. In particular, we will try to show how the relative importance of a given piece of information can effect not only the message but also the form.</Paragraph> </Section> <Section position="3" start_page="0" end_page="60" type="metho"> <SectionTitle> 1 Introduction: the problem </SectionTitle> <Paragraph position="0"> A frequently encountered *problem in urban life is the use of *public transportation: we have to get from here to there and quite so often we dori't know how. As it is not always possible to get help from a person (be it because nobody is available, or able to speak our language), we might appreciate assistance of a machine.</Paragraph> <Paragraph position="1"> In order to convey to the user &quot;useful information&quot;, we must define what &quot;usefulness&quot; means. For example, if we *tell someone how to get from one place to another, we hardly ever specify all the intermediate steps, in particular if there is no ambiguity. Also, not all information is of equal weight. Yet, as we *will show ' the notion of &quot;relative importance&quot; of information is gradual in nature * rather than discrete , that is a simple binary Value (important vs. unimportant) 1.</Paragraph> <Paragraph position="2"> * All this reflects, of course, in the content and form of the final text. Relative importance is signaled by different means at the text level (headers, paragraphs, etc.) and at the sentence level (word choice, *syntactic structure: main clause versus subordinate clause, topic-comment *structures).</Paragraph> <Paragraph position="3"> * Concerning the prominence status (i.e. relative importance of a piece of information), semioticians and text linguists have reached a similar conclusion by distinguishing between the 'Yoreground/background&quot; or &quot;primary/secondary level&quot; of a text \[Bar66, vD77, AP89, Com92\]. According to Combettes \[Com92\], the &quot;primary level&quot; deals with the core meaning, i.e. events and facts that make the text progress, while th e &quot;secondary level&quot; deaIs *with descriptions, evaluations, comments, and:reformulati0ns. &quot; . -- ~: &quot; i ~. : : : The distinction of levels, with information Of varying shades (salience gradation), implies that it should be possible to identify corresponding linguistic &quot;markers&quot; for each one of them. Yet, as Combettes has pointed out \[Com92\], the means used for marking the relative importance of information may vary *from one type of text to another. Nevertheless, certain markers do *hold lln this respect we cieviate from:most,. current generation systems.</Paragraph> <Paragraph position="5"> regardless of the text type. This is particularly true for certain syntactic devices such as subordinate clauses, appositions, nominalization, all of which are, according to Combettes, markers of the secondary level, unlike main clauses which mark the primary level.</Paragraph> <Paragraph position="6"> Analyzing a corpus of route descriptions in French we have identified correlations between the salience status of specific conceptua ! chunks (landmarks, segment distance, etc.) and linguistic structures (independent vs. subordinate clauses). In section 2, we will reveal how the salience status of some types of information may affect the content and form of the final text. In section 3 we will illustrate our use of these data in a generator of subway route descriptions.</Paragraph> <Paragraph position="7"> 2 A case study: subway route descriptions Route descriptions are interesting for at least two reasons: first of all, as navigation aids in general they help to solve a real world problem; second, despite their apparent simplicity, especially with regard to surface form, they require the solution of a number of non trivial linguistic and discourse problems, problems which are intimately rooted in human cognition, Our analysis is based on a corpus containing 30 subway route descriptions in French. The data were collected from ten subjects via e-mail. Each one of them had to describe three routes in the Parisian subway. These routes differ in terms of length and complexity. The first route involves 9 stops and one transfer. It is the longest. The second one contains 4 stops and no transfer. It is the simplest. The third route, though very short (4 stops), is the most complex one as it involves two transfers.</Paragraph> <Section position="1" start_page="0" end_page="60" type="sub_section"> <SectionTitle> 2.1 Analysis of the underlying content </SectionTitle> <Paragraph position="0"> The information contained in subway route descriptions can be divided into two broad categories: &quot;global&quot; and &quot;local&quot; information. We describe each one of them below, illustrating particular information types with examples taken from the corpus.</Paragraph> <Paragraph position="1"> Global information: * identification of the route by specifying departure and destination, eg. Po~r aller de Saint-Lazare 5 Jussieu... / To go from Saint-Lazare to Jussieu... * comments concerning the complexity of the whole route, eg. C'est simple et rapide, pas de changement. / It's simple and fast, no transfer. * information concerning the distance of the whole trip, eg. ~a doit faire 7 ou 8 stations en tout. / This should make 7 or 8 stops for the whole trip. Local information: * stop of departure, eg. ii partir de Jussieu, tu prends... / Starting from Jussieu, you take...&quot; * destination, eg. tu arrives h Gate de Lyon / you arrive at Gate de Lyon o * lines to take, eg. prendre Ia ligne 5 / take the line number 5 * transfers, eg. changer h Opdra / change at Opera * directions to take, eg. tu prends la direction Gallieni / you take the direction Gallieni * partial distances to cover, eg. il y a une seule station /there is only one stop According to Wunderlich and Reinelt \[WR82\], &quot;local information&quot; is the core of route descriptions, while &quot;global information&quot; is additional as it serves mainly interactional purposes. * In the remainder of our analysis we will concentrate on the &quot;local route information&quot; and the way it is expressed in the domain of subway route descriptions, the objective being to determine whether some information is obligatory or not. Of course, we could have defined on a priori grounds what \ . * * should be mentioned explicitly and what not. Yet, we preferred to ground our work on empmcal data.</Paragraph> <Paragraph position="2"> We assume that &quot;obligatory information&quot; is information that is contained in all descriptions of the corpus, whereas &quot;optional information&quot; occurs only occasionally 2. We have also tried to find explanations for the omission of optional information. For example, the stations of departure and destination could be considered as optional, since they are already known by the &quot;questioner&quot; (either because they are a part of the question, or because they are given with the context/situation ). Indeed, our data reveal that, while the destination stop is always mentioned, the departure is mentioned only in 50%of the cases (eg./i Jussieu, tu prends... / At Jussieu, you take...). In the light of these data we conclude that it is useful to make a distinction between given and new, or known and unknown information. The problem concerning the &quot;known&quot; information is to decide whether to make it explicit or not. This is not a conceptual problem, - the known information must already be present at the conceptual level, - the choice is pragmatic in nature (what information should be conveyed, because it is really useful?), with possible stylistic side effects. For example, the fact that the destination (known information) is mentioned systematically in the corpus seems to be based on &quot;stylistic&quot; considerations: if it were not, the description would look like being incomplete. On the other hand.: decisions concerning &quot;new&quot; information do involve conceptual choices. They consist in determining whether to include a given piece of information in the message or not, and in determining its degree of salience.</Paragraph> <Paragraph position="3"> The rest of our paper deals only with the analysis of &quot;new&quot; information, since we are mainly interested in the choices at the Conceptual level and their consequences on the linguistic form. As the data show, information concerning transfer stations and directions of lines is obligatory! both types of information systematically occur in the corpus. The corpus also reveals that information concerning partial distances (number of stops to travel on a given line) and the names of the lines (eg. &quot;line 7&quot; or &quot;orange line&quot;)is optional.</Paragraph> <Paragraph position="4"> It should'be noted, that partial distance may be represented in two ways in the domain of subway route descriptions: either as the length of a route segment (eg. &quot;two stops&quot;), or as the result of the number of stops counted (eg. &quot;second stop&quot;), This kind of information is not mentioned at all in 30% of the cases. We have noticed that the inclusion/exclusion of information concerning partial distances depends on contextual factors such as the &quot;value&quot; of the distance itself (one stop vs. several) and the position on the route (last route segment or not). A &quot;one-stop distance&quot; is more important than a segment containing several stops. Also, the distance of the last segment seems to be more important than the distances of the intermediate segments (unless they are equal to one stop). Other strategies concerning information on partial distances have been observed: some subjects have mentioned all of them in each one of their description, regardless of the number of stops and the relative position of the segment, while others did not mention them at all. Another kind of optional information are the names of the lines to take. This may vary from place to place, but at least in Paris it is the direction (final destination) of the train that tells the user which train to take. The names of the lines, represented by numbers, were omitted in one third of the descriptions in the corpus.</Paragraph> <Paragraph position="5"> In the next section we describe the results of our linguistic analysis. We will show what spe-</Paragraph> <Paragraph position="7"> cific linguistic resources (independent clauses vs. sufiordinate structures) are used for expressing obligatory or optional parts of information.</Paragraph> </Section> <Section position="2" start_page="60" end_page="60" type="sub_section"> <SectionTitle> 2.2 Correspondence between *conceptual saliency and linguistic resources </SectionTitle> <Paragraph position="0"> It comes as no surprise that independent * clauses are the major syntactic structure used. Their function is to convey information of primary importance. Our analysis of the corpus Shows that independent clauses are mostly used in order to convey &quot;obligatory&quot; information, namely information specifying the names of the stations where to get off and directions to take. This is the case in example 1 below 3, where only these two chunks of information are contained in the independent clauses.</Paragraph> <Paragraph position="1"> Ex. 1 ,~ Saint-Lazare, prendre la direction GaIlieni. Descendre ~ Opdra (deux stations plus loin). Ps'endre alors la direction Mairie d'lvry/Villejuif jusqu'~ Jussieu (7-~me station).</Paragraph> <Paragraph position="2"> At Saint-Lazare, take the direction Gallieni (two stops ahead). Then take the direction Mairie d'lvry/Villejuif until Jussieu (Tth stop).</Paragraph> <Paragraph position="3"> However, independent clauses may also convey optional information. Ill this case, we consider it as a way of signaling prominence. For example, in our corpus there are cases where a &quot;one-stop distance&quot; (distance being optional information) is expressed by an independent clause: Ex. 2 A Bastille, prendre le m~tro n 1 direction Chhteau de Vincennes et descendre & la prochaine station qui est la Gate de Lyon.</Paragraph> <Paragraph position="4"> At Bastille, take the line number 1, direction Chateau de Vincennes, and get off at the next stop which is Gare de Lyon, Tile names of the lines (optional information), together with information concerning tile direction (obligatory information), are also quite frequently mentioned in independent clauses (see example 3). Again, we consider this as a sign for signaling high prominence: Ex. 3 A Saint-Lazare , prendre la ligne 3 direction Gallieni et changer h Opera. Prendre ensuite la iigne 7 direction Mairie d'Ivry et descendre h Jussieu.</Paragraph> <Paragraph position="5"> At Saint kazare, take the line 3 direction Gallieni and change at Opera. Then, take the line 7 direction Mairie d'lvry and get off at Jussieu.</Paragraph> <Paragraph position="6"> Subordinate structures are generally used to convey optional information or information of minor importance. This is in our case information concerning partial distances and names of lines. In example 4 below, the information concerning partial distance is included only for the last segment, which is expressed by an &quot;anaphoric clause&quot;. Example 5illustrates a strategy whereby prominence of the names of the lines is decreased: they are expressed in bracketed appositions.</Paragraph> <Paragraph position="7"> Ex. 4 A Saint-Lazare prendre le m~tro n 3 direction Gallieni, changer b. OPera et prendre le m~tro n 7 direction Mairie d'Ivry/Villejuif et descendre ~t Jussieu (c'est la 7-~me station).</Paragraph> <Paragraph position="8"> At Saint-Lazare take the number 3, direction Gallienl, change at Opera and take the number 7 direction Mairie d'lvry/Villejuif anti get off at Jussieu (it's the 7th stop).</Paragraph> <Paragraph position="9"> Ex. 5 Prendre direction Gallieni (ligne 3). Sortir b. Opera (2 stations). Prendre direction Mairie d'Ivry (ligne 7). Descendre h Jussieu (7-~me station).</Paragraph> <Paragraph position="10"> Take direction Gallieni (line 3) and change at Opera (2 stops). Take the direction Mairie d'lvry (line 7) and get off at Jussieu (7th stop).</Paragraph> <Paragraph position="11"> aExamples from the corpus are followed by their English equivalents.</Paragraph> <Paragraph position="12"> * * 61 We distinguish between two cases of subordinate structures: subordinate clauses and appositions. The former include relative clauses (eg. &quot;descends PS Opera qui est la 2-~me station&quot; / &quot;get off at Opera, which is the second stop&quot;) and anaphoric clauses (eg. &quot;tu prends la direction Mairie d'Ivry, c'est la ligne 7&quot; / &quot;you take direction Mairie d'Ivry, it's the line 7&quot;). We divide appositions into nominal and prepositional appositions: Nominal appositions occur after an independent clause and may be used with various punctuation devices such as comma, colon, or brackets. In our corpus, they generally occur in brackets, for example; &quot;descendre ~ Gare de Lyon (station suivante)&quot; / &quot;get of at Gare de Lyon (the following stop)&quot;. Prepositional appositions occur before .an independent clause. They are used to mention &quot;known&quot; information like &quot;get-on stations&quot; (the &quot;departure station&quot; or a &quot;get-on station&quot; that has been mentioned before as a &quot;get-off&quot; or &quot;transfer station&quot;), for example: &quot;Descendre ~ Bastille. De Bastille, prendre.,.&quot;/&quot;Get off at Bastille. From Bastille take...&quot;.</Paragraph> <Paragraph position="13"> : In order to be able to automatically generate route descriptions in line with these linguistic data, we have defined a set of rules that map the relative salience of a given piece of information onto one or several syntactic structures (cf: section 3 below, table 1 and table 2).</Paragraph> </Section> </Section> class="xml-element"></Paper>