File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-1069_metho.xml

Size: 17,463 bytes

Last Modified: 2025-10-06 14:07:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-1069">
  <Title>Multilinguality in a Text Generation System For Three Slavic Languages Geert-Jan Kruijff a, Elke Teich t', John Bateman ~, Ivana Kruijit;Korbayovfi&amp;quot;,</Title>
  <Section position="3" start_page="475" end_page="475" type="metho">
    <SectionTitle>
2 Language-independent Content
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="475" end_page="475" type="sub_section">
      <SectionTitle>
Specifications
</SectionTitle>
      <Paragraph position="0"> The content constructed by a user via the Agile GUI is specified in terms of Assertion-bozes or A-boxes. These A-boxes are considered to be entirely neutral with respect to the language that will be used to express the A-box's content. Thus individual A-boxes can be used for generating multiple languages. A-boxes speci(y content by instantiating concepts from ~,he DM or UM, and placing these concepts in relation to one another by means of configurational concepts. The configurational concepts define adnfissible ways in which content can be structured. Figure 2 gives the configurational concepts distinguished within Agile.</Paragraph>
      <Paragraph position="1">  Procedure A procedure has three slots: (i) GOAL (obligatory,filled by a USER-AcTION), (ii) METIIODS (optional, filled by a METHOD-LIsT), (iii) SIDE-EPFECT (optional, filled by a USEREVENT). null Method A method has three slots: (i) CONSTRAINT (optionM, filled by an OPERATINGSYSTEM), null (ii) PaEeONDITION (optional, filled by a PROCE null Configurational concepts are devoid of actual content. Tile content is provided by inst, antiations of concepts that represent various user actions, interface events, and interface modalities and functions. Taken together, these instantiations provide the basic propositional content tbr instructional texts and are taken as input tbr the text planning process.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="475" end_page="477" type="metho">
    <SectionTitle>
3 Strategic Generation: From
</SectionTitle>
    <Paragraph position="0"> To realize an A-box as a text, we go through successive stages of text planning, sentence planning, and lexico-grammatical generation (cf also Reiter &amp; Dale, 1997). At each stage there is an increase in sensitivity to, or dependency on, the target language in which output will be generated. Although the text planner itself is language-independent, the text; plamfing resources may (lifter fl'om language to language as much as is required. This is exactly analogous to the situation we find within the individual language grammars as represented within KPML: we therefore represent the text planning resources in the same fashion. For the text type and languages of concern here, however, w~rialion across languages at the text planning stage turned out to be minimal.</Paragraph>
    <Paragraph position="1"> The organization of an A-box is used to guide the text planning process. Itere, we draw a distinction between text structure elements (TSEs) as the elements from which a (task-oriented) text, is built ut), and text templates', which condition the way TSEs are to be realized linguistically. We locus on the relation between concepts on the one hand, and TSEs on the other.</Paragraph>
    <Paragraph position="2"> We are specifically interested in the configurational concepts that are used to configure the content specified in an A-box because we want to maintain a close connection between how the content can be defined in an A-box and how that content is to be spelled out in text.</Paragraph>
    <Section position="1" start_page="475" end_page="476" type="sub_section">
      <SectionTitle>
3.1 Structuring and Styling
</SectionTitle>
      <Paragraph position="0"> A text structure element is a predefined component that needs to be filled by one or more specific parts of the user's content definition.</Paragraph>
      <Paragraph position="1"> Using the reader-oriented terminology common in technical authoring guides, we distinguish a small (recursively defined) set of text TSEs; these are listed in Figure 3.</Paragraph>
      <Paragraph position="2">  The TSEs are placed in correspondence with the configurational concet)ts of the DM (cf. Figure 2); this enat)les us to lmild a text stru('ture l;hat folh)ws the structuring of the content in an A-1)ox (cf. Figure 4).</Paragraph>
      <Paragraph position="3"> Orthogonal to the notion of text structure element is the notion of text temt)late. Whereas TSEs capture what needs to be realized, the text template (:al)tures how that content is to 1)e realized. Thus, a feint)late defines a style for expressing the content. Am we discuss below, we define text templates in terms of constraints on the realization of si)e(:iti(&amp;quot; (in(tividual) TSEs. D)r examt)le, whereas in Bulgarian and Czech headings (to which the '\]'ASK-TITLE element corresponds: of. Figure 4) are usually realized as nominal groups, in the Russian AutoCAD ulallnal headings are realized as noniinile purpose clauses as they are ill English.</Paragraph>
    </Section>
    <Section position="2" start_page="476" end_page="477" type="sub_section">
      <SectionTitle>
3.2 Tex~ Planning g~ Sentence Planning
</SectionTitle>
      <Paragraph position="0"> The major component of the text pbmner is fi)rnmd by a systemic network fi)r text structuring; this network, called the text structuring region, defines an additional level of linguistic resources for the level of genre. This region constructs text structures in a way that is very similar to the way in which the systemic networks of the grammars of the tactical genera|or build up grammatical structures. In fact, by using KPML to implement this means for text structuring, the interaction between global level text generation (strategic generation) and lexico-grammatical expression (tactical generation) is greatly facilitated. Moreover, this al)t)roach has the advantage |;tint constraints on output realization can 1)e easily accmnulated and propagated: for example, the text planner can iml)ose constraints on the output lexico-grammatical realization of particular text t)lan elements, such am the realization of text headings by a nominalization ill Czech and Bulgar|an or by an infinite purpose clause in Russian. This is one contribution to overcoming the notorious generation gap prol)leln caused when a text planning module lacks control over the line-grained distinctions that m'e available in a grmmnar. Ill our case, both text plamfing and sentence planning are integrated into one and the same system and are distinguished by stratification. null</Paragraph>
      <Paragraph position="2"> concepts defined in the DM Following on from the orthogomflity of text t/;mplates and text structure elements, the text structuring region consists of two parts. One 1)arl; deals wil;h interpreting the A-box in terms of TSEs: traversing l;he network of this part of the region produces a text structure for the Ab/lx contbrufing to the definitions above. The second part of the region imposes constraints on the realization of the TSEs introduced by the first part. Divers(; constraints can be iraposed depending on the user's choice of style, e.g., personal (featuring ppredominantly imperatives) vs. impersonal (tbaturing indicatives). Tile result of text plmming is a text plan.</Paragraph>
      <Paragraph position="3"> This can be thought of as a hierarchical structure (built by TSEs) with lilts of A-box content at; its leaves together with additional constraints imposed by the text planning process: e.g., that the Title segment of the document should not be realized as a full (:lause but; rather as a nominal phrase or a lmrt)osive det)endent clause. The text plan may also include constraints on preferred layout of the docmnent elements: this ilflbrmation is passed on via HTML annotations.</Paragraph>
      <Paragraph position="4"> The sentence plmmer then takes this text plan as intmt, and creates SPL tbrmulae to express  the content identified by the text plan's leaves. The resulting SPLs can also group one or more leaves together (aggregation) det)ending on decisions taken by the text planner concerning discourse relations. Furthennore, constraints on realization that were introduced by the text-planner are also included into the SPLs at this stage.</Paragraph>
      <Paragraph position="5"> Of particular interest multilingually is the way concepts may require different kinds of realizations ill different languages. For example, languages need not of course realize concepts as single words: in Czech the concept Mcn,t gets realized as &amp;quot;menu&amp;quot; but the interface modality Dialogboz is realized as a multiword expression &amp;quot;dialogovd okno&amp;quot; (whose compofients i.e., an adjective and a nominal head may undergo various grammatical operations independently).</Paragraph>
      <Paragraph position="6"> The Agile system sentence plammr handles such cases by inserting SPL fbrms corresponding to the literal semantics of the complex expressions required; these are then expressed via the tactical generator in the usual way. The resulting SPL formulas thus represent the languagespecitic semantics of the sentences to be generated. Otherwise, if a concept maps to a single word, the sentence planner leaves the fnrther specification of how the concept should be realized to the lexico-grammar and its conceptto-word mapI)ings. More extensive diflb.rences between languages are handled by conditionalizing the text and sentence planner resources fltrther according to language.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="477" end_page="479" type="metho">
    <SectionTitle>
4 Tactical Generation: From
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="477" end_page="477" type="sub_section">
      <SectionTitle>
Sentence Plans to Sentences
</SectionTitle>
      <Paragraph position="0"> The tactical generation component that coltstructs sentences (and other grammatical units) fl'om the SPL tbrmulae specified in the text plan relies on linguistic resources tbr Bulgarian, Czech and Russian. The necessary grammars and lexicons have been constrncted employing the methods described in Section 1. As ,toted there, the crucial characteristic of this model of nmltilingual representation is that it allows tbr the representation of both, commonalities and differences between languages, as required to cover the observable eontrastive-linguistic phenomena. This can be applied even among typologically rather distant languages.</Paragraph>
      <Paragraph position="1"> We first illustrate this with respect to some of the contrastive-linguistic t)henomena that are covered by this model employing exami)les ti'om English, Bulgarian, Czech and Russian. We then show the organization of the lexicons and briefly describe lexical dloice.</Paragraph>
    </Section>
    <Section position="2" start_page="477" end_page="479" type="sub_section">
      <SectionTitle>
4.1 Semantic and grammatical
</SectionTitle>
      <Paragraph position="0"> cross-linguistic variation One. of the tenets of our model of cross-linguistic variation is that languages have a rather high degree of similarity semantically attd tend to differ syntactically. We can thus expect to have identical SPL expressions for Bulgarian, Czech and Russian in many cases, although these may be realized by diverging syntactic structures.</Paragraph>
      <Paragraph position="1"> However, we also allow for the case in which there is no commonality at; this level and even the SPL expressions diverge. 2 Example 1 illustrates the latter case (high semantic divergence, plus grammatical divergence), and example 2 the former (semantic commonality, plus gram- null ference l)etween English and Russian prepositional phrases is that the relation expressed by the PP is realized by the choice of the preposition in English, whereas in Russian, it; is in addition realized by case-government. In the are.a of spatial PPs, the choice of a particular preI)osition in English corresl)onds to a distinction in the dimensionality of the object that realizes the range of the relation expressed by the PP. For both PPs expressing a location and PPs expressing movement, English distinguishes between three-dimensional objects (in, into), one-or-two-dimensional objects (on, onto) and zerodimensional objects (at, to).</Paragraph>
      <Paragraph position="2"> In Russian, in contrast, zero-or-three dimensional objects (preposition: v) are opposed to one-or-two-dimensional objects (preposition: ha). A fnrther difference between the expression of static location vs. movement is expressed by case selection: na/v+locative case expresses static location, v/na+accusative case expresses inovement (entering or reaching an object) and the preposition k+dative case expresses moveinent towards an object (,lot quite reaching or 2This distinguishes our approach fl'om interlingua-based systems, which typically require a common semantic (or conceptual) input.</Paragraph>
      <Paragraph position="3">  entering it). In the {-onverse relation, motion away from an object, s is sele, eted tbr movement from within an oh.joel;, and ot fbr movemen |away from the vicinity of an ot).jeet. Her(;, both prel)ositions govern genitive case. The dimensionality of the object is only relevant for the distinction between v/na and s/ot, 1)ut not for h. Since the concel)tualizations of spatial relations are ditf'erent across \]'3nglish and Russian, the input SPL expressions diverge, as shown in Figure 5); rather than using domain model concepts, these SPL ext)ressions restrict themselves to Ut)pe, r Model concepts in order to highlight the cross-linguistic contrast. This examl)le illustrates well how it is (}ften ne{:e, ssary t{} 'semanti{:ize,' eve, nts differently in (tilt'ere|d; languages in order 1;o achieve the most natural results. Not;{; that Cze, ch is here very similar to l/nssian.</Paragraph>
      <Paragraph position="4">  ullits (1) (4) below show all ex~PStIllt, le, of ,:r,,sslinguistic commonality at the level of sen|anti{: int}ut and divergence at the le, vel of grammar.</Paragraph>
      <Paragraph position="5"> These units all time|ion as selfsutficient Tasktitles tbr the deseril}tions of particular actions that can be t)erformed with the given s{}t'tware.</Paragraph>
      <Paragraph position="6">  (1) En: T{} draw a polyline (2) BU: qepTaene na IlOJII4MI4IIFIH  There are two major dit re,,,ces (:,) (4) that need to 1)e accounte, d for: (i) they exhibit divergent grammatieal ranks in that (1) and (4) are clauses (uontinite), while (2) and (3) are nomil,al groul,s (nominalizations); and (ii)they show divergent syntactic realizations: (2) and (3) ditl'er in that in Bulgarian, wlfich does not have (:as(',, the relation 1)etween the syntactic head Met)q_'aelte (ch, crtacnc) and the modifier lie:mamma (polilinia) is (;xt)ressed by a t)re, position na (ha), whereas in Cze, ch, which has cast, this relation is expressed by genitive case, (kC/ivky). \])espite these (litferen(:es, only the first divergen(:e has any (;onsequen(:{;s for the S\])L ext)ressions rcquir(;d; I;hc l)asie semantic commona\]ity among (1)(4) is 1)reserve, d. This is shown in Figm:e 6 t)y me, ans of the standard linguistic conditionalization 1)rovided 1)y KPML l'or all levels of linguistic des(:ription. The COll(titionalization shows that both the English (1.) and the Russian (4) ar(' nontinite clauses while, the \]hdgarian (2) and the Czech (3) are nominMizations. These S\])l, ext)ressions also show the use of (lom~dn ('onc(;1)ts as i)rodu('e(l by the text tfl~mner rathe, r than Ut)lmr model concepts as in  header examlfles The second differen('e is handled by the generation grmmnars internally. Here, Bulgarian and Czech share the basic tractional-grammatical description of t)ostmotlifie, rs tbr nomilmlizati(ms (Figm:e 7). The ditl'erence in structure only  shows in syntagmatic realization and is separate from the functional description: For Bulgarian, the postmodifier marker Ha (ha: %f') is inserted, and tbr Czech, the nominal group realizing the Postmodifier is attributed genitive ease. a</Paragraph>
    </Section>
    <Section position="3" start_page="479" end_page="479" type="sub_section">
      <SectionTitle>
Czech
4.2 Lexical choice and lexicons
</SectionTitle>
      <Paragraph position="0"> The lexical items tbr each language are selected from the lexicon via the domain model. A DM concept is annotated with one or more lexical items from each language. If there is more than one item per language, the choice is constrained by features imposed by the gralnmar.</Paragraph>
      <Paragraph position="1"> For example the concept DN::draw is annotated with two lexical items which are the imperfective and perfective forms of the verb draw in Czech, Bulgarian and Russian. If the grammar selects imperfective aspect, tim first is chosen; if the grammar selects perfective aspect, the second is chosen. This mechanism is used also fbr the choice between a verb and its nominalization, among others. With the help of the lexicon, the inflectional properties collected tbr a particular lexical item during generation are translated into a format suitable tbr external morphological modules, which are then called.</Paragraph>
      <Paragraph position="2"> The result of the external module, the inflected tbrm, is passed back to the KPML system and inserted into the grammatical structure.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML