File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/c00-1069_intro.xml

Size: 8,788 bytes

Last Modified: 2025-10-06 14:00:45

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-1069">
  <Title>Multilinguality in a Text Generation System For Three Slavic Languages Geert-Jan Kruijff a, Elke Teich t', John Bateman ~, Ivana Kruijit;Korbayovfi&amp;quot;,</Title>
  <Section position="2" start_page="0" end_page="475" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> This paper describes the Agile system I tbr the multilingual generation of instructional texts as found in soft;ware user-manuals in Bulgarian, Czech and Russian. The current prototype focuses on the automatic drafting of CAD/CAM software documentation; routine passages as found in the AutoCAD user-manual have been taken as target texts. The application scenario of the Agile system is as follows. First, a user constructs, with the help of a GUI, language-independent task models that specify the contents of the documentation to be generated. The user additionally specifies the language (currently Bulgarian, Czech or Russian) and the register of the text to be generated. The Agile system then produces continuous instructional texts realizing the specified content and conforming to the style of soft- null intended to serve as drafts for final revision; this ~drafting' scenario is therefbre analogous to that first explored within the Drafter project.</Paragraph>
    <Paragraph position="1"> Within the Agile project, however, we have explored a more thoroughly nmltilingual architecture, making substantial use of existing linguistic resources and components.</Paragraph>
    <Paragraph position="2"> The design of the Agile system overall re, sts on the following three assumI)tions.</Paragraph>
    <Paragraph position="3"> First, the input of the system should be specified irrespective of any particular output language. This means that the user must be able to express the content that she wants the texts to convey, irrespective of what natural language(s) she masters and in what language(s) the output text shouM be realized. Such language-independent content specification can take the form of some knowledge representation pertaining to the application domain.</Paragraph>
    <Paragraph position="4"> Second, the texts generated as the outtmt of the system should be well-formulated with respect to the expectations of natiw. * speakers of each particular language covered by the system.</Paragraph>
    <Paragraph position="5"> Since differences among languages may appear at any level, language-sensitive decisions about the realization of the specified content must be possible throughout the generation process.</Paragraph>
    <Paragraph position="6"> And third, the notion of multilinguality employed in the system should be recursive, in the sense that the modules responsible tbr the generation should themselves be multilingual.</Paragraph>
    <Paragraph position="7"> The text generation tasks which are common to the languages under consideration should be pertbrmed only once. Ideally, there should be one process of generation yielding output in multiple languages rather than a sequence of monolingual processes. This view of 'intrinsic multilinguality' builds on the approach set out in Bateman et al. (1999). Each module of the system is fnlly multilingual in that it simul- null taneously enables both integration of linguistic resources, defining commonalities bel;ween languages, and resource integrity, in |;bat the individuality of each of the language-speeitic resources of a multilingnM ensemble is always preserved. null We consider these assuml)l;ions an(l the view of multilinguality entailed by |;hem to be crucial for the design of efli;ctive multilingual text generation systems. The results so far a(:hicved by the Agile system SUl)port this and also ofl'er a ~soli(l experiential basis tbr the develot)mcnt of fllrther multilingnal generation systems.</Paragraph>
    <Paragraph position="8"> The overall operation of 1;t1(; Agile sysl;em is as tbllows. Al/tcr the us(u' has Sl)ecilied some inl;en(led text (;OlltenI; (described in Section 2) via the Agile GUI, the system i)ro(:eeds to general;e the texts required. To do this, a text t)lammr (Section 3) first assigns parts of the, task model to text elements and arranges l;h(;m in a hierarchical fashion a text t)lan. Then, a sentence plammr organizes I;he content of the text elements into sentence-sized elml~ks and ere~,tes the corresponding input fin' l;he tactica,1 generator, expressed in standard sentence l)lamfing language (SPI,) lbrmulae. Finally, 1;11(; tactical g(meral;or generates t;he linguistic realizations corresponding 1;o these Sl)l~s the text (Sect;ion 4). In the stage of the l)rojccI; rt}l)orte(l here, we, conceal;rated i)arl;icularly on \])roccdural texts. These otlhr sl;el)-by-st;e t) des(:rit)t;ions of how to perlbrm domain tasks using the given software tools. A simplified version of one such procedural text is given (tbr English) in Figure 1. This architectm:e mirrors the reference architecture for generation diseusse(t in I/,eiter  realizing the intended meaning of the inlmt semantic representation without backtracking or revision.</Paragraph>
    <Paragraph position="9"> Several important properties have ('haracterized the method of development leading to the Agile system. These are to a large extent responsible for the eflhetiveness of the system.</Paragraph>
    <Paragraph position="10"> These include: Re-use and adaptation of available resources. We have re-used snt)stantial bodics of e, xisting linguistic resources at all levels relevant for the system; this t)laye(l a (:rueial role in achieving the Sol)histieatcd generation capa7b d~nw a polylinc First start the PLINE command using one of these methods: null Windows From the Polylinc tlyout on the, l)raw tool~ lmr, choose Polylinc.</Paragraph>
    <Paragraph position="11"> DOS and UNIX lqom the Draw menu, choose Polyline. null  1.. Spccit~y the start point of the polyline.</Paragraph>
    <Paragraph position="12"> 2. S1)ecil~y tim next point of the 1)olylinc.</Paragraph>
    <Paragraph position="13"> 3. Press ll,cturn t;o end the polyline.</Paragraph>
    <Paragraph position="14">  bilities now displayed by the system in each of its languages of expertise prior to the project t\]m'l.'e were 11o substantial ~mtomatic generation systenls fi)r any of the languages covered. The core modules for strategic and ta(:ti('al generation were all imt)lemcnted using the Kernel-Penman Multilingual system (KPML: ef Bateman et al., \]999) a Common l,isp base(t grammar development environment, in addition, we adopted the Pemnan Upt)er Model as used within Pemnan/KPMl~ as the basis tbr our linguistic semantics; a more rcstri(:ted domain model (DM) rclewmt to the CAD/CAM-domain was &amp;',lined as a st)e('ialization of l;he UM con(:epts. The I)M was iuspired by the domain me(tel of the Drafter l)rojet:t, but l)res(ml;s a g(m(',ralizati()n ()f the latter in that it allows for eml)(;d(ling t:asks and illsLrut'|;ions t:o any arlfil;rm:y re(:ursive depth (i.e., more complex l;cxt; plans). Ah'eady existing lexical resom:ces and morphological modules availabh; to the 1)ro.j(',ct were re-used tbr Bulgarian, Czech and l~.ussian: the Czech and Bulgarian components were mo(tules written in C (e.g., IIaji(: L; Hla(lk~, 1997, tbr Czech) that were interfimed with KPMI, using a standard set of API-methods (of. Bateman &amp; Sharoff, 1998). Finally, because no grammars suitable for generation in Bulgarian, Czech and l/.ussia,n existed, a grammar tbr English (NIGEL: Mann &amp; Matthiessen, 1985) was re-used to lmild them; tbr the theoretical basis of this technique see Teich (1995).</Paragraph>
    <Paragraph position="15"> Combination of two methods of resources development. Two methods were combined to enable us to develop basic general-language grammars and sublanguage grammars fin: CAD/CAM instructional texts at; |;11(; same time. One nmthod is the system-oriented one aimed at lmildiug a computational resource  with a view of the whole language system: this is a method strongly supported by the KPML development environment. The other method is instance-oriented, and is guided by a detailed register analysis. The latter method was particularly important given the Agile goal of being able to generate texts belonging to rather diverse text types-- e.g., impersonal vs. personal; procedural, flmetional descriptions, overviews etc.</Paragraph>
    <Paragraph position="16"> Cross-linguistic resource-sharing. A cross-linguistic approach to linguistic specifications and implementation was taken by maximizing resource sharing, i.e. taking into account similarities and differences among the treated languages so that development tasks have been distributed across different languages and re-used wherever possible.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML