File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-1036_metho.xml

Size: 17,647 bytes

Last Modified: 2025-10-06 14:07:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-1036">
  <Title>XML and Multilingual Document Authoring: Convergent Trends</Title>
  <Section position="4" start_page="244" end_page="246" type="metho">
    <SectionTitle>
2 GF ~ the Grammatical Framework
</SectionTitle>
    <Paragraph position="0"> The Grammatical Framework (GF; (Ranta, 2000)) is a special-purpose programming hmguage combining co~zstrttctive type thee O, with an annotation hmguage for concrete syntax. A grammar, in the sense of GF, delines, on one hand, an abstract s3,1ttax (a system of types and typed syntax trees), and on the other hand, a mapping {51 tile abstract syntax into a co,icicle sy, tta.v. The abstract syntax has cotCtlot 3, declarations, such as cat Country ; cat City ; and combinator (orfttnctiolO dechuations, such as fun Get : Country ; fun Fra : Country ; fun Ham : City ; fun Par : City ; fun cap : Country -&gt; City ; The type of a combinator can be either a basic type, such as the type City of the combinator Ham, or a function type, such as the type of the combinator cap. Syntax trees formed by combinators of functioll types are con&gt; plex functional terlns, such as cap Fra of type City.</Paragraph>
    <Paragraph position="1"> &amp;quot;file concrete syntax part of a GF grammar gives lit~earization rules, which assign strings (or, in general, more complex linguistic objects) to syntax trees. For the abstract syntax above, we may lmve fin Ger = &amp;quot;Germany&amp;quot; ; fin Fra = &amp;quot;France&amp;quot; ; lin Ham = &amp;quot;Hamburg&amp;quot; ; lin Par : &amp;quot;Paris&amp;quot; ; lin cap Co = &amp;quot;the capital of&amp;quot; ++ Co ; Thus tile linearization of cap Fra is the capital of France</Paragraph>
    <Section position="1" start_page="244" end_page="245" type="sub_section">
      <SectionTitle>
2.1 GFinXMI,
</SectionTitle>
      <Paragraph position="0"> Functional terms have a straightforward encoding in XML, l'el~resenting a term of tile forna .\[ (11 . .. (I,~ by the XML object &lt;J'&gt; ct', ... a',, &lt;/f&gt; where each e~ is tile encoding of a i. In this encoding,  Tile simple encoding does not pay attention to the types (51' the objects, and has no interesting DTI). To express type distinctions, we will hence use a slightly more complicated representation, in which the category and combinator declarations of GF are represented as DTDs in XML, so that GF type dlecking becomes equivalent ,a, itll XML validatiom The represelm~tion of the GF grallllllaf o1' tile previous section is tile DTI)  In this DTD, each category is represented as an EI,E-MENT dclinition, listing all combinators producing trees of that category. The combinators themselves are represented as EMPTY elements. The XML representation of the capital (51' France is  The tirst property guarantees that type checking in the sense of GF (and type theory) can be used for validation of XML objects. The second property guarantees that GF objects can be stored in tim XML format. (The second property is already gt, aranteed by tile simpler encoding, which ignores types.) ()ther prope,'ties one would desire are the followillg:  These properties cannot be satislied, in general. The reason is that GF grammars may contain dependent types, i.e. types depending on objects. We will retnrn to this notion shortly. But let us first consider the use of GF for nmltilingual generation.</Paragraph>
    </Section>
    <Section position="2" start_page="245" end_page="245" type="sub_section">
      <SectionTitle>
2.2 Multilingualgeneration in GF
</SectionTitle>
      <Paragraph position="0"> Multilingual generation in GF is based on parallel grammars': two (or more) GF grammars are parallel, if they have the same abstract syntax. They may differ in concrete syntax. A grammar parallel to the one above is defined by the concrete syntax  param Case = hem \[ gen ; oper noml : Str -&gt; Case =&gt; Str = ks -&gt; tbl {{nom} =&gt; s, {gen} -&gt; s+&amp;quot;n&amp;quot;} ; oper nom2 :Str -&gt; Case =&gt; Str ks -&gt; tbl {{nom} =&gt; s+&amp;quot;ki&amp;quot;, {gen} -&gt; s+&amp;quot;gin&amp;quot;} ; lincat Country = Case =&gt; Str ; lincat City = Case =&gt; Str; lin Ger = noml &amp;quot;Saksa&amp;quot; ; lin Fra = noml &amp;quot;Ranska&amp;quot; ; lin Ham = noml &amp;quot;Hampuri&amp;quot; ;</Paragraph>
      <Paragraph position="2"> This grammar renders GF objects in Finnish. In addition to linearization rules, it has rules introducing parameters and operations, and rules detining the linearization O,pes&amp;quot; corresponding to basic types: the linearization type el' Country, for instance is not just string (Str), but a function fl'om cases to strings.</Paragraph>
      <Paragraph position="3"> Not only the linearization rules proper, but also parameters and linearization types wwy a lot fl'om one hmguage to another. In our example, we have the paralnetre of ease with two values (in larger granunars for Finnish, as many as 16 may be required!), and two patterns for inflecting Finnish nouns. The syntax tree cap Fra produces the  which are the nominative and the genitive form, respectively. null</Paragraph>
    </Section>
    <Section position="3" start_page="245" end_page="246" type="sub_section">
      <SectionTitle>
2.3 Del)endent types
</SectionTitle>
      <Paragraph position="0"> DTDs in XML are capable of representing simple types, i.e. types without dependencies. Even a simple type system can contribute a lot to the semantic control of documents. For instance, the above grammar permits the formation of the English noun phrase the capital of France but not of the capital of Paris Both of these expressions would be well-formed w.r.t. an &amp;quot;ordinary&amp;quot; granunar, in which both France and Paris would be classitied simply as noun phrases.</Paragraph>
      <Paragraph position="1"> Dependent types are types depending on objects of other types. An example is the following alternative declaration of Country and City: cat Country ; cat City (Co:Country) ; Under tiffs definition, there are no objects of type City (which is no longer a well-formed type), but of types City Ger and City Fra. Tlms we define e.g.</Paragraph>
      <Paragraph position="2"> fun Ham : City Ger ; fun Par : City Fra ; fun cap : (Co:Country) -&gt; City Co ; Observe the use of the variable Co in the type of the combinator capital: the variable is bound to the argument type and then used in the value type. The capital of a country is by definition a city of the same country. This involves a generalization o1' function types with dependent types.</Paragraph>
      <Paragraph position="3"> Now consider a simplified format ()f postal addresses: an address is a pair of a country and a city. The GF rule is either fun addr : Country -&gt; City -&gt; Address ; iin addr Co C = C ++ &amp;quot;,&amp;quot; ++ Co ; using simple types or</Paragraph>
      <Paragraph position="5"> using dependent types. The invalid address Hamburg, France is well-typed by the former definition but not by the latter. Using the laUer delinition gives a simple mechanism of semantic control ot' addresses. The same idea can obviously be exlended to full addresses with street names and numbers. Such dependencies cannot, however, be expressed in DTDs: both of the address rules above correspond to one and the same ELEMENT definition,  is valid w.r.t, the DTD, but the corresponding Ot-; object addr Fra ttam is not well-typed.</Paragraph>
    </Section>
    <Section position="4" start_page="246" end_page="246" type="sub_section">
      <SectionTitle>
2.4 Computation rules
</SectionTitle>
      <Paragraph position="0"> In addition to categories and cornbinators, GF grammars may contain definitions, such as def cap Fra = Par ; Definitions belong to the abstract syntax. They define a normal form for syntax trees (recursively replace defienda by definientes), as well as a paraphrase relation (sameness of normal tbrm). These notions are, of course reflected in the concrete syntax: the addresses the capital of France, France Paris, France are paraphrases, and the latter is the normal form of the former.</Paragraph>
      <Paragraph position="2"/>
    </Section>
    <Section position="5" start_page="246" end_page="246" type="sub_section">
      <SectionTitle>
2.5 GF editing tools
</SectionTitle>
      <Paragraph position="0"> An editing tool has been implemented for GF, using metavariables to represent yet undefined parts of expressions. The user can work on any metavariable, in various  different ways, e.g.</Paragraph>
      <Paragraph position="1"> * by choosing a combinator from a menu, (r) by entering a string that is parsed, * by reading a previously defined object from a file, (r) by using an automatic search of suitable instantiations. null  These functionalities and their metatheory have been used for about a decade in a number of syntax editors for constructive type theory, usually known as proof editors (Magnusson and NordstrOm, 1994). From this point of view, the GF editor is essentially a proof editor together with supplementary views, provided by the concrete syntax. The current implementation of GF is a plugin module of the proof editor Alfa (Hallgren, 2000). The window dump in Figure 2 shows a GF session editing a mathematical proof. Five views are provided: abstract syntax in type-theoretical notation, English, French, Finnish, and XML. One metavariable is seen, expecting the user to find a Proof of the proposition that there exists a number .r' such that a', is smaller than x', where x is an arbitrary number given in the context (for the sake of Universal Introduction).</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="246" end_page="248" type="metho">
    <SectionTitle>
3 IG : Interaction Grammars
</SectionTitle>
    <Paragraph position="0"> We have just described an approach to solving the limitations of usual XML tools for multilingual document authoring which originates in the tradition of constructive type-theory and mathematical proof editors. We will now sketch an approach strongly inspired by GF but which formally is more in the tradition of logic-programming based unification grammars, and which is currently un-.</Paragraph>
    <Paragraph position="1"> der development at Xerox Research Centre Europe (see (Brun et al., 2000) for a more extended description of this project).</Paragraph>
    <Paragraph position="2"> Definite Clause Grammars, or DCG's, (Pereira and Warren, 1980), are possibly the simplest unification-based extension of context-free grammars, and have good reversibility properties which make them adapted both to parsing and to generation. A typical view of what a DCG rule looks like is the following: 5 a(al(B,C .... )) ---&gt; &lt;textl&gt;, b(B), &lt;text2&gt;, e(c), &lt;text3&gt;, {constraints (B,C,...)}.</Paragraph>
    <Paragraph position="3"> This rule expresses the fact that (1) some abstract structure al (B, C .... ) is in category a if the structure B is in category b, the structure C in category c ..... and furthermore a certain number of constraints are satisfied by the structures B, C .... ; (2) if the structures B, C .... can be &amp;quot;rendered&amp;quot; by character strings StringB, StringC, .... then the structure al(B,C .... ) can be rendered by the string obtained by concatenating the text &lt;text:t&gt; (that is, a certain constant sequence of terminals), then StringB, then &lt;text2&gt;, then StringC, etc.</Paragraph>
    <Paragraph position="4"> In this formalism, a grammar for generating English addresses (see preceding section) might look like: SReminder: according to the usual logic programming conventions, lowercase letters denote predicates and functors, whereas uppercase letters denote metavariables that will be instantiated with terms.</Paragraph>
    <Paragraph position="6"> country(Co).</Paragraph>
    <Paragraph position="7"> The analogies with the GF grammars of the previous section arc clear. What is traditionally called a category (or nonterminal, or predicate) in the logic programruing terminology, can also be seen as a type (address, country, city) and functors such as get, par, addr, cap can be seen as combinators.</Paragraph>
    <Paragraph position="8"> If, in this DCG, we &amp;quot;forget&amp;quot; all the constant strings by replacing them with the empty string, we obtain the following &amp;quot;abstract grammar&amp;quot;:</Paragraph>
    <Paragraph position="10"> This program is language-independent and recursively dclines a set el' well-formed trees to which it assigns types (thus cap(fra) is a well-formed tree o1' type city).</Paragraph>
    <Paragraph position="11"> As they stand, such definite clause grammars and programs, although suitable Ibr simple generation tasks, are not directly adapted for the process of interactive multi-lingual document authoring. In order to make them more appropriate for that task, we need to specialize and adapt DCGs in the way that we now describe.</Paragraph>
    <Paragraph position="12"> Parallel grammars. The tirst move is to allow for parallel English, French ..... grammars, which all have the same underlying abstract gralnmar (program). So in addition to the Englisb grammar given above, we have tim French grammar:</Paragraph>
    <Paragraph position="14"> 6hl the sense that rewriling the llOntCI'nlilull goal address (addr (Co ,C) ) to the empty siring in lhe I)CG is equivalent |o proving the goal address (addr (Co, C) ) in the program (l)cransart and Maluszynski, 1993).</Paragraph>
    <Paragraph position="15"> Dependent Categories. The grammars we have given arc delicient in one importaut respect: there is no dependency between the city and the country in the salne address. In order to remedy this problem, a standard logic programming move would he to reformulate the abstract grammar (and similarly for the language-dependent ones) as:</Paragraph>
    <Paragraph position="17"> The expression city(C, Co) is usually read as the relation &amp;quot;C is a city of Co&amp;quot;, which is line for computational purposes, but this reading obscures the notion that the object C is being typed as a city; more precisely, it is being typed as a city of Co. In order to make this reading more apparent, we will write the grammar as:</Paragraph>
    <Paragraph position="19"> That is, we allow the categories to be indexed by terms (a move which is a kind of &amp;quot;currying&amp;quot; ot' a relation into a type for its first argument). Dependent categories are similar to the dependent types of constructive type theory. null Heterogeneous trees. Natural language authoring is different from natural language generation in one crucial respect. Whenever the abstract tree to be generated is incomplete (for instance the tree cap(Co)), that is, has some leaves which are yet uninstantiated variables, the generation process should not proceed with noudeterministically enumerating texts for all the possible instantiations of the initial incomplete structure. Instead it should display to the author as much of the text as it can in its present &amp;quot;knowledge state&amp;quot;, and enter into an interaction with the author to allow her to further refine the incomplete structure, that is, to further instantiate some of the uninstantiated leaves. To this purpose, it is useful to introduce along with the usual combinators (addr, fra, cap, etc.) new combinators of arity 0 called typenames, which are notated type, and are of type type.</Paragraph>
    <Paragraph position="20"> These combiuators are allowed to stand as leaves (e.g. in the tree cap(country)) and the trees thus obtained are said to be heterogeneous. The typenames are treated by the text generation process as if they were standard semantic units, that is, they are associated with text trails which arc generated &amp;quot;at their proper place&amp;quot; in the generated output. These text units are specially phrased and highlighted to indicate to the author that some choice has to be made to reline the underlying type (e.g. obtaining  the text &amp;quot;la capimle de PAYS&amp;quot;). This choice has the efl'ect of further instantiating the incomplete tree with &amp;quot;true&amp;quot; combinators, and the gmmration process is iterated.</Paragraph>
    <Paragraph position="21"> Extended senmntics-driven eompositionality. The simple DCG view presented at the beginning of this section sees the process of generating text from an abstract structure as basically a compositional process on strings, that is, a process where strings are recursively associated with subtrees and concatenated to l~roduce strings at the next subtree level. But such a direct process of constructing strings Ires well-known limitations when the semantic and syntactic levels do not have such a direct correspondence (simple example: ordering a list of modifiers around a noun). We are currently experimenting with a powerful extension of string compositionality where the objects compositionally associated with abstract subtrees are not strings, but syntactic representations with rich internal structure. The text itself is obtained fiom the syntactic representation associated with the total tree by Silnply enumerating its leaves.</Paragraph>
    <Paragraph position="22"> The picture we get of an IG grammar is tinally the following: aD,..(al(B,C .... ))-Syn --&gt; bE,...(B)-SynB, CF,...(C)-SynC, {constraints(B,C,...,D,E,F,...)}, {compose_english(SynB, SynC, Syn)}.</Paragraph>
    <Paragraph position="23"> The rule shown is a rule for English: the syntactic representations are hmguage dependent; Parallel rules for tim other hmguages are obtained by replacing the compose engl'ish constraint (which is tmique to this rule) by constraints appropriate to the other hmguages under consideration.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML