File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/e06-2005_metho.xml
Size: 9,812 bytes
Last Modified: 2025-10-06 14:10:07
<?xml version="1.0" standalone="yes"?> <Paper uid="E06-2005"> <Title>XMG - An expressive formalism for describing tree-based grammars</Title> <Section position="3" start_page="0" end_page="103" type="metho"> <SectionTitle> 2 Linguistic formalism </SectionTitle> <Paragraph position="0"> As mentioned above, the XMG system produces a grammar from a linguistic meta-description called a metagrammar. This description is specified using the XMG metagrammar formalism which sup- null ports three main features: 1. the reuse of tree fragments 2. the specialization of fragments via inheritance null 3. the combination of fragments by meansof conjunctions anddisjunctions These features reflect the idea that a metagrammar should allow the description of two main axes: (i) the specification of elementary pieces of information (fragments), and (ii) the combination of these to represent alternative syntactic structures. Describing syntax In a tree-based metagrammar, the basic informational units to be handled are tree fragments. In the XMG formalism, these units are put into classes. A class associates a name with a content. At the syntactic level, a content is a tree description2. The tree descriptions supported by the XMG formalism are defined by the following tree description language:</Paragraph> <Paragraph position="2"> where x,y represent node variables, -immediate dominance (x is directly above y),-+ strict dominance (x is above y), -[?] large dominance (x is above or equal to y), [?] is immediate precedence, [?]+ strict precedence, and [?][?] large precedence3.</Paragraph> <Paragraph position="3"> x[f:E] constrains feature f with associated expression E on node x (a feature can for instance refer to the syntactic category of the node)4.</Paragraph> <Paragraph position="4"> Tree fragments can furthermore be combined using conjunction and/or disjunction. These two operators allow the metagrammar designer to achieve a high degree of factorization. Moreover the XMG system also supports inheritance between classes, thus offering more flexibility and structure sharing by allowing one to reuse and specialize classes.</Paragraph> <Paragraph position="5"> Identifiers' scope When describing a broad-coverage grammar, dealing with identifiers scope is a non-trivial issue.</Paragraph> <Paragraph position="6"> In previous approaches to metagrammar compilation ((Candito, 1999), (Gaiffe et al., 2002)), 2As we shall later see, a content can in fact be multi-dimensional and integrateforinstanceboth semantic andsyntax/semantics interface information.</Paragraph> <Paragraph position="7"> how top and bottom are encoded in TAG.</Paragraph> <Paragraph position="8"> node identifiers had global scope. When designing broad-coverage metagrammars however, such a strategy quickly reduces modularity and complexifies grammar maintenance. To start with, the grammar writer must remember each node name andits interpretation andinalarge coverage grammar the number of these node names amounts to several hundreds. Further it is easy to use twice the same name erroneously or on the contrary, to mistype a name identifier, in both cases introducing errors in the metagrammar In XMG, identifiers are local to a class and can thus be reused freely. Global and semi-global (i.e., global to a subbranch in the inheritance hierarchy) naming is also supported however through a system of import / export inspired from Object Oriented Programming. When defining a class as being a sub-class of another one, the XMG user can specify which are the viewable identifiers (i.e. which identifiers have been exported in the superclass). null Extension to semantics The XMG formalism further supports the integration in the grammar of semantic information. More generally, the language manages dimensions of descriptions so that the content of a class can consists of several elements belonging to different dimensions. Each dimension is then processed differently according to the output that is expected (trees, set of predicates, etc).</Paragraph> <Paragraph position="9"> Currently, XMG includes a semantic representationlanguage based onFlatSemantics (see (Gardent and Kallmeyer, 2003)):</Paragraph> <Paragraph position="11"> where lscript:p(E1,...,En) represents the predicate p with parameters E1,..,En, and labeled lscript. ! is the logical negation, and Ei lessmuch Ej is the scope between Ei and Ej (used to deal with quantifiers).</Paragraph> <Paragraph position="12"> Thus, one can write classes whose content consists of tree description and/or of semantic formulas. TheXMGformalism furthermore supports the sharing of identifiers across dimension hence allowing for a straightforward encoding of the syntax/semantics interface (see figure 1).</Paragraph> </Section> <Section position="4" start_page="103" end_page="105" type="metho"> <SectionTitle> 3 Compiling a MetaGrammar into a </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="103" end_page="104" type="sub_section"> <SectionTitle> Grammar </SectionTitle> <Paragraph position="0"> We now focus on the compilation process and on the constraint logic programming techniques we As we have seen, an XMG metagrammar consists of classes that are combined. Provided these classes can be referred to by means of names, we can view a class as a Clause associating a name with a content or Goal to borrow vocabulary from Logic Programming. In XMG, this Goal will be either a tree Description, a semantic Description, a Name (class call) or a combination of classes (conjunction or disjunction). Finally, the valuation of a specific class can be seen as being triggered by a query.</Paragraph> <Paragraph position="2"> In other words, we view our metagrammar language as a specific kind of Logic Program namely, a Definite Clause Grammar (or DCG). In this DCG, the terminal symbols are descriptions.</Paragraph> <Paragraph position="3"> To extend the approach to the representation of semantic information asintroduced in2, clause (4) is modified as follows:</Paragraph> <Paragraph position="5"> Note that, with this modification, the XMG language no longer correspond to a Definite Clause Grammar but to an Extended Definite Clause Grammar (see (Van Roy, 1990)) where the symbol += represents the accumulation of information for each dimension.</Paragraph> <Paragraph position="6"> Virtual Machine The evaluation of a query is done by a specific Virtual Machine inspired by the Warren's Abstract Machine (see (Ait-Kaci, 1991)). First, it computes the derivations contained in the description, i.e. in the Extended Definite Clause Grammar, and secondly it performs unification of non standard data-types (nodes, node features for TAG). Eventually it produces as an output a description, more precisely one description per dimension (syntax, semantics).</Paragraph> <Paragraph position="7"> Inthecase of TAG, thevirtual machine produces a tree description. We still need to solve this description in order to obtain trees (i.e. the items of the resulting grammar).</Paragraph> <Paragraph position="8"> Constraint-based tree description solver The tree description solver we use is inspired by (Duchier and Niehren, 2000). The idea is to: 1. associate to each node x in the description an integer, 2. then refer to x by means of the tuple (Eqx,Upx,Downx,Leftx,Rightx) where Eqx (respectively Upx, Downx, Leftx, Rightx) denotes the set of nodes in the description which areequal, (respectively above, below, left, and right) of x (see picture 2). Note that these sets are set of integers.</Paragraph> <Paragraph position="9"> The operations supported by the XMG language (i.e. dominance, precedence, etc) are then converted into constraints onthese sets. For instance, let us consider 2 nodes x and y of the description. Assuming we associate x with the integer i and y with j, we can translate the dominance relation</Paragraph> </Section> <Section position="2" start_page="104" end_page="105" type="sub_section"> <SectionTitle> Eq and </SectionTitle> <Paragraph position="0"> NiUp, similarly forNjEqDown withNiEq and NiDown.</Paragraph> <Paragraph position="1"> (2) the dual holds, i.e. the set of integers representing nodes that are below x contains the set of integers representing nodes that are equal or below y, (3) the set of integers representing nodes that are on the left of x is included in the set of integers representing those on the left of y, and (4) symmetrically for the nodes on the right6.</Paragraph> <Paragraph position="2"> Parameterized constraint solver To recap 3 from a grammar-designer's point of view, a queried class needs not define complete trees but rather a set of tree descriptions. The solver is then called to generate all the matching valid minimal trees from those descriptions. This feature provides the users with a way to concentrate on what is relevant in the grammar, thus taking advantage of underspecification, and to delegate the tiresome work to the solver.</Paragraph> <Paragraph position="3"> Actually, thesolver canbeparameterized toperform various checks or constraints on the tree descriptions besides tree-shaping them. These parameters are called principles in the XMG terminology. Some are specific to a target formalism (e.g. TAG trees must have at most one foot node) whileothersareindependent. Themostinteresting one is a resources/needs mechanism for node unification called color principle, see (Crabb'e and Duchier, 2004).</Paragraph> <Paragraph position="4"> At the end of this tree description solving process we obtain the trees of the grammar. Note that the use of constraint programming techniques to solve tree descriptions allows us to compute grammars faster than the previous approaches (see section 4).</Paragraph> </Section> </Section> class="xml-element"></Paper>