Proceedings of the Ninth International Workshop on Parsing Technologies (IWPT), pages 190–191,
Vancouver, October 2005. c©2005 Association for Computational Linguistics
From Metagrammars to Factorized TAG/TIG Parsers
Éric Villemonte de la Clergerie
INRIA - Rocquencourt - B.P. 105
78153 Le Chesnay Cedex, FRANCE
Eric.De_La_Clergerie@inria.fr
Abstract
This document shows how the factorized
syntactic descriptions provided by Meta-
Grammars coupled with factorization op-
erators may be used to derive compact
large coverage tree adjoining grammars.
1 Introduction
Large coverage Tree Adjoining Grammars (TAGs)
tend to be very large, with several thousands of
tree schemata, i.e., trees with at least one anchor
node. Such large grammars are difficult to develop
and maintain. Of course, their sizes have also a
strong impact on parsing efficiency. The size of such
TAGs mostly arises from redundancies, due to the
extended domain of locality provided by trees.
Recently, Meta-Grammars (Candito, 1999) have
been introduced to factorize linguistic information
through a multiple inheritance hierarchy of small
classes, each of them grouping elementary con-
straints on nodes. A MG compiler exploits these bits
of information to generate a set of trees. While MGs
help reducing redundancies at a descriptive level, the
resulting grammars remain very large.
We propose to exploit the fact that MGs are al-
ready factorized to get compact grammars through
the use of factorized trees, as provided by system
DYALOG (Thomasset and Villemonte de la Clerg-
erie, 2005).
This proposal has been validated by quickly de-
veloping and testing a large coverage French MG.
2 Generic factorization operators
The first factorization operators provided by DYA-
LOG are the disjunction, Kleene star, and optional-
ity operators. A finer control of optionality is pro-
vided through the notion of guards, used to state
conditions on the presence or absence of a node (or
of a node sequence). An expression (G+,x;G−)
means that the guard G+ (resp. G−) should be sat-
isfied for x to be present (resp. absent). A guard
G is a boolean expression on equations between FS
paths and is equivalent to a finite set of substitu-
tions ΣG. Used to handle local free-word order-
ings, the interleaving (or shuffling) of two sequences
(ai)i=1···n##(bj)j=1···m returns all sequences con-
taining all ai and bj in any order that preserves the
original orderings (i.e., ai < ai+1 and bj < bj+1).
These operators do not increase the expressive
power or the worst-case complexity of TAGs. They
are implemented without expansion, ensuring good
performances and more natural parsing output (with
no added non-terminals).
3 Meta-Grammars
MGs allow modular descriptions of syntactic phe-
nomena, using elementary constraints grouped into
classes. A class may inherit constraints from sev-
eral parent classes and can also provide a resource
or require a resource. Constraints on nodes include
equality, precedence, immediate and indirect dom-
inances. The constraints may also be on node and
class decorations, expressed with Feature Structures.
The objective of our MG compiler, also devel-
oped with DYALOG, is to cross the terminal classes
(i.e. any class without descendants) in order to ob-
tain neutral classes where each provided resource
190
has been consumed and conversely. Constraints are
accumulated during crossing and are only kept the
neutral classes whose accumulated constraints are
satisfiable, taking into account their logical conse-
quence. Minimal trees satisfying the constraints of
the neutral classes are then produced.
Getting factorized trees results from several
mechanisms. A node may group alternatives, and
may be made optional or repeatable (for Kleene
stars). When generating trees, underspecified prece-
dences between sibling nodes are handled by the in-
terleaving operator.
Positive and negative guards may be attached to
nodes and are accumulated in a conjunctive way dur-
ing the crossing phase, i.e. N ⇒ G1 and N ⇒ G2 is
equivalent to N ⇒ (G1,G2). The compiler checks
the satisfiability of the guards, removing the alter-
natives leading to failures and equations in guards
which become trivially true. The remaining guards
are emitted as DYALOG guards in the trees.
4 Grammar anatomy
In just a few months, we have developed, for French,
a MG with 191 classes, used to generate a very
compact TAG of only 126 trees. Only 27 trees are
anchored by verbs and they are sufficient to cover
canonical, passive and extracted verbal construc-
tions with at most 2 arguments (including objects,
attributes, completives, infinitives, prepositional ar-
guments, wh-completives). These trees would cor-
respond to several thousand trees, if the factoriza-
tion operators were expanded. This strong com-
paction rate stems from the presence of 820 guards,
92 disjunctions (to handle choices in realizations),
26 interleavings (to handle verb argument positions)
and 13 Kleene stars (to handle coordinations). The
grammar is mostly formed of simple trees (with less
than 17 nodes), and a few complex trees (26 trees
between 30 and 46 nodes), essentially anchored by
verbs.
For instance, tree #1111, used for canonical verb
constructions, results from the crossing of 25 ter-
minal classes, and has 43 nodes, plus 3 disjunction
nodes (for the different realizations of the subject
and other verb arguments) and 1 interleaving node
1browsable online at http://atoll.inria.fr/
perl/frmg/tree.pl.
(between the verb arguments and a possible post-
verbal subject). The tree is controlled by 35 guards,
governing, for instance, the presence and position of
a subject and of clitics.
Such a tree covers much more verb sub-
categorization frames than the number of frames
usually attached to a given verb. The anchoring of a
tree α by a word w is done by unifying two feature
structures Hα and Hw, called hypertags (Kinyon,
2000), that list the syntactic properties covered by
α and allowed by w. The link between Hτ and the
allowed syntactic constructions is done through the
variables occurring inHτ and in the guards and node
decorations.
5 Evaluation
The resulting French grammar has been compiled,
with DYALOG, into an hybrid TAG/TIG parser,
by identifying the left and right auxiliary insertion
trees. Following a left-to-right top-down tabular
parsing strategy, the parser may be used to get ei-
ther full or partial parses.2 Coverage rate for full
parsing is around 95% for two test suites (EURO-
TRA and TSNLP) and around 42% on various cor-
pora (including more than 300K sentences of a raw
journalistic corpus).
Our MG is still very young and needs to be im-
proved to ensure a better coverage. However, we can
already conclude that coupling MGs with factorized
trees is a generic and powerful approach to control
the size of grammars and to get efficient parsers.
The various tools and linguistic resources men-
tioned in this abstract are freely available at http:
//atoll.inria.fr/.

References
M.-H. Candito. 1999. Organisation modulaire et
paramétrable de grammaires électroniques lexical-
isées. Ph.D. thesis, Université Paris 7.
A. Kinyon. 2000. Hypertags. In Proc. of COLING,
pages 446–452.
F. Thomasset and É. Villemonte de la Clergerie. 2005.
Comment obtenir plus des méta-grammaires. In Pro-
ceedings of TALN’05, volume 1, pages 3–12, Dourdan,
France, June. ATALA.
