Proceedings of the 8th International Workshop on Tree Adjoining Grammar and Related Formalisms, pages 17–24,
Sydney, July 2006. c©2006 Association for Computational Linguistics
The Metagrammar Goes Multilingual:
A Cross-Linguistic Look at the V2-Phenomenon
Alexandra Kinyon
Department of CIS
University of Pennsylvania
kinyon@linc.cis.upenn.edu
Tatjana Scheffler
Department of Linguistics
University of Pennsylvania
tatjana@ling.upenn.edu
Aravind K. Joshi
Department of CIS
University of Pennsylvania
joshi@linc.cis.upenn.edu
Owen Rambow
CCLS
Columbia University
rambow@cs.columbia.edu
SinWon Yoon
UFRL
Universit´e Paris 7
swyoon@linguist.jussieu.fr
Abstract
We present an initial investigation into
the use of a metagrammar for explic-
itly sharing abstract grammatical specifi-
cations among languages. We define a
single class hierarchy for a metagrammar
which allows us to automatically gener-
ate grammars for different languages from
a single compact metagrammar hierarchy.
We use as our linguistic example the verb-
second phenomenon, which shows con-
siderable variation while retaining a ba-
sic property, namely the fact that the verb
can appear in one of two positions in the
clause.
1 An Overview of Metagrammars
A metagrammar (MG) factors common properties
of TAG elementary trees to avoid redundancy, ease
grammar development, and expand coverage with
minimal effort: typically, from a compact man-
ually encoded MG of a few dozen classes, one
or more TAGs with several hundreds of elemen-
tary trees are automatically generated. This is
appealing from a grammar engineering point of
view, and also from a linguistic point of view:
cross-linguistic generalizations are expressed di-
rectly in the MG. In this paper, we extend some
earlier work on multilingual MGs (Candito, 1998;
Kinyon and Rambow, 2003) by proposing cross-
linguistic and framework-neutral syntactic invari-
ants, which we apply to TAG. We focus on the
verb-second phenomenon as a prototypical exam-
ple of cross-language variation.
The notion of Metagrammar Metagrammars
were first introduced by Candito (1996) to manu-
ally encode syntactic knowledge in a compact and
abstract class hierarchy which supports multiple
inheritance, and from which a TAG is automati-
cally generated offline. Candito’s class hierarchy
imposes a general organization of syntax into three
dimensions:
a0 Dimension 1: to encode initial subcategoriza-
tion frames i.e. TAG tree families
a0 Dimension 2: to encode valency alternations
/ redistribution of syntactic functions
a0 Dimension 3: to encode the surface realiza-
tion of arguments.
Each class in the MG hierarchy is associated
with a partial tree description The tool computes
a set of well-formed classes by combining exactly
one terminal class from dimension 1, one termi-
nal class from dimension 2, and a1 terminal classes
from dimensions 3 (a1 being the number of argu-
ments subcategorized by the lexical head anchor-
ing the elementary tree(s) generated). The con-
junction of the tree descriptions associated with
each well-formed class in the set yields a minimal
satisfying description, which results in the gener-
ation of one or more elementary trees. Candito’s
tool was used to develop a large TAG for French
as well as a medium-size TAG for Italian Candito
(1999), so multilinguality was addressed from the
start, but each language had its dedicated hierar-
chy, with no sharing of classes despite the obvious
similarities between Italian and French. A related
approach was proposed by (Xia, 2001); the work
of Evans, Gazdar, and Weir (2000) also has some
common elements with MG.
Framework- and language-neutral syntactic
invariants Using a MG, and following Can-
dito, we can postulate cross-linguistic and cross-
framework syntactic invariants such as:
17
a0 The notion of subcategorization
a0 The existence of a finite number of syntactic
functions (subject, object etc.)
a0 The existence of a finite number of syntactic
categories (NP, PP, etc.)
a0 The existence of valency alternations (Can-
dito’s dimension 2)
a0 The existence, orthogonal to valency alterna-
tions, of syntactic phenomena which do not
alter valency, such as wh-movement (Can-
dito’s dimension 3).
These invariants — unlike other framework-
specific syntactic assumptions such as the exis-
tence of “movement” or “wh-traces” — are ac-
cepted by most if not all existing frameworks, even
though the machinery of a given framework may
not necessarily account explicitly for each invari-
ant. For instance, TAG does not have an explicit
notion of syntactic function: although by conven-
tion node indices tend to reflect a function, it is not
enforced by the framework’s machinery.1
Hypertags Based on such framework- and
language-neutral syntactic properties, Kinyon
(2000) defined the notion of Hypertag (HT), a
combination of Supertags (ST) Srinivas (1997)
and of the MG. A ST is a TAG elementary tree,
which provides richer information than standard
POS tagging, but in a framework-specific man-
ner (TAG), and also in a grammar-specific manner
since a ST tagset can’t be ported from one TAG
to another TAG. A HT is an abstraction of STs,
where the main syntactic properties of any given
ST is encoded in a general readable Feature Struc-
ture (FS), by recording which MG classes a ST in-
herited from when it was generated. Figure 1 illus-
trates the a0ST, HTa1 pair for Par qui sera accom-
pagn´ee Marie ‘By whom will Mary be accompa-
nied’. We see that a HT feature structure directly
reflects the MG organization, by having 3 features
“Dimension 1”, “Dimension 2” and “Dimension
3”, where each feature takes its value from the MG
terminal classes used to generate a given ST.
The XMG Tool Candito’s tool brought a sig-
nificant linguistic insight, therefore we essentially
retain the above-mentioned syntactic invariants.
However, more recent MG implementations have
been developed since, each adding its significant
contribution to the underlying metagrammatical
hypothesis.
In this paper, we use the eXtensible MetaGram-
mar (XMG) tool which was developed by Crabb´e
1But several attempts have been made to explicitly add
functions to TAG, e.g. by Kameyama (1986) to retain the
benefits of both TAG and LFG, or by Prolo (2006) to account
for the coordination of constituents of different categories,
yet sharing the same function.
S
PP
P
par
Na2Wha3
(qui)
S
Aux
sera
Va4
accompagn´ee
Na5 a3
(Marie)
a6
a7
DIMENSION1 STRICTTRANSITIVE
DIMENSION2 PERSONALFULLPASSIVE
DIMENSION3 a8SUBJECT INVERTEDSUBJECTCOMPLEMENT WHQUESTIONEDBYCOMPLEMENT
a9
a10
a11
Figure 1: A a0SuperTag, HyperTaga1 pair for ac-
compagn´ee (‘accompanied’) obtained with Can-
dito’s MetaGrammar compiler
(2005). In XMG, an MG consists of a set of
classes similar to those in object-oriented pro-
gramming, which are structured into a multiple
inheritance hierarchy. Each class specifies a par-
tial tree description (expressed by dominance and
precedence constraints). The nodes of these tree
fragment descriptions may be annotated with fea-
tures. Classes may instantiate each other, and they
may be parametrized (e.g., to hand down features
like the grammatical function of a substitution
node). The compiler unifies the instantiations of
tree descriptions that are called. This unification
is additionally guided by node colors, constraints
that specify that a node must not be unified with
any other node (red), must be unified (white), or
may be unified, but only with a white node (black).
XMG allows us to implement a hierarchy similar
to that of Candito, but it also allows us to modify
and extend it, as no structural assumptions about
the class hierarchy are hard-coded.
2 The V2 Phenomenon
The Verb-Second (V2) phenomenon is a well-
known set of data that demonstrates small-scale
cross-linguistic variation. The examples in (1)
show German, a language with a V2-constraint:
(1a) is completely grammatical, while (1b) is not.
This is considered to be due to the fact that the
finite verb is required to be located in “second po-
sition” (V2) in German. Other languages with a
V2 constraint include Dutch, Yiddish, Frisian, Ice-
landic, Mainland Scandinavian, and Kashmiri.
(1) a. Auf
on
dem
the
Weg
path
sieht
sees
der
the
Junge
boy
eine
a
Ente.
duck
‘On the path, the boy sees a duck.’
18
b. * Auf
on
dem
the
Weg
path
der
the
Junge
boy
sieht
sees
eine
a
Ente.
duck
Int.: ‘On the path, the boy sees a duck.’
Interestingly, these languages differ with re-
spect to how exactly the constraint is realized.
Rambow and Santorini (1995) present data from
the mentioned languages and provide a set of pa-
rameters that account for the exhibited variation.
In the following, for the sake of brevity, we will
confine the discussion to two languages: German,
and Yiddish. The German data is as follows (we
do not repeat (1a) from above):
(2) a. Der
the
Junge
boy
sieht
sees
eine
a
Ente
duck
auf
on
dem
the
Weg.
path
‘On the path, the boy sees a duck.’
b. . . . ,
. . . ,
dass
that
der
the
Junge
boy
auf
on
dem
the
Weg
path
eine
a
Ente
duck
sieht.
sees
‘. . . , that the boy sees a duck on the path.’
c. Eine
a
Ente
duck
sieht
sees
der
the
Junge.
boy
‘The boy sees a duck.’
The Yiddish data:
(3) a. Dos
the
yingl
boy
zet
sees
oyfn
on-the
veg
path
a
a
katshke.
duck
‘On the path, the boy sees a duck.’
b. Oyfn
on-the
veg
path
zet
sees
dos
the
yingl
boy
a
a
katshke.
duck.
‘On the path, the boy sees a duck.’
c. . . . ,
. . . ,
az
that
dos
the
yingl
boy
zet
sees
a
a
katshke
duck
‘. . . , that the boy sees a duck.’
While main clauses exhibit V2 in German, embed-
ded clauses with complementizers are verb-final
(2b). In contrast, Yiddish embedded clauses must
also be V2 (3c).
3 Handling V2 in the Metagrammar
It is striking that the basic V2 phenomenon is the
same in all of these languages: the verb can ap-
pear in either its underlying position, or in sec-
ond position (or, in some cases, third). We claim
that what governs the appearance of the verb
in these different positions (and thus the cross-
linguistic differences) is that the heads—the verbal
head and functional heads such as auxiliaries and
complementizers—interact in specific ways. For
example, in German a complementizer is not com-
patible with a verbal V2 head, while in Yiddish it
is. We express the interaction among heads by as-
signing the heads different values for a set of fea-
tures. Which heads can carry which feature values
is a language-specific parameter. Our implementa-
tion is based on the previous pen-and-pencil anal-
ysis of Rambow and Santorini (1995), which we
have modified and extended.
The work we present in this paper thus has
a threefold interest: (1) we show how to han-
dle an important syntactic phenomenon cross-
linguistically in a MG framework; (2) we partially
validate, correct, and extend a previously proposed
linguistically-motivated analysis; and (3) we pro-
vide an initial fragment of a MG implementa-
tion from which we generate TAGs for languages
which are relatively less-studied and for which no
TAG currently exists (Yiddish).
4 Elements of Our Implementation
In this paper, we only address verbal elementary
trees. We define a verbal realization to be a com-
bination of three classes (or “dimensions” in Can-
dito’s terminology): a subcategorization frame,
a redistribution of arguments/valency alternation
(in our case, voice, which we do not further dis-
cuss), and a topology, which encodes the posi-
tion and characteristics of the verbal head. Thus,
we reinterpret Candito’s “Dimension 3” to con-
centrate on the position of the verbal heads, with
the different argument realizations (topicalized,
base position) depending on the available heads,
rather than defined as first-class citizens. The sub-
cat and argument redistributions result in a set of
structures for arguments which are left- or right-
branching (depending on language and grammat-
ical function). Figure 2 shows some argument
structures for German. The topology reflects the
basic clause structure, that is, the distribution of ar-
guments and adjuncts, and the position of the verb
(initial, V2, final, etc.). Our notion of sentence
topology is thus similar to the notion formalized
by Gerdes (2002). Specifically, we see positions
of arguments and adjuncts as defined by the posi-
tions of their verbal heads. However, while Gerdes
(2002) assumes as basic underlying notions the
fields created by the heads (the traditional Vorfeld
for the topicalized element and the Mittelfeld be-
tween the verb in second position and the verb in
clause-final position), we only use properties of
the heads. The fields are epiphenomenal for us.As
mentioned above, we use the following set of fea-
tures to define our MG topology:
a0 I (finite tense and subject-verb agreement):
creates a specifier position for agreement
which must be filled in a derivation, but al-
lows recursion (i.e., adjunction at IP).
a0 Top (topic): a feature which creates a spec-
ifier position for the topic (semantically rep-
resented in a lambda abstraction) which must
be filled in a derivation, and which does not
allow recursion.
a0 M (mood): a feature with semantic content
(to be defined), but no specifier.
a0 C (complementizer): a lexical feature intro-
duced only by complementizers.
We can now define our topology in more detail.
It consists of two main parts:
19
German:
What Features Introduced Directionality
1 Verb (clause-final) +I head-final
2 Verb (V2, subject-inital) +M, +Top, +I head-initial
3 Verb (V2, non-subject-initial) +M, +Top head-initial
4 Complementizer +C, +M head-initial
Yiddish:
What Features Introduced Directionality
1 Verb +I head-initial
2 Verb (V2, subject-inital) +M, +Top, +I head-initial
3 Verb (V2, non-subject-initial) +M, +Top head-initial
4 Complementizer +C head-initial
Figure 4: Head inventories for German and Yiddish.
1: a6
a0
a0
a0
a0
a0
a0
a0
a0
a7
CAT V
I +
TOP a1
M a1
C a1
black
a10
a2
a2
a2
a2
a2
a2
a2
a2
a11
a6
a0
a0
a0
a0
a0
a0
a0
a0
a7
CAT V
I a1
TOP a1
M a1
C a1
white
a10
a2
a2
a2
a2
a2
a2
a2
a2
a11
v
2: a6
a0
a0
a0
a0
a0
a0
a0
a0
a7
CAT V
I +
TOP +
M +
C a1
black
a10
a2
a2
a2
a2
a2
a2
a2
a2
a11
v a6
a0
a0
a0
a0
a0
a0
a0
a0
a7
CAT V
I a1
TOP a1
M a1
C a1
white
a10
a2
a2
a2
a2
a2
a2
a2
a2
a11
3: a6
a0
a0
a0
a0
a0
a0
a0
a0
a7
CAT V
I +
TOP +
M +
C a1
black
a10
a2
a2
a2
a2
a2
a2
a2
a2
a11
v a6
a0
a0
a0
a0
a0
a0
a0
a0
a7
CAT V
I +
TOP a1
M a1
C a1
white
a10
a2
a2
a2
a2
a2
a2
a2
a2
a11
4: a6
a0
a0
a0
a0
a0
a0
a0
a0
a7
CAT V
I +
TOP a1
M +
C +
black
a10
a2
a2
a2
a2
a2
a2
a2
a2
a11
comp a6
a0
a0
a0
a0
a0
a0
a0
a0
a7
CAT V
I +
TOP a1
M a1
C a1
white
a10
a2
a2
a2
a2
a2
a2
a2
a2
a11
Figure 5: Head structures for German corresponding to the table in Figure 4 (above)
1:
a6
a0
a0
a0
a0
a0
a0
a0
a0
a7
CAT V
I +
TOP a1
M a1
C a1
black
a10
a2
a2
a2
a2
a2
a2
a2
a2
a11
v a6
a0
a0
a0
a0
a0
a0
a0
a0
a7
CAT V
I a1
TOP a1
M a1
C a1
white
a10
a2
a2
a2
a2
a2
a2
a2
a2
a11
2:
a6
a0
a0
a0
a0
a0
a0
a0
a0
a7
CAT V
I +
TOP +
M +
C a1
black
a10
a2
a2
a2
a2
a2
a2
a2
a2
a11
v a6
a0
a0
a0
a0
a0
a0
a0
a0
a7
CAT V
I a1
TOP a1
M a1
C a1
white
a10
a2
a2
a2
a2
a2
a2
a2
a2
a11
3:
a6
a0
a0
a0
a0
a0
a0
a0
a0
a7
CAT V
I +
TOP +
M +
C a1
black
a10
a2
a2
a2
a2
a2
a2
a2
a2
a11
v a6
a0
a0
a0
a0
a0
a0
a0
a0
a7
CAT V
I +
TOP a1
M a1
C a1
white
a10
a2
a2
a2
a2
a2
a2
a2
a2
a11
4:
a6
a0
a0
a0
a0
a0
a0
a0
a0
a7
CAT V
I +
TOP +
M +
C +
black
a10
a2
a2
a2
a2
a2
a2
a2
a2
a11
comp a6
a0
a0
a0
a0
a0
a0
a0
a0
a7
CAT V
I +
TOP +
M +
C a1
white
a10
a2
a2
a2
a2
a2
a2
a2
a2
a11
Figure 6: Head structures for Yiddish corresponding to the table in Figure 4 (below)
20
a6
a0
a0
a0
a7
CAT V
I +
TOP +
black
a10
a2
a2
a2
a11
NPa0a1a2a3
a6
a0
a0
a0
a7
CAT V
I +
TOP +
white
a10
a2
a2
a2
a11
a6
a0
a0
a0
a7
CAT V
I +
TOP a1
black
a10
a2
a2
a2
a11
NPa0a1a2a3
a6
a0
a0
a0
a7
CAT V
I +
TOP a1
white
a10
a2
a2
a2
a11
a6
a0
a0
a0
a7
CAT V
I a5
TOP +
black
a10
a2
a2
a2
a11
NPa4a5a4a6a0a1 a6
a0
a0
a0
a7
CAT V
I a5
TOP +
white
a10
a2
a2
a2
a11
a6
a0
a0
a0
a7
CAT V
I a5
TOP a1
black
a10
a2
a2
a2
a11
NPa4a5a4a6a0a1 a6
a0
a0
a0
a7
CAT V
I a5
TOP a1
white
a10
a2
a2
a2
a11
Figure 2: The argument structures
a7
CAT V
white a8
a6
a0
a0
a0
a0
a0
a0
a0
a0
a7
CAT V
I a1
TOP a1
M a1
C a1
black
a10
a2
a2
a2
a2
a2
a2
a2
a2
a11
a9
Figure 3: The projection structure; feature values
can be filled in at the top feature structure to con-
trol the derivation.
a0 The projection includes the origin of the
verb in the phrase structure (with an empty
head since we assume it is no longer there)
and its maximal projection. It is shown in
Figure 3. The maximal projection expresses
the expected feature content. For example,
if we want to model non-finite clauses, the
maximal projection will have [a1I], while root
V2 clauses will have [+Top], and embedded
finite clauses with complementizers will have
[+I,+C].
a0 Structures for heads, which can be head-
initial or head-final. They introduce catego-
rial features. Languages differ in what sort of
heads they have. Which heads are available
for a given language is captured in a head in-
ventory, i.e., a list of possible heads for that
language (which use the head structure just
mentioned). Two such lists are shown in Fig-
ure 4, for German and Yiddish. The corre-
sponding head structures are shown in Fig-
ures 5 and 6.
A topology is a combination of the projection
and any combination of heads allowed by the
language-specific head inventory. This is hard
to express in XMG, so instead we list the spe-
cific combinations allowed. One might ask how
we derive trees for language without the V2 phe-
nomenon. Languages without V2 will usually
have a smaller set of possible heads. We are work-
ing on a metagrammar for Korean in parallel with
our work on the V2 languages. Korean is very
much like German without the V2 phenomenon:
the verbal head can only be in clause-final position
(i.e., head 1 from Figure 5. However, passiviza-
tion and scrambling can be treated the same way
in Korean and German, since these phenomena are
independent of V2.
5 Sample Derivation
Given a feature ordering (C a1 M a1 Top a1 I) and
language-specific head inventories as in Figure 4,
we compile out MGs for German (Figure 5) and
Yiddish (Figure 6).2 The projection and the ar-
gument realizations do not differ between the two
languages: thus, these parts of the MG can be
reused. The features, which were introduced for
descriptive reasons, now guide the TAG compila-
tion: only certain heads can be combined. Further-
more, subjects and non-subjects are distinguished,
as well as topicalized and non-topicalized NPs
(producing 4 kinds of arguments so far). The com-
piler picks out any number of compatible elements
from the Metagrammar and performs the unifica-
tions of nodes that are permitted (or required) by
2All terminal nodes are “red”; spine nodes have been an-
notated with their color.
21
the node descriptions and the colors. By way of
example, the derivations of elementary trees which
can be used in a TAG analysis of German (2c) and
Yiddish (3c) are shown in Figures 7 and 8, respec-
tively.
6 Conclusion and Future work
This paper showed how cross-linguistic general-
izations (in this case, V2) can be incorporated into
a multilingual MG. This allows not only the reuse
of MG parts for new (often, not well-studied) lan-
guages, but it also enables us to study small-scale
parametric variation between languages in a con-
trolled and formal way. We are currently modify-
ing and extending our implementation in several
ways.
The Notion of Projection In our current ap-
proach, the verb is never at the basis of the pro-
jection, it has always been removed into a new
location. This may seem unmotivated in certain
cases, such as German verb-final sentences. We
are looking into using the XMG unification to ac-
tually place the verb at the bottom of the projection
in these cases.
Generating Top and Bottom Features The
generated TAG grammar currently does not have
top and bottom feature sets, as one would expect
in a feature-based TAG. These are important for
us so we can force adjunction in adjunct-initial V2
sentences (where the element in clause-initial po-
sition is not an argument of the verb). We intend
to follow the approach laid out in Crabb´e (2005) in
order to generate top and bottom feature structures
on the nodes of the TAG grammar.
Generating test-suites to document our
grammars Since XMG offers more complex
object-oriented functionalities, including in-
stances, and therefore recursion, it is now
straightforward to directly generate parallel mul-
tilingual sentences directly from XMG, without
any intermediate grammar generation step. The
only obstacle remains the explicit encoding of
Hypertags into XMG.
Acknowledgments
We thank Yannick Parmentier, Joseph Leroux,
Bertrand Gaiffe, Benoit Crabb´e, the LORIA XMG
team, and Julia Hockenmaier for their invaluable
help; Eric de la Clergerie, Carlos Prolo and the
Xtag group for their helpful feedback, comments
and suggestions on different aspects of this work;
and Marie-H´el`ene Candito for her insights. This
work was supported by NSF Grant 0414409 to the
University of Pennsylvania.

References
Candito, M. H. 1998. Building parallel LTAG for French and
Italian. In Proc. ACL-98. Montreal.
Candito, M.H. 1996. A principle-based hierarchical repre-
sentation of LTAGs. In Proc. COLING-96. Copenhagen.
Candito, M.H. 1999. Repr´esentation modulaire et
param´etrable de grammaires ´electroniques lexicalis´ees.
Doctoral Dissertation, Univ. Paris 7.
Cl´ement, L., and A. Kinyon. 2003. Generating parallel mul-
tilingual LFG-TAG grammars using a MetaGrammar. In
Proc. ACL-03. Sapporo.
Clergerie, E. De La. 2005. From metagrammars to factorized
TAG/TIG parsers. In IWPT-05. Trento.
Crabb´e, B. 2005. Repr´esentation informatique de grammaires
fortement lexicalis´ees. Doctoral Dissertation, Univ. Nancy
2.
Evans, R., G. Gazdar, and D. Weir. 2000. Lexical rules
are just lexical rules. In Tree Adjoining Grammars, ed.
A. Abeill´e and O. Rambow. CSLI.
Gerdes, K. 2002. DTAG. attempt to generate a useful TAG for
German using a metagrammar. In Proc. TAG+6. Venice.
Kameyama, M. 1986. Characterising LFG in terms of TAG.
In Unpublished report. Univ. of Pennsylvania.
Kinyon, A. 2000. Hypertags. In Proc. COLING-00. Sar-
rebrucken.
Kinyon, A., and O. Rambow. 2003. Generating cross-
language and cross-framework annotated test-suites using
a MetaGrammar. In Proc. LINC-EACL-03. Budapest.
Prolo, C. 2006. Handling unlike coordinated phrases in TAG
by mixing Syntactic Category and Grammatical Function.
In Proc. TAG+8. Sidney.
Rambow, Owen, and Beatrice Santorini. 1995. Incremental
phrase structure generation and a universal theory of V2.
In Proceedings of NELS 25, ed. J.N. Beckman, 373–387.
Amherst, MA: GSLA.
Srinivas, B. 1997. Complexity of lexical descriptions and its
relevance for partial parsing. Doctoral Dissertation, Univ.
of Pennsylvania.
Xia, F. 2001. Automatic grammar generation from two per-
spectives. Doctoral Dissertation, Univ. of Pennsylvania.
XTAG Research Group. 2001. A lexicalized tree adjoin-
ing grammar for English. Technical Report IRCS-01-03,
IRCS, University of Pennsylvania.
