Proceedings of the Fourth International Natural Language Generation Conference, pages 9–11,
Sydney, July 2006. c©2006 Association for Computational Linguistics
A generation-oriented workbench for Performance Grammar: 
Capturing linear order variability in Geran and Dutch 
Karin Harbusch
1
, Gerard Kempen
2,3
, Camiel van Breugel
3
, Ulrich Koch
1 
1
Universität Koblenz- 
Landau, Koblenz 
{harbusch, koch} 
@uni-koblenz.de 
2
Max Planck Institute 
for Psycholinguistics, 
Nijmegen 
gerard.kempen@mpi.nl 
3
Department of 
Psychology, 
Leiden University 
cvbreuge@liacs.nl 
 
 
 
 
  
Abstract 
We describe a generation-oriented 
workbench for the Performance 
Grammar (PG) formalism, highlighting 
the treatment of certain word order and 
movement constraints in Dutch and 
German. PG enables a simple and uni-
form treatment of a heterogeneous col-
lection of linear order phenomena in 
the domain of verb constructions 
(variably known as Cros-serial De-
pendencies, Verb Raising, Clause Un-
ion, Extraposition, Third Construction, 
Particle Hoping, etc.). The central 
data structures enabling this feature are 
clausal “topologies”: one-dimensional 
arrays associated with clauses, whose 
cells (“slots”) provide landing sites for 
the constituents of the clause. Move-
ment operations are enabled by unifica-
tion of lateral slots of topologies at ad-
jacent levels of the clause hierarchy. 
The PGW generator assists the gram-
mar developer in testing whether the 
implemented syntactic knowledge al-
lows all and only the well-formed per-
mutations of constituents. 
1 Introduction 
Workbenches for natural-language grammar 
formalisms typically provide a parser to test 
whether given sentences are treated adequately 
— D-PATR for Unification Grammar (Kart-
tunen, 1986) or XTAG for Tree-Adjoining 
Grammars (Paroubek et al., 192) are early 
examples. However, a parser is not a conven-
ient tol for checking whether the current 
grammar implementation licenses all and only 
the strings qualifying as well-formed expres-
sions of a given input. Sentence generators that 
try out all posible combinations of grammar 
rules applicable to the current input, are better 
suited. 
Few workbenches in the literature come 
with such a facility. LinGO (Copestake & 
Flickinger, 200), for Head-Driven Phrase 
Structure Grammar, provides a generator in 
addition to a parser. For Tree Adjoining 
Grammars, several workbenches with genera-
tion components have been built: InTeGenInE 
(Harbusch & Woch, 204) is a recent example. 
Finetuning the grammar such that it neither 
over- nor undergenerates, is a major problem 
for semi-free word order languages (e.g., Ger-
man; cf. Kallmeyer & Yon, 204). Working 
out a satisfactory solution to this problem is 
logically prior to designing a generator capable 
of selecting, from the set of all posible para-
phrases, those that sound “natural,” i.e., the 
ones human speakers/writers would chose in 
the situation at hand (cf. Kempen & Harbusch, 
204). 
Verb constructions in German and Dutch 
exhibit extremely intricate word order patterns 
(cf. Seuren & Kempen, 203). One of the fac-
tors contributing to this complexity is the phe-
nomenon of clause union, which allows con-
stituents of a complement clause to be inter-
spersed between those of the dominating 
clause. The resulting sequences exhibit, among 
other things, cros-serial dependencies and 
clause-final verb clusters. Further complica-
tions arise from all sorts of ‘movement’ phe-
nomena such as fronting, extraction, disloca-
tion, extraposition, scrambling, etc. Given the 
limited space available, we cannot describe the 
Performance Grammar (PG) formalism and the 
linearization algorithm that enables generating 
a broad range of linear order phenomena in 
Dutch, German, and English verb construc-
tions. Instead, we refer to Harbusch & Kempen 
(202), and Kempen & Harbusch (202, 203). 
9
Here, we present the generation-oriented PG 
Workbench (PGW), which assists grammar 
developers, among other things, in testing 
whether the implemented syntactic and lexical 
knowledge allows all and only well-formed 
permutations. 
In Section 2, we describe PG’s topology-
based linearizer implemented in the PGW gen-
erator, whose software design is sketched in 
Section 3. Section 4 shows the PGW at work 
and draws some conclusions. 
2 Linearization in PG and PGW 
Performance Grammar (PG) is a fuly lexical-
ized grammar that belongs to the family of tree 
substitution grammars and deploys disjunctive 
feature unification as its main structure build-
ing mechanism. It adheres to the ID/LP format 
(Immediate Dominance vs. Linear Precedence) 
and includes separate components generating 
the hierarchical and the linear structure of sen-
tences. Here, we focus on the linearization 
component. 
PG's hierarchical structures consist of unor-
dered trees composed of elementary building 
blocks called lexical frames. Every word is 
head of a lexical frame, which specifies the 
subcategorization constraints of the word. As-
sociated with every lexical frame is a topology. 
Topologies serve to assign a left-to-right order 
to the branches of lexical frames. In this paper, 
we wil only be concerned with topologies for 
verb frames (clauses). We assume that clausal 
topologies of Dutch and German contain ex-
actly nine slots — see (1). 
 (1) Wat wil je dat ik doe? / what want you that I do 
/‘hat do you want me to do?’ 
 F1 M1 M2 … M6 E1 E2 
  
wil je    
 
 ↑ 
     
 ⇑ 
 Wat dat ik   doe  
The slot labeled F1 makes up the Forefield 
(from Ger. Vorfeld); slots M1-M6 make up the 
Midfield (Mitelfeld); slots E1 and E2 define 
the Endfield (Nachfeld). Every constituent 
(subject, head, direct object, complement, etc.) 
has a small number of placement options, i.e. 
slots in the topology associated with its “own” 
clause. 
How is the Direct Object NP wat ‘what’ 
'extracted' from the complement clause and 
‘promoted’ into the main clause? Movement of 
phrases between clauses is due to lateral to-
pology sharing. If a sentence contains more 
than one verb, each of their lexical frames in-
stantiates its own topology. This applies to 
verbs of any type — main, auxiliary or copula. 
In such cases, the topologies  are allowed to 
share identically labeled lateral (i.e. left- 
and/or right-peripheral) slots, conditionally 
upon several restrictions (not to be explained 
here; but see Harbusch & Kempen, 202)). 
After two slots have been shared, they are no 
longer distinguishable; in fact, they are unified 
and become the same object. In example (1), 
the embedded topology shares its F1 slot with 
the F1 slot of the matrix clause. This is indi-
cated by the dashed borders of the botom F1 
slot. Sharing the F1 slots effectively causes the 
embeded Direct Object wat to be preposed 
into the main clause (black dot in F1 above the 
single arrow in (1)). The dot in E2 above the 
double arrow marks the position selected by 
the finite complement clause. 
The overt surface order is determined by a 
read-out module that traverses the hierarchy of 
topologies in left-to-right, depth first manner. 
E.g., wat is already seen while the reader scans 
the higher topology. 
3   A sketch of PGW’s software design 
The PGW is a computational grammar devel-
opment tol for PG. Writen in Java, it comes 
with an advanced graphical direct-
manipulation user interface. Al lexical and 
grammatical data have been encoded in a rela-
tional database schema. This contrasts with the 
predominance of hierarchical databases in 
present-day computational linguistics. Rela-
tional lexical databases tend to be easier to 
maintain and update than hierarchical ones, 
especially for linguists with limited program-
ming experience. The software was designed 
with an eye toward easy cros-language port-
ability of the encoded information. For German 
we developed a lexicon converter that maps the 
German CELEX database automatically to the 
PGW format (Koch, 204). 
4 Generating verb constructions in 
Dutch and German 
In order to convey an impression of the capa-
bilities of the PGW, we show it at work in gen-
erating verb constructions that involve rather 
delicate linearization phenomena: “Particle 
Hoping” in Dutch (2), and “Scrambling” in 
German (3). 
10
The finite complement clause (2) includes 
the verb meezingen ‘sing along with,’ where  
me ‘with’ is a preposition functioning as sepa-
rable particle. The three other verbs are auxilia-
ries. According to a topology sharing rule for 
Dutch, clauses headed by auxiliaries are free to 
share 4, 5 or 6 left-peripheral slots of their own 
topology with that of its complement. The most 
restrictive sharing option is shown in (2). 
(2) .. dat  ze  dit (lied) zouden kunen heben 
megezongen 
     .. that they this (song) would be-able-to have 
   along-sung 
 ‘.. that they might have sung along this 
(song)’ 
 M1 M2 M3  M4   M6  E1  
 dat  ze    zouden   
   
 ↑ 
   
 ⇑ 
 
      kunen   
   
 ↑ 
   
 ⇑ 
 
      heben   
   
 ↑ 
   
 ⇑ 
 
   dit 
lied 
me  gezongen   
The Direct Object NP dit (lied) ‘this (song)’ 
lands in M3 of the lowest topology. As this slot 
belongs to the four left-peripheral ones, it is 
always shared and its content gets promoted all 
the way up into the highest clause (see single 
arrows). Particle me always lands in the fifth 
slot (M4), i.e. in the optionally shared area. 
Hence, its surface position depends on the ac-
tual number of shared left-peripheral slots. In 
(2), with minimal slot sharing, mee stays in its 
standard position immediately preceding the 
head verb. In case of non-minimal topology 
sharing, the particle may move leftward until 
(but no farther than) the direct object, thus 
yielding exactly the set of grammatical place-
ment options. 
The quality of PGW’s treatment of Scram-
bling in German can be assessed in terms of a 
set of 30 word order variations of sentence (3), 
discused by Rambow (194), who also pro-
vides grammaticality ratings for all members of 
the set. Seuren (203) presents similar gram-
maticality judgments obtained from an inde-
pendent group of native speakers. As the rating 
scores appeared to vary considerably (cf. (3a) 
and (3b)), we checked which permutations are 
actually generated by the PGW. It turned out 
easy to find a set of topology sharing values 
that generates all and only the paraphrases with 
high or satisfactory grammaticality scores. 
In conclusion, although the performance 
data discused here are very limited, we be-
lieve they justify positive expectations with 
respect to the potential of a topology-based 
linearizer to  approximate closely the gram-
maticality judgments of native speakers and 
thus to avoid over- and undergeneration. 
 (3) a. … weil niemand das Fahrad zu reparie-
ren zu versuchen verspricht 
    because nobody the  bike  to  repair 
   to   try     promises 
   ‘… because nobody promises to try to re-
pair the bike’ 
 b.  *…weil zu versuchen das Fahrad nie-
mand zu reparieren verspricht 

References 
Copestake, A. and Flickinger, D. 200. An open 
source gramar development environment and 
broad-coverage English gramar using HPSG. 
Procs. of the 2nd Internat. Conf. on Language 
Resources and Evaluation (LREC), Athens. 
Harbusch, K. and Kempen, G. 202 A quantitative 
model of word order and movement in English, 
Dutch and German complement constructions. 
Procs. of the 19th Internat. Conf. on Computa-
tional Linguistics (COLING), Taipei. 
Harbusch, K. and Woch, J. 204. Integrated natural 
language generation with Schema-TAGs. In: 
Habel, C., Pechman, T. (eds.) Language Pro-
duction. Berlin: Mouton De Gruyter. 
Kartunen, L. 1986. D-PATR: A Development En-
vironment for Unification Gramars. Tech. 
Rep. CSLI-86-61. CSLI, Stanford, CA. 
Kempen, G. and Harbusch, K. 202 Performance 
Gramar: A declarative definition. In Nijholt, 
A., Theune, M., Hondorp, H. (eds.), Computa-
tional Linguistics in the Netherlands 201. Am-
sterdam: Rodopi, 146-162. 
Kempen, G. and Harbusch, K. 203.  Dutch and 
German verb constructions in Performance 
Gramar. In Seuren & Kempen, 203. 
Koch, U. 204. The Specification of a German Per-
formance Gramar from CELEX Data in 
Preparation for a German Performance Gram-
mar Workbench,  Master's Thesis at the Com-
puter Science Department, U. Koblenz-Landau. 
Paroubek, P., Schabes, Y., and Joshi, A.K. 192. 
XTAG – A Graphical Workbench for Develop-
ing Tre-Adjoining Gramars. Procs. of the 3rd 
Aplied Natural Language Procesing Confer-
ence (ANLP), Trento, 216-23. 
Rambow, O. 194. Formal and Computational As-
pects of Natural Language Syntax. Ph.D. Thesis, 
U. Pensylvania, Philadelphia. 
Seuren, P.A.M. 203. Verb clusters and branching 
directionality in German and Dutch. In (Seuren 
& Kempen, 203), 247-296. 
Seuren, P. and Kempen, G. (eds.). 203. Verb Con-
structions in Dutch and German. Amsterdam: 
Benjamins. 
