The Role of Testing in Grammar Engineering 
Martin Volk 
University of Koblenz-Landau 
Institute of Computational Linguistics 
Rheinau 3-4 
5400 Koblenz, Germany 
(+49)-261-9119-469 
e-mail volk~brian.uni-koblenz.de 
1 Introduction 
In the past grammars have been developed either with an 
art approach (building up a coherent system; prescrip- 
tive grammars like the Latin grammars of the Middle 
Ages) or with a science approach (describing the laws of 
nature; descriptive and contrastive grammars). We pro- 
pose to regard grammar development in Computational 
Linguistics as an engineering task analogous to software 
engineerkig: one that requires analysis, specification, im- 
plementation, testing, integration and maintenance. 
The different phases in the software development pro- 
cess correspond to phases in grammar development in 
the following way: 
* The problem analysis and definition phase corre- 
sponds to an analysis of the linguistic data (texts 
of written or spoken language). 
• Problem specification means setting up grammar 
rules that describe the observed data. The gram- 
mar formalism thus provides the formal language 
for the specification. 
• The implementation phase includes putting the ru- 
les into the specific format of the computational 
grammar system. This is a computer program to 
process a grammar in the framework of a grammar 
theory. The implementation effort decreases the clo- 
ser the format of the grammar approaches the for- 
mat of the computational system. 
* The testing phase comprises checking the grammar 
implementation against the linguistic data, i.e. jud- 
ging grammaticality and the assigned structure. 
• Integration, installation and maintenance are no dif- 
ferent in the context of an NLP system than in other 
software systems. 
Our project focusses on the testing aspect of the pro- 
cess. Testing can never be exhaustive but must be re- 
presentative. We therefore propose an archive of test 
sentences (hence: ATS) to govern the incremental deve- 
lopment and the testing of grammar implementations. 
2 Construction of an ATS 
Towards the goal of a comprehensive collection of test 
sentences we have restricted ourselves to the construc- 
tion of an ATS that represents specific syntactic pheno- 
mena. The archive aims to be a representative sample 
of all syntactic phenomena of a natural language, in our 
case German. 
The ATS must contain grammatical sentences as well 
as ungrammatical strings. The grammatical sentences 
are systematically collected by varying one syntactic 
phenomenon at a time. E.g. we vary first the subca- 
tegory of the verb and then we vary verb tense, etc. For 
every phenomenon we have to construct ungrammatical 
strings to check against overgeneration by the grammar. 
These strings are found by manipulating the ph,'nome- 
non in question in such ways as to make it ungramma- 
tical. A syntactic feature must be varied over all values 
of its domain. E.g. the feature case in German has the 
values nominative, genitive, dative, and accusative. A 
noun phrase (NP) that needs to appear in the nomina- 
tive in a given sentence will then result in three ungram- 
matical strings when the other cases are assigned. 
It must be noted that the set of ungrammatical strings 
gets very large when it comes to problems such as 
word order where the permutation of all the words in 
a sentence is necessary to enumerate all possibilities. In 
this case we need heuristics to find the most appropriate 
test sentences. E.g. in German the order of NPs in 
a sentence is variable with certain restrictions whereas 
the word order within the NP is relatively fixed. The- 
refore we are much more likely to encounter problems 
in NP order when setting up our grammar. As a result 
we will have to focus on ungrammatical strings with NP 
order problems. In contrast to grammatical sentences 
ungrammatical strings are only seldom found naturally 
(e.g. errors of language learners) and it will be intere- 
sting to study whether these occurrences (at least the 
most frequent ones) correspond to our constructed ex- 
amples. 
The judgement of grammatical versus ungrammatical 
strings is subjective and has little to say about the ac- 
ceptability of a sentence in a real-world communication. 
Thus our ATS will model competence rather than per- 
formance in the Chomskyan sense. 
For the practical use the ATS must be organized in 
a modular fashion, enabling the user to adapt and ex- 
tend the archive according to his needs and convictions. 
Furthermore it must be documented why a sentence has 
been entered into the archive, since every sentence dis- 
plays a multitude of syntactic information, only some of 
which is relevant in our domain. 
  
 257 
3 Testing with an ATS 
The ATS can be useful in many respects. First, it can 
govern the development process of a grammar in that it 
provides the linguistic data in a concise fashion. Gram- 
mar rules can be incrementally constructed to account 
for the given sentences. Organizing the ATS around syn- 
tactic phenomena of increasing complexity supports in- 
cremental grammar development. After each phenome- 
non has been formalized, the grammar can be checked 
against the test sentences, thus facilitating the retrie- 
val of problematical sections in the grammar. The goal 
is to set up the archive in such a way that incremental 
grammar development does not require total retesting 
of all the previous material but only of the recent re- 
levant phenomena. But foremost the ATS is meant to 
support the testing of the grammar for completeness (all 
sentences that should be accepted are accepted by the 
grammar) and soundness (the grammar does not accept 
any sentence that should not be accepted). 
In testing a grammar we need to distinguish between 
using the grammar for analysis or synthesis. In ana- 
lysis the ATS can be used to provide input sentences 
for the parsing process. In synthesis the ATS can be 
used for comparison with the generated sentences and 
thus minimize the human judgement effort to the lefto- 
ver sentences. The grammatical ones of this group can 
be used to complete the ATS. 
We see three major advantages in using an ATS. 
1. Organizing the ATS in equivalence classes around 
syntactic phenomena facilitates incremental gram- 
mar development and problem driven testing. 
2. Compared to working with NL corpora the ATS avo- 
ids redundant testing of frequently occuring syntac- 
tic phenomena. The ATS can thus serve as a testbed 
before the grammar is used on real ("unconstruc- 
ted" ) texts. 
3. The ATS will help to compare the performance of 
different implementations within the same forma- 
lism as well as across linguistic theories. Running 
an LFG implementation against a GPSG implemen- 
tation will show the respective adequacy of the theo- 
ries for particular syntactic phenomena. 
In order to apply an ATS in practice we have built a 
workbench for experimentation with grammar develop- 
ment that contains an archive of this nature (see Volk 
and Ridder, 1991). The workbench is designed as a tuto- 
rial system for students of syntax analysis in Computa- 
tional Linguistics. Its ATS contains 350 sentences for 15 
syntactic phenomena. The workbench comes with a lexi- 
con, a morphology component, a special purpose editor 
and output procedures for linguistic type tree output. 
The program is written in Prolog and runs on PCs. 
4 Similar work 
Concerning the ATS the approach most similar to our 
own is by Nerbonne and coworkers (1991). They assem- 
ble a database of constructed sentences in an attempt 
to exhaustively cover syntactic phenomena. So far they 
have tackled coordination and verb government with se- 
veral hundred sentences for both. They have not yet. 
included their "diagnostic database" in any test system. 
This work also reports on attempts to build up sentence 
collections for English. 
Other comparable approaches towards grammar engi- 
neering are by Erbach (1991) and by Erbach and Arens 
(1991). The first describes a system that allows for the 
parametrization of and experimentation with different 
parsing strategies. The user can specify priorities to in- 
crementally optimize the parsing process. We believe, 
however, that lacking a broad collection of test sentences 
this system cannot be sufficiently evaluated and there- 
fore we see our approach as complementary to theirs. 
In another attempt Erbach and Arens (1991) try to 
evaluate grammars by generating a "representative set of 
sentences" in a systematic way. They limit their lexicon 
and grammar, and starting with sentences of length 1, 
they generate sentences with increasing length. It is not 
clear how they intend to check the resulting sentences 
other than by human judgement. An ATS that is ad- 
apted to their lexicon could be compared against these 
sentences. 
5 Future directions 
Future work will focus on two aspects. First, we will 
try to apply testing techniques from software enginee- 
ring to our domain of grammar development. In parti- 
cular we hope to demonstrate that building a grammar 
implementation is a special case of declarative program- 
ming, since most recent grammar formalisms (notably 
the unification-based theories) are declarative in nature. 
This needs to go together with test statistics and pre- 
cise information on how to incrementally improve the 
grammar. 
Second, in the long run it will be necessary to add 
sentences that exceed pure syntactic testing and check 
for semantic regularities. It is by no means clear how 
the test sentences for semantics should be collected since 
there is no set of agreed upon semantic features that 
could be varied. 

References 

Gregor Erbach. An Environment for Ezperimentatwn 
with Parsing Strategies. (IWBS Report 167) Stuttgart: 
Wissenschaftliches Zentrum der IBM Deutschland. April 
1991. 

Gregor Erbach and Roman Arens. Evaluation von 
Grammatiken fiir die Analyse natiirlicher Sprache dutch 
Generierung einer repr~sentativen Satzmenge. Procee- 
dings of GWAI-91, pages 126-129, Bonn, September 
1991. 

John Nerbonne et al. A diagT~ostic tool for German syn- 
taz. (Research Report RR-91-t8)Saarbriicken: DFKI. 
July 1991. 

Martin Volk and Hanno Ridder. GTU (Grammalik Test 
Umgebung) Manual. (Manuscript) Institute of Com- 
putational Linguistics. University of Koblenz-Landau. 
1991. 
