Parallel Distributed Grammar Engineering for Practical Applications
Stephan Oepen , Emily M. Bender , Uli Callmeier~, Dan Flickinger ~, Melanie Siegel|
 CSLI Stanford ~YY Technologies |DFKI GmbH
Stanford (CA) Mountain View (CA) Saarbr¨ucken (Germany)
8<
:
oe
bender
dan
9=
;@csli.stanford.edu
 uc
dan
 
@yy.com siegel@dfki.de
Abstract
Based on a detailed case study of paral-
lel grammar development distributed across
two sites, we review some of the require-
ments for regression testing in grammar en-
gineering, summarize our approach to sys-
tematic competence and performance profil-
ing, and discuss our experience with gram-
mar development for a commercial applica-
tion. If possible, the workshop presentation
will be organized around a software demon-
stration.
1 Background
The production of large-scale constraint-based
grammars and suitable processing environments is
a labour- and time-intensive process that, maybe,
has become somewhat of a growth industry over
the past few years, as companies explore products
that incorporate grammar-based language process-
ing. Many broad-coverage grammars have been
developed over several years, sometimes decades,
typically coordinated by a single grammarian who
would often draw on additional contributors (e.g.
the three HPSG implementations developed as part
of the VerbMobil effort, see Flickinger, Copes-
take, & Sag, 2000, M¨uller & Kasper, 2000, and
Siegel, 2000; or the LFG implementations devel-
oped within the ParGram consortium, Butt, King,
Ni˜no, & Segond, 1999).
More recently, we also find genuinely shared
and distributed development of broad-coverage
grammars, and we will use one such initiative as an
example—viz. an open-source HPSG implementa-
tion for Japanese jointly developed between DFKI
Saarbr¨ucken (Germany) and YY Technologies
(Mountain View, CA)—to demonstrate the techno-
logical and methodological challenges present in
distributed grammar and system engineering.
2 Parallel Distributed Grammar
Development—A Case Study
The Japanese grammar builds on earlier work per-
formed jointly between DFKI and the Computa-
tional Linguistics Department at Saarland Univer-
sity (Germany) within VerbMobil; much like for
the German VerbMobil grammar, two people were
contributing to the grammar in parallel, one build-
ing out syntactic analyses, the other charged with
integrating semantic composition into the syntax.
This relatively strict separation of responsibilities
mostly enabled grammarians to serialize incre-
mental development of the resource: the syntacti-
cian would supply a grammar with extended cov-
erage to the semanticist and, at the onset of the fol-
lowing iteration, start subsequent work on syntax
from the revised grammar.
In the DFKI – YY cooperation the situation was
quite different. Over a period of eight months,
both partners had a grammarian working on syn-
tax and semantics simultaneously on a day-to-
day basis; both grammarians were submitting
changes to a joint, version-controlled source repos-
itory and usually would start the work day by re-
trieving the most recent revisions. At the same
time, product building and the development of
so-called ‘domain libraries’ (structured collections
of knowledge about a specific domain that is in-
stantiated from semantic representations delivered
from grammatical analysis) at YY already incorpo-
rated the grammar and depended on it for actual,
customer-specific contracts. Due to a continuous
demand for improvements in coverage and analy-
sis accuracy, the grammar used in the main product
line would be updated from the current develop-
ment version about once or twice a week. Parallel
to work on the Japanese grammar (and simultane-
ous work on grammars for English and Spanish),
both the grammar development environment (the
open-source LKB system; Copestake, 2002) and
the HPSG run-time component powering the YY
linguistic analysis engine (the open-source PET
parser; Callmeier, 2002) continued to evolve, as
did the YY-proprietary mapping of meaning repre-
sentations extracted from the HPSG grammars into
domain knowledge—all central parts of a complex
system of interacting components and constraints.
As has been argued before (see, for exam-
ple, Oepen & Flickinger, 1998), the nature of a
large-scale constraint-based grammar and the sub-
tle interactions of lexical and constructional con-
straints make it virtually impossible to predict how
a change in one part of the grammar affects over-
all system behaviour. A relatively minor repair in
one lexical class, numeral adjectives as in ‘three
books were ordered’ for instance, will have the po-
tential of breaking the interaction of that class with
the construction deriving named (numeric) entities
from a numeral (e.g. as in ‘three is my favourite
number’) or the partitive construction (e.g. as in
‘three have arrived already’). A ripple effect of
a single change can thus corrupt the semantics
produced for any of these cases and in the con-
sequence cause failure or incorrect behaviour in
the back-end system. In addition to these qual-
ity assurance requirements on grammatical cover-
age and correctness, the YY application (like most
applications for grammar-based linguistic analy-
sis) utilizes a set of hand-constructed parse rank-
ing heuristics that enables the parser to operate
in best-first search mode and to return only one
reading, i.e. the analysis that is ranked best by the
heuristic component. The parse ranking machin-
ery builds on preferences that are associated with
individual or classes of lexical items and construc-
tions. The set of preferences is maintained in par-
allel to the grammar, in a sense providing a layer
of performance-oriented annotations over the basic
building blocks of the core competence grammar.
Without discussing the details of the parse ranking
approach, it creates an additional element of un-
certainty in assessing grammar changes: since the
preference for a specific analysis results implic-
itly from a series of local preferences (of lexical
items and constructions contributing to the com-
plete derivation), introducing additional elements
(i.e. new local or global ambiguity) into the search
space and subjecting them to the partial ordering
can quickly skew the overall result.
Summing up, the grammar and application engi-
neering example presented here illustrates a num-
ber of highly typical requirements on the engi-
neering environment. First, all grammarians and
system engineers participating in the development
process need to keep frequent, detailed, and accu-
rate records of a large number of relevant parame-
ters, including but not limited to grammatical cov-
erage, correctness of syntactic analyses and cor-
responding semantic forms, parse selection accu-
racy, and overall system performance. Second, as
modifications to the system as a whole are made
daily—and sometimes several times each day—all
developers must be able to assess the impact of
recent changes and track their effects on all rele-
vant parameters; gathering the data and analyzing
it must be simple, fast, and automated as much as
possible. Third, not all modifications (to the gram-
mar or underlying software) will result in ‘mono-
tonic’ or backwards-compatible effects. A change
in the treatment of optional nominal complements,
for example, may affect virtually all derivation
trees and render a comparison of results at this
level uninformative. At the same time, a primarily
syntactic change of this nature will not cause an ef-
fect in associated meaning representations, so that
a semantic equivalence test over analyses should
be expected to yield an exact match to earlier re-
sults. Hence, the machinery for representation and
comparison of relevant parameters needs to facil-
itate user-level specification of informative tests
and evolution criteria. Finally, the metrics used in
tracking grammar development cannot be isolated
from measurements of system resource consump-
tion and overall performance (specific properties
of a grammar may trigger idiosyncrasies or soft-
ware bugs in a particular version of the process-
ing system); therefore, and to enable exchange of
reference points and comparability of experiments,
grammarians and system developers alike should
use the same, homogenuous set of relevant param-
eters.
3 Integrated Competence and
Performance Profiling
The integrated competence and performance pro-
filing methodology and associated engineering
platform, dubbed [incr tsdb()] (Oepen & Callmeier,
2000)1 and reviewed in the remainder of this sec-
1See ‘http://www.coli.uni-sb.de/itsdb/’
for the (draft) [incr tsdb()] user manual, pronunciation rules,
and instructions on obtaining and installing the package.
tion, was designed to meet all of the requirements
identified in the DFKI – YY case study. Generally
speaking, the [incr tsdb()] environment is an in-
tegrated package for diagnostics, evaluation, and
benchmarking in practical grammar and system
engineering. The toolkit implements an approach
to grammar development and system optimization
that builds on precise empirical data and system-
atic experimentation, as it has been advocated by,
among others, Erbach & Uszkoreit (1990), Erbach
(1991), and Carroll (1994). [incr tsdb()] has been
integrated with, as of June 2002, nine different
constraint-based grammar development and pars-
ing systems (including both environments in use at
YY, i.e. the LKB and PET), thus providing a pre-
standard reference point for a relatively large (and
growing) community of NLP developers. The [incr
tsdb()] environment builds on the following com-
ponents and modules:
 test and reference data stored with annota-
tions in a structured database; annotations
can range from minimal information (unique
test item identifier, item origin, length et al.)
to fine-grained linguistic classifications (e.g.
regarding grammaticality and linguistic phe-
nomena presented in an item), as they are rep-
resented in the TSNLP test suites, for example
(Oepen, Netter, & Klein, 1997);
 tools to browse the available data, identify
suitable subsets and feed them through the
analysis component of processing systems
like the LKB and PET, LiLFeS (Makino,
Yoshida, Torisawa, & Tsujii, 1998), TRALE
(Penn, 2000), PAGE (Uszkoreit et al., 1994),
and others;
 the ability to gather a multitude of precise and
fine-grained (grammar) competence and (sys-
tem) performance measures—like the num-
ber of readings obtained per test item, various
time and memory usage statistics, ambigu-
ity and non-determinism metrics, and salient
properties of the result structures—and store
them in a uniform, platform-independent data
format as a competence and performance pro-
file; and
 graphical facilities to inspect the resulting
profiles, analyze system competence (i.e.
grammatical coverage and overgeneration)
and performance (e.g. cpu time and memory
usage, parser search space, constraint solver
’
&
$
%
Parser 3Parser 2Parser 1Grammar 3Grammar 2Grammar 1
TestSet 3TestSet 2Test
Set 1
ParallelVirtualMachine
C and Lisp APIRelational
DBMSBatch ControlStatisticsUserInterfaceANSI CCommon-LispTcl/Tk
Figure 1: Rough sketch of [incr tsdb()] architec-
ture: the core engine comprises the database man-
agement, batch control and statistics component,
and the user interface.
workload, and others) at variable granulari-
ties, aggregate, correlate, and visualize the
data, and compare among profiles obtained
from previous grammar or system versions or
other processing environments.
As it is depicted in Figure 1, the [incr tsdb()]
architecture can be broken down into three major
parts: (i) the underlying database management sys-
tem (DBMS), (ii) the batch control and statistics
kernel (providing a C and Lisp application pro-
gram interface to client systems that can be dis-
tributed across the network), and (iii) the graphi-
cal user interface (GUI). Although, historically, the
DBMS was developed independently and the ker-
nel can be operated without the GUI, the full func-
tionality of the integrated competence and perfor-
mance laboratory—as demonstrated below—only
emerges from the combination of all three com-
ponents. Likewise, the flexibility of a clearly de-
fined API to client systems and its ability to par-
allelize batch processing and distribute test runs
across the network have greatly contributed to the
success of the package. The following paragraphs
review some of the fundamental aspects in more
detail, sketch essential functionality, and comment
on how they have been exploited in the DFKI – YY
cooperation.
Abstraction over Processors The [incr tsdb()]
environment, by virtue of its generalized pro-
file format, abstracts over specific processing en-
vironments. While grammar engineers in the
DFKI – YY collaboration regularly use both the
LKB (primarily for interactive development) and
PET (mostly for batch testing and the assessment
of results obtained in the YY production envi-
ronment), usage of the [incr tsdb()] profile anal-
ysis routines in most aspects hides the specifics
of the token processor used in obtaining a profile.
Both platforms interprete the same typed feature
structure formalism, load the same set of gram-
mar source files, and (unless malfunctioning) pro-
duce equivalent results. Using [incr tsdb()], gram-
marians can obtain summary views of grammati-
cal coverage and overgeneration, inspect relevant
subsets of the available data, break down analysis
views according to various aggregation schemes,
and zoom in on specific aggregates or individual
test items as appropriate. Moreover, processing
results obtained from the (far more efficient) PET
parser (that has no visualization or debugging sup-
port built in), once recorded as an [incr tsdb()] pro-
file, can be used in conjunction with the LKB (con-
tingent on the use of identical grammars), thereby
facilitating graphical inspection of parse trees and
semantic formulae.
Parallelization of Test Runs The [incr tsdb()] ar-
chitecture (see Figure 1) separates the batch con-
trol and statistics kernel from what is referred to
as client processors (i.e. parsing systems like the
LKB or PET) through an application program inter-
face (API) and the Parallel Virtual Machine (PVM;
Geist, Bequelin, Dongarra, Manchek, & Sun-
deram, 1994) message-passing protocol layer. The
use of PVM—in connection with task scheduling,
error recovery, and roll-over facilities in the [incr
tsdb()] kernel—enables developers to transparently
parallelize and distribute execution of batch pro-
cessing. At YY, grammarians had a cluster of net-
worked Linux compute servers configured as a sin-
gle PVM instance, so that execution of a test run—
using the efficient PET run-time engine—could be
completed as a matter of a few seconds. The com-
bination of near-instantaneous profile creation and
[incr tsdb()] facilities for quick, semi-automated as-
sessment of relevant changes (see below) enabled
developers to pursue a strongly empiricist style of
grammar engineering, assessing changes and their
effects on actual system behavior in small incre-
ments (often many times per hour).
Structured Comparison One of the facilities
that has proven particularly useful in the dis-
tributed grammar engineering setup outlined in
Section 2 above is the flexible comparison of com-
petence and performance profiles. The [incr tsdb()]
package eases comparison of results on a per-
item basis, using an approach similar to Un x
diff(1), but generalized for structured data sets.
By selection of a set of parameters for intersec-
tion (and optionally a comparison predicate), the
user interface allows browsing the subset of test
items (and associated results) that fail to match
in the selected properties. One dimension that
grammarians found especially useful in intersect-
ing profiles is on the number of readings assigned
per item—detecting where coverage was lost or
added—and on derivation trees (bracketed struc-
tures labeled with rule names and identifiers of lex-
ical items) associated with each parser analysis—
assessing where analyses have changed. Addition-
ally, using a user-supplied equivalence predicate,
the same technique was regularly used at YY to
track the evolution of meaning representations (as
they form the interface from linguistic analysis into
the back-end knowledge processing engine), both
for all readings and the analysis ranked best by the
parse selection heuristics.
Zooming and Interactive Debugging In
analysing a new competence and performance
profile, grammarians typically start from summary
views (overall grammatical coverage, say), then
single out relevant (or suspicious) subsets of
profile data, and often end up zooming in to
the level of individual test items. For most [incr
tsdb()] analysis views the ‘success’ criteria can be
varied according to user decisions: in assessing
grammatical coverage, for example, the scoring
function can refer to virtually arbitrary profile
elements—ranging from the most basic coverage
measure (assigning at least one reading) to more
refined or application-specific metrics, the produc-
tion of a well-formed meaning representation, say.
Although the general approach allows output an-
notations on the test data (full or partial constituent
structure descriptions, for example), developers so
far have found the incremental, semi-automated
comparison against earlier results a more adequate
means of regression testing. It would appear
that, especially in an application-driven and
tightly scheduled engineering situation like the
DFKI – YY partnership, the pace of evolution
and general lack of locality in changes (see the
examples discussed in Section 2) precludes the
construction of a static, ‘gold-standard’ target for
comparison. Instead, the structured comparison
facilities of [incr tsdb()] enable developers to
incrementally approximate target results and, even
12-sep-2001 (13:24 h) – 14-feb-2002 (17:14 h)
40
45
50
55
60
65
70
75
80
85
90
95
Grammatical Coverage (Per Cent)
(generated by [incr tsdb()] at 29-jun-2002 (20:49 h))
 
        
    
 
 
 
 
 
  
 
   
  
 
  
 
 
     
 
 
 
     
 
 
 
 
 
 
      
      
  
      
 — ‘banking’
 — ‘trading’
12-sep-2001 (13:24 h) – 14-feb-2002 (17:14 h)
0
10
20
30
40
50
60
70
80
90
Ambiguity (Average Number of Analyses)
(generated by [incr tsdb()] at 29-jun-2002 (20:59 h))
 
 
   
 
             
 
 
 
 
  
 
 
   
   
    
 — ‘banking’
 — ‘trading’
Figure 2: Evolution of grammatical coverage and average ambiguity (number of readings per test item) over
a five-month period; ‘banking’ and ‘trading’ are two data sets (of some 700 and 400 sentences, respectively)
of domain data.
in a highly dynamic environment where grammar
and processing environment evolve in parallel,
track changes and identify regression with great
confidence.
4 Looking Back—Quantifying Evolution
Over time, the [incr tsdb()] profile storage accu-
mulates precise data on the grammar development
process. Figure 2 summarizes two aspects of
grammatical evolution compiled over a five-month
period (and representing some 130 profiles that
grammarians put aside for future reference): gram-
matical coverage over two representative samples
of customer data—one for an on-line banking ap-
plication, the other from an electronic stock trad-
ing domain—is contrasted with the development
of global ambiguity (i.e. the average number of
analyses assinged to each test item). As should
be expected, grammatical coverage on both data
sets increases significantly as grammar develop-
ment focuses on these domains (‘banking’ for the
first three months, ‘trading’ from there on). While
the collection of available profiles, apparently, in-
cludes a number of data points corresponding to
‘failed’ experiments (fairly dramatic losses in cov-
erage), the larger picture shows mostly monotonic
improvement in coverage. As a control experi-
ment, the coverage graph includes another data
point for the ‘banking’ data towards the end of the
reporting period. Two months of focussed devel-
opment on the ‘trading’ domain have not nega-
tively affected grammatical coverage on the data
set used earlier. Corresponding to the (desirable)
increase in coverage, the graph on the right of Fig-
ure 2 depicts the evolution of grammatical ambi-
guity. As hand-built linguistic grammars put great
emphasis on the precision of grammatical analy-
sis and the exclusion of ungrammatical input, the
overall average of readings assigned to each sen-
tence varies around relatively small numbers. For
the moderately complex email data2 the grammar
often assigns less than ten analyses, rarely more
than a few dozens. However, not surprisingly
the addition of grammatical coverage comes with
a sharp increase in ambiguity (which may indi-
cate overgeneration): the graphs in Figure 2 clearly
show that, once coverage on the ‘trading’ data was
above eighty per cent, grammarians shifted their
engineering focus on ‘tightening’ the grammar, i.e.
the elimination of spurious ambiguity and overgen-
eration (see Siegel & Bender, 2002, for details on
the grammar).
Another view on grammar evolution is pre-
sented in Figure 3, depicting the ‘size’ of the
Japanese grammar over the same five-month de-
velopment cycle. Although measuring the size of
2Quantifying input complexity for Japanese is a non-
trivial task, as the count of the number of input words would
depend on the approach to string segmentation used in a spe-
cific system (the fairly aggressive tokenizer of ChaSen, Asa-
hara & Matsumoto, 2000, in our case); to avoid potential for
confusion, we report input complexity in the (overtly system-
specific) number of lexical items stipulated by the grammar
instead: around 50 and 80, on average, for the ‘banking’ and
‘trading’ data sets, respectively (as of February 2002).
12-sep-2001 (13:24 h) – 14-feb-2002 (17:14 h)
8800
9000
9200
9400
9600
9800
10000
10200
88
90
92
94
96
98
100
102
104
106
Grammar Size
(generated by [incr tsdb()] at 30-jun-2002 (16:09 h))    
   
       
      
  
 
 
 
    
               
  
  
     
 
   
  
 
   
 
  
   
   
 
 
    
    
   
   
     — types
 — rules
Figure 3: Evolution of grammar size (in the num-
bers of types, plotted against the left axis, and
grammar rules, plotted against the right axis) over
a five-month period.
computational grammars is a difficult challenge,
for the HPSG framework two metrics suggest them-
selves: the number of types (i.e. the size of the
grammatical ontology) and the number of gram-
mar rules (i.e. the inventory of construction types).
As would be expected, both numbers increase
more or less monotonically over the reporting pe-
riod, where the shift of focus from the ‘banking’
into the ‘trading’ domain is marked with a sharp
increase in (primarily lexical) types. Contrasted
to the significant gains in grammatical coverage
(a relative improvement of more than seventy per
cent on the ‘banking’ data), the increase in gram-
mar size is moderate, though: around fifteen and
twenty per cent in the number of types and rules,
respectively.
5 Conclusions
At YY and cooperating partners (primarily DFKI
Saarbr¨ucken and CSLI Stanford), grammarians
(for all languages) as well as developers of both the
grammar development tools and of the production
system all used the competence and performance
profiling environment as part of their daily engi-
neering toolbox. The combination of [incr tsdb()]
facilities to parallelize test run processing and a
break-through in client system efficiency (using
the PET parser; Callmeier, 2002) has created an ex-
perimental development environment where gram-
marians can obtain near-instantaneous feedback on
the effects of changes they explore.
For the Japanese grammar specifically, the
grammar developers at both ends would typically
spend the first ten to twenty minutes of the day ob-
taining fresh profiles for a number of shared test
sets and diagnostic corpora, thereby assessing the
most recent set of changes through empirical anal-
ysis of their effects. In conjunction with a certain
rigor in documentation and communication, it was
the ability of both partners to regularly, quickly,
and semi-automatically monitor the evolution of
the joint resource with great confidence that has
enabled truly parallel development of a single,
shared HPSG grammar across continents. Within
a relatively short time, the partners succeeded
in adapting an existing grammar to a new genre
(email rather than spoken language) and domain
(customer service requests rather than appointment
scheduling), greatly extending grammatical cov-
erage (from initially around forty to above ninety
per cent on representative customer corpora), and
incorporating the grammar-based analysis engine
into a commercial product. And even though in
February 2002, for business reasons, YY decided
to reorganize grammar development for Japanese,
the distributed, parallel grammar development ef-
fort positively demonstrates that methodological
and technological advances in constraint-based
grammar engineering have enabled commercial
development and deployment of broad-coverage
HPSG implementations, a paradigm that until re-
cently was often believed to still lack the maturity
for real-world applications.
Acknowledgements
The DFKI – YY partnership involved a large group
of people at both sites. We would like to thank
Kirk Oatman, co-founder of YY and first CEO,
and Hans Uszkoreit, Scientific Director at DFKI,
for their initiative and whole-hearted support to
the project; it takes vision for both corporate and
academic types to jointly develop an open-source
resource. Atsuko Shimada (from Saarland Uni-
versity), as part of a two-month internship at YY,
has greatly contributed to the preparation of repre-
sentative data samples, development of robust pre-
processing rules, and extensions to lexical cover-
age. Our colleague and friend Asahara-san (of the
Nara Advanced Institute of Technology, Japan),
co-developer of the open-source ChaSen tokenizer
and morphological analyzer for Japanese, was in-
strumental in the integration of ChaSen into the
YY product and also helped a lot in adapting and
(sometimes) fixing tokenization and morphology.

References
Asahara, M., & Matsumoto, Y. (2000). Extended
models and tools for high-performance part-of-
speech tagger. In Proceedings of the 18th In-
ternational Conference on Computational Lin-
guistics (pp. 21 – 27). Saarbr¨ucken, Germany.
Butt, M., King, T. H., Ni˜no, M.-E., & Segond, F.
(1999). A grammar writer’s cookbook. Stan-
ford, CA: CSLI Publications.
Callmeier, U. (2002). Preprocessing and encoding
techniques in PET. In S. Oepen, D. Flickinger,
J. Tsujii, & H. Uszkoreit (Eds.), Collabora-
tive language engineering. A case study in ef-
ficient grammar-based processing. Stanford,
CA: CSLI Publications. (forthcoming)
Carroll, J. (1994). Relating complexity to practi-
cal performance in parsing with wide-coverage
unification grammars. In Proceedings of the
32nd Meeting of the Association for Computa-
tional Linguistics (pp. 287 – 294). Las Cruces,
NM.
Copestake, A. (2002). Implementing typed fea-
ture structure grammars. Stanford, CA: CSLI
Publications.
Erbach, G. (1991). An environment for exper-
imenting with parsing strategies. In J. My-
lopoulos & R. Reiter (Eds.), Proceedings of
IJCAI 1991 (pp. 931 – 937). San Mateo, CA:
Morgan Kaufmann Publishers.
Erbach, G., & Uszkoreit, H. (1990). Grammar
engineering. Problems and prospects (CLAUS
Report # 1). Saarbr¨ucken, Germany: Compu-
tational Linguistics, Saarland University.
Flickinger, D., Copestake, A., & Sag, I. A. (2000).
HPSG analysis of English. In W. Wahlster
(Ed.), Verbmobil. Foundations of speech-to-
speech translation (Artificial Intelligence ed.,
pp. 321 – 330). Berlin, Germany: Springer.
Geist, A., Bequelin, A., Dongarra, J., Manchek, W.
J. R., & Sunderam, V. (Eds.). (1994). PVM —
parallel virtual machine. A users’ guide and tu-
torial for networked parallel computing. Cam-
bridge, MA: The MIT Press.
Makino, T., Yoshida, M., Torisawa, K., & Tsu-
jii, J. (1998). LiLFeS — towards a practical
HPSG parser. In Proceedings of the 17th In-
ternational Conference on Computational Lin-
guistics and the 36th Annual Meeting of the
Association for Computational Linguistics (pp.
807 – 11). Montreal, Canada.
M¨uller, S., & Kasper, W. (2000). HPSG analy-
sis of German. In W. Wahlster (Ed.), Verbmo-
bil. Foundations of speech-to-speech transla-
tion (Artificial Intelligence ed., pp. 238 – 253).
Berlin, Germany: Springer.
Oepen, S., & Callmeier, U. (2000). Measure for
measure: Parser cross-fertilization. Towards
increased component comparability and ex-
change. In Proceedings of the 6th International
Workshop on Parsing Technologies (pp. 183 –
194). Trento, Italy.
Oepen, S., & Flickinger, D. P. (1998). Towards
systematic grammar profiling. Test suite tech-
nology ten years after. Journal of Computer
Speech and Language, 12 (4) (Special Issue on
Evaluation), 411 – 436.
Oepen, S., Netter, K., & Klein, J. (1997). TSNLP
— Test Suites for Natural Language Process-
ing. In J. Nerbonne (Ed.), Linguistic Databases
(pp. 13 – 36). Stanford, CA: CSLI Publica-
tions.
Penn, G. (2000). Applying constraint handling
rules to HPSG. In Proceedings of the first in-
ternational conference on computational logic
(pp. 51 – 68). London, UK.
Siegel, M. (2000). HPSG analysis of Japanese. In
W. Wahlster (Ed.), Verbmobil. Foundations of
speech-to-speech translation (Artificial Intelli-
gence ed., pp. 265 – 280). Berlin, Germany:
Springer.
Siegel, M., & Bender, E. M. (2002). Efficient
deep processing of japanese. In Proceedings
of the 19th International Conference on Com-
putational Linguistics. Taipei, Taiwan.
Uszkoreit, H., Backofen, R., Busemann, S., Di-
agne, A. K., Hinkelman, E. A., Kasper, W.,
Kiefer, B., Krieger, H.-U., Netter, K., Neu-
mann, G., Oepen, S., & Spackman, S. P.
(1994). DISCO — an HPSG-based NLP
system and its application for appointment
scheduling. In Proceedings of the 15th Interna-
tional Conference on Computational Linguis-
tics. Kyoto, Japan.
