Experiments in Reusability of Grammatical Resources 
Doug Arnold ° Toni Badia ®, Josef van Genabith% Stella Markantonatou ° 
Stefan Momma% Louisa Sadler °, Paul Schmidt ° 
°Dept of Language and Linguistics, University of Essex, Colchester C04 3SQ, UK 
°Universitat Pompeu Fabra, La Ramba 32, 08002 Barcelona, Spain 
°IMS-CL, Azenbergstrafle 12, University of Stuttgart, D-W7000 Stuttgart, Germany 
DIAI, Martin-Luther-Strai~e 14, D-W6600 Sa~rbrficken 3, Germany 
doug;marks;louisa@essex.ac.uk, tbadia@upf.es, 
steff;josef@ims.uni-stuttgart.de, paul@iai.uni-sb.de 
Abstract 1 Introduction 
Substantial formal grammatical and lex- 
ical resources exist in various NLP sys- 
tems and in the form of textbook speci- 
fications. In the present paper we report 
on experimental results obtained in man- 
ual, semi-antomatic and automatic migra- 
tion of entire computational or textbook de- 
scriptions (as opposed to a more informal 
reuse of ideas or the design of a single "poly- 
theoretic" representation) from a variety 
of formalisms into the ALEP formalism. 1 
The choice of ALEP (a comparatively lean, 
typed feature structure formalism based on 
rewrite rules) was motivated by the as- 
sumption that the study would be most 
interesting if the target formalism is rel- 
atively mainstream without overt ideolog- 
ical commitments to particular grammat- 
ical theories. As regards the source for- 
malisms we have attempted migrations of 
descriptions in HPSG (which uses fully- 
typed feature structures and has a strong 
'non-derivational' flavour), ETS (an un- 
typed stratificational formalism which es- 
sentially uses rewrite rules for feature struc- 
tures and has run-time non-monotonic de- 
vices) and LFG (which is an un-typed con- 
straint and CF-PSG based formalism with 
extensions such as existential, negative and 
global well-formedness constraints). 
1 The work reported in this paper was supported by 
the CEC as part of the project ET10/52. 
Reusability of grammatical resources is an important 
idea. Practically, it has obvious economic benefits in 
allowing grammars to be developed cheaply; for the- 
oreticians it is important in allowing new formalisms 
to be tested out, quickly and in depth, by providing 
large-scale grammars. It is timely since substantial 
computational grammatical resources exist in vari- 
ous NLP systems, and large scale descriptions must 
be quickly produced if applications are to succeed. 
Meanwhile, in the CL community, there is a percep- 
tible paradigm shift towards typed feature structure 
and constraint based systems and, if successful, mi- 
gration allows such systems to be equipped with large 
bodies of descriptions drawn from existing resources. 
In principle, there are two approaches to achiev- 
ing the reuse of grammatical and lexical resources. 
The first involves storing or developing resources in 
some theory neutral representation language, and is 
probably impossible in the current state of knowl- 
edge. In this paper, we focus on reusability through 
migration--the transfer of linguistic resources (gram- 
matical and lexical descriptions) from one compu- 
tational formalism into another (a target computa- 
tional formalism). Migration can be completely man- 
ual (as when a linguist attempts to encode the analy- 
ses of a particular linguistic theory in some compu- 
tationally interpreted formalism), semi-automatic or 
automatic. The starting resource can be a paper de- 
scription or an implemented, runnable grammar. 
The literature on migration is thin, and practical 
experience is episodic at best. Shieber's work (e.g. 
\[Shieber 1988\]) is relevant, but this was concerned 
with relations between formalisms, rather than on 
migrating grammars per se. He studied the extent 
to which the formalisms of FUG, LFG and GPSG 
could be reduced to PATlt-II. Although these stud- 
12 
ies explored the expressivity of the different grammar 
formalisms (both in the strong mathematical and in 
the functional sense, i.e. not only which class of 
string sets can be described, but also what can be 
stated directly or naturally, as opposed to just being 
encoded somehow or other), the reduction was not 
intended to be the basis of migration of descriptions 
written in the formalisms. In this respect the work 
described below differs substantially from Shieber's 
work: our goal has to be to provide grammars in the 
target formalisms that can be directly used for fur- 
ther work by linguists, e.g. extending the coverage or 
restructuring the description to express new insights, 
etc. 
The idea of migration raises some general ques- 
tions. 
• What counts as successful migration? (e.g. 
what properties must the output/target descrip- 
tion have and which of these properties are cru- 
cial for the reuse of the target description?). 
• How conceptually close must source and target 
be for migration to be successful? 
• How far is it possible to migrate descriptions ex- 
pressed in a richer formalism (e.g. one that uses 
many expressive devices) into a poorer formal- 
ism? For example, which higher level expres- 
sive devices can be directly expressed in a 'lean' 
formalism, which ones might be compiled down 
into a lean formalism, and which ones are truly 
problematic? Are there any general hints that 
might be given for any particular class of higher 
level expressive devices? When should effort be 
put into finding encodings for richer devices, and 
when should the effort go into simply extending 
the target formalism? 
• How important is it that the source formalism 
have a well-defined semantics? How far can 
difficulties in this area be off-set if the gram- 
mars/descriptions are well-documented? 
• How does the existence of non-monotonic de- 
vices within a source formalism effect migrata- 
bility, and is it possible to identify, for a given 
source grammar, uses of these mechanisms that 
are not truly non-monotonic in nature and could 
thus still be modelled inside a monotonic de- 
scription? 
• To what extent are macros and preprocessors a 
useful tool in a step-wise migration from source 
to target? 
We can provide some answers in advance of ex- 
perimentation. In particular, successful migration 
implies that the target description must be practi- 
cally usablc that is, understandable and extensible. 
There is one exception to this, which is where a large 
grammatical resource is migrated solely to test the 
(run-time) capabilities of a target formalism. Practi- 
cally, usability implies at least I/O equivalence with 
the source grammar but should .ideally also imply the 
preservation of general properties such as modular- 
ity, compactness and user-friendliness of the specifi- 
cation. 
This paper reports on and derives some lessons 
from a series of on-going experiments in which 
we have attempted automatic, semi-automatic and 
manual migration of implemented grammatical and 
lexical resources and of textbook specifications, writ- 
ten in various 'styles', to the ALEP formalism (see 
below). The choice of ALEP was motivated by the 
assumption the study would be most interesting if 
the target formalism is relatively mainstream. 2 As 
regards the 'style' and expressivity of source for- 
malisms, we have carried out migrations from HPSG, 
which uses fully-typed feature structures and a vari- 
ety of richly expressive devices, from ETS grammars 
and lexicons 3 (ETS is an untyped stratificational 
formalism essentially using rewrite rules for feature 
structures), and from an LFG grammar 4 (LFG is a 
standard untyped AVS formalism with some exten- 
sions, with a CFG backbone). 
2 The Migration Experiments 
2.1 The Target Formalism 
The target formalism, ALEP, is a first prototype im- 
plementation of the formalism specified in the ET- 
6 design study (the ET-6 formalism \[Alshawi et al. 
1991\]). ET-6 was intended to be an efficient, main- 
stream CL formalism without ideological commit- 
ments to particular grammatical theories and suit- 
able for large-scale implementations. It is declara- 
tive, monotonic and reversible, although in ET-6 and 
in ALEP it is possible to model certain non-monotonic 
operations (e.g. getting some treatment of defaults 
out of parametrised macros). ALEP is CF-PSG rule 
based and supports feature structures which are 
typed and simple inheritance between types. Type 
information and inheritance is effective only at com- 
pile time. ALEP provides atoms, lists, booleans and 
terms as basic types. Complex structured types and 
simple inheritance relations are defined by the user in 
a type system specification. In addition to standard 
grammar rules which are effective during a parse 
(generation) the formalism provides refinement rules 
which operate on the output of the parser and spec- 
ify values which are still undefined after parsing by 
using only unification. Although the core formal- 
ism is rather conservative, for reasons of efficiency, 
it is intended to support the eventual inclusion of 
a periphery including external constraint processing 
20f course, for practical purposes one might want to 
migrate resources to a non-standaxd formalism, provided 
it is relatively easy to understand. 
3Developed at Saaxbrficken, Essex and UMIST during 
the EUROTRA project. 
4Developed at Stuttgart as part of the EUROTRA 
accompanying research, see \[Meier 1992\]. 
13 
modules. Similarly, it does not (yet) directly provide 
potentially computationally expensive expressive de- 
vices such as e.g. set-valued features and operations 
on sets, functionally dependent vMues, separation of 
ID and LP statements, multiple inheritance or mem- 
bership and concatenation constraints on lists. The 
idea is that such extensions should be provided, prob- 
ably as external modules, as and when they are found 
to be necessary. 5 
2.2 Manual Migration from HPSG 
Although both HPSG and ALEP use typed feature 
structures and support type inheritance, they dif- 
fer crucially in that HPSG specifications are con- 
sciously non-derivational and strongly modularised 
in terms of sets of principles, immediate dominance 
schemata and linear precedence statements operat- 
ing as constraints on typed feature structures. To 
achieve this, HPSG employs a number of powerful 
descriptive devices, including list and set operations 
(often expressed as functionally dependent values), 
and multiple type inheritance. The focus for the 
HPSG , ALEP conversion, then, is to what ex- 
tent can the latter, rather lean formalism support in 
a reasonable way the style of linguistic specification 
found in HPSG (the source specifications for this ex- 
eriment was the description of English provided in 
ollard & Sag 1992\]). 
Various approaches to conversion are possible. For 
example, it would be possible to define a user lan- 
guage permitting the expression of principles (in 
much the same way as some formalisms permit fea- 
ture percolation principles to be separately stated) 
and a compiler into ALEP allowing their effects to 
be expanded into the rules. In this spirit, follow- 
ing the work of Mellish \[Mellish 1988\] the technique 
of encoding boolean combinations of atomic feature 
values so that satisfaction can be checked by unifi- 
cation is adopted in the ET-6 formalism \[Alshawi et 
al. 1991\]. 
Since there were open questions as what could be 
directly expressed in ALEP, in this conversion experi- 
ment we first took a direct approach, essentially em- 
ploying ALEP as a feature term rewriting system for 
HPSG specifications. The focus of this conversion 
was mainly on exploring the limits of the expressiv- 
ity of ALEP and thus identifying which higher level 
expressive devices could not be treated. 
The resulting translation is not as perspicuous, 
modular, compact and maintainable as the original 
HPSG specification. Migration results in a fragmen- 
tation and particularisation of the linguistic infor- 
mation encoded in the original specification. This is 
because (i) HPSG principles and schemata have to 
be compiled out into (possibly large) sets of ALEP 
SApart from investigating issues involved in migration 
of descriptions, one motivation for these experiments is 
to explore just which devices are essential for expressing 
linguistically motivated grammatical descriptions. 
phrase-structure rules; and (ii) some descriptions 
cast in a richly expressive formalism have to be sim- 
ulated and can often only be approximated in ALEP. 
For example, ID-2 and the valence principle as it 
applies to ID-2, (1) has to be approximated with sets 
of ALEP rules of the form in (2), because of the lack 
of the functional constraint val_append. 
(1) ID-2 and Valence Principle (simplified): 
\[SYlgSEM \[ LOC \[ CAT \[ COMPS e 1 
DRTS l HDTR\[ SYNSEN \[ L0C l CAT l CONPS val_append(@ 1, @2) 
COMPDTRS @2\] 
(2) ALEP rules for ID2: 
id_2_0 = sign:{ .... comps => I'\] .... } -> 
\[sign:{ .... comps => \[\] .... }\] head 1. 
id_2_1 = sign:{ .... comps => \[\] .... } -> 
\[sign:{ .... comps => IX\] .... }, 
sign:{ .... synsel => X .... }\] head 1. 
id_2_2 = sign:{ .... comps -> \[\] .... } -> 
\[sign:{ .... comps => IX,Y\] .... }, 
sign:{ .... synsel -> I .... }, 
sign:{ .... synsel => Y .... }\] head 1. 
id_2_3 = . ............ 
Of course, by adopting binary branching trees and 
altering the ID and Subcategorisation principles it 
would be possible to avoid some of this verbosity, but 
for the purposes of our experiment we considered it 
important to investigate the migration of the source 
formalism as is. 
Note that the resulting ALEP specification in (2) is 
as compact, perspicuous and maintainable as in any 
rule based grammar formalism, although it compares 
badly with HPSG in these terms. While initially it 
seemed that it was possible to create a usable, ex- 
tensible and understandable ALEP grammar on the 
basis of HPSG specifications, there is one feature of 
HPSG which remains problematic, that of set-valued 
features and set operations. The difficulty comes in 
modelling principles such as the HPSG Quantifier 
Inheritance Principle (QIP), which relies on the op- 
erations such as set union and complementation. 
In ALEP set union can be approximated to a certain 
extent in terms of list concatenation in a difference 
list based threading approach. However, since the 
current implementation of ALEP does not even pro- 
vide membership constraints on list representations, 
element and set difference constraints can only be ap- 
proximated in terms of a multitude of minimally dif- 
fering rules naming elements in set representations. 
This approach is only safe if the following two con- 
ditions hold: 
• the sets involved are finite 
• elements in the difference list representations of 
sets are unique 
Even for small sets, however, any exhaustive im- 
plementation of set difference in terms of naming el- 
14 
"SYNSEM: \[LOC: \[CONTENT:\[QUANTS:RETR U HQUANTS\]\]\] 
QSTORE:(HQSTORE U QUANTS1 U...U QUANTS,}- RETR 
RETRVD:RETR FHDTR:\[SYNSEM:\[LOC: 
\[CONTENT:\[QUANTS:HQUANTS\]\] \]\]\] DTRS: I °TRI IQSTO.\] RE:QuANTSI\] J 
LDTR. \[QSTORE:QUANTS,\] 
Figure h Quantifier Inheritance Principle (simplified) 
ements in the representation results in an unaccept- 
able number of rules and associated parse time. In 
some cases we were able to avoid this problem by 
relegating e.g. quantifier retrieval to sets of refine- 
ment rules which operate on parse objects which are 
effectively underspecified for quantifier scope. 
It soon became clear that sets of refinement rules 
are not a general solution for the modelling of el- 
ement or set complement constraints in HPSG be- 
cause they operate on the output of a parse and 
hence cannot decide about the 'phrase' structure of 
a sign. Introducing and filling gaps, however, is cen- 
tral to the structure of a sign. The Nonlocal Feature 
Principle (NFP) which is at the heart of the ttPSG 
treatment of unbounded dependency constructions 
(UDCs) ensures that SYNSEM I NONLOC I INHER values 
are discharged in terms of a set difference specifica- 
tion which cannot be implemented in terms of sets 
of refinement rules since it directly decides about 
the well-formedness of strings in terms of the phrase 
structure of the sign. 
IOTR.: \[SYNSEM:\[NONtOC:tINHER:Sn\]\]\] 
Figure 2: Nonlocal Feature Principle (simplified) 
Furthermore, parasitic gap phenomena in English as 
in That was the rebel leader who rivals of_ shot _ 
suggest that at least as far as the NFP is concerned 
it is problematic to asssume that elements in the dif- 
ference list representations of sets are unique. This 
assumption is crucial to modeling set union in terms 
of list concatenation. 
Formally, HPSG principles can either be given the 
status of proper types or that of typed feature struc- 
ture templates acting as constraints on other feature 
structures. In ALEP the first option is not available 
to us since apart from subtype or supertype infor- 
mation the type system specification does not allow 
the specification of a type other than in terms of 
its root attributes and the type of their correspond- 
ing values and more importantly it does not support 
multiple inheritance required to inherit principles to 
other types. In order to recapture some of the loss of 
modularity in compiling out HPSG principles over 
sets of ALEP rules we thus tried to pursue the sec- 
ond option using m4 macros to directly state princi- 
ples. m4 is a standard UNIX facility which allows for 
parameterised and non-parameterised macros, con- 
ditional expansions and numeric operations. Macros 
are expanded externally to ALEP and not during com- 
pilation time. Each HPSG principle can be rep- 
resented as a feature structure template which in 
turn can be specified in terms of a macro defini- 
tion, or so it seems. The problem here, however, is 
that since IIPSG principles mutually constrain signs, 
the conjunction of such principles (at least in simple 
cases) corresponds to the unification (or merging) of 
their feature structure template representations (if 
the conjunction is satisfiable). What standard macro 
facilities achieve is effectively a simple lexical expan- 
sion of strings and it is impossible to get the merging 
effect of unification of template feature structures out 
of a modular macro specification of such templates. 
Basically, three options are available to us: 
(i) To get the overlapping effect of unification we 
integrate different principles into one macro. 
(ii) We define extended types with special attributes 
for each of the relevant HPSG principles which 
are expanded by modular macro definitions of 
the principles and get the unification effect from 
ALEP at compile time through proper coindexa- 
tion. 
phrase{phrase ffi> QS{PHRASE}, 
hfp ffi> @S{HEAD_FEATURE_PRINC}, 
sp ~> @S{SEMANTICS_PRINC}, 
qip -> QS{QUANTIF_INHERIT_PRINC}, 
valp => GS{VALENCY_PRINC}} 
(iii) We use a more powerful 'macro' processor like 
e.g. Prolog which provides the unification effect 
and define a map into ALEP. 
In the case of (i) the modularity of ttPSG with 
separately stated, but interacting principles is lost. 
(ii) hasthe disadvantage that the ALEP specifications 
grow in size while in the case of (iii) we are not con- 
15 
sidering the expressivity of the target formalism it- 
self. 
2.3 Automatic Migration from ETS B-rules 
In this section we draw some general conclusions 
following from our experience of attempting auto- 
matic migration from an untyped rule-based formal- 
ism. Specifically, the source for this experiment was 
the structure-building rules of some relatively large 
ETS grammars. The ETS formalism is "badly be- 
haved" in that it contains a rich array of devices ad- 
ditional to the structure-building or B-rules, many of 
which are non-monotonic, and which apply at run- 
tim e (they are mainly output filters and various types 
of feature percolation rules). We have written an au- 
tomatic compiler in Prolog which calculates a very 
simple type system and automatically migrates the 
structure rules and lexical descriptions. With respect 
to the source formalism in question, the following 
points are to be noted: 
• The run-time non-monotonic devices found in 
ETS are extremely problematic to take into ac- 
count in automatic direct migration. We doubt 
whether it would be possible to write an intelli- 
gent compiler which directly encoded the effect 
of these devices in the resultant ALEP rule set. If 
they are ignored in the migration process, then 
of course the source and target descriptions are 
not I/O equivalent. 
• The B-rules themselves allow optionality, Kleene 
closure, positive Kleene closure and disjunction 
over (sequences of) daughters to any degree of 
embedding within each other. In ALEP such 
rules have to be compiled out into a normal 
form which allows only for optionality over sin- 
gle daughters and no disjunctions of daughters. 
The size of the resulting rule set is such that 
it cannot be reasonably maintained. The size 
also means that it is impossible for a linguist to 
manually "correct" an overgenerating grammar 
resulting from the omission of filters and feature 
rules above. 
• In some cases, it became apparent during the 
migration process that the intended semantics 
of the (very complex) phrase structure rules was 
unclear (e.g. regarding the scope of variables in 
Kleene starred constituents). 
One conclusion is that one of the crucial ingredi- 
ents is the quality and detail of the documentation 
of grammars. With good documentation it is often 
possible to get around the effects of unclear rule se- 
mantics, because the rule writers intention can be 
understood. The lack of such documentation is se- 
rious, since it means the migrator has to try to in- 
tuit the intended behaviour by attempting to run the 
source grammars in the source formalism. 
Similarly, so long as the intended interpretation is 
clear, it may be possible to deal with non-monotonic 
devices. This is most obvious where the non- 
monotonic effects do not persist to run-time (but 
see also our discussion of the LFG migration below). 
For example the ALVEY grammar \[Carroll 1991\] has 
them, but since there is an object grammar stage 
in which all this is compiled out, the non-montonic 
devices can be avoided by taking the object gram- 
mar as the input to migration. The issue is then 
whether it is possible to automatically 'recompact' 
the target grammar in some linguistically useful way, 
or whether all extension and maintenance should be 
done in the source formalism. 
Note further that even if the grammars resulting 
from a migration are not linguistically useful (for ex- 
ample, because the grammar is not maintainable or 
extensible), they may serve some purpose in testing 
the capacity of the target formalism to operate (ef- 
ficiently) with very large rule sets (for example, in 
our experimentation, a rule set of some 1,500 rules 
derived by automatic migration caused ALEP to fail 
to compute the link relation). 
ETS lexical descriptions are more successfully mi- 
gratable because their semantics is clear. Simple 
parameterised macros have been used in a semi- 
automatic migration process. 
2.4 Automatic LFG importation into ALEP 
LFG is an untyped constraint-based linguistic for- 
realism with rich expressive devices built around a 
CFG backbone. The formalism has been imple- 
mented in various systems, including XEROX PARC's 
Grammar Writer's Workbench, and the CHARON 
system developed as part of the accompanying re- 
search for EUROTRA-D carried out at the University 
of Stuttgart. Our automatic migration experiment 
started from grammars written for the latter system. 
We have written a Prolog program that translates 
automatically from an LFG notation that is very 
close to the original specification in \[Bresnan 1982\] 
into ALEP. For reasons explained further below, the 
program cannot succeed in all cases. It is, however, 
capable of detecting those cases reliably, and gener- 
ates warnings where the fully automatic translation 
fails. 6 Examples for typical rules from the source 
grammar are shown in figure 3. 7 
The translation of the rule format illustrated in fig- 
ure 3 into a PROLOG readable form is performed by 
a subcomponent of the CHARON system. The auto- 
matic translation procedure makes use of the output 
of this precompilation step. 
The rule format supports optionality of con- 
stituents, nested optionalities and Kleene starred 
rule parts, which have to be expanded in the ALI,~P 
translation. ALEP only supports optionality of single 
daughters in the RHS of rules. In our case, this part 
of the expansion is done by the preprocessor. The 
eThe program was developed by Dieter Kohl at IMS. 
7The caret sign and the lowercase v are ASCII repre- 
sentations of the metavariables T and 1, respectively. 
16 
VP'' -> VP' 
\[v 
{/ (- vco~) = v 
/ =v /} 
---- V\]. 
Cl -> C 
VP2 
= v 
{/ " = v 
{/(" VTYPE) = v2 
/(" VTYPE) = vl /} 
\[(" FCOMP) = v 
{/ (" VTYPE) = v:fin 
/ (" VTYPE) = inf /} 
/}. 
Figure 3: Sample grammar rules from the source de- 
scription 
result of compiling out Kleene starred rules and op- 
tionalities is that the object grammar quickly reaches 
a size that can no longer be reasonably maintained 
and the target description contains elements (in this 
case auxiliary categories) which are not part of the 
linguistic intuition of the grammar writer. 
The second characteristic feature of rules like the 
ones shown in figure 3 is the massive use of complex 
disjunctions over feature structures (indicated by the 
{\ and \} pairs). Although the ALEP formalism sup- 
ports disjunctions over complex feature structures, 
due to problems in the implementation available at 
the time of the experiment, they had to be multiplied 
out into a possibly large number of separate rules. 
The next example (figure 4) shows a typical lexical 
entry from the source grammar. 
bietet: V, (~ OBJ AGR CAS) =acc 
(" PLIED) = "bieten <(" SUBJ)(" OBJ)>" 
(" SUBJ AGE Bq/M) = sg 
(" SUBJ AGR CAS = nora 
(" TENSE) = present 
(" INF) =- 
(" FORM) =c an <--- 
(" VERBTYPE) = particle. 
Figure 4: Sample lexicon entry from the source de- 
scription 
The basic part of the annotations of the LFG rules 
and lexicon, i.e. the defining equations, are mapped 
easily into ALEP. The work here is divided between 
the CHARON preprocessor which converts feature de- 
scriptions (the equations) into feature terms, and the 
output routine which maps feature terms into ALEP 
rules and lexicon entries. 
In LFG, path specifications in equations can be 
variables, as in the (" (v PCASE)) case, where the at- 
tribute under which the f-structure associated with v 
is determined by the value of a feature inside v. ALEP 
does not support variable path expressions, therefore 
we have to enumerate all possible paths in a large dis- 
junction which adds another factor to the multiplica- 
tive expansion of the rule set. Similar facts hold for 
the implementation of functional uncertainty, where 
we have to deal with regular expressions over paths, s 
LFG permits "special" types of equation besides 
the standard defining ones. Constraining (=c type) 
equations in our source grammar typically occur in 
lexical entries as the one shown in figure 4, where 
a given form of e.g. a verb has to be distinguished, 
because it is only used in particular contexts. The 
equation is then typically a specification of a special 
subclass of a more general class of verbs (here a verb 
which can occur with a separable prefix). Where this 
is the case, in the migrated description the relevant 
distinction can be made in the type system, ensur- 
ing that non-membership in the particular subtype 
is explicitly stated for all (relevant) members of the 
supertype. 
Another, potentially very powerful expressive de- 
vice in the LFG formalism is the use of existential 
and negative existential constraints (in the CHARON 
notation expressed as !(" INF) and "(" INF), re- 
spectively). Current implementations of LFG delay 
the evaluation of such constraints, because in gen- 
eral, they can only be tested at the end of the pro- 
cessing of a whole utterance. It turns out, however, 
that quite often existential and negative existential 
constraints can be disposed of, if a full type sys- 
tem is available. Careful examination of the source 
grammars reveals that the prevalent use of such con- 
straints is exactly to model what feature appropriate- 
ness conditions in a type system do: they restrict the 
application of particular rule types to feature struc- 
tures where a given set of features is either present or 
absent. To model this by using the type system in- 
stead, we introduce subtypes of the structure where 
the path has to or must not exist. 
If the source grammar only uses negative existen- 
tial constraints for atomic valued features, we could 
easily formulate a proper type system, and do away 
with '-', and '!' in a rather straightforward manner. 
Typical uses of e.g. negative existential constraints 
are shown in the rule and lexical entry in figure 5. 
LFG uses set values for collecting e.g. adjuncts 
which do not have any other distinguishing function 
on the f-structure level. ALEP does not support the 
direct expression of sets as values. Given the facts 
of German word order, generation would seem to re- 
quire sets of ADJUNCTS as values, rather than lists. 
Here we do in fact loose some expressivity if we try 
to model adjuncts in ALEP using lists, because the 
canonical set operations are not available. 
Finally, we have to be able to express the (non- 
monotonic) global completeness and coherence con- 
Sin a recent experiment, the implementors of the 
CHARON system added support for functional uncer- 
tainty modelled via an interpretation of paths as se- 
quences and general operations on these sequences. 
17 
C ->V  V 
{/(" VTYPE) = v2 
/(" VTYPE) ffi vl /} 
"(" INF). 
kennen: V, (" PRED) ffi "kennen<(" SUBJ)(* 0BJ)>" 
(" OBJ AGR CAS) = ace 
(/ (" SUBJ AGR ~OM) ffi pl 
(" SUBJ AGR CIS) = nora 
(" TENSE) -- present 
" (- I~F) 
/ (" INF PEPS) --- 
(" U~ACC) = - /}. 
Figure 5: Examples for negative existential con- 
straints in the rules and the lexicon 
straints which help to control subcategorisation. Of 
these two, the coherence condition can be easily con- 
trolled by defining types with the appropriate num- 
ber of features, one for each of the subcategorised 
functions. The introduction of additional syntactic 
functions which are not subcategorised for is then 
prevented by the type system. The completeness 
condition, however, which is supposed to guarantee 
that all syntactic functions in the subcategorisation 
frame are filled, can not be handled that easily. The 
main problem here is, that while we are able to re- 
quire that a certain feature be present in a feature 
structure, we cannot express restrictions on the de- 
gree of instantiation of the value of that feature. 
There is, of course, another option: If we model 
subcategorisation more explicitly, introducing 'sub- 
cat lists' as data structures in much the same way 
as HPSG does, we can add the requirement that PS 
rules consume elements of the subcat list. Besides 
the question whether such a modelling is still com- 
patible with the spirit of LFG theory as it stands, 
the proposed solution does not solve the problem 
for a German LFG grammar: in order to model 
the variability of German word order, we have to 
be able to pick arbitrary elements from the subcat 
list, rather than relying on a fixed order in which ele- 
ments are picked. Since list operations (or functional 
constraints in general) are not available in ALEP, this 
can currently not be modelled perspiciously. 
In summary, then, the philosophy of the grammar 
can be maintained, and a type system can be pro- 
vided. To a certain extent, it can express LFG's 
non-monotonic devices such as existential, negative 
existential and constraining equations and the global 
wellformedness constraints of completeness and co- 
herence. The target grammar is less compact, be- 
cause generalisations are lost, through the multi- 
plicatory effect of spelling out optionalities, Kleene 
stars and variables over attribute names. 
2.5 Technical description of the automatic 
conversion procedure 
The automatic conversion has to accomplish three 
basic tasks: 
• A conversion of the grammar rules into ALEP 
format 
• A conversion of lexical entries into the ALEP 
lexicon format 
• The extraction of a certain amount of type in- 
formation from the LFG grammar to be used in 
the ALEP descriptions. 9 
We will not go into details of the CHARON pre- 
compilation, since the techniques employed are stan- 
dard (expansion of optionality and Kleene star con- 
stituents, as well as compilation of feature descrip- 
tions into feature terms). As regards the extraction 
of type information from the untyped LFG descrip- 
tion, more explanation is needed, however. 
In the current incarnation of the conversion rou- 
tine, the following strategies are used: 
• each attribute is assigned (at least) one type 
name 
• atomic-valued features and PREDS are used dur- 
ing compilation to compute value ranges for 
their corresponding types 
• features with complex values have their possi- 
ble values (and the attributes therein) collected 
during compilation, and the compiler then de- 
termines the corresponding types at the end of 
the compilation. 
• the output routines take care of the fact that 
types that represent atomic values or terms are 
spelt out correctly (i.e. that they do not show 
up as type definitions, but are inserted directly) 
• if we encounter more than one type name for 
the value of a given attribute, further processing 
is necessary, because reentrancies are involved 
or we have an interaction with the e-structure 
skeleton which has to be handled separately. 
In all those cases, where the compilation cannot pro- 
duce satisfactory results, the intermediate structures 
are printed out instead, together with a comment 
saying which steps failed indicating where further 
hand-tuning is required. 
In particular, 
• sets are encoded as open ended lists, thus not 
solving the free order problem mentioned above 
• the uniqueness condition is marked through the 
use of a term for the value of PRED 
• for compilation steps which modify the original 
structure of the grammar (e.g. turning inequa- 
tions in finite domains into disjunctions, map- 
ping constraining equations onto defining ones, 
if the automatic inference of the proper subtypes 
°We also have to provide the ALEP runtime sys- 
tem with information about headness in grammar rules, 
which is crucial for the proper operation of at least one 
of the parser modules provided with the system. 
18 
is not yet possible, etc.) a warning is issued in 
the resulting ALEP code in the form of a com- 
ment 
s headness information is selected according to 
the following heuristics: 
- derivation to e have no head information 
associated (naturally) 
- unary-branching nodes have a trivial head 
- for non-unary-branching rules 
* those categories that can rewrite to e are 
eliminated from the list of head candi- 
dates (if all daughter nodes are elimi- 
nated this way, the first daughter is se- 
lected as the head, and a comment ap- 
pears with the rule) 
* if pure preterminal nodes are among the 
remaining ones, the first one is selected 
as the head 
. otherwise, all left-recursive nodes are 
eliminated (with a similar strategy for 
taking the remaining leftmost node, if 
all nodes would be eliminated) 
. among the remaining nodes, again the 
leftmost node is selected as the head 
* if everything is left-recursive, the left- 
most node is selected, and a comment is 
generated accordingly in the output. 
Compiling out the rule given in figure 3 yields 
(among others) the ALEP structure in figure 6, the 
result of the compilation of the lexical entry from 
figure 5 is shown in figure 7 (again, only one of the 
disjuncts is shown). 
vp2_vp_v = 
ld: { spec => get_Specifier_t: { }, 
syn => vp2_Syntax_t: { }, 
fs => QV_FS vp_Cat_t: 
{ vcomp -> Vp_I_FS}, 
pho => phones: { string -> Vp_Str, 
rest => Rest } } -> 
\[ld: { syn => vp_Syntax t: { }, 
fs => Vp_I_FS, 
pho => phones:{ string => Vp_Str, 
rest => V_Str } } 
ld: { syn => v_Syntax_t: { }, 
fs => V_FS, 
pho => phones: { string => V_Str, 
rest => Rest } }\] 
head 2. 
Figure 6: Compiled rule from figure 1 
3 Conclusion 
Our experiments have demonstrated that migrations 
of various sorts can be performed with a reasonable 
degree of success. 
kennen " 
Id: {spec => get_Specifier_t: {}, 
pho => phones: 
{string-> \[kennen \[ R\], 
rest => R}, 
syn => v_Syntax_t: { }, 
subcat =>\[ld: 
{syn => alp_Syntax_t: {}, 
fs => Subj}, 
ld: 
{ syn => dp_Syntax_t: {}, 
fs => Obj}\], 
fs => cpl_Cat_t: 
{ pred -> pred_FS_t: 
{semuame => kennen, 
semargs => suhj_obj}, 
subj "> @Subj dp_Cat_t: 
{pred "> _}, 
obj -> @Obj dp_Cat_t: 
{pred => _, 
asr -> agr_FS_t : 
{ cas -> ace}}, 
inf -> inf_FS_t_kv: {perf => -}, 
unacc => -}}. 
Figure 7: Compiled lexical entry from figure 3 
As regards the general questions about migration 
posed at the beginning, we can formulate some (par- 
tial) answers. 
• Successful migration obviously involves more 
than just I/O equivalence of source and target 
descriptions. One also looks for similar degrees 
of 'descriptive adequacy' (i.e. compactness, per- 
spicuity, maintainability etc.). Clearly reusabil- 
ity implies usability. However, this is not an ab- 
solute property, and a small loss of such proper- 
ties can be acceptable. It is clear, however, that 
the loss of maintainability that we have experi- 
enced in some of the migration activities above 
is unacceptable. 
• How conceptually close must source and target 
be for migration to be successful? We have seen 
that in principle it is possible to migrate re- 
sources across certain formal/ideological divides 
-- for example, from ttPSG, which has no rules, 
but uses types extensively, to ALE\]', which has a 
weaker type system, and is CF-PSG rule based; 
and from LFG (which does not use typed feature 
structures) to ALEP. The migration of IIPSG 
specifications into the rule based ALEP entails 
a considerable degree of fragmentation and par- 
ticularisation of the linguistic information en- 
coded in the original specification. To a certain 
extent this can be recaptured if the target for- 
malism provides an integrated template facility 
which is not restricted to simple lexical expan- 
sion. We have also suggested that good docu- 
mentation can alleviate the effects of distance 
19 
between formalisms. 
• With respect to the migration of descriptions us- 
ing richer expressive devices, it is clear that it is 
sometimes possible to dispense with the richer 
devices, and that some descriptions couched in 
richer formalims do not use them in any crucial 
way. The HPSG conversion experiment, how- 
ever, has clearly shown that for set valued fea- 
tures, and operations on sets, a naive encoding 
is simply unacceptable. 
• We have seen that the effect of non-monotonic 
devices in a source formMism can be serious, es- 
pecially when it is combined with unclear rule 
semantics (c.f. the ETS conversion experiment). 
However, the existence of an 'object' formalism 
where the non-monotonic devices are compiled 
out (like in the case of the ALVEY grammars) is 
an asset, and again, good documentation helps. 
Particularly in the case of the LFG conversion 
experiment it became clear that often there is a 
crucial difference between the availability of cer- 
tain non-monotonic devices and their actuM use. 
E.g. it was found that existential constraints are 
often used to express subtype information. If 
the type system is rich enough, this information 
can be modelled in the type system specification 
in the target formalism. 
• As expected, we have found macros and pre- 
processors a useful tool, especially in the semi- 
automatic migration of lexical resources. In 
order to approximate a principles based style 
of linguistic description like in HPSG the tar- 
get formalism should be extended with an in- 
tegrated template facility which determines sat- 
isfiability of templates (principles) in terms of 
unification. 
References 
\[Alshawi et al. 1991\] Hiyan Alshawi, Arnold D J, 
Backofen It, Carter D M, Lindop J, Netter K, 
Pulman S G, Tsujii J & Uszkoreit H, (1991), Eu- 
rotra ETa/l: Rule Formalism and Virtual Ma- 
chine Design Study (Final Report), CEC 1991. 
\[Bresnan 1982\] Joan Bresnan (ed.), (1982). The 
Mental Representation of Grammatical Rela- 
tions. MIT Press, Cambridge, Massachusetts, 
1982 
\[Carroll 1991\] J. Carroll, E. Briscoe & C. Grover 
(1991). A Development Environment for Large 
Natural Language Grammars, distributed with 
the Third Release. 
\[Meier 1992\] Meier, J. (1992). "Eine Grammatik 
des Deutschen im Formalismus der Lexikaliseh 
Funktionalen Grammatik unter Beriicksichti- 
gung funktionaler Kategorien". Iteport, Univer- 
sit,it Stuttgart. 
\[Mellish 1988\] Chris Mellish (1988) "Implementing 
Systemic Classification by Unification", Com- 
putational Linguistics, 14, pp 40-51. 
\[Pollard& Sag 1992\] Carl Pollard & Ivan Sag, 
(1992). Head Driven Phrase Structure Gram- 
mar, Chicago University Press, forthcoming. 
\[Shieber 1988\] Stuart M. Shieber (1988), "Separat- 
ing Linguistic Analyses from Linguistic The- 
ories", in U. Reyle and C. l~hrer Natural 
Language Parsing and Linguistics Theories, 
D. Reidel Publishing Co. Dordrecht, pp 33-68. 
20 
