EFFICIENT PARSING FOR FRENCH* 
Claire Gardent 
University Blaise Pascal - Clermont II and University of Edinburgh, Centre for Cognitive Science, 
2 Buccleuch Place, Edinburgh EH89LW, SCOTLAND, LrK 
Gabriel G. B~s, Pierre-Franqois Jude and Karine Baschung, 
Universit~ Blaise Pascal - Clermont II, Formation Doctorale Linguistique et Informatique, 
34, Ave, Carnot, 63037 Clermont-Ferrand Cedex, FRANCE 
ABSTRACT 
Parsing with categorial grammars often leads to 
problems such as proliferating lexical ambiguity, spu- 
rious parses and overgeneration. This paper presents a 
parser for French developed on an unification based 
categorial grammar (FG) which avoids these pro- 
blem s. This parser is a bottom-up c hart parser augmen- 
ted with a heuristic eliminating spurious parses. The 
unicity and completeness of parsing are proved. 
INTRODUCTION 
Our aim is twofold. First to provide a linguistical- 
ly well motivated categorial grammar for French 
(henceforth, FG) which accounts for word order varia- 
tions without overgenerating and without unnecessary 
lexical ambiguities. Second, to enhance parsing effi- 
ciency by eliminating spurious parses, i.e. parses with 
• different derivation trees but equivalent semantics. 
The two goals are related in that the parsing strategy 
relies on properties of the grammar which are indepen- 
dently motivated by the linguistic data. Nevertheless, 
the knowledge embodied in the grammar is kept inde- 
pendent from the processing phase. 
1. LINGUISTIC THEORIES AND 
WORD ORDER 
Word order remains a pervasive issue for most 
linguistic analyses. Among the theories most closely 
related to FG, Unification Categorial Grammar 
(UCG : Zeevat et al. 1987), Combinatory Categorial 
Grammar (CCG : Steedman 1985, Steedman 1988), 
Categorial Unification Grammar (CUG : Karttunen 
1986) and Head-driven Phrase Structure Grammar 
(I-IPSG: Pollard & Sag 1988) all present inconvenien- 
ces in their way of dealing with word order as regards 
parsing efficiency and/or linguistic data. 
* The workreported here was carried outin the ESPR/T Project 393 
ACORD, ,,The Construction and Interrogation of Knowledge 
Bases using Natural Language Text and Graphics~. 
280 
In UCG and in CCG, the verb typically encodes the 
notion of a canonical ordering of the verb arguments. 
Word order variations are then handled by resorting to 
lexical ambiguity and jump rules ~ (UCG) or to new 
combinators (CCG). As a result, the number of lexical 
and/or phrasal edges increases rapidly thus affecting 
parsing efficiency. Moreover, empirical evidence does 
not support the notion of a canonical order for French 
(cf. B~s & Gardent 1989). 
In contrast, CUG, GPSG (Gazdar et al. 1985) and 
HPSG do not assume any canonical order and subcate- 
gorisation information is dissociated from surface 
word order. Constraints on word order are enforced by 
features and graph unification (CUG) or by Linear Pre- 
cedence (LP) statements (HPSG, GPSG). The pro- 
blems with CUG are that on the computational side, 
graph-unification is costly and less efficient in a Prolog 
environment than term unification while from the 
linguistic point of view (a) NP's must be assumed 
unambiguous with respect to case which is not true for 
- at least - French and (b) clitic doubling cannot be ac- 
counted for as a result of using graph unification 
between the argument feature structure and the functor 
syntax value-set. In HPSG and GPSG (cf. also Uszko- 
reit 1987), the problem is that somehow, LP statements 
must be made to interact with the corresponding rule 
schemas. That is, either rule schemas and LP state- 
ments are precompiled before parsing and the number 
of rules increases rapidly or LP statements are checked 
on the fly during parsing thus slowing down proces- 
sing. 
2. THE GRAMMAR 
The formal characteristics of FG underlying the 
parsing heuristic are presented in §4. The characteris- 
tics of FG necessary to understand the grammar are re- 
sumed here (see (B~s & Gardent 89) for a more detailed 
presentation). 
t Ajumpmle of the form X/Y, YfZ ---~ X/Z where X/Yis atype raised 
NP and Y/Z is a verb. 
FG accounts for French linearity phenomena, em- 
bedded sentences and unbounded dependencies. It is 
derived from UCG and conserves most of the basic 
characteristics of the model : monostratality, lexica- 
lism, unification-based formalism and binary combi- 
natory rules restricted to adjacent signs. Furthermore, 
FG, as UCG, analyses NP's as type-raised categories. 
FG departs from UCG in that (i) linguistic entities 
such as verbs and nouns, sub-categorize for a set 
- rather than a list-of valencies ; (ii) a feature system 
is introduced which embodies the interaction of the 
different elements conditioning word order ; (iii) FG 
semantics, though derived directly from InL ~, leave the 
scope of seeping operators undefined. 
The FG sign presents four types of information re- 
levant to the discussion of this paper : (a) Category, Co) 
Valency set ; (c) Features ; (d) Semantics. Only two 
combinatory rules-forward and backward concatena- 
tion - are used, together with a deletion rule. 
A Category can be basic or complex. A basic ca- 
tegory is of the form Head, where Head is an atomic 
symbol (n(oun), np or s(entence)). Complex categories 
are of the form C/Sign, where C is either atomic or 
complex, and Sign is a sign called the active sign. 
With regard to the Category information, the FG 
typology of signs is reduced to the following. 
(1)Type Category Linguistic entities 
f0 Head verb, noun 
fl Head/f0 NP, PP, adjective, adverb, 
auxiliary, negative panicles 
f2 (fl)/signi 
(a) sign i = f0 
Co) sign i = fl 
Determiner, complementi- 
zer, relative pronoun 
Preposition 
Thus, the result of the concatenation of a NP (fl) 
with a verb (f0) is a verbal sign (f0). Wrt the concate- 
nation rules, f0 signs are arguments; fl signs are either 
functors of f0 signs, or arguments of f2 signs. Signs of 
type 1"2 are leaves and fanctors. 
Valencies in the Valency Set are signs which ex- 
press sub-categorisation. The semantics ofa fO sign is 
a predicate with an argumental list. Variables shared by 
the semantics of each valency and by the predicate list, 
relate the semantics of the valency with the semantics 
of the predicate. Nouns and verbs sub-categorize not 
only for "normal" valencies such as nom(inative), 
dat(ive), etc, but also for a mod(ifier) valency, which is 
consumed and recursively reintroduced by modifiers 
(adjectives, laP's and adverbs). Thus, in FG the com- 
: In/. (Indexed language) is the semantics incorporated to UCG ; it 
derives from Kamp's DRT. From hereafter werefer to FG semantics 
as InL'. 
281 
plete combinatorial potential of a predicate is incorpo- 
rated into its valency set and a unified treatment of 
nominal and verbal modifiers is proposed. The active 
sign of a fl functor indicates the valency - ff any - 
which the functor consumes. 
No order value (or directional slash) is associated 
with valencies. Instead, Features express adjacent and 
non-adjacent constraints on constituent ordering, 
which are enforced by the unification-based combina- 
tory rules. Constraints can be stated not only between 
the active sign of a functor and its argument, but also 
between a valency, of a sign., the sign. and the active J J 
. J sign of the fl functor consuming valency~ while con- 
catenating with sign~ As a result, the valency of a verb 
or era noun imposes constraints not only on the functor 
which consumes it, but also on subsequent concatena- 
tions. The feature percolation system underlies the 
partial associativity property of the grammar (cf. §4). 
As mentioned above, the Semanticspart of the sign 
contains an InL' formula. In FG different derivations of 
a string may yield sentence signs whose InL' formulae 
are formally different, in that the order of their sub-for- 
mulae are different, but the set of their sub-formulae 
are equal. Furthermore, sub-formulae are so built that 
formulae differing in the ordering of their sub-formu- 
lae can in principle be translated to a semantically equi- 
valent representation in a first order predicate logic. 
This is because : (i) in InL', the scope of seeping 
operators is left undefined ; (ii) shared variables ex- 
press the relation between determiner and restrictor, 
and between seeping operators and their semantic 
arguments ; (iii) the grammar places constants (i.e. 
proper names) in the specified place of the argumental 
list of the predicate. For instance, FG associates to (2) 
the InL' formulae in (3a) and (3b) : 
(2) Un garcon pr~sente Marie ~ une fille 
(3) (a) \[15\] \[indCX) & garcon(X) & ind(Y) & fiRe(Y) & 
presenter (E,X,marie,Y)\] 
Co) \[E\] \[indCO & fille(Y) & ind(X) & gar~on(X) & 
presenter (E,X,marie,Y)\] 
While a seeping operator of a sentence constituent 
is related to its argument by the index of a noun (as in 
the above (3)), the relation between the argument of a 
seeping operator and the verbal unit is expressed by the 
index of the verb. For instance, the negative version of 
(2) will incorporate the sub-formula neg (E). 
In InL' formulae, determiners (which are leaves 
and f2 signs, el. above), immediately precede their res- 
trictors. In formally different InL' formulae, only the 
ordering of seeping operators sub-formulae can differ, 
but this can be shown to be irrelevant with regard to the 
semantics. In French, scope ambiguity is the same for 
members of each of the following pairs, while the 
ordering of their corresponding semantic sub-formu- 
lae, thanks to concatenation of adjacent signs, is ines- 
capably different. 
(4) (a) Jacques avait donn6 un livre (a) ~ tousles dtu- 
diants ( b ). 
(a) Jacques avait donn6 d tousles dtudiants(b) un 
livre (a). 
(b) Un livre a 6t~ command6 par chaque ~tudiant 
(a) dune librairie (b). 
Co') Un livre a6t6 command6d une librairie (b)par 
chaque dtudiant (a). 
At the grammatical level (i.e. leaving aside prag- 
matic considerations),the translation of an InL' formu- 
la to a scoped logical formula can be determined by the 
specific scoping operator involved (indicated in the 
sub-formula) and by its relation to its semantic argu- 
ment (indicated by shared variables). This translation 
must introduce the adequate quantifiers, determine 
their scope and interpret the'&' separator as either ^ or 
-->, as well as introduce .1. in negative forms. For ins- 
tahoe, the InL' formulae in (Y) translate ~ to : 
(5) 3E, 3X, 3Y (garqon(X)^ fille(Y) ^ pr6senter 
(E,X~narie,Y)). 
We assume here the possibility of this translation 
without saying any more on it. Since this translation 
procedure cannot be defined on the basis of the order of 
the sub-formulae corresponding to the scoping opera- 
tors, InL' formulae which differ only wrt the order of 
their sub-formulae are said to be semantically equiva- 
lent. 
3. THE PARSER 
Because the subcategorisation information is re- 
presented as a set rather than as a list, there is no 
constraint on the order in which each valency is 
consumed. This raises a problem with respect to par- 
sing which is that for any triplet X,Y,Z where Y is a 
verb and X and Z are arguments to this verb, there will 
often be two possible derivations i.e., (XY)Z and 
xo'z). 
The problem of spurious parses is a well-known 
one in extensions of pure categorial grammar. It deri- 
ves either from using other rules or combinators for de- 
rivation than just functional application (Pareschi and 
Steedman 1987, Wittenburg 1987, Moortgat 1987, 
Morrill 1988) or from having anordered set valencies 
(Karttunen 1986), the latter case being that of FG. 
Various solutions have been proposed in relation to 
this problem. Karttunen's solution is to check that for 
any potential edge, no equivalent analysis is already 
In (5) 3E can be paraphrased as "There exists an event". 
282 
stored in the chart for the same string of words. Howe- 
ver as explained above, two semantically equivalent 
formulae of InL' need not be syntactically identical. 
Reducing two formulae to a normal form to check their 
equivalence or alternatively reducing one to the other 
might require 2* permutations with n the number of 
predicates occaring in the formulae. Given that the test 
must occur each time that two edges stretch over the 
same region and given that itrequires exponential time, 
this solution was disguarded as computationaUy inef- 
ficient. 
Pareschi's lazy parsing algorithm (Pareschi, 1987) 
has been shown (I-Iepple, 1987) to be incomplete. 
Wittenburg's predictive combinators avoid the parsing 
problem by advocating grammar compilation which is 
not our concern here. Morilrs proposal of defining 
equivalence classes on derivations cannot be transpo- 
sed to FG since the equivalence class that would be of 
relevance to our problem i.e., ((X,Z)Y, X(ZY)) is not 
an equivalence class due to our analysis of modifiers. 
Finally, Moortgat's solution is not possible since it 
relies on the fact that the grammar is structurally com- 
plete ~ which FG is not. 
The solution we offer is to augment a shift-reduce 
parser with a heuristic whose essential content is that 
no same functor may consume twice the same valency. 
This ensures that for all semantically unambiguous 
sentences, only one parse is output. To ensure that a 
parse is always output whenever there is one, that is to 
ensure that the parser is complete, the heuristic only 
applies to a restricted set of edge pairs and the chart is 
organized as aqueue. Coupled with the parlial-associa- 
tivity of FG, this strategy guarantees that the parser is 
complete (of. §4). 
3.1 THE HEURISTIC 
The heuristic constrains the combination of edges 
in the following way 2. 
Let el be an edge stretching from $1 to E1 labelled 
with the typefl~, a predicate identifier pl and a sign 
Sign1, let e2 be an edge stretching from E1 to $2 
labelled with type fl and a sign Sign,?, then e2 will 
reduce with el by consuming the valency Val of pl if 
e2 has not already reduced with an edge el'by consu- 
ming the valency Valofpl where el 'stretches from $1" 
to E1 and $1' ~ $1. 
In the rest of this section, examples illustrate how 
A structurally complete grammar is one such that : 
If a sequence of categories X I.. Xn reduces to Y, there is a red u~on 
to Y for any bracketing of Xl .. Ym into constituents (Moortgat, 
19S7). 
2 A mote complete difinition is given in the description of the 
parsing algorithm below. 
this heuristic eliminates spurious parses, while allo- 
wing for real ambiguities. 
Avoiding spurious parses 
Consider the derivation in (6) 
(6) Jean aime Marie 
0-Edl - I - Ed2-2 - F.A3- 3 
0 ...... Ed4 ...... 2 Ed4 ffi Edl(Ed2,pl,subj) 
0 ...... Ed5 ...... 2 *Ed5 = Edl(Ed2,pl, obj) 
I ...... Ed6 ....... 3 Ed6 = Ed3(Ed2,pLobj) 
l ....... Ed7 ....... 3 Ed7 ffi EcL3(Ed2,pLsubj) 
0 ...... Ed8 .................. 3 Ed8 = Edl(Ed6,pl, subj) 
0 .... Ed9 .................. 3 *Ed9 = FA3(Ed4,pl,obj) 
0 ...... Edl0 .................. 3 *Edl0= Edl(EdT,pl,obj) 
where Ed4 = Edl(Ed2,pl,subj) indicates that the edge 
Ed 1 reduces with Ed2 by consuming the subject valen- 
cy of the edge Ed2 with predicate pl. 
Ed5 and EdlO are ruled out by the grammar since 
in French no lexical (as opposed to clirics and wh-NP) 
object NP may appear to the left of the verb. Ed9 is 
ruled out by the heuristic since Ed3 has already consu- 
med the object valency of the predicate pl thus yiel- 
ding Ed6. Note also that Edl may consume twice the 
subject valency ofpl thus yielding Ed4 and Ed8 since 
the heuristic does not apply to pairs of edges labelled 
with signs Of type fl and f0 respectively. 
Producing as many parses as there are readings 
The proviso that a functor edge cannot combine 
with two different edges by consuming twice the same 
valency on the same predicate ensures that PP attach- 
ment ambiguities are preserved. Consider (7) for ins- 
tance 1. 
(7) Regarde le chien darts la rue 
0 --Edl --- 1 ---Ed2 - 2 - Ed3 .... 3 --- Ed4 ....... 4 
I ..... Ed5 ........... 3 
0 ................... Ed6 ........... 3 
2 .............. Ed7 .......... 4 
1 ...... Ed8 .............................. 4 
0 ................... Ed9 .............................. 4 
0 ................... Edl0 ............................. 4 
with Ed7 = Ed4(Ed3,p2,mod) 
Ed8 = Ed2(Ed7) 
Ed9 = Ed8(Edl,pl,obj) 
EdlO = Ed4(Ed6,p l,mod) 
where pl and p2 are the predicate identifiers labelling 
the edges Edl and Ed3 respectively. 
The above heuristic allows a functor to concatenate 
twice by consuming two different valencies. This case 
t For the sake of clarity, all irelevant edges have been omitted. This 
practice will hold throughout the sequel. 
283 
of real ambiguity is illustrated in (8). 
(8) Quel homme pr6sente Marie ~t Rose ? 
0 .... Edl .... 1 ---Ed2--2--Ed3---3-- Ed4--- 4 
1 .......... Ed4 ........ 3 
1 .......... Ed5 ........ 3 
0 ................ Ed6 ................... 3 
0 ................ Ed7 ................... 3 
where Ed4 = (Ed3,pl,nom) 
and Ed5 = (Ed3,pl,obj) 
Thus, only edges of the same length correspond to 
two different readings. This is the reason why the 
heuristic allows a functor to consume twice the same 
valency on the same predicate iff it combines with two 
edges E andE' thatstretch over the same region. A case 
in point is illustrated in (9) 
(9) Quel homme pr6sente Marie ~ Rose ? 
0 .... Edl .... 1 ---Ed2--2--Ed3---3-- Ed4--- 4 
1 .......... Ed5 ........ 3 
1 .......... Ed6 ........ 3 
1 ......... Ed7 ...................... 4 
1 ......... Ed8 ...................... 4 
0 .... Ed9 ............................................. 4 
0 .... Edl0 ........................................... 4 
where a Rose concatenates twice by consuming twice 
the same - dative - valency of the same predicate. 
3.2 THE PARSING ALGORITHM 
The parser is a shift-reduce parser integrating a 
chart and augmented with the heuristic. 
An edge in the chart contains the following infor- 
marion : 
edge \[Name, Type, Heur, S,E, Sign\] 
where Name is the name of the edge, S and E identifies 
the startingand the ending vertex and Sign is the sign 
labelling the edge. Type and Heur contain the info'r- 
marion used by the heuristic. Type is either f0, fl and 
t2 while the content of Heur depends on the type of the 
edge and on whether or not the edge has already 
combined with some other edge(s). 
Heur 
f0 pX where X is an integer. 
pX identifies the predicate associated with any 
edge. 
type fO 
fl before combination : Vat 
where Var is the anonymous variable. This indica- 
tes that there is as yet no information available that 
could violate the heuristic. 
after combination : Heur-List 
where Heur-List is a list of triplets of the form 
\[Edge,pX.Val\] and Edge indicates an argument 
edge with which the functor edge has combined by 
consuming valency Val of the predicate pX label- 
ling Edge. 
f2 nil 
The basic parsing algorithm is that of a normal 
shift-reduce parser integrating a chart rather than a 
stack i.e., 
1. Starting from the beginning of the sentence, for 
each word W either shift or reduce, 
2. Stop when there is no more word to shift and no 
more reduce to perfomi, 
3. Accept or reject. 
Shifting a word W consists in adding to the chart as 
many lexical edges as there are lexical entries associa- 
ted with W in the lexicon. Reducing an edge E consists 
in trying to reduce E with any adjacent edge E' already 
stored in the chart. The operation applies recursively in 
that whenever a new edge E" is created it is immedia- 
tely added to the chart and tried for reduction. The 
order in which edges tried for reduction are retrieved 
from the chart corresponds to organising the chart as a 
queue i.e., f'n'st-in- ftrst-out. Step 3 consists in checking 
the chart for an edge stretching from the beginning to 
the end of the chart and labelled with a sign of category 
s(entence). If there is such an edge, the string is 
accepted - else it is rejected. 
The heuristic is integrated in the reduce procedure 
which can be defined as follows. 
Two edges Edge 1 and Edge2 will reduce to a new 
edge Edge3 iff - 
Either (a) 
1. Edgel = \[el,Typel,H1,E2,Signl\] and 
2. Edge2 = \[e2,Type2,H2,E2,Sign2\] and 
<Typel,Type2> # <f0,fl> and 
3. apply(Sign 1,Sign2,Sign3) and 
4. Edge3 = \[e3,Type3,H3,E3,Sign3\] and 
<$3,E3> = <S I,E2> 
or (b) 
1. Edgel = \[el,f0,pl,S1,E1,Signl\] and 
2. Edge2 = \[e2,fl,I-I2,S2,E2,Sign2\] and 
E1 = $2 and 
3. bapply(Signl,Sign2,Sign3) by consuming the 
valency Val and 
4. H2 does not contain a triplet of the form 
\[el',pl,Val\] where Edge 1' = \[el',f0,pl,S'l,S2\] 
and S'I"-S1 
5. Edge3 = \[e3,f0,pl,S1,E2,Sign3\] 
6. The heuristic information H2 in Edge2 is upda- 
ted to \[e 1,p 1,Val\]+I-I2 
where '+ 'indicates list concatenation and under the 
proviso that the triplet does not already belong to H2. 
Where apply(Sign1 ,Sign2,Sign3) means that Sign 1 
can combine with Sign2 to yield Sign3 by one of the 
two combinatory rules of FG and bapply indicates the 
backward combinatory rule. 
284 
This algorithm is best illustrated by a short exam- 
ple. Consider for instance, the parsing of the sentence 
Pierre aime Marie. Stepl shifts Pierre thus adding 
Edgel to the chart. Because the grammar is designed 
to avoid spurious lexical ambiguity, only one edge is 
created. 
Edgel = \[el,fl,_,0,1,Signl\] 
Since there is no adjacent edge with which Edgel 
could be reduced, the next word is shifted i.e., aime 
thus yielding Edge2 that is also added to the chart. 
Edge2 = \[e2,f0,p 1,1,2,S ign2\] 
Edge2 can reduce with Edgel since Signl can 
combine with Sign2 to yield Sign3 by consuming the 
subject valency of the predicate pl. The resulting edge 
Edge3 is added to the chart while the heuristic infor- 
mation of the functor edge Edgel is updated : 
Edge3 = \[e3,f0,p 1,0,2,Sign3 \] 
Edgel = \[el,fl ,\[\[e3,pl,subj\]\],0,1 ,Sign 1 \] 
No more reduction can occur so that the last word 
Marie is shifted thus adding Edge4 to the chart. 
Edge,4 = \[e4,fl,_,2,3,Sig4\] 
Edg4 first reduces with Edeg2 by consuming the sub- 
ject valency ofpl thus creating Edge5. It also reduces 
with Edge2 by consuming the object valency ofpl to 
yield Edge6. 
Edge5 = \[e5,f0,pl,l,3,Sign5\] 
Edge6 - \[e6,f0,p 1,1,3,S ign6\] 
Edge4 is updated as follows. 
Edge4 = \[e4,fl,\[\[e2,pl,subj\],\[e2,pl,obj\]\],2,3,Sign4\] 
At this stage, the chart contains the following edges. 
Pierre aime Marie 
0--el ~ 1 ~e2--2~e4--3 
0 e3 ~ 3 
1 e5~3 
1 e6~3 
Now Edge1 can reduce with Edge6 by consuming 
the subject valency of pl thus yielding Edge7. Howe- 
ver, the heuristic forbids Edge4 to consume the object 
valency of pl on Edge3 since Edge4 has already 
consumed the object valency of pl when combining 
with Edge2. In this way, the spurious parse Edge8 is 
avoided. 
The final chart is as follows. 
Pierre aime Marie 
0--elw l--e2 ..... 2we4-- 3 
0 e3 3 
1 e5 -- 3 
1 e6~ 3 
0 e7 3 
*0------ e8 3 
with 
Edge7 = \[e7,f0,pl,0,3,Sign7\] 
Edge4 = \[e4 ,fl, \[ \[e2 ,p 1 ,s ubj\], \[e2 ,p 1, obj\] \] ,2,3 ,S ign4\]" 
Edge 1 = \[e 1, fl,\[ \[e2,p I ,sub j\] \] ,0,1 ,Sign 1 \] 
4. UNICITY AND COMPLETNESS 
OF THE PARSING 
DEFINITIONS 
1. An indexed lexical f0 is a pair <X,i> where X is a 
lexical sign of f0 type (c.f. 2) and i is an integer. 
2. PARSE denotes the free algebra recursively defined 
by the following conditions. 
2.1 Every lexical sign of type fl or f2, and every 
indexed lexical f0 is a member of PARSE. 
2.2 If P and Q are elements of PARSE, i is an integer, 
and k is a name of a valency then (P+aQ) is a 
member of PARSE. 
2.3 If P and Q are elements of PARSE, (P+imQ) is a 
member of PARSE, where I~ is a new symbol} 
3. For each member, P, of PARSE, the string of the 
leaves of P is defined recursively as usual : 
3.1 If P is a lexical functor or a lexical indexed argu- 
ment, L(P) is the string reduced to P. 
3.2 L(P+~tQ) is the string obtained by concatenation of 
L(P) and L(Q). 
4. A member P of PARSE, is called a well indexed 
parse (WP) if two indexed leaves which have different 
ranges in L(P), have different indicies. 
5. The partial function, SO:'), from the set of WP to the 
set of signs, is defined recursively by the following 
conditions : 
5.1 IfP is a leave S(P) = P 
5.2 S(F+ikA) = Z \[resp. S(A+ikF) = Z\] (km ) 
If S (F) is a functor of type fl, S(A) is an argument and 
Z is the result sign by the FC rule \[resp. BC rule\] 
when S(F) consumes the valency named k in the 
leave of S(A) indexed by i. 
5.3 S(P+ilnA ) = Z \[res. S(A+i~-" ) = Z\] if S(F) is a functor 
of type fl or f2, S(A) is an argument sign and Z is 
the result sign by the FC rule \[resp. BC rule\]. 
6. For each pair of signs X and Y we denote X.=. Y if X 
and Y are such that their non semantic parts are formal- 
ly equal and their semantic part is semantically equiva- 
lent. 
I In 2.3 the index i is just introduced for notational convenience and 
will not be used ; k,l.., will denote a valency name or the symbol m. 
285 
7. IfP and Q are WP 
P =Qiff 
7.1 S(P) and S(Q) are defined 
7.2 S(P) = S(Q) and 
7.3 L(P) = L(Q) 
8. A WP is called acceptedif it is accepted by the parser 
augmented with the heuristic described in §3. 
THEOREM 
1. (Unicity) IfP and Q are accepted WP's and ifP = Q, 
then P and Q are formally equal. 
2. (Completeness) IfP is a WP which is accepted by the 
grammar, and S(P) is a sign corresponding to a gram- 
matical sentence, then there exists a WP Q such that : 
a) Q is accepted, and 
b)P =Q. 
NOTATIONAL CONVENTION 
F, F'...(resp. A,A',...) will denote WP's such that S(F), 
S(F')...are functors of type fl (resp. S(A), S(A') .... are 
arguments of type f0). 
The proof of the theorem is based on the following 
properties 1 to 3 of the grammar. Property 1 follows 
directly from the grammar itself (cf. §2) ; the other two 
are strong conjectures which we expect to prove in a 
near future. 
PROPERTY 1 If S(K) is defined and L(K) is not a 
lexical leaf, then : 
a) If K is of type f0, there exist i,k,F and A such that : 
K = F+ikA or K = A+ikF 
b) If K is of type fl, there exist Fu of type f2 and Ar of 
type f0 or of type fl such that : 
K = Fu+imAr 
c) K is not of type f2. 
PROPERTY 2 (Decomposition unicity) : 
For every i and k 
if F+i~A = F+ixA', or A+i~F -- A'+i~t.F 
then i= i', k = k', A--A' and F = F' 
PROPERTY 3 (Partial associativity) : 
For every F,A,F' such that L(F) L(A) L(F') is a sub- 
string of a string oflexical entries which is accepted by 
the grammar as a grammatical sentence, 
a) If S\[F+i~(A+aF)\] and S\[(F+ikA)+u F'\] are defined, 
then F+ii(A+ilF' ) = (F+~A)+IIF 
b) If S\[A+nF \] and S\[(F+ikA)+aF \] are defined, 
then S\[F+ik(A+nF)\] is also defined. 
LEMMA 1 
If F+ikA = A'+jtF' 
then A'+j~F' is not accepted. 
Proof : L(F) is a proper substring of L(A), so there 
exists A" such that : 
a) S(A"+jlF) is defined, and 
b) L(A") is a substfing of L(A) 
But A' begins by F and F is not contained in A", so A" 
is an edge shorter than A'. Thus A'+F' is not accepted. 
LEMMA 2 
If S\[(A+tkF)+uF'\] is defined and 
A+ikF is accepted, then 
(A+tkF)+uF is also accepted. 
Proof : Suppose, a contrario, that (A+ikF)+nF is not 
accepted. Then there must exist an edge 
A' = A"+i~F such that : 
a) S(A'+nF) is defined, and 
b) A' is shorter than A+ikF 
This implies that A" is shorter than A. 
Therefore A+ikF would not be accepted. 
PROOF OF THE PART 1 OF THE THEOREM 
Tile proof is by induction on the lengh, lg(P), of 
L(P). So we suppose a) and b) : 
a) (induction hypothesis). For every P' and Q' such that 
P' and Q' are accepted, if P' =_ Q', and 
lg(P') < n, then P' =Q' 
b) P and Q are accepted, P = Q and 
lg(P) = n 
and we have to prove that 
C) P= Q. 
First cas : if lg(P) = 1, then we have 
P = L(P) = L(Q) = Q. 
Second cas : if lg(P) > 1, then we have 
lg(Q) > 1 since L(P) = L(Q). Thus there exist P't, P'2, 
Q't, Q'2, i, k, j, 1, such that 
P = P'~+u P'2 and Q = Q't+~Q'2 
By the Lemma 1 P't and Q't must be both functors 
or both arguments. And ifP'~ and Q'~ are functors (res. 
arguments) then P'2 and Q'2 are arguments (resp. func- 
tors). So by Property 2, we have : 
i = i', k = k', P'l -- Q't, and P'2 =- Q' 2 . 
Then the induction hypothesis implies that P't = Q't and 
that P'2 = Q'2" Thus we have proved that P = Q. 
PROOF OF THE PART 2 OF THE THEOREM 
Let P be a WP such that S(P) is define and cortes- 
286 
ponds to a grammatical sentence. We will prove, by 
induction on the lengh of L(K), that for all the subtrees 
K of P, there exists K' such that : 
a) K' is accepted, and 
b) K_=_K'. 
We consider the following cases (Property 1) 
1. IfKis a leaf then K' = K 
2. If K = F+tkA, then by the induction hypothesis 
there exist F' and A' such that : 
(i) F' and A' are accepted, and 
(ii) F_=_ F', A = A'. 
Then F'+A' is also accepted. So that K' can be choosed 
as F'+A'. 
3. If K = A+ikF, we define F, A' as in (2) and we 
consider the following subcases : 
3.1 If A' is a leaf or if A' = FI+jlA1 where S(AI+~ F') 
is not def'med, then A'+~F is accepted, and we can 
take it as K. 
3.2 If A' = Al+ilF1, then by the Lemma 2 A'+~kF' is 
accepted. Thus we can define K' as A'+u F'. 
3.3 IfA' = FI+nA1 and S(AI+~ F) is defined. 
Let A2 = Al+ikF. 
By the Property 3 S(FI+jlA2) is defined 
and K = A'+tkF = FI+jlA2. 
Thus this case reduces to case 2. 
4. If K = Fu+~Ar, where Fu is of type f2 and Ar is of 
type f0 or fl, then by induction hypothesis there exists 
At' such that Ar ~_ Ar' and At' is accepted. Then K can 
be defined as Fu+i®Ar'. 
5. IMPLEMENTATION AND COVE- 
RAGE 
FG is implemented in PIMPLE, a PROLOG term 
unification implementation of PATR II (cf. Calder 
1987) developed at Edinburgh University (Centre for 
Cognitive Studies). Modifications to the parsing algo- 
rithm have been introduced at the "Universit6 Blaise 
Pascal", Clermont-Ferrand. The system runs on a SUN 
M 3/50 and is being extensively tested. It covers at 
present : declarative, interrogative and negative sen- 
tences in all moods, with simple and complex verb 
forms. This includes yes/no questions, constituent 
questions, negative sentences, linearity phenomena 
introduced by interrogative inversions, semi free cons- 
tituent order, clitics (including reflexives), agreement 
phenomena (including gender and number agreement 
between obj NP to the left of the verb and participles), 
passives, embedded sentences and unbounded depen- 
dencies. 
REFERENCES 
B~s, G.G. and C. Gardent (1989) French Order without 
Order. To appear in the Proceedings of the Fourth 
European ACL Conference (UMIST, Manchester, 
10-12 April 1989), 249-255. 
Calder, J. (1987) PIMPLE ; A PROLOG Implementa- 
tion of the PATR-H Linguistic Environment. Edin- 
burgh, Centre for Cognitive Science. 
Gazdar, G., Klein, E., Pullum, G., and Sag., I. (1985) 
Generalized Phrase Structure Grammar. London: 
Basil Blackwell. 
Kamp, H. (1981) A Theory of Truth and Semantic 
Representation. In Groenendijk, J. A. G., Janssen, 
T. M. V. and Stokhof, M. B. J. (eds.) Formal 
Methods in the Study of Language, Volume 136, 
277-322. Amsterdam : Mathematical Centre 
Tracts. 
Karttunen, L. (1986) Radical Lexicalism. Report No. 
CSLI-86-68, Center for the Study of Language and 
Information, Paper presented at the Conference on 
Alternative Conceptions of Phrase Structure, July 
1986, New York. 
Morrill, G. (1988) Extraction and Coordination in 
Phrase Structure Grammar and Categorial Gram- 
mar. PhD Thesis, Centre for Cognitive Science, 
University of Edinburgh. 
Pareschi, R. (1987) Combinatory Grammar, Logic 
Programming, and Natural Language. In Haddock, 
N. J., Klein, E. and Morill, G. (eds.) Edinburgh 
Working Papers in Cognitive Science, Volume I ; 
Categorial Grammar, Unification Grammar and 
Parsing. 
Pareschi, R. and Steedman, M. J. (1987) A Lazy Way 
to Chart-Parse with Extended Categorial Gram- 
mars. In Proceedings of the 25 th Annual Meeting of 
the Association for Computational Linguistics, 
Stanford University, Stanford, Ca., 6-9 July, 1987. 
Pollard, C. J. (1984) Generalized Phrase Structure 
Grammars, Head Grammars, and Natural Lan- 
guages. PhD Thesis, Stanford University. 
Pollard, C. J. and Sag, I. (1988) An Information-Based 
Approach to Syntax and Semantics : Volume 1 
Fundamentals. Stanford, Ca. : Center for the Study 
of Language and Information. 
S teedman, M. (1985) Dependency and Coordination in 
the Grammar of Dutch and English. Language, 61, 
523 -568. 
Steedman, M. (1988) Combinators and Grammars. In 
Oehrle, R., Bach, E. and Wheeler, D. (eds.) Catego - 
rial Grammars and Natural Language Structures, 
Dordrecht, 1988. 
Uszkoreit, H. (1987) Word Order and Constituent 
Structure in German. Stanford, CSLI. 
Wittenburg, K. (1987) Predictive Combinators : a 
Method for Efficient Processing of Combinatory 
Categorial Grammar. In Proceedings of the 25th 
Annual Meeting of the Association for C omputatio- 
nalLinguistics, Stanford University, Stanford, Ca., 
6-9 July, 1987. 
Zeevat, H. (1986) A Specification of InL. Internal 
ACORD Report. Edinburgh, Centre for Cognitive 
Science. 
Zeevat, H. (1988) Combining Categorial Grammar 
and Unification. In Reyle, U. and Rohrer, C. (eds.) 
Natural Language Parsing and Linguistic Theo- 
ries, 202-229. Dordrecht : D. Reidel. 
Zeevat, H., Klein, E. and Calder, J. (1987) An Inlroduc- 
tion to Unification Categorial Grammar. In Had- 
dock, N. J., Klein, E. and Morrill, G. (eds.) Edin- 
burgh Working Papers in Cognitive Science, Vo- 
lume 1 : Categorial Grammar, Unification Gram- 
mar and Parsing 
287 
