The implementation of a computational grammar of French 
using the Grammar Development Environment 
Louisette Emirkanian Lyne Da Sylva Lorne H. Bouchard 
Universit6 du Qu6bec g Montrdal UQAM and Universit6 du Qu6bec 'h Montr6al 
emirk~nian.louisette@uqam.ca Universit6 de Montr6al bouchard.lorneJl~uqam.ca 
dasylva@iro.mnontreal.ca 
Abstract 
The design and implementation of a 
large-coverage computational grammar 
of French is described. This grammar 
is compared to a comprehensive compu- 
tational grammar of English which was 
implemented using the same computer 
workbench. Although many similari- 
ties may be observed in the two gram- 
mars, there are important structural dif 
ferences which can be traced back to fea- 
tures specific to the F~'ench language, no- 
tably agreement and cliticization. 
1 Introduction 
We present a large-coverage computational gram- 
mar of French (CGF) 1 and discuss some imple- 
mentation aspects of' this grammar by comparing 
it to the Alvey Natural Language Tools Grmm 
mar (ANLT Grammar) (Grover et al., 1.993) which 
wins also implemented using the Grammar De- 
velopment Environment (GDE) (Bogm'aev et al., 
1988). The grammar of French was implemented 
as part of a larger project, the goals of which are 
the development and application of Generalized 
Phrase Structure Grammar (GPSG) (Gazdar et 
al., 1985) in general and the design and implemen- 
tation of a comprehensive computational gram- 
mar for French in particular. 
In this paper, we highlight the many similari- 
ties between the two grammars but especially dif- 
ferences due to phenomena particular to French. 
This leads to a description of the major structural 
differences between tile two grammars. 
We shall not justify the reasons for our choice 
of GPSG but will simply point out that GPSG 
is a formal theory which is unification-based and 
hence well-suited for computational linguistics. 
The Grammar Development Environment is a 
computer workbench for the development and 
evaluation of computational grammars of natural 
1This resem'ch is flmded by SSHRCC (40-93-0607) 
and by FCAR (95ERl198) 
language which are described in a style close to 
that of GPSG. It embodies a modified version of 
GPSC which is easier to implement than its the- 
oretical counterpart. The GDE was developed as 
part of the Alvey Natural Language Tools project 
in the UK. 
2 Comparison of the two 
grammars 
The ANLT grammar was used as a model and ini- 
tial source of inspiration for the design and imple- 
mentation of the grammar of l~'ench, despite obvi- 
ous differences between tile two languages. Both 
grammm-s strive to account for the same t)roper- 
ties of natural language although they differ on 
points of detail, the differences being essentially 
structural. 
2.1 Similarities 
X-bar theory is used in (Gazdar et al., 1985) to 
characterize constituent structm'e. In our gram- 
mar, as in the ANLT grammar, X-bar schemata 
are respected in general, although there are differ- 
ences of detail. There is no specifier in the verb 
phrase (VP), thus the V2 immediately dominates 
the V; complex specifiers in noun phrases (NP) 
and adjectival and adverbial phrases (AdjP and 
AdvP) are given special treatment: there are X2 
level specifiers of X2 constituents, as tbr example 
in: 
(I) N2 -4 R2\[POSS +\], H2 
the man's black hat 
N2 --+ N2\[TYPR coll\], H2 
une foule de ces 6tudiants (a crowd of those 
students) 
Furthermore, some constituents have a specifier 
at level X1 even if there is a specifier at level X2 : 
(2) N2 -4 Spec, H2 
tous les enfants, all the children 
N2 -4 Det, HI 
les enfants, the children 
Adjectival, adverbial and prepositional phrases 
are treated in a similar fashion in both grammars. 
1024 
Thus in ninny respects our grammar follows 
closely the ANLT grammar. 
2.2 Differences 
The structure of tile NP and that of the VP in 
the CGF differ from those in the ANLT gram- 
mar. This is a reflection of phenomena which are 
characteristic of French: cliticization, present in 
IS"ench but not in English, and agreement, which 
is more limited in English titan in l~¥eneh. We 
overview these phenomena and then examine the 
structures. 
2.2.1 Agreement 
There are nnmy more cases of agreement in 
French than in English and mmw more lexical 
items exhibit agreement fi~,atures (determiners, 
adjectives, past participles and conjugated verbs). 
In particular, two sets of rules are required to ac- 
count for AdvP and AdjP constituents in CGF, 
since only the latter is subject to agreement. The 
agreeInent patterns are more complex as well. In 
the VP for example the past participle ('.an agree 
with a direct object when it is anteposed (3), or 
with its subject when it is conjugated with ~.trc 
(for a non-pronominal verb); otherwise it remains 
invariant. 
(3) la table que Paul a faite (the table that Paul 
made) 
il les a vus (he saw them) 
In quantified NPs, we examine two cases: 
1. When the subject of a w:rb is an NP quanti- 
fied by an adverb or an adjective, the comple- 
ment of tile quantifier determines agreement: 
(4) beaucoup de garcons sont arrivds (ninny 
boys have arrived(masc, pl.)) 
bcaucoup de filles sont m'rivdes (many 
girls have arrived(ti-~m, pl.)) 
2. When tile subject of a verb is a collective 
noun, the verb <:an agree either with the col- 
lective or with the element it quantifies: 
(5) une foule de garcons est arrivdc (a crowd 
of boys has arrived(fern, sg.)) 
une tbule de garcons sont arrivds (a 
crowd of boys have arrived(masc, pl.)) 
2.2.2 Clitics 
Cliticization refers to the phenomenon where a 
w~rbal complement can be pronominalized and ad- 
joined to the verb: 
(6) Pierre volt le garqon (Peter sees the boy) 
Pierre le volt ("Peter him sees") 
This phenomenon is interesting because it bears 
a superficial resemblance to long distance depen- 
dencies, which are dealt with by SLASH in GPSG. 
3 Structural differences 
The data relative to cliticization and agreement 
have dictated the structure of the French VP and 
NP. 
3.1 The structure of the French VP 
The structure for the VP in English developed in 
the ANLT grammar corresponds to that in (Gaz- 
dar et al., 1982): a binary branching structure 
where each verbal element takes a VP complement 
of a specifc type (the "cascading structure"). 
The same type of structure for the French VP is 
generally assumed in transformational or genera- 
tive grammar and also in GPSG analyses (Miller, 
1991). We haw', departed from these traditional 
analyses and have implelnented a fiat structure for 
the compomld tenses, while retaining the cascad- 
ing one for the passive. For example, the structure 
for a direct transitive verb is expressed by the rule 
in (7) and the passive by the rnle in (8): 
(7) SV --~ H\[AUX +\], V\[AUX -\], N2 
(8) SV -+ H\[SUBCAT ~tre\], X2\[PRD\] 
In rule (8), the X2 can be realized as a p~ssive 
participle or any predicative complement, as in 
the GPSG analysis of passives. 
3.1.1 Arguments for a flat structure 
The arguments in favour of distinguishing the 
two structures are numerous. Many have been 
discussed in (Abeill6 A Godard, 1994) and also 
in (Emirkanian A Da Sylva, 1995). Directly rel- 
evant: to our discussion is the fact that only the 
passive participle may be cliticized (9a)2 not the 
participle involved in compound tenses (gb): 
(9) a. il est aired par Marie (he is loved by Mary) 
il l'est ("he it is") 
b. il a mang~5 une poinme (he ate an apple) 
* il l'a ("he it has") 
Thus in the passive structure, the copula be- 
haves like control verbs such as the modal vouloir 
(to want) which take a V2 complement: il veut 
partir (he wants to leave) yields il le veut ("he it 
wants" ). 
Our analysis of the French VP also provides an 
account of past partieilfle agreeinent. In GPSG, 
agreement is handled by the Control-Agreement 
Principle (CAP). a However, we have been unable 
to account for past participle agreement in French 
using the CAP (see (Emirkanian et al., in press)); 4 
2With or without its eolnplements. See (Abeill5 A 
Godard, 1994) for more details. 
aln the GDE implementation of the grammar of 
English (Grover et al., 1993), the CAP seems satis- 
factorily transposed: it is implemented by a relatively 
snmll set of propagation rules, which respect the gen- 
erality of the CAP. 
aThis article shows on the other hand how, tbr the 
past participle, the CAP can account for agreement in 
predicative structures, for example the passive. 
1025 
the latter account makes use of other devices, such 
as the Feature Cooccurrence Restrictions and the 
Feature Specification Defaults. 
3.1.2 GDE Implementation of a flat VP 
The insertion of the auxiliary into the VP struc- 
ture to produce a flat VP is done by a metarule. 
However, implementing this flat structure in the 
GDE was found to be highly problematic. In the 
GDE, grammars are pre-compiled into ordered 
phrase structure rules, and the number of these 
rules necessary to account for the VP turned out 
to be extremely large. To give an idea of the size 
of the resulting grammar, consider the lexical ID 
rule for a verb requiring an NP and a PP comple- 
ments: 
(I0) V2 --~ H\[3\], N2, P2\[~\] 
Paul donne un livre ~ Marie (Paul gives a 
book to Mary) 
The following metarules operate on this ID rule: 
passiveS; direct object extraction (SLASH N2); di- 
rect object cliticization (accusative); direct object 
cliticization (oblique); indirect object extraction 
(SLASH P2\[h\]); indirect object cliticization (da- 
tive); indirect object cliticization (locative); direct 
object extraction (SLASH N2\[de\]); adverb inser- 
tion; auxiliary insertion; supercompound aux in- 
sertion; negative adverb (pas) insertion; subject 
clitic insertion. 
Although not all rules are compatible with each 
other (in particular, no direct object extraction 
rule can apply to the result of the passive rule), the 
combinatorics are complex: in a test grammar, we 
found that the number of phrase structure rules 
corresponding to this ID rule was of the order of 
2 la, or over 8000. This is of course unacceptable, 
as the CGF comprises 45 different lexical ID rules. 
Not all of them give rise to so many rules, but the 
compilation time and size of the output grammar 
made this solution impractical. 
Instead, we implemented the structure of the 
VP as a verbal complex of the auxiliary and the 
past participle which is sister to the complements. 
The above metarules apply either to the verbal 
complex or to the complements, thus reducing 
considerably the combinatorics. The resulting 
structure is sufficiently equivalent from a theoret- 
ical perspective (i.e., the past participle and its 
complements do not form a constituent) and it al- 
lowed us to implement the bulk of our grammar. 
The size of our grammar is now as follows: 185 ID 
rules and 81 metarules. After compilation, these 
185 ID rules expand to 2630 expanded ID rules 
and 3053 phrase structure rules. 
Let us now turn to our account of the structure 
of the NP. 
5The optionality of the agent gives rise to two ID 
rules from the output of this metarule. 
3.2 The structure of the French NP 
The data relative to cliticization and agreement 
have shaped the structure of the NP in various 
ways. Our study of the French NP has been influ- 
enced mainly by the work of (Milner, 1978). We 
will concentrate here oil quantitative structures 
(such as beaueoup d'enfants, many children) and 
partitive structures (such as beaucoup de ces en- 
rants, many of these children). These involve a 
quantifier (adverb, pronoun, collective or adjec- 
tive) whose "complement" contains respectively a 
determiner-less NP with de (de filles) or an NP 
with a definite determiner (de ces filles). 
The ANLT grammar's treatment of partitive 
NPs such as "all of the children", akin to trois de 
cos filles and also to beaueoup de filles, assumes 
a three-way branching structure, given by the fbl- 
lowing rule: 
(11) N2\[+SPEC\] --~ A2, P\[ofj, N2\[+SPEC\] 
This structure must be rejected for French 
based on data from cliticization: the de and what 
follows must form a constituent, which may be 
cliticized as en like a P2 \[PFORM de\] can be 
(12) je parle de Marie (I speak of Mary) 
j'en parle ("I of-her speak") 
je vois beaucoup de filles (I see many girls) 
j'en vois beaucoup ("I of-them see many") 
But rather than treat this constituent as a P2, 
we treat it as an N2 with \[PFORM de\], because 
of agreement data: agreement features from the 
complement must be allowed to percolate upwards 
to explain possible agreement patterns with it, as 
in une foule d'hommes sont venus (a crowd of men 
have come). This can be achieved only if the com- 
plement is allowed to be an N2 head of the higher 
NP, and not a P2 (there is no justification for 
agreement features on a PP in French). (Miller ~, 
1991) also argues in favour of treating de and d in 
French as markers on N2, rather than as prepo- 
sitions, yielding N2 \[PFORM de\] and N2 \[PFORM 5\]. 
The following rules for a subset of French quan- 
titative constructions highlight the prevalence of 
this N2:6 
(13) a. N2 \[PFORM nil\] -+ Adv2 \[+QTE\], H2 \[de\] 
beaucoup de filles (many girls) 
b. N2\[PFORM nil\] --* N2\[Coll\], H2\[de\] 
une foule de filles (a crowd of girls ) 
c. N2\[PFORM nil\] -+ A2\[+QTE\] , H2\[de\] 
trois filles (three girls) 
d. N2\[PFORM nil\] -~ H2\[de\] 
des filles (girls) 
6We have omitted the description of the inter- 
nal structure of this g2 \[de\]. Our grammar accounts 
for the absence of an explicit de in certain contexts 
(i.e.,trois filles) as well as its presence in extraposed 
contexts (fen ai trois, de fllles) and its contraction 
with the determiner des to produce simply de (de filles 
- r~gle de caeophonie). 
1026 
The following examples featuring cliticization 
and dislocated structures, where tim de reappears, 
also argue for the N2\[de\]: 
(14) a. il voit be.aucoup d(; filles 
("he sees many of gMs") 
il en voit l)eaucoul) ("he of them sees lnany") 
il en voit 1)eaucoul), (le filles 
("he of them sees many, of girls") 
(14) b. il volt une foule de lilles (he sees a crowd 
of girls) 
il en volt une fbulc, de tilles ("he of-them sees 
a crowd, of girls") 
(14) c. il volt trois filles 
(he sees three girls) 
il en voit trois ("he of them sees three") 
il en volt trois, de filles 
("he of-them sees three, of girls") 
Wc I)osit an anah)gous structure fbr NPs with 
the determiner des in rule (d), i.e., a partitive 
structure where, the quantifier is not st)ecified. In- 
deed, cliticization in cn is possit)te (14d). 
(14) d. il voit (h;s filles (he sees gMs) 
il en voit ("he of-theln sees") 
il en voit, des filles ("he of them sees, girls") 
In all structures described above, we postulate 
an N2 \[de\] which may be cliticized. In that cruse, 
it leaves behind a "stranded" quantifier (except in 
the e;use of rule (d), where that quantifier is null). 
4 Conclusion 
Overall, the coverage of the CGF is comparable to 
that of the ANLT grammar, although some, of the 
phenomena treated are different. I'resently, the 
CGF can parse isolated sentences using a limited, 
though representative, hmM-compiled lexicon. In 
order to analyse sentences occurring in real text, 
a large coverage lexicon is required. Research in 
progress aims at automating the lexicon Imilding 
pro(',ess. 
Louisette Emirkanian & Lyne Da Sylva. 1995. 
Quclqucs phdnomi;nes d'accord dans los g'ram- 
maircs syntagmatiqucs. Ms. D6partement (te 
linguistiquc, Universitd du Qu6bee g Montrdal. 
Gerald Gazdar, Ewan Klein, Geoffrey Pullmn & 
Ivan Sag. 1985. Generalized Phrase StT"ucture 
Grammar. Cambridge MA: Harvard University 
Press, 276 1)1)- 
Gerald Gazdar, Geoffrey Pullum & Ivan Sag. 
1982. Auxiliaries an(1 Related Phenomena in a 
Restrictive Theory of Grammar. In Language, 
58(3), pp. 591-638. 
Claire Grover, John Carroll & Ted Briscoe. 
1993. Th, e Alvey Natural Language Tools Gram- 
mar (~th Release). 31;ch. Report, 162 (re- 
vised), Computer Laboratory, University of 
Cambridge. 
Philip Miller. 1991. Clities and constituents in 
l"h'rase St'r~,ct'ur'e Grammar. Doctoral disserta- 
tion, University of Utrec, ht. Published (1992), 
New York: Garland Publishers. 
Jean-Claude Milner. 1978. De la syntaxe g 
l'inteTprdtation : quantitds, insultes, exclama- 
tions. 1)aris: Editions du Seuil. 
R,eferences 
Anne, Abeill6 & l)ani?~le Godard. 1994. The co,m- 
plemcntation of Frc'nch a'uxiliarie~s. Ms. UFRL 
& CNRS, Universit6 de Paris 7. 
Bran Boguraev, John Canoll, '11i;(1 Briscoe & 
Claire Grover. 1988. Software Support fbr 
Practical Grammar I)evelopnmnt. In COL- 
ING'88, Budapest, pp. 54 56, .lohn Von Ncu- 
maml Society for Computing Sciences. 
l,ouisette Emirkanian, Lyne Da Sylwt & Lorne 
Bouchard. In press. L'acco,:d dans une p;ram- 
maire conltmtationnelh', du franqais, in Emirka-- 
nian, \[,. & L. Bouehard (eds), Traitcmcnt 
automatiquc du fl'arl,~:ais dcr'it, Cahiers scien- 
tifiques de I'ACI,'AS. 
1027 
