Parsing with Dependency Relations 
and Robust Parsing 
Jacques Courtin, Damien Genthial 
CLIPS - IMAG CAMPUS 
BP 53 
38040 GRENOBLE CEDEX 9 
Phone: +33 476 51 49 15 
E-Mail: Jacques.Courtin@imag.fr, Damien.Genthial@imag.fr 
Abstract 
After a short recall of our view of dependency 
grammars, we present two dependency parsers. 
The first uses dependency relations to have a 
more concise expression of dependency rules 
and to get efficiency in parsing. The second 
uses typed feature structures to add some se- 
mantic knowledge on dependency trees and 
parses in a more robust left to right manner. 
1. Introduction 
Our team has been working with dependency 
grammars for more than twenty-five years 
(Courtin 73). Two dependency parsers built by 
our team are presented in this paper. The first 
one uses the notion of dependency relations in 
order to implement dependency grammars effi- 
ciently; it is described in the first part of the text. 
The second one was built with the following 
objectives: adding the use of some semantic 
knowledge in the process of syntactic parsing 
and obtaining a robust parser (second part of 
the text). 
2. Parsing with dependency relations 
The linguistic model we use for dependency is 
inspired by the Tesni&e model (Tesni~re 59), 
which we will recall shortly in order to define 
precisely our terminology. 
2.1. The linguistic model 
Relationship between words is the fundamental 
concept associated with dependency structures 
(DS). Given two words of the language, a rela- 
tion is established between them, defining a 
dominated word (or dependent) and a domi- 
nating word (or governor). This relation can be 
represented by an arc between two nodes, where 
each node is labelled by a word. The arc de- 
scends from the governor to the dependent. 
Example: the dependency structure for the sen- 
tence (, we present two parsers ,~: 
/ present 
we ~ parsers 
We can also use a linear notation with brackets 
and write: (we) present ((two) parsers). 
But the graphical representation is more read- 
able and shows clearly the hierarchy between the 
governor and its dependents, which of course, 
can also have dependents. 
Dependency grammars 
A dependency grammar (formalism used by 
(Hays 64)) on a vocabulary V is made of: 
• a family of parts Ci of V such that the union 
of Ci is equal to V. 
• a set of rules, each having one of the two 
following forms: 
i) *(X) 
ii) X ( XI ... Xi * Xi+l ... Xn ) 
Ci are word classes or lexico-syntactic categories 
and are denoted by their name (Determiner, 
Noun, Adjective,...). Xi in the rules above are 
category names. 
The star shows the place of the governor rela- 
tively to its dependents, so in a type ii) rule, 
X l...Xi are left dependents of the X governor 
and Xi+l ... Xn are its right dependents. 
When n = 0, the rule is written X(*) and is a terminating 
rule; type i)rules are initial rules. 
Grammar example: 
We use the following categories: Determiner (D), 
Noun (N), Adjective (A), Verb (V). 
• (V) 
V(N, *,N) 
N(D,*) N(D,A, *) N(A, *) 
D(*) V(*) A(*) N(*) 
95 
V={drinks, eats) D={the, a} 
N={dog, cat, cup, milk) 
A=tblack, white, hot} 
With this grammar one can build the structure: 
drinks / 
cat: ~milk 
hot: 
Generation 
Dependency grammars are generative, working 
with the following generating rules: 
a) choose a type i) rule (which determines the 
main governor), 
b) choose and apply type ii) rules until we ob- 
tain a complete structure, entirely made of 
terminating rules. 
With the example grammar above, we can make 
the following derivation (which matches the 
sentence: ~ the black cat drinks hot milk ,0: 
* (V) 
* (V(N, *,N) ) 
* (V(N(D,A,*) ,*,N) ) 
* (V(N(D,A,*), *,N(A,*) ) ) 
* (V(*) (N(D,A,*),*,N(A,*)) ) 
;ivi;.i cNc*  
t 
N(*) (A(*),*))) 
Remark: 
For a given governor, the dependency grammar 
must contain as many rules as there are possible 
configurations of dependents below this gover- 
nor. For example, if we want nominal phrases 
with at least a noun, an optional determiner and 
0,1 or 2 adjectives before the noun, we will have 
the grammar: 
N(*) N(A,*) N(A,A,*) 
N(D, *) N(D,A, *) N(D,A,A, *) 
The formalism proposed below shows a better 
way to describe the same things. 
2.2. Dependency relations 
The method used in the PILAF ! system (Courtin 
77) to build dependency structures is a direct 
analysis: we transform the input word chain in a 
dependency tree by using a form of depend- 
ency grammar and no intermediate structure. 
But the algorithm does not directly use Tesni~re 
Iprocfdures Interactives Linguistiques Appliqufes au 
Fran~ais (Interactive Linguistic Procedures Applied to 
French) 
96 
type dependency grammars because, as we seen 
before, these grammars impose a combinatorial 
description of all the possible configurations of 
dependents for a given governor. To overcome 
this drawback, we introduce dependency rela- 
tions between two lexico-syntactic categories. 
Example: 
To say that N governs the g we simply write 
N -> Jt 
Dependency Relations (DR) must not only code 
the relation itself but also: 
• the relative positions of the dependent and 
the governor: is it a left dependent or a right 
one ? 
• the relative positions of all dependents of a 
given governor. 
Example: 
We want to describe the sentence ,~ The black cat 
drinks hot milk ,~ which gives the sequence of 
categories: 
DANVAN 
and the dependency tree given above. 
Dependency relations must stipulate that a noun 
can appear on the left or on the fight below a 
verb and that below a noun, the determiner pre- 
cedes the adjective. So we attach to each relation 
an vector of integers (either positive or negative) 
and we write: 
GOUV-> DEP := (x I ..... Xn), 
which says that we can have 0,1...n dependents 
of the DEP category below the governor of the 
GOV category. 
The integers are presented in ascending order, 
showing the relative position of DEP below 
GORY. For any given governor, the integer values 
also determine the relative positions of all its 
different possible dependents. 
Example: 
N -> A := (-14, -15) 
N -> D := (-16) 
V -> N := (-20, + 20) 
Positive integers concern right dependents and 
negative integers left dependents. The integer of 
the second relation stipulate that the determiner, 
if any, will be placed before the adjectives, be- 
cause -16 is less than -15 and -14. From the first 
relation we can see that no word can be placed, 
below the noun, between the two adjectives 
(there is no integer between - 15 and -14). 
These relations can be drawn as the following 
trees: 
/v\ 
N N D A A 
-20 +20 -16 -15 -14 
An important thing to be noted is that each inte- 
ger position gives the possibility for a dependent 
to be present at that position, but never imposes 
that presence. 
So the three relations above are equivalent to the 
following dependency grammar: 
N(A,*) N(A,A,*) 
N(D,*) N(D,A,*) N(D,A,A, *) 
V(N,*) V(*,N) V(N,*,N) 
N(*) A(*) D(*) V(*) 
It can be noted that these relations are in some 
sense similar to disjunctive forms of Sleator's 
link grammars (Sleator and Temperley 91). 
2.3. Parsing algorithm 
This algorithm supposes that the morphological 
step is finished and that it has produced the se- 
quence of lexico-syntactic categories for the 
input sentence, each word corresponding to one 
category - or several if the word is ambiguous. 
So the parser's inputs are: 
• the sequence XI...Xn of categories computed 
by the morphological parser; 
• the set of dependency relations and the asso- 
ciated integer vectors. 
We add to the Xi sequence the pseudo category 
SI~'T=X0 which will help in determining the 
possible governor of the sentence (to initiate the 
parsing process). If, for example, possible main 
governors of a sentence are coordination con- 
junction (el and verb, we will have the relations: 
SENT -> V := (+i) 
SENT-> C := (+I) 
As we can have only one governor for a sen- 
tence, these two relations are mutually exclusive. 
This is expressed by the value of the integer: +1, 
which is the same for the two possible depend- 
ents of SENT. 
In order to build the dependency tree (or trees) 
associated with the given sequence of categories, 
the parser first initializes the square array of 
figure 1. 
As (Sleator And Temperley 91), we only want 
projective structures (or planar structures), i.e. 
trees which can be traversed by a left to right 
infix algorithm to find the original linear order 
of the sentence. The motivations for this limita- 
tion to projective structures are the following: 
97 
it is important to be able to retrieve, from the 
tree, the original linear form of the sentence; 
this limitation leads to greater parsing effi- 
ciency: for each governor, the search for its 
dependents will be made in two separate 
spaces: a left and a right space. 
OOV Xo Xx Xa XN 
DEP SENT 
go ~ POl Pc~ Port 
SENT 
i 
x, P~o ® P~ P~ 
xa ~ ~z~ o ~zn 
Pij: set of integers determined by the relations 
Xj -> X i. 
PiF 0 
Figure l : Square array fo r a sentence 
So for a given governor Xj, all its left depend- 
ents must have an index i < j (the index order 
matches the word order in the sentence). The 
same is true for right dependents, with index k > 
j. So we can remove from the top-right triangle 
of the array all positive numbers and from the 
bottom-left triangle all negative ones. We then 
have the two properties: 
• 1< i,j < n, i > j, if Pij ~ 0 then 
Vpe Pij,wehave p>O. 
• 1< i, j < n, i <j, ifPij ~ 0 then 
Vp¢ Pij, we have p<O. 
After having initialized the array and removed 
useless parts of it, the parser builds, with a de- 
scendant and recursive algorithm, all depend- 
ency structures compatible with the array: 
a) For each possible governor of the sentence 
(SENT column): 
b) build all left sub-trees and all right sub- 
trees (rocursively); 
c) build the final structures by merging 
the partial fight and left ones. 
One can say that we catch the SENT category 
and ,~ pull ~, the structures out of the array. The 
algorithm succeeds if at least one ~ pulled ~, 
structure contains all the words of the input 
sentence. 
With real sentences, of course, we have lexical 
ambiguities or structural ambiguities. In both 
cases, the algorithm is non-deterministic and 
builds all possible solutions by blind combinato- 
rial enumeration. 
Dependency relations, associated with the algo- 
rithm described above, constitute a grammatical 
model with very few constraints. We can quickly 
state that the parser will succeed on more sen- 
tences than the language sentences. This feature 
can be viewed as an advantage in the framework 
of a man-machine communication system, 
where the essential quality of an utterance is to 
be interpretable, even if it is not syntactically 
correct: .Close file),, for example, is un- 
grammatical but we can interpret it and execute 
the associated command. 
On the contrary, this lack of constraints is pe- 
nalizing efficiency: the algorithm will build a lot 
of incorrect structures because we can not state 
for example, that a given governor must have at 
least one dependent at that position, that a given 
relation only apply in a given context .... 
These limits and the necessary addition of some 
semantic knowledge in the syntactic parsing 
process lead us to design the new method for 
dependency tree construction presented in the 
second part. 
2.4. Conclusion 
Despite its relatively limited power of expres- 
sion, this parser builds dependency structures 
extremely quickly (,<instantaneously)> on a 
personal computer) as long as the input sentence 
is not too long and not too ambiguous (say 
when the number of produced trees is less than 
20). 
The parser has been put to use in a system for 
detection and correction of syntactic errors 
(Strube de Lima 90). The main purpose was to 
check the numerous concordancy rules for gen- 
der, number and person in written French sen- 
tences. For this type of application, it was of 
course essential for the parser not to take into 
account morphological properties of words 
while building dependency structures. 
By its lack of constraints and its high practical 
efficiency, this algorithm could be used in ap- 
plications for man-machine interfaces where 
exchanges are short and language often ap- 
proximative. 
3. Robust Parsing 
The use of the preceding parser in a system for 
detection and correction of syntactic errors in 
French has raised the following problems: 
• even for a simple task such as detection and 
correction of agreement errors in written 
texts, you need a powerful parsing mecha- 
nism able to determine, for example, the an- 
tecedent of a relative pronoun; 
• a system for error correction can not rely on 
the correctness of the inputs in order to build 
a structure which is essential to make a mini- 
mal work. So you have to improve the 
knowledge of the system, i.e. in our case, to 
add some semantic information on words in 
order to determine more precisely the rela- 
tions between them; 
• the syntactic parser of a such system must 
also be robust and produce an output even if 
the input is completely ill-formed. 
These problems lead us to define a new depend- 
ency parser which will be able to manipulate 
some semantic information and which will be 
error resistant. This work results in a prototype 
called CADET 2 of a dependency tree transducer, 
which we will describe in the following sections. 
3.1. A language for writing depend- 
ency grammars 
We have attempted to design a language for the 
description of dependency structures retaining 
the precision of Tesni~re's grammars, but more 
appropriate for automatic treatment. Our basic 
idea is that the governor-dependent relation 
should not be expressed for two categories in 
general, but for two words which are instancia- 
tions of these categories in a given sentence. We 
therefore think it is necessary, when describing a 
governor-dependent relation, to indicate the 
context in which the relation is valid. 
To build dependency structures, we must be able 
to determine, for any two words, caracterized by 
their lexical category: determiner, noun, verb ..... 
which one governs the other. More generally, 
given two dependency trees, we must know how 
to merge them into a unique tree. 
Example: 
N- V ----~ "V 
D N N N 
D 
(D)N, V(N) -> ((D)N)V(N) 
We have defined a language based on rewriting 
rules; each rule applies to a dependency forest 
2 Constructeur d'Arbres de DEpendances Tyl~s (Typed 
Dependency Trees Builder). 
98 
and produces a dependency tree. A set of such 
rules constitutes a dependency grammar which 
can be applied to a sentence by means of an 
interpreter. This interpreter is in fact a tree- 
transducer driven by the rules. 
Example of a simple rule: (the "-" begins 
comments) 
N_V \[ -- Name of the rule 
(I:{N), C0, SF:{pv))2:{V)) -- Forest 
=> 
( ( i, $F ) 2 ) \]-- Resulting tree 
This rule applies to any forest which includes a 
sequence of an N and a V, whose left dependents 
are only preverbal particles pv. It builds a new 
tree where the N is added as a dependent of the 
V. 
The advantage of these rules, compared to sim- 
ple binary relations, is that it is possible to ex- 
press the context of each category which 
appears. It is thus possible to restrict a governor 
to one or two dependents only, or to forbid 
more than one occurrence of a given category .... 
One can also define linked pairs of binary rela- 
tions, as for coordination conjunctions (C): 
N_C\[ 
(I:{N), 2:{C), 3:(N)) 
=> 
( ( 1 ) 2 ( 3 )) \] 
On the other hand, they present the drawback of 
the primitive dependency grammars: there must 
be a rule for almost every pair of lexical catego- 
ries (LC). To avoid this problem, we have cho- 
sen to use a hierarchy of LCs instead of the 
usual linear set of LCs (Genthial & al. 90). This 
hierarchy is a set, partially ordered by the is-a 
relation (figure 2). 
We can, in this manner, express very general 
rules like the two given above (NV and N_C) 
or more specific ones like: 
aux_~pas \[ 
(i: {xbe; xhave), 2: {pastp}) 
=> 
((1)2)\] 
By means ofis-a ((cnoun, pnoun}, N) and 
is-a ( {xbe, xhave, verb, pastp), V) 
relations, the N_V rule for instance may be ap- 
plied to all the following pairs of categories: 
(cnoun, xbe) (pnoun, xbe) 
(cnoun, xhave) (pnoun, xhave) 
(cnoun, verb) (pnoun, verb) 
(cnoun, pastp) (pnoun, pastp) 
We can thus define a set of basic categories 
which describe words in a very specific way, and 
use these categories for lexical indexing. The 
categories can then be grouped in ,~ meta- 
categories ~ according to the structures we want 
to build. Finally, we can write the rules which 
effectively build these structures. 
We can also write grammars in an incremental 
fashion, starting with the highest categories (e.g. 
N, V, A, C, P) then testing the rules on our cor- 
pus, progressively adding more precise rules for 
the lowest categories to treat specific phenom- 
ena. 
So, by using this method, we can avoid the usual 
compromise between a very fine set of LCs 
(which multiplies morphological ambiguities 
and syntactic rules) and a very general set 
(which multiplies syntactic ambiguities). We also 
obtain a fairly robust syntactic parsing: all un- 
known words are given the most general cate- 
gory (CLS), to which any rule can apply, thus 
an unknown word does not stop the parsing 
process. 
Similar type hierarchies have already been used 
in work on language semantics to represent the 
taxonomy of semantic types. We shall therefore 
use the same formalism for the representation of 
syntactic and semantic knowledge (see §3.3). 
N V A /on o  
cnoun ~xha~vVerb pastp adj 
We use the following abbreviations: cnoun and pnoun 
for common and proper nouns, xbe and xhave for the 
auxiliaries be and have, pastp for past participle, adj for 
adjective, P for preposition and C for coordination 
conjunction. 
Figure 2: Example of hierarchy 
3.2. Building dependency structures 
Given a set of rewriting rules, the tree transducer 
proceeds by a left to right scanning of the input 
text. Each time a word is recognized by the 
morphological parser, it is transmitted to the 
syntactic module which includes it in the current 
state of the analysis. As the data manipulated by 
the tree transducer must be trees or forests, each 
word is transformed in a one node tree, where 
the root bears the information associated to the 
word. 
In order to manage multiple interpretations of 
the same word or of the same sentence, the 
transducer maintains a list of forests where each 
99 
t 
, is of,he.entente. 
These forests, which are the current state of the 
analysis, are called stacks because each time a 
new word is recognized, a one node tree is 
I pushed on each forest and the parsing always _ _ 
resumes on the top of each forest. 
Given a list of stacks, the transducer applies each 
I applicable rule to the top of all stacks and each 
time a rule applies, a new stack is produced and 
added at the end of the list. Doing so, the trans- 
ducer will also apply the rules to the new stacks 
I produced, cyclically. If more than one rule ap- 
plies to a particular stack, more than one stack 
will be produced, but if at least one rule applies 
to a stack, this stack will be removed from the D I 
list. 
Example: (adapted from French) 
We consider only four categories: D,A, N, V (for 
I determiner, adjective, noun and verb) and we 
give the following very simple rules: 
D_N \[(I:{D}, 2:{N}) => ((I) 2) \] N ,N l A_N \[(I:{A), 2:{N}) 
=> ((i) 2) \] D" N_A \[(I:{N}, 2:{A)) => (i (2)) \] D 
N_V 2:(V)) => 2) \] //N /N ,4 
V_N \[(I:{V), 2:(N}) => (I (2)) \] D A D A 
n Figure 3 shows the evolution of the list of stacks 
during the parsing of the French nominal Figure 3: Stacks evolution phrase: ,~ la belle ferme ,, which is ambiguous 
and leads to the following sequence of catego- Our example gives three correct structures: 
I ties: - (la)belle(ferme) the firm beauty 
((la)belle)ferme the beauty closes 
DIA~N~ (la,belle)ferme the beautiful farm 
I The algorithm is guaranted to stop because we &l LV J 
We first introduce the word a la ,~ as a one node have added a constraint: rewriting rules are 
tree bearing the D category. As no rule can ap- written in such a way that the length of a stack must reduce each time a rule is applied to it. A 
i ply to this tree, we then introduce the word ,, belle ,, which is ambiguous. The ambiguity detailed discussion of termination and an 
gives two forests which are described on list (1). evaluation of the algorithm can be found in 
The D N rule applies to this list and gives list (2). (Genthial 91). 
N Introducing the word ferme leads to list 3.3. hierarchies (3), Type (( )) 
on which we detail rule application. So the rule We have chosen to represent knowledge about 
A_N applied to the second stack of the list pro- words and trees with a unique formalism: ~P- N duces a new forest (or stack) which is appended 
to the list. When the transducer ends with the terms (Ait-Kaci 84). ~P-terms are typed features 
original list, it finds the new produced stacks and structures which permit the description of types 
proceeds with them, applying grammar rules. (in the sense of classical programming lan- guages such as Pascal), i.e. sets of values. 
I The D_N rule will then be applied to the new 
produced forest (D, (A)N). The process stops Example: 
when the transducer reaches the end of the list UL(lex => "eats'; 
and, after removing the stacks where a rule has cat => verb; 
applied, we obtain list (4). subj => UL(sem => S:AN'DIA'I~) ; 
A correct interpretation (according to a given obj => UL(s~-'ta => O:EATABI~) ; 
grammar) of the input sentence can be found in sere => /2qGEST(agent => S; 
each stack which contains exactly one tree: this patient => O)) 
A tree is a dependency structure of the sentence. 
100 
The use of reference tags like s or 0 allows 
structure sharing, so W-terms are not trees but 
graphs. 
Simple types are defined in the signature which 
is a set partially ordered by the is-a relation. 
This order is extended to W-terms by the unique 
operation used to manipulate them: unification. 
The unification of two simple types is defined as 
the set of lower bounds of these two types (in 
the is-a relation). Unification allows implicit 
inheritance of properties, and can be efficiently 
implemented (Ai't-Kaci & al. 89). 
In our parser, a W-term is attached to each node 
of a tree and to transduction rules we have 
added expressions which enable us to test and 
modify those ~l'-terms. We can thus simultane- 
ously build a syntactic structure (dependency 
tree) and a semantic structure (W-term, which 
also contains morphological and syntactical 
information), and which is built by unification 
(see also (Hellwig 86) on the use of unification 
for dependency parsing). 
Example of rules and application: 
We have two words: 
UL(lex => "dog" ; 
cat => cnoun; 
sere => CANINE) 
UL (lex => "eats" ; 
cat => verb; 
subj => UL(sem => S:ANIMATE); 
obj => UL(sem => O:EATABLE); 
sem => INGEST(agent => S; 
patient => O)) 
and the rule: 
subject \[ (i: {N}, 2:{V}) 
/Unif (i, 2. subj) / -- Conditions 
=> 
( (1)2); 
ASSIGN (2. subj, i) ; \] -- Actions 
The root of the resulting tree is decorated by: 
UL(lex => "eats" ; 
cat => verb; 
subj => UL(lex => "dog'; 
cat => cnoun; 
sere => S:CANINE); 
obj => UL(sem => O:EATABLE); 
sem => INGEST(agent => S; 
patient => O)) 
3.4. Conclusion 
The use of a category hierarchy simplifies the 
writing of the rules and introduces a way of 
manipulating unknown words which is not part 
of the mechanisms of the system but which is 
integrated in the objects it manipulates. We can 
then write rules without thinking about ill- 
formedness (i.e. it is not necessary to make the 
rules tolerant because the tolerance is implicit in 
the system). 
More generally, the use of unification in con- 
junction with dependency parsing allow to build 
syntactic structures efficiently while having the 
pFossibility to make very fine descriptions with 
-terms. 

References 
Hassan Ait-Kaci (1984). A Lattice.Theoretic Approach 
to Computation Based on a Calculus of Partially- 
Ordered Type Structures. Ph.D., University of Penn- 
sylvania 1984. 
Hassan Air Kaci et al. (1989). Efficient implementa- 
tion of Lattice Operations. ACM Transactions on 
Programming Languages and Systems 11:1, pp. 116- 
146. 
Jacques Courtin (1973). Un analyseur syntaxique inter- 
actif pour la communication heroine.machine. Intl 
Conference of Computational Linguistics, Pise, Italy, 
August 1973, Vol. I. 
Jacques Courtin (1977). Algorithmes pour le traitement 
interactif des langues naturelles. Th~se d'~tat, Grenoble 
I, Octobre 1977. 
Damien Genthial, Jacques Courtin et Irene Kowarski 
(1990). Contribution of a Category Hierarchy to the 
Robustness of Syntactic Parsing. 13th CoLing, Hel- 
sinki, Finland, August 1990, Vol. 2, pp 139-144. 
Damien Genthial (1991 ). Contribution ~ la construction 
d'un syst~me robuste d'analyse du franfais. Th~se, Uni- 
versit~ Joseph Fourier, 10janvier 1991. 
D. Hays (1964). Dependency theory : a formalism cowl 
some observations. Language 40, pp. 511-525. 
Peter Heliwig (1986). Dependency Unification Gram- 
mar. 1 lth CoLing, Bonn, FRG, August 1986, pp 195- 
198. 
Daniel Sleator et Davy Temperley (1991). Parsing 
English with a Link Grammar. Technical Report CMU- 
CS-91-196, School of Computer Science, Pittsburgh, 
October 1991. 
V~xa Lucia Strube de Lima (1990). Contribution 
r~tude du traitement des erreurs au niveau lexico- 
syntaxique darts un texte ~crit en franfais. Th~se, Uni- 
versit~ Joseph Fourier, Mars 1990. 
Lucien Tesni~:re (1959). Eldments de syntaxe ~tu- 
rale. Klincksiek, Paris 
