Decision Procedures for Dependency Parsing Using Graded 
Constraints 
Wolfgang Menzel and Ingo SchrSder 
(menzel I ingo.schroeder@informatik.uni-hamburg.de ) 
Fachbereich Informatik, Universit~t Hamburg 
Vogt-K611n-Strage 30, 22527 Hamburg, Germany 
I Abstract 
We present an approach to the parsing of depen- 
dency structures which brings together the no- 
tion of parsing as candidate elimination, the use 
of graded constraints, and the parallel disam- 
biguation of related structural representations. 
The approach aims at an increased level of ro- 
bustness by accepting constraint violations in a 
controlled way, combining redundant and possi- 
bly conflicting information on different represen- 
tational levels, and facilitating partial parsing as 
a natural mode of behavior. 
1 Introduction 
Language understanding is based on a vari- 
ety of contributions from different representa- 
tional levels. From this perspective, one of the 
most attractive features of dependency based 
grammar models seems to be their relational 
nature which allows to accommodate various 
kinds of relationships in a very similar fashion. 
Since the basic representational framework is a 
rather general one it can be (re-)interpreted in 
many different ways. Thus, dependency rela- 
tions lend themselves to model the surface syn- 
tactic structure of an utterance (with labels like 
subject-of, direct-object-of, determiner-of, etc.), 
its thematic structure (with labels like agent-of, 
theme-of, etc.) and even the referential struc- 
ture (with labels like referential-identity, part- 
of, possessor-of, etc.). This representational 
similarity obviates the necessity to integrate too 
many disparate informational contributions into 
a single tree-like representation. Instead, rep- 
resentational levels can be separated from each 
other in a clean manner with appropriate map- 
pings being defined to relate the different com- 
ponents to each other. 
Another less obvious advantage of depen- 
dency formalisms is their suitability for the ap- 
Syn 
Sem 
: : : i 
Die Knochen sieht die Katze. 
The bones sees the cat. 
1 2 3 4 5 
Figure 1: Collection of dependency trees: Each 
tree represents a description level. 
plication of eliminative parsing techniques. In 
contrast to the traditional view on parsing as 
a constructive process, which builds new tree 
structures from elementary building blocks and 
intermediate results, eliminative approaches or- 
ganize structural analysis as a candidate elimi- 
nation procedure, removing unsuitable interpre- 
tations from a maximum set of possible ones. 
Hence, parsing is constructed as a strictly mono- 
tonic process of ambiguity reduction. 
In this paper we describe different algorith- 
mic solutions to eliminative parsing. The novel 
contribution consists in the use of graded con- 
straints, which allow to model traditional gram- 
mar regularities as well as preferences and de- 
faults. Details of the linguistic modeling are 
presented by Heinecke and SchrSder (1998). 
2 Ellmlnative Parsing 
The idea of eliminative parsing is not a novel one 
and virtually every tagger can be considered a 
candidate elimination procedure which removes 
items from the maximum set of tags accord- 
ing to different decision criteria. Interestingly, 
dependency-based parsing can be viewed as a 
78 
generalized tagging procedure. One of the first 
parsing systems which built on this property 
is the Constraint Grammar approach (Karlsson 
et al., 1995). Underspecified dependency struc- 
tures are represented as syntactic tags 1 and dis- 
ambiguated by a set of constraints that exclude 
inappropriate readings. Maruyama (1990) first 
tried to extend the idea to allow the treatment 
of complete dependency structures. Therefore, 
he has to generalize the notion of a"tag" to pairs 
consisting of a label and the identifier of the 
dominating node, i. e., the tagset needs to be- 
come sensitive to the individual tokens of the ut- 
terance under consideration sacrificing the sta- 
tus of the tagset being fixed a-priori. As in 
the case of atomic tags, constraints are specified 
which delete inappropriate dependency relations 
from the initial space of possibilities. The ap- 
proach is not restricted to linear input strings 
but can also treat lattices of input tokens, which 
allows to accommodate lexical ambiguity as well 
as recognition uncertainty in speech understand- 
ing applications (Harper et al., 1994). 
Obviously, it is again the relational nature of 
dependency models which provides for the ap- 
plicability of candidate elimination procedures. 
Since the initial state of the analysis is given by 
an - admittedly large - set of possible depen- 
dency relations per token, the problem space re- 
mains finite for finite utterances. An analogous 
approach for constituency-based grammar mod- 
els would encounter considerable difficulties, be- 
cause the number and the kind of non-terminal 
nodes which need to be included in the tagset 
remains completely unclear prior to the parsing 
itself. 
Eliminative approaches to parsing come along 
with a number of interesting properties which 
make them particularly attractive as computa- 
tional models for language comprehension. 
1. As long as constraint checking is restricted 
to strictly local configurations of depen- 
dency relations the decision procedures in- 
herits this locality property and thus ex- 
hibits a considerable potential for con- 
current implementation (Helzerman and 
lIn this framework tags denote, for instance, the sub- 
ject of the sentence, a determlner modifying a noun to 
the right, a preposition modifying a noun to the lei~ etc. 
However, only the category of the dominating node is 
specified, not its exact identity. 
79 
Harper, 1992). 
2. Since partial structural descriptions are 
available concurrently they can be com- 
pared in a competitive manner. Note how- 
ever that such a comparison imposes addi- 
tional synchronization and communication 
requirements on parallel realizations. 
3. As the elimlnative approach considers pars- 
ing a procedure of disambiguation, the 
quality of the results to be expected be- 
comes directly related to the amount of 
effort one is prepared to spend. This is 
a clear contrast to constructive methods 
which, upon request usually will attempt 
to generate alternative interpretations, thus 
leading to a corresponding decrease of clar- 
ity about the structural properties of the 
input utterance (in terms of Karlsson et al. 
(1995)). 
4. The progress of disambiguation can easily 
be assessed by constantly monitoring the 
size of value sets. Moreover, under certain 
conditions the amount of remaining effort 
for obtaining a completely disambiguated 
solution can be estimated. This appears 
to be an important characteristic for the 
development of anytime procedures, which 
are able to adapt their behavior with re- 
spect to external resource limitations (Men- 
zel, 1994; Menzel, 1998). 
3 Graded Constraints 
Both the comparison of competitive structural 
hypotheses as well as the adaptation to resource 
limitations require to generalize the approach by 
allowing constraints of different strength. While 
traditional constraints only make binary deci- 
sions about the well-formedness of a configura- 
tion the strength of a constraint additionally re- 
fiects a human judgment of how critical a viola- 
tion of that particular constraint is considered. 
Such a grading, expressed as a penalty factor, 
allows to model a number of observations which 
are quite common to linguistic structures: 
• Many phenomena can more easily be de- 
scribed as preferences rather than strict 
regularities. Among them are structural 
conditions about attachment positions or 
linear ordering as well as selectional restric- 
tions. 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
Preferences usually reflect different fre- 
quencies of use and in certain cases can be 
extracted from large collections of sample 
data. 
Some linguistic cues are inherently uncer- 
tain (e. g., prosodic markers), and therefore 
resist a description by means of crisp rule 
sets. 
By introducing graded constraints the pars- 
ing problem becomes an optimiT.ation problem 
aiming at a solution which violates constraints 
that are as few and as weak as possible. This, 
on the one hand, leads to a higher degree of 
structural disambiguation since different solu- 
tion candidates now may receive a different 
score due to preference constraints. Usually, a 
complete disambiguation is achieved provided 
that enough preferential knowledge is encoded 
by means of constraints. Remaining ambigu- 
ity which cannot be constrained further is one 
of the major ditticulties for systems using crisp 
constraints (Harper et al., 1995). On the other 
hand, weighed constraints allow to handle con- 
tradictory evidence which is typical for cases of 
ill-formed input. Additionally, the gradings are 
expected to provide a basis for the realization of 
time adaptive behavior. 
One of the most important advantages which 
can be attributed to the use of graded con- 
straints is their ability to provide the mapping 
between different levels in a multi-level repre- 
sentation, where many instances of preferen- 
tial relationships can be found. This separation 
of structural representations facilitates a clear 
modularization of the constraint grammar al- 
though constraints are applied to a single com- 
putational space. In particular, the propaga- 
tion of gradings between representational lev- 
els supports a mutual compensation of informa- 
tion deficits (e. g., a syntactic disambiguation 
can be achieved by means of semantic support) 
and even cross-level conflicts can be arbitrated 
(e. g., a syntactic preference might be inconsis- 
tent with a selectional restriction). 
Combining candidate elimination techniques, 
graded constraints, and multi-level disambigua- 
tion within a single computational paradigm 
aims first of all at an increased level of robust- 
ness of the resulting parsing procedure (Menzel 
and Schr~der, 1998). Robustness is enhanced by 
80 
three different contributions: 
1. The use of graded constraints makes con- 
straint violations acceptable. In a cer- 
tain sense, the resulting behavior can be 
considered a kind of constraint retraction 
which is guided by the individual grad- 
ings of violated constraints. Therefore, a 
"blind" weakening of the constraint system 
is avoided and hints for a controlled appli- 
cation are preserved. 
2. The propagation of evidence among multi- 
ple representational levels exploits the re- 
dundancy of the grammar model about dif- 
ferent aspects of language use in order to 
compensate the loss of constraining infor- 
mation due to constraint retraction. Natu- 
rally, the use of additional representational 
levels also means an expansion of the search 
space, but this undesired effect can be dealt 
with because once a single point of relative 
certainty has been found on an arbitrary 
level, the system can use it as an anchor 
point from which constraining information 
is propagated to the other levels. For in- 
stance, if selectional restrictions provide 
enough evidence for a particular solution, 
an ambiguous case can be resolved. Even 
contradictory indications can be treated in 
that manner. In such a case conflict resolu- 
tion is obtained according to the particular 
strength of evidence resulting from the ob- 
served constraint violations. 
3. The seamless integration of partial parsing 
is achieved by allowing arbitrary categories 
(not just finite verbs) to serve as the top 
node of a dependency tree. Of course, these 
configurations need to be penalized appro- 
priately in order to restrict their selection 
to those cases where no alternative inter- 
pretations remain. Note that under this 
approach partial parsing is not introduced 
by me~n.~ of an additional mechanism but 
falls out as a general result of the underly- 
ing parsing procedure. 
Certainly, all the desired advantages men- 
tioned above become noticeable only if a con- 
straint modeling of grammatical relations can be 
provided which obeys the rather restrictive lo- 
cality conditions and efficient implementations 
of the disambiguation procedure become avail- 
able. 
4 Parsing As Constroint Satisfaction 
Parsing of natural language sentences can be 
considered a constraint satisfaction problem if 
one manages to specify exactly what the con- 
straint variables should be, how constraints can 
be used to find appropriate value assignments, 
and how these value assignments represent spe- 
cific structural solutions to the parsing problem. 
These are the questions we address in this sec- 
tion. 
The original definition of constraint de- 
pendency grammars by Maruyama (1990) 
is extended to graded constraint dependency 
grammars which are represented by a tuple 
(~, L, C, ~).. The lexicon !~ is a set of word 
forms each of which has some lexical informa- 
tion associated with it. The set of represen- 
tational levels L = {(/x, Lx),..., (l,,L,)} con- 
sists of pairs (li, Li) where li is a name of the 
ith representational level and l~ E Li is the 
jth appropriate label for level l,. Think of 
(5yn, {subj, obj, det}) as a simple example of a 
representational level. 
The constraints from the set C can be di- 
vided into disjunct subsets C* with C = Ui Ci 
depending on the constraints' arity i which de- 
notes the number of constraint variables related 
by the constraint. Mainly due to computational 
reasons, but also in order to keep the scope of 
constraints strictly local, at most binary con- 
straints, i. e., constraints with arity not larger 
then two, are considered: C = C 1 U C2. 2 
The assessment function ~ : C ~ \[0, 1\] maps 
a constraint c E C to a weight ~b(c) which in- 
dicates how serious one considers a violation of 
that constraint. Crisp constraints which may 
not be violated at all, i. e., they correspond to 
traditional constraints, have a penalty factor of 
zero (~(c) = 0) while others have higher grades 
(i. e., 0 < ~b(e) < 1) and thus may be violated 
by a solution. 3 
=The restriction to at most binary constraints does 
not decrease the theoretical expressiveness of the formal- 
ism but has some practical consequences for the gram- 
mar writer as he/she occasionally has to adopt rather 
artificial constructs for the description of some linguistic 
phenomena (Menzel and SchrSder, 1998). 
3Constraints c with ~b(c) = 1.0 are totally ineffective 
as will become clear in the next paragraphs. 
Given a natural language sentence W = 
(wl,... ,win) and a graded constraint depen- 
dency grammar the parsing problem can be 
stated as follows: For each representational 
level Ii and each word of the sentence wj a 
constraint variable ~ is established. Let the 
set of all constraint variables be V. The do- 
main dom(~) = Li x {0,1,..., j- 1, j-I- 1,... n} 
of variable ~, i. e., the set of possible values for 
that variable, consists of all pairs (l,/=) where ! 
is an appropriate label for level li (i. e., l E Li) 
and/¢ is the index of the dominating word wk 
(i. e., word wj is subordinated to word wh) or 
zero if the word wj is the root of the dependency 
structure on level li. 
A problem candidate p of the parsing prob- 
lem is a unique value assignment to each of the 
constraint variables. In other words, for each 
variable ~ a single value p(~) = d~ E dom(~) 
has to be chosen. 
The solution is the problem candidate that 
violates less and/or less important constraints 
than any other problem candidate. In order to 
make this intuitive notion more formal the func- 
tion ~ is extended to assess not only constraints 
but also problem candidates p. 
= R TIII 
a cEC o ~EV = 
where a, 1 < a < 2, is the arity 
and ~ is a tuple of variables 
A single constraint c can be violated once, 
more than once or not at all by a problem candi- 
date since constraints judge local configurations, 
not complete problem candidates. 
~(c, ~ = { ~b(C)l.0 :: ifelsedViolates c 
where dis a (unary or binary) tu- 
pie of values 
Note that satisfying a constraint does not 
change the grade of the problem candidate be- 
cause of the multiplicative nature of the assess- 
ing function. 
The final solution Ps is found by maximum 
selection. 
p. = argm= 
81 
I 
I 
I 
I 
i 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
Thus the system uniquely determines the 
dominating node for each of the input word 
forms. Additional conditions for well-formed 
structural representations like projectivity or 
the absence of cyclic dependency relations must 
be taken extra care of. 
In our current implementation the acyclico 
it), property is ensured by a special built-in 
control structure, while projectivity has to be 
established by means of specifically designed 
constraints. This enables the grammar writer 
to carefully model the conditions under which 
non-projective dependency structures may oc- 
cur. Note, however, that there are cases of non- 
projective structures that cannot be eliminated 
by using only local (i. e. at most binary) con- 
straints. 
Another problem arises from the fact that 
constraints are universally quantified and exis- 
tence conditions (like "there must be a subject 
for each finite verb") c~nnot be expressed di- 
rectly. This diilqculty, however, is easily over- 
come by the introduction of "reverse" depen- 
dencies on additional auxiliary levels, which are 
used to model the valency requirements of a 
dominating node. Since each valency to be sat- 
urated requires an auxiliary level, the overall 
number of levels in a multi-level representation 
may easily grow to more than some ten. 
Moreover, the formal description given so far 
is only valid for linear strings but not for word 
graphs. An extension to the treatment of word 
graphs requires the modification of the notion of 
a problem candidate. While in the case of linear 
input an assignment of exactly one value to each 
variable represents a possible structure, this is 
not valid for word graphs. Instead, only those 
variables that correspond to a word hypothesis 
on one particular path through the word graph 
must receive a unique value while all other vari- 
ables must be assigned no value at all. This 
additional path condition usually is also not en- 
coded as normal grammar constraints but must 
be guaranteed by the control mechanism. 
5 An Example 
To illustrate the formalization we now go 
through an example. To avoid unnecessary de- 
tails we exclude the treatment of auxiliary levels 
from our discussion, thus restricting ourselves 
to the modeling of valency possibilities and ab- 
82 
stracting from valency necessities. The problem 
is simplified further by selecting an extremely 
limited set of dependency labels. Consider again 
the example from Figure 1: 
(I) Die Knochenpl siehtsg die Katzesg. 
The bones sees the cat. 
"The cat sees the bones." 
Two representational levels, one for syntactic 
functions and one for semantic case-fillers, are 
introduced: 
L = { (Syn, {subj, oSj, det}), 
(Sam, {agent, theme, def}) } 
Figure 2 contains some of the constraints nec- 
essary to parse the example sentence. Basically, 
a constraint consists of a logical formula which 
is parameterized by variables (in our example X 
and Y) which can be bound to an edge in the 
dependency tree. It is associated with a name 
(e. g., SubjNumber) and a class (e. g., Subj) for 
identification and modularization purposes re- 
spectively. The constraint score is given just 
before the actual formula. Selector functions 
are provided which facilitate access to the label 
of an edge (e. g., X.labe\[) and to lexical prop- 
erties of the dominating node (e. g., X1"num) 
and the dominated one (e. g., X~num). Being 
universally quantified, a typical constraint takes 
the form of an implication with the premise de- 
scribing the conditions for its application. Ac- 
cordingly, the constraint SubjNumber of Figure 2 
reads as follows: For each subject (X.\[abel=subj) 
it holds that the dominated and the dominating 
nodes agree with each other in regard to n-tuber 
(X,i.num=X'tnum). 
Figure 1 from the introduction graphically 
presents the desired solution structure which is 
repeated as a constraint variable assignment in 
Figure 3. 4 
All (shown) constraints are satisfied by the 
variable assignment except SubjOrder which is 
violated once, viz. by the assignment _V~y n = 
(suSj, 3). Therefore, the structure has a score 
4The presentation of solutions as dependency trees 
becomes less intuitive as soon as more levels, especially 
auxiliary levels, are introduced. 
iili 
{X} : SubjNumber : Subj : 0.1 : 
X.label=subj -~ X~num=Xtnum 
'Subjects agree with finite verbs regarding number.' 
{X} : SubjOrder : Subj : 0.9 : 
X.label=subj -F X~pos<Xtpos 
'Subjects are usually placed in front of the verb. 
{X} : SemType : SelRestr : 0.8 : 
X.label (E { agent, theme } -~ 
type_match( Xlid. X.label, X,I.id ) 
'Verbs restrict the semantic types of their 
arguments.' 
{X, Y} : SubjAgentObjTheme : Mapping : 0.2 : 
XTid=YTid A X~id=Y~id -~ 
( X.label=subj ~-~ Y.label=agent ) ^ 
( X.label=obj ~ Y.label=theme ) 
'The subject is the agent and the object the theme.' 
{X, Y} : Unique : Unique : 0.0 : 
X.label E {subj,obj,agent,theme} ^ Xtid=Ytid 
-4 Y.labelpX.label 
'Some labels are unique for a given verb.' 
Figure 2: Some of the constraints needed for 
the disambiguation of the example sentence. 
equal to the constraint's score, namely 0.9. Fur- 
thermore there is no structure which has a bet- 
ter assessment. The next example is similar to 
the last, except that the finite verb appears in 
plural form now. 
(2) Die Knochenpl sehenpl die Katzesg. 
The bones see the cat. 
"The bones see the cat." 
A solution structure analogous to the one dis- 
cussed above would have a score of 0.09 be- 
cause not only the constraint SubjOrder but also 
the constraint SubjNumber would have been vi- 
olated. But the alternative structure where the 
V~y n = (det,2) VSem = (de:f, 2) 
V~yn = (obj,3) V~e m = (theme, 3) 
V~y n = (root, O) V~e m = (root,0) 
U~y n = (det, 5) V~e m = (de.f, 5) 
"~yn = (sub.j, 3) V~e m = (agent, 3) 
Figure 3: Constraint variable assignments cor- 
responding to the dependency trees in Figure 1 
subj/agent and the obj/theme edges are inter- 
changed (meaning that the bones do the seeing) 
has a better score of 0.8 this time because it only 
violates the constraint SemType. This result ob- 
viously resembles performance of human beings 
who first of all note the semantic oddness of the 
example (2) before trying to repair the syntactic 
deviations when reading this sentence in isola- 
tion. 
Thus, the approach successfully arbitrates be- 
tween conflicting information from different lev- 
els, using the constraint scores to determine 
which of the problem candidates is chosen as 
the final solution. 
6 Constraint Satisfaction Procedures 
A lot of research has been carried out in the field 
of algorithm~ for constraint satisfaction prob- 
lems (Meseguer, 1989; Kumar, 1992) and con- 
straint optimization problems (Tsang, 1993). 
Although CSPs are NP--complete problems in 
general and, therefore, one cannot expect a bet- 
ter than exponential complexity in the worst 
case, a lot of methods have been developed to 
allow for a reasonable complexity in most practi- 
cal cases. Some heuristic methods, for instance, 
try to arrive at a solution more efficiently at the 
expense of giving up the property of correctness, 
i. e., they find the globally best solution in most 
cases while they are not guaranteed to do so in 
all cases. 
This allows to influence the temporal char- 
acteristics of the parsing procedure, a possibil- 
ity which seems especially important in interac- 
tive applications: If the system has to deliver a 
reasonable solution within a specific time inter- 
val a dynamic scheduling of computational re- 
sources depending on the remaining ambiguity 
and available time is necessary. While differ- 
ent kinds of search are more suitable with re- 
gard to the correctness property, local pruning 
strategies lend themselves to resource adaptive 
procedures. 
6.1 Consistency-Based Methods 
As long as only crisp constraints are considered, 
procedures based on local consistency, particu- 
larly arc consistency can be used (Maruyama, 
1990; Harper et al., 1995). These methods try 
to delete values from the domain of constraint 
variables by considering only local information 
and have a polynomial worst case complexity. 
83 
Unfortunately, they possibly stop deleting val- 
ues before a unique solution has been found. In 
such a case, even if arc consistency has been es- 
tablished one cannot be sure whether the prob- 
lem has zero, one, or more than one solution 
because alternative value assignments may be 
locally consistent, but globally mutually incom- 
patible. Consequently, in order to find actual 
solutions an additional search has to be carried 
out for which, however, the search space is con- 
siderably reduced already. 
6.2 Search 
The most straightforward method for constraint 
parsing is a simple search procedure where the 
constraint variables are successively bound to 
values and these value assignments are tested for 
consistency. In case of an inconsistency alterna- 
tive values are tried until a solution is found or 
the set of possible values is exhausted. The ba- 
sic search algorithm is Branch & Bound which 
exploits the fact that the score of every subset of 
variable assignments is already an upper bound 
of the final score. Additional constraint viola- 
tions only make the score worse, bemuse the 
scores of constraints do not exceed a value of 
one. Therefore, large parts of the search space 
can be abandoned as soon as the score becomes 
too low. To further ~ improve the efficiency an 
agenda is used to sort the search space nodes 
so that the most promising candidates are tried 
first. By not allowing the agenda to grow larger 
than a specified size, one can exclude search 
states with low scores from further considera- 
tion. Note that correctness cannot be guaran- 
teed in that case anymore. Figure 4 presents the 
algorithm in pseudo code notation. 
Unfortunately, the time requirements of the 
search algorithms are almost unpredictable since 
an intermediate state of computation does not 
give a reliable estimation of the effort that re- 
mains to be done. 
6.3 Pruning 
As explained in Section 6.1 consistency-based 
procedures use local information to delete values 
from the domain of variables. While these meth- 
ods only do so if the local information suffices 
to guarantee that the value under consideration 
can safely be deleted, pruning goes one step fur- 
ther. Values are successively selected for dele- 
tion based on a heuristic (i. e., possibly incor- 
procedure ConstraintSearch 
set b := 0.0 ; best score so far 
set r := 0 ; set of solutions 
set a := {(0, V, 1.0)) ; agenda 
while a ~ 0 do ; process agenda 
get best item (a, V, s) from agenda a ; best first 
if V = 0 then 
if a = b then 
add (B, V, s) to r 
else 
set r := {(/3, IF, ,)} 
set b :-- a 
fl 
fl 
select v E V 
set V' := V\{~} 
foreach d E dora(v) do 
set B' := B u 
compute new score s ~ for B ~ 
if s ~ ~ b then add (B', V', s') 
to agenda 
fl 
done 
truncate agenda a (if desired) 
done 
; complete assignment? 
; best so far? 
; equally good 
; better 
; try next free variable 
; try all values 
; already worse? 
Figure 4: Search procedure for constraint pars- 
ing: Best-first branch & bound algorithm with 
limited agenda length (beam search) 
rect) assessment until a single solution remains 
(cf. Figure 5). The selection function considers 
only local information (as do the consistency- 
based methods) for efficiency reasons. Taking 
into account global optimality criteria would not 
help at all since then the selection would be as 
difficult as the whole problem, i. e., one would 
have to expect an exponential worst-case com- 
plexity. 
procedure pruning(V) 
while 3(~fi V): Idom(~)~ > I do 
select (~, d) to be deleted 
delete d from domain 
done 
Figure 5: The pruning algorithm repeatedly 
selects and deletes values from the domain of 
variables. 
Obviously, the selection heuristics plays the 
major role while pruning. 
Simple selection functions only consider the 
minimum support a value gets from another 
variable (Menzel, 1994). They combine the mu- 
tual compatibilities of the value under consid- 
eration and all possible values for another vari- 
84 
able. Then the minimum support for each value 
is determined and finally the value with the least 
support is selected for deletion. In other words, 
the value for which at least one variable's values 
have only low or no support is ruled out as a 
possible solution. 
Formally the following formulas (using the no- 
tation of Section 4) determine the value d of 
variable # to deleted next: 
score(d, d') 2 
s(v, d) = rni, d'~do,,,(,/) vev\{ } 
Idom(v')l 
(~, d) = axg min,(v, d) (~,d) 
where score(d, d') is the accumulated assessment 
of the pair of subordinations (d, d ~): 
score(d,d') = I\] ¢(c,d).¢(c,d'). I'\[ ¢(c,d,d') 
c6C z c6C 2 
While this heuristics works quite well for lin- 
ear strings it fails if one switches to word graphs. 
Figure 6 gives an example of a very simple word 
graph which cannot be handled correctly by the 
simple heuristics. 
i laughs 
start The children ! ~ stop 
0---~ ,,----'-~ • = e:. X...~j ,---~ 0 
: laugh 
0 1 2 3 
Figure 6: Simple word graph 
Alternative word hypotheses whose time 
spans are not disjunct do not support each other 
by definition. Therefore, the subordination of 
children under laugh in Figure 6 is equally disfa- 
vored by laughs as is the subordination of chil- 
dren under laughs by laugh. Unfortunately, this 
lack of support is not based on negative evidence 
but on the simple fact that laugh and laughs 
are temporal alternatives and may, thus, not 
be existent in a solution simultaneously. Since 
the simple heuristics does not know anything 
about this distinction it may arbitrarily select 
the wrong value for deletion. 
A naive extension to the above heuristics 
would be to base the assessment not on the min- 
imal support from all variables but on the corn- 
85 
bined support from those variables that share at 
least one path through the word graph with the 
variable under consideration. But the path cri- 
terion is computational\]y expensive to compute 
and, therefore, needs to be approximated dur- 
ing pruning. Instead of considering all possible 
paths through the graph, we compute the max- 
imum support at each time point t and on each 
level I and select the minimum of these values 
to be removed from the space of hypotheses: 
score(d, d') 2 
dr Edom(v * ) a(v,d,Z,t) = max 
Idom(v')l ~etime(,') 
level(.l)=~ 
8(v,d) = ,(v, d, Z, t) t,l 
= argmins(v.d) 
where time(v) denotes the time interval of the 
word hypothesis (cf. Figure 6 or 7) that corre- 
sponds to the variable v and level(v) denotes the 
representational level of variable v. 
For temporally overlapping nodes the proce- 
dure selects a single one to act as a representa- 
tive of all the nodes within that particular time 
slice. Therefore, information about the exact 
identity of the node which caused the lack of 
support is lost. But since the node which gives 
a maximum support is used as a time.slice rep- 
resentative it seems likely that any other choice 
might be even worse. 
Although preliminary experiments produced 
promising results (around 3 % errors) it can be 
expected that the quality of the results depends 
on the kind of grammar used and utterances an- 
alyzed. Since the problem deserves further in- 
vestigation, it is too early to give final results. 
The example in Figure 7 shows a simple case 
that demonstrates the shortcomings of the re- 
fined heuristics. Although these and children are 
not allowed to occur in a solution simultane- 
ously, exactly these two words erroneously re- 
main undeleted and finally make up the subject 
in the analysis. First, all values for the article a 
are deleted because of a missing number agree- 
ment with the possible dominating nodes and 
thereafter the values for the word houses are dis- 
carded since the semantic type does not match 
the selectional restrictions of the verb very well. 
I 
I 
I 
I 
I 
I 
I 
I 
I 
i 
I 
I 
I 
i 
I 
I 
I 
! 
The heuristics is not aware of the distinction 
between the time points and word graph nodes 
and, therefore, counts the determiner these as 
supporting the noun children. 
• a • children 
0 ---~,-- O~,,~ 0~0 
these 0 1 2 3 
Figure 7: Hypothetic simplified word graph 
which may be analyzed "incorrectly" by the time 
slice pruning heuristics. 
7 Efficiency Issues 
Although pruning strategies bear a great po- 
tential for efficient and time-adaptive parsing 
schemes, the absolute computational expenses 
for a "blind" application of constraints are still 
unacceptably high. Additional techniques have 
to be employed to decrease actual computation 
times. One of the starting points for such im- 
provements is the extremely large number of 
constraint evaluations during parsing: A few 
million constraint checks are quite common for 
realistic grammars and sentences of even modest 
size. 
Two approaches seem to be suitable for the re- 
duction of the number of constraint evaluations: 
• Reduced application of constraints: A de- 
tailed analysis of how constraints are ap- 
plied and under what circumstances they 
fail shows that most constraint checks are 
Figure 8: Window of the graphical grammar 
environment xcdg 
86 
'~seless" since the tested constraint is sat- 
isfied for some trivial reason. For in- 
stance, because most constraints are very 
specific about what levels are constrained 
and whether and how the dependency edges 
are connected, this information can be ex- 
ploited in order to reduce the number of 
constraint checks. By applying constraints 
only to the relevant levels the number of 
constraint evaluation has been cut down 
to (at most) 40%. Taking into account 
the topological structure of the edges un- 
der consideration improves the efficiency by 
another 30% to 50%. 
Reduction of the number of constraint vari- 
ables: A typical grammar contains a rela- 
tively large number of representational lev- 
els and for most word forms there are sev- 
eral entries in the lexicon. Since the lexi- 
cal ambiguity of the word form usually is 
relevant only to one or very few levels, con- 
straint variables need not be established for 
all lexical entries and all levels. For in- 
stance, the German definite determiner die 
has eight different morpho-syntactic feature 
combinations if one only considers varia- 
tions of gender, case, and number. All these 
forms behave quite similarly with respect 
to non-syntactic levels. Consequently, it 
makes no difference if one merges the con- 
straint variables for the non-syntactic lev- 
els except that now less constraint checks 
must be carried out. By considering the 
relevance of particular types of lexical am- 
biguity for constraint variables of different 
levels one achieves an efficient treatment of 
disjunctive feature sets in the lexicon (Foth, 
1998). This technique reduced the time re- 
quirements by 75% to 90% depending on 
the details of the grammatical modeling. In 
particular, a clean modularization, both in 
the constraint set and the dictionary en- 
tries, results in considerable gains of effi- 
ciency. 
In order to support the grammar writer, a 
graphical grammar environment has been devel- 
• oped (cf. Figure 8). It includes an editor for 
dependency trees (cf. Figure 9) which allows to 
detect undesired constraint violations easily. 
I 
Figure 9: Window of the editor for dependency 
trees 
8 Conclusion 
A parsing approach aiming at dependency struc- 
tures for different representational levels has 
been presented. The approach improves in ro- 
bustness by assessing partial structures, inte- 
grating multiple representational levels, and em- 
ploying partial parsing techniques. Knowledge 
about the grammar but also extralinguistic in- 
formation about the domain under considera- 
tion is encoded by meaus of graded constraints 
which allows for the arbitration between con- 
flicting information. Different decision proce- 
dures for the defined parsing problem have been 
introduced and some efficiency issues have been 
discussed. 
The approach has successfully been applied 
to a number of modestly sized projects (Menzel 
and Schr~Sder, 1998; Heinecke et al., 1998). 
Further investigations will focus on possibili- 
ties for incremental processing of speech input 
and the realization of resource adaptive behav- 
ior. 

References 
Kilian Foth. 1998. Disjunktive Lexikoninforma- 
tion im eliminativen Parsing. Studienarbeit, 
FB Informatik, Universit~t Hamburg. 
Mary P. Harper, L. H. Jarnieson, G. D. Mitchell, 
G. Ying, S. Potisuk, P. N. Srinivasan, 
R. Chen, C: B. Zoltowski, L. L. McPheters, 
B. Pellom, and R. A. Helzerman. 1994. In- 
tegrating language models with speech recog- 
nition. In Proceedings of the AAAI-9~ Work- 
shop on the Integration of Natural Language 
and Speech Processing, pages 139-146. 
Mary P. Harper, Randall A. Helzermann, C. B. 
Zoltowski, B. L. Yeo, Y. Ohan, T. Stew- 
ard, and B. L. Pellom. 1995. Implementa- 
tion issues in the development of the PARSEC 
parser. Software - Practice and Experience, 
25(8):831-862. 
Johannes Heineeke and Ingo Schr'6der. 1998. 
Robust analysis of (spoken) language. In 
Proc. KONVENS '98, Bonn, Germany. 
Johannes Heinecke, Jiirgen Kunze, Wolfgang 
Menzel, and Ingo SchrSder. 1998. Elimina- 
tire parsing with graded constraints. In Proc. 
Joint Conference COLING/A CL '98. 
Randall A. Helzerman and Mary P. Harper. 
1992. Log time parsing on the MasPar MP-1. 
In Proceedings of the 6th International Con- 
ference on Parallel Processing, pages 209-217. 
Fred Karlsson, Atro Voutilainen; Julaa Heikkil~i, 
and Arto Anttila, editors. 1995. Constraint 
Grammar - A Language-Independent System 
for Parsing Unrestricted Tezt. Mouton de 
Gruyter, Berlin, New York. 
Vipin Kumar. 1992. Algorithms for constraint 
satisfaction problems: A survey. A1 Maga- 
zine, 13(1):32--44. 
Hiroshi Maruyama. 1990. Structural disam- 
biguation with constraint propagation. In 
Proceedings of the e8th Annual Meetin 9 of the 
A CL, pages 31-38, Pittsburgh. 
Woffgang Menzel and Ingo SchrSder. 1998. 
Constraint-based diagnosis for intelligent lan- 
guage tutoring systems. In Proceedings of 
the ITFJKNOWS Conference at the 1FIP '98 
Congress, Wien/Budapest. 
Woffgang Menzel. 1994. Parsing of spoken lan- 
guage under time constraints. In A. Cohn, ed- 
itor, Proceedings of the 11th European Confer- 
ence on Artificial Intelligence , pages 560-564, 
Amsterdam. 
Wolfgang Menzel. 1998. Constraint Satisfac- 
tion for Robust Parsing of Spoken Language. 
Journal for Experimental and Theoretical Ar- 
tificial InteUigence, 10:77-89. 
Pedro Meseguer. 1989. Constraint satisfaction 
problems: An overview. A1 Communications, 
2(1):3-!7. 
E. Tsang. 1993. Foundations of Constraint Sat- 
isfaction. Academic Press, Harcort Brace and 
Company, London. 
