Parsing preferences with Lexicalized Tree Adjoining Grammars : 
exploiting the derivation tree 
Alexandra KINYON 
TALANA 
Universite Paris 7, case 7003, 
2pl Jussieu 75005 Paris France 
Alexandra.Kinyon@linguist.jussieu.fr 
Abstract 
Since Kimball (73) parsing preference 
principles such as "Right association" 
(RA) and "Minimal attachment" (MA) are 
often formulated with respect to 
constituent trees. We present 3 preference 
principles based on "derivation trees" 
within the framework of LTAGs. We 
argue they remedy some shortcomings of 
the former approaches and account for 
widely accepted heuristics (e.g. 
argument/modifier, idioms...). 
Introduction 
The inherent characteristics of LTAGs (i.e. 
lexicalization, adjunction, an extended domain of 
locality and "mildly-context sensitive" power) 
makes it attractive to Natural Language 
Processing : LTAGs are parsable in polynomial 
time and allow an elegant and 
psycholinguistically plausible representation of 
natural language 1. Large coverage grammars 
were developed for English (Xtag group (95)) 
and French (Abeille (91)). Unfortunately, "large" 
grammars yield high ambiguity rates : Doran & 
al. (94) report 7.46 parses / sentence on a WSJ 
corpus of 18730 sentences using a wide coverage 
English grammar. Srinivas & al. (95) formulate 
domain independent heuristics to rank parses. 
But this approach is practical, English-oriented, 
not explicitly linked to psycholinguistic results, 
and does not fully exploit "derivation" 
i e.g. Frank (92) discusses the psycholinguistic 
relevance of adjunction for Children Language 
Acquisition, Joshi (90) discusses psycholinguistic 
results on crossed and serial dependencies. 
information. In this paper, we present 3 
disambiguation principles which exploit 
derivation trees. 
1, Brief presentation of LTAGs 
A LTAG consists of a finite set of 
elementary trees of finite depth. Each 
elementary tree must <<anchor>> one or more 
lexical item(s). The principal anchor is called 
daead>>, other anchors are called <<co-heads>>. All 
leaves in elementary trees are either <<anchor>>, 
<<foot node>> (noted *) or <<substitution node>> 
(noted $). These trees are of 2 types • auxiliary 
or initial 2. A tree has at most 1 foot-node, such a 
tree is an auxiliary tree. Trees that are not 
auxiliary are initial. Elementary trees combine 
with 2 operations : substitution and adjunetion. 
Substitution is compulsory and is used essentially 
for arguments (subject, verb and noun 
complements). It consists in replacing in a tree 
(elementary or not) a node marked for 
substitution with an initial tree that has a root of 
same category. Adjunction is optional (although 
it can be forbidden or made compulsory using 
specific constraints) and deals essentially with 
determiners, modifiers, auxiliaries, modals, 
raising verbs (e.g. seem). It consists in inserting 
in a tree in place of a node X an auxiliary tree 
with a root of same category. The descendants of 
X then become the descendants of the foot node 
of the auxiliary tree. Contrary to context-free 
rewriting rules, the history of derivation must be 
made explicit since the same derived tree can be 
obtained using different derivations. This is why 
parsing LTAGs yields a derivation tree, from 
2 Traditionally initial trees are called o~, and 
auxiliary trees 13 
585 
which a derived tree (i.e. constituent tree) can be 
obtained. (Figure 1) 3 . Branches in a derivation 
tree are unordered. 
Moreover, linguistic constraints on the well- 
formedness of elementary trees have been 
formulated : 
• Predicate Argument Cooccurence Principle : 
there must be a leaf node for each realized 
argument of the head of an elementary tree. 
• Semantic consistency : No elementary tree is 
semantically void 
• Semantic minimality : an elementary tree 
corresponds at most to one semantic unit 
2. Former results on parsing preferences 
A vast literature addresses parsing preferences. 
Structural approaches introduced 2 principles : 
RA accounts for the preferred reading of the 
ambiguous sentence (a) : "yesterday" attaches to 
"left" and not to "said" (Kimball (73)). 
MA accounts for the preferred reading of (b) : 
"for Sue" attaches to "bought" and not to 
"flowers" (Frazier & Fodor (78)) 
(a) Tom said that Joe left yesterday 
(b) Tom bought the flowers for Sue 
These structural principles have been criticized 
though : Among other things, the interaction 
between these principles is unclear. This type of 
approach lacks provision for integration with 
semantics and/or pragmatics (Schubert (84)), 
does not clearly establish the distinction between 
arguments and modifiers (Ferreira & Clifton 
(86)) and is English-biased : evidence against RA 
has been found for Spanish (Cuetos & Mitchell 
(88)) and Dutch (Brysbaert & Mitchell (96)). 
Some parsing preferences are widely accepted, 
though: 
The idiomatic interpretation of a sentence is 
favored over its literal interpretation (Gibbs & 
Nayak (89)). 
Arguments are preferred over modifiers (Abney 
(89), Britt & al. (92)). 
Additionally, lexical factors (e.g. frequency of 
subcategorization for a given verb) have been 
shown to influence parsing preferences (I-Iindle & 
Rooth (93)). 
It is striking that these three most consensual 
types of syntactic preferences tum out to be 
difficult to formalize by resorting only to 
"constituent trees" , but easy to formalize in 
terms of LTAGs. 
Before explaining our approach, we must 
underline that the examples 4 presented later on 
are not necessarily counter-examples to RA and 
or MA, but just illustrations : our goal is not to 
further criticize RA and MA, but to show that 
problems linked to these "traditional" structural 
approaches do not automatically condemn all 
structural approaches. 
3 Three preference principles based on 
derivation trees 
For sake of brevity, we will not develop the 
importance of "lexical factors", but just note that 
LTAGs are obviously well suited to represent 
that type of preferences because of strong 
lexicalization 5. 
To account for the "idiomatic" vs "literal", and 
for the "argument" vs "modifier" preferences, we 
formulate three parsing preference principles 
based on the shape of derivation trees : 
1. Prefer the derivation tree with the fewer 
number of nodes 
2. Prefer to attach an m-tree low 6 
3. Prefer the derivation tree with the fewer 
number of 13-tree nodes 
Principle 1 takes precedence over principle 2 and 
principle 2 takes precedence over principle 3. 
3 Our examples follow linguistic analyses presented 
in (Abeill6 (91)), except that we substitute sentential 
complements when no extraction occurs. Thus we 
use no VP node and no Wh nor NP traces. But this 
has no incidence on the application of our preference 
principles. 
4 These examples are kept simple on purpose, for 
sake of clarity. 
Also, "lexical preferences" and "structural 
preferences" are not necessarily antagonistic and can 
both be used for practical purpose. 
6 By low we mean "as far as possible from the root". 
586 
3.1 What these principles account for 
Principle 1 accounts for the preference 
"idiomatic" over "literal": In LTAGs, all the set 
elements of an idiomatic expression are present m 
a single elementary tree. Figure 1 shows the 2 
derivation trees obtained when parsing 
"Yesterday John kicked the bucket". The 
preferred one (i.e. idiomatic interpretation) has 
fewer nodes. 
lSf_yesterday (z_John (z.bucket 13.the ~'~X\ 
S N N N 
Adv S* John Bucket Det N* I I 
Yesterday The 
(z-kicked-the-bucket (z-kicked 
S S 
kicked kicked Det N I I 
the buckel 
Elementary trees for \[ 
"Yesterday John kicked the bucket" \] / 
/ 
or-kicked-the-bucket (z-kicked 
(z-John \[3-yesterday (z-John (z-bucket \[3-yesterday I 
~ -the 
~referred derivation tree I IDispreferred derivation tree \[ 
$ 
Adv S 
Yesterday N V N 
John kicked Det N I I 
the bucket 
\[ Both derivation trees yield the same derived tree \[ 
FIGURE 17 
Illustration of Principle 1 
7 In derivation trees, plain lines indicate an, 
adjunction, dotted lines a substitution. 
~N n \[3-the ~xl-Organizer ct-Demonstrafi~m N N N I / / 
John Det N* Organizer Demonstration 
I The 
el-suspects c~2-Organizer 
S N 
N04, V NI4, Organizer PP 
Suspects o~2-suspects P~ep NI4, 
of 
S 
N04, V NI4, PP 
Suspects ~ep ~ 
d 
~1 Elementary trees for I 
I " J°hn 'he °I *="*"°"" \[ / 
al-suspects c¢2-suspects 
J'/'"" "J'" J"i ......................................... 
• / '....11 ./- ..j.s .... 
o~-John~anizer...,......, or.John ~l-Orlanizer ~x-Demonstrationl 
~-the ~x-Demonstration 13.4he 13-the 
I~-the 
l Preferred deflation tree I \[ Di~referred deri,ation tree I 
S $ 
N V N N V N PP 
J0hnsuspects Det IN John Suspects Det N Prep N / /~ / / / /',,. 
The Organizer pp The Organizer of Det N 
the demonstration 
of Det N \[C#'esp'ding&rivedtrees\] 
I I t J the demonstration 
FIGURE 2 
Illustration of Principle 2 
587 
for French (Abeill6 & Candito (99)). We kept 
the1074 grammatical ones (i.e. noted "1" in the 
TSNLP terminology) of category S or augmented 
to S (excluding coordination ) that were accepted. 
A human picked one or more "correct" 
derivations for each sentence parsed 8. Principle 1, 
and then Principles 1 & 2 were applied on the 
derivation trees to eliminate some derivations. 
Table 1 shows the results obtained. 
Total #'of 
Before 
applying 
principles 
1074 
A.~er 
applying 
principlel 
1074 
A~er 
applying 
principles 
l&2 
1074 
sentences 
Total #of 3057 2474 2334 
derivations 
1070 
(99.6 %) 
537 
537 
n.a. 
2.85 
#of 
sentences 
with at 
least 1 
correct 
parse 
#of 
ambiguous 
sentences 
# of non 
ambiguous 
sentences 
1055 
(98.2 %) 
427 
647 
89 
23 
# of 
partially 
disambigua 
ted 
sentences 
# of parses 
/ sentence 
TABLE 1 : results for TSNLP 
1054 
(98.1%) 
424 
650 
86 
2.i7 
4.1 Comments on the results 
ARer disambiguating with principles 1 and 2, the 
proportion of sentences with at least one parse 
judged correct by a human only marginally 
decreased while the average number of parses per 
s More than one derivation was deemed "correct" 
when non spurious ambiguity remained in modifier 
attachment (e.g. He saw the man with a telescope) 
sentence went down from 2.85 to 2.17 (i.e. -24 
%). 
Since "strict modifier attachment" is orthogonal 
to our concem, a sentence such as (f) still yields 
5 derivations, partly because of spurious 
ambiguity, partly because of adverbial 
attachment (i.e. 'qaier" attached to S or to V). 
1l a travailld hier (He worked yesterday) 
Therefore most sentences aren~ disambiguated by 
principles 1 or 2, especially those anchoring an 
intransitive verb. For sentences that are affected 
by at least one of these two principles, the 
average number of parses per sentence goes 
down from 6.76 to 2.94 after applying both 
principles (i.e. - 56.5 %). (Table 2). 
# of 
sentences 
affected by 
at least one 
principle 
# of 
derivations 
# of 
parses/sent 
ence 
Before 
applying 
principles 
189 
1279 
A~er 
applying 
principle 
1 
189 
After 
applying 
principles 
l&2 
189 
6.77 
696 
3.68 
556 
2.94 
TABLE 2 : Results for sentences affected by 
at least one Principle 
4.2 The gap between theory and 
practice 
Surprisingly, Principle 1 was used in only one 
case to prefer an idiomatic interpretation, but 
proved very useful in preferring arguments over 
modifiers : derivation trees with arguments often 
have fewer nodes because of co-heads. For 
instance it systematically favored the attachment 
of "by" phrases as passive with agent, 
Principle 2 favored lower attachment of 
arguments as in (g) but proved useful only in 
conjunction with Principle 1 : it provided further 
disambiguation by selecting derivation trees 
among those with an equally low number of 
nodes. 
588 
Principle 2 says to attach an argument low (e.g. 
to the direct object of the mare verb) rather than 
high (e.g. to the verb). In (el), "of the 
demonstration" attaches to "organizer" rather 
than to "suspect", while m (c2) "of the crime" can 
only attach to the verb. Figure 2 shows how 
principle 2 yields the preferred derivation tree for 
sentence (cl). Similarly, in sentence (dl) "to 
whom" attaches to "say" rather than to "give", 
while in (d2) it attaches to "give" since "think" 
can not take a PP complement. This agrees with 
psycholinguistic results such as "filled gap 
effects" (Cram & Fodor (85)). 
(cl) John suspects the organizer of the 
demonstration 
(c2) John suspects Bill of the crime 
(dl) To whom does Mary say that John 
gives flowers. 
(d2) To whom does Mary think that John 
gives flowers. 
Principle 3 prefers arguments over modifiers. 
Figure 3 shows that principle 3 predicts the 
preferred derivation tree for (e) : "to be honest" 
argument of "prefer", ruling out 'to be honest" as 
sentence modifier (i.e. "To be honest, he prefers 
his daughter"). 
(e) John prefers his daughter to be honest. 
These three principles aim at attaching arguments 
as accurately as possible and do not deal with 
"strict" modifier attachment for the following 
reasons : 
• There is a lack of agreement concerning the 
validity of preferences principles for 
"modifier attachment" 
• Principle 3, which deals the most with 
modifier attachment, turned out the least 
conclusive when confronted to empirical data 
• We wanted to evaluate how attaching 
arguments correctly affects ambiguity, all 
other factors remaining unchanged. 
4 Some results 
French sentences from the test suite developed in 
the TSNLP project (Estival & Lehman (96)) 
were originally parsed using Xtag with a domain 
independent wide-coverage grammar 
/- a-John a-daughter 
N N I I 
John daughter 
al-Prefer 
~-his a-honest 
N Adj 
Det N* Honest 
I 
a2-Prefer 
S S 
I I P~ff~ P~ 
~z-Be I~-Be 
Vinf S 
i rep Vinf' S* P~p Vinf' 
to V Adj~ to "~ 
I I Be Be 
Elementary trees I 
'Johnprefers his daughter to be honest" \]/ 
I ! ! I ...I" U U 
al-Prefer 
..y....,Y '--.. ,. 
a-John a~a~ter ~-1~1 
~-Im ~-honest 
~referredderivation'tree\[ 
S 
ct2-Prefer 
w-John a~a~Jllter ~-Be I- I 
~-his a-honest 
\[ Dispreferred derivation tree \[ 
S 
N V \] I A /~ N Vinf /~ P~ep Vinf' ~Adj 
JolmPrefers Det N PrepVinf' N V NTo 
his daughter to V Adi John Prefers Det N be honest //" I I 
Be Honest His Daughter 
\] Correspondingderivedtrees, \] 
FIGURE 3 
Illustration of Principle 3 
589 
(g)- L 7ng~nieur obtient l 'accord de 1 'entreprise 
(The engineer obtains the agreement of the 
company/from the company) 
Principle 3 did not prove as useful as the two 
others : first, it aims at favoring arguments over 
modifiers, but these cases were already handled 
by Principle 1 (again because of co-heads). 
Second, it consistently made wrong predictions 
in cases oflexical ambiguity (e.g it favored "&re" 
as a copula rather than as an auxiliary, although 
the auxiliary is much more common in French.). 
Therefore we have postponed testing it until 
further refinement is found. 
5 Conclusion 
We have presented three application-independent, 
domain-independent and language-independent 
disambiguation principles formulated in terms of 
derivation trees within the framework of LTAGs. 
But since they are straightforward to implement, 
these principles can be used for parse ranking 
applications or integrated into a parser to reduce 
non determinism. Preliminary results are 
encouraging as to the soundness of at least two of 
these principles. Further work will focus on 
testing these principles on larger corpora (e.g. Le 
Monde) as well as on other languages, refining 
them for practical purposes (e.g. addition of 
frequency information and principles for 
modifiers attachment). Since it is the first time to 
our knowledge that parsing preferences are 
formulated in terms of derivation trees, it would 
also be interesting to see how this could be 
adapted to dependency-based parsing. 

References 
Abeill6 /L (1991) Une grammaire lexicalisde 
d'arbres adjoints pour le franfais. Phi) 
dissertation.. Universit6 Paris 7. 
Abeill~ A., Candito M.H. (1999) P~AG : A LTAG 
for French. In Tree Adjoining Grammars. Abeill6, 
Rambow(eds). CSLI, Stanford. 
Abney S. (1989) A computational model of human 
parsing. Journal of psycholinguistic Research, 18, 
129-144. 
Britt M, Perfetti C., Garrod S, Rayner K. (1992) 
Parsing and discourse : Context effects and their 
limits. Journal of memory and language, 31, 293- 
314. 
Brysbaert M., Mitchell D.C. (1996) Modifier 
Attachment in sentence parsing : Evidence from 
Dutch. Quarterly journal of experimental 
psychology, 49a, 664-695. 
Crain S., Fodor J.D. (1985) How can grammars help 
parsers? In Natural language parsing .. 
94-127. D. Dowty, L. Kartttmen, A. Zwicky (eds). 
Cambridge University Press. 
Cuetos F., Mitchell D.C. (1988) Cross linguistic 
differences in parsing : restrictions on the use of 
the Late Closure strategy in Spanish. Cognition, 
30,73-105. 
Doran C., Egedi D., Hockey B.A., Srinivas B., 
Zaidel M. (1994))(tag System- a wide coverage 
grammar for English. COLING'94. Kyoto. Japan. 
Estival D., Lehman S (1997) TSNLP: des jeux de 
phrases testpour le TALN, TAL 38:1, 115-172 
Ferreira F. Clifton C. (1986) The independence of 
syntactic processing. Journal of Memory and 
Language, 25,348-368. 
Frank R. (1992) Syntactic Locality and Tree 
Adjoining Grammar : Grammatical Acquisition 
and Processing Perspectives. PhD dissertation. 
University of Pennsylvania. 
Frazier L, Fodor J.D. (1978) "The sausage machine" 
: a new two stage parsing model. Cognition 6. 
Gibbs R., Nayak (1989) Psycholinguistic studies on 
the syntactic behaviour of idioms. Cognitive 
Psychology, 21, 100-138. 
Hindle D. Rooth M. (1993) Structural ambiguity and 
lexical relations. Computational Linguistics, 19, 
pp. 103-120. 
Joshi A. (1990) Processing crossed and serial 
dependencies : an automaton perspective on the 
psycholinguistic results. Language and cognitive 
processes, 5:1, 1-27. 
Kimball J. (1973) Seven principles of surface 
structure parsing in natural language. Cognition 
2. 
Schubert L. (1984). On parsing preferences. 
COLING'84, Stanford. 247-250. 
Srinivas B., Doran C., Kulick S. (1995) Heuristics 
and Parse Ranking. 4 th international workshop on 
Parsing Technologies.. Prag. Czech Republic. 
Xtag group (1995) A LTAG for English. Technical 
ReportlRCS 95-03. University of Pennsylvania. 
