Complexity of Description of Primitives: Relevance to 
Local Statistical Computations 
Aravind K. Joshi and B. Srinivas 
Department of Computer and Information Science 
University of Pennsylvania 
Philadelphia, PA 19104, USA 
{joshi, srini} @linc.cis.upenn.edu 
Introduction 
In this paper we pursue the idea that by making the 
descriptions of primitive items (lexical items in the lin- 
guistic context) more complex, we can make the com- 
putation of linguistic structure more local 1. The idea is 
that by making the descriptions of primitives more com- 
plex, we can not only make more complex constraints 
operate more locally but also verify these constraints 
more locally. Statistical techniques work better when 
such localities are taken into account 2 Of course, there 
is a price for making the descriptions of primitives more 
complex. The number of different descriptions for each 
primitive item is now much larger than when the de- 
scriptions are less complex. For example, in a lexical- 
ized tree-adjoining grammar (LTAG), the number of 
trees associated with each lexical item is much larger 
than the number of standard parts-of-speech (POS) as- 
sociated with that item. Even when the POS ambiguity 
is removed the number of LTAG trees associated with 
each item can be large, on the order of 10 trees in the 
current English grammar in the XTAG system s. This 
is because in LTAG, roughly speaking, each lexical item 
1 Let ~ be the alphabet consisting of the names of el- 
mentary trees in an LTAG. Then ~* is the set of all strings 
over this alphabet including the null string. The tree 71 
and 7~ in a string of tree names axe said to be ~*-local if 
they are separated by any string in ~*. For brevity, we will 
continue to use the term local instead of the term ~*-local. 
2The work described here is completely different from 
the work reported in (Resnik, 1992) and (Schabes, 1992) 
concerning stochastic TAGs. 
3See Section on Data Collection 
is associated with as many trees as the numb~,r of dif- 
ferent syntactic contexts in which the iexical item can 
appear. This, of course, increases the local ambiguil.y 
for the parser. The parser has to decide which com- 
plex description (LTAG tree) out of the set of descrip- 
tions associated with each lexical item is to be used for 
a given reading of a sentence, even before combining 
the descriptions together. The obvious solution is to 
put the burden of this job entirely on the parser. The 
parser will eventually disambiguate all the descriptions 
and pick one per object, for a given reading of the sen- 
tence. This is what the parser is expected to do for dis- 
ambiguating the standard POS, unless a separate POS 
disambiguation module is used (Church, 1988). Many 
parsers, including XTAG, use such a module ('alh'd a 
POS tagger. 
LTAGs present a novel opportunity to reduce the 
amount of disambiguation done by the parser. We 
can treat the LTAG trees associated with each lexic'al 
item as more complex parts-of-speech which we call su- 
pertags. In this paper, we report on some experiments 
on direct supertag disambiguation, without parsing in 
the strict sense, using lexical preference and local lexi- 
cal dependencies (acquired from a corpus parsed by the 
XTAG system). The information extracted from the 
XTAG-parsed corpus contains, for each item and its 
supertag, a probability distribution of the distances of 
other items and their supertags that are expectcd by it.. 
We have devised a method somewhat akin to tile stare 
dard POS tagger that disambiguates supertags without 
53 
doing any parsing. 
'File idea of using complex descriptions for primitives 
to capture constraints locally has some precursors in AI. 
For example, the Waltz algorithm (Waltz, 1975) for la- 
I)eling vertices of polygonal solid objects can be thought 
of in these terms, although it is not usually described 
in this way. There is no statistical computations in the 
Waltz algorithm, however. The supertag disambigua- 
tion experiments, as far as we know, are the first to 
use these ideas in the linguistic context. Of course, we 
:ds(~ show how the supertag disambiguation naturally 
lends itself to the application of statistical techniques. 
I1, tl,, lbllowing sections we will briefly describe our 
approach and some preliminary results of supertag dis- 
ambiguation as an illustration of our main theme: the 
relationship of the complexity of descriptions of primi- 
tives to local statistical computations. A more complete 
analysis of this technique and experimental results will 
eventually be reported elsewhere. 
Lexicalized Tree Adjoining Grammars 
l,exicalized Tree Adjoining Grammar (LTAG) is a lex- 
icalized tree rewriting grammar formalism (Schabes, 
1990). The primary structures of LTAG are called EL- 
EMEN'FARY TREES. Each elementary tree has a lexi- 
cal item (anchor) on its frontier and serves as a com- 
plt~x description of the anchor. An elementary tree 
provides a domain of locality larger than that pro- 
vided by CFG rules over which syntactic and semantic 
(predicate-argument) constraints can be specified. El- 
ementary trees are of two kinds: INITIAL TREES and 
AUXI,,IARY TREES. Examples of initial trees (as) and 
~u\]xi\[iary trees (,Ss) are shown in Figure 1. Nodes on 
th(. frontier of initial trees are marked as substitution 
sites by a '~', while exactly one node on the frontier 
~)\[" an auxiliary tree, whose label matches the label of 
the root of the tree, is marked as a foot node by a '.'. 
'l'hv other nodes on the frontier of an auxiliary tree are 
marked as substitution sites. LTAG factors out recur- 
si()n f,-om the statement of the syntactic dependencies. 
Eh,n,,,,,tary tr~,es (initial and auxiliary) are the domain 
I;,r sp,,cifying dependencies. Recursion is specified via 
i,h~" auxiliary trees. 
Hcm('nt.ary trees are combined by Substitution and 
Adjuncti,)n operations. Substitution inserts elemen- 
l;iry I.i',.,~s at the substitution nodes of other elementary 
trees. Adjunction inserts auxiliary trees into elemen- 
tary trees at the node whose label is the same as the 
root label of the auxiliary tree. As an example, the 
component trees ( as, c~2, aa, c~4,/38, as, as), shown in 
Figure 1 can be combined to form the parse tree for the 
sentence John saw a man with the telescope 4 as follows: 
1. ors substitutes at the NP0 node in a2. 
2. aa substitutes at the DetP node in c~4, the result of 
which is substituted at the NP1 node in c~. 
3. a5 substitutes at the DetP node in as, the result of 
which is substituted at the NP node in/3s. 
4. The result of step (3) above adjoins to the VP node 
of the result of step (2). The resulting parse tree is 
shown in Figure 2. 
The process of combining the elementary trees that 
yield a parse of the sentence is represented by the 
derivation tree, shown in Figure 2. The nodes of the 
derivation tree are the tree names that are anchored by 
the appropriate lexical item. The composition opera- 
tion is indicated by the nature of the arcs-broken line 
for substitution and bold line for adjunction-while the 
address of the operation is indicated as part of the node 
label. The derivation tree can also be interpreted as a 
dependency graph with unlabeled arcs between words 
of the sentence as shown in Figure 2. 
We will call the elementary trees associated with each 
lexical item super part-of-speech tags or supertags. 
4The parse with the PP attached to the NP has not been 
shown. 
54 
NP 
DetP $ N 
John 
Sr 
V NIp 
I 
C~2 
DetP 
I 
D 
J 
I 
~3 
NP 
DetP,I, N NS A 
P NIg I I 
mn .kh 
~4 ~01 
DetP 
I 
D 
I 
the 
of 5 
NP 
DetP $ N 
I 
telescope 
N, 
\] 
John 
Sr 
~t\]~ vp 
HA IA 
I 
maw 
~7 
DetP r 
D D~Pp 
I 
i 
A ~ 
HA P NK 
I 
Det P • 
D DetPf 
I 
the 
N, 
N 
tele~:ope 
~7 
N 
I 
John 
0C8 
A v ~ 
I 
DetP 
A 
D Dd, 
I 
8 
I 
N I 
m 
vP IT 
P NI~I, 
I wUh 
DetP 
I 
the 
0~9 ~10 O~ll ~8 ~12 
NP I 
N 
I 
telescope 
Or18 
Figure 1: Elementary trees of LTAG 
55 
S, 
NP 
I 
N 
I 
John 
VP 
V NP 
mw DetP N 
I I 
D ~n 
I 
i 
PP 
P NIP 
with I~tP N 
I I 
I d~ 
~..,2\[Mw\] 
a_8\[John\] (1) p~8\[with\] (2) a_4\[man\] (2.2) 
I ! a_6\[telescope\] (2.2) a.fl\[a\] (1) 
I 
I 
a_S\[the\] (1) 
Parse Tree Derivation Tree 
MW 
J-----7-----------.... John with 
I 
telueope 
I 
I 
a 
Dependency Graph 
Figure 2: Structures of LTAG 
Example of Supertagging 
As a result of localization in LTAG, a lexical item may 
be associated with more than one supertag. The ex- 
ample in Figure 3 illustrates the initial set of supertags 
a.,~sigm~d to each word of the sentence John saw a man 
with the telescope. The order of the supertags for each 
h'xi~'al item in tile example is completely irrelevant. 
I"iglire 3 also shows the final supertag sequence assigned 
I,y the s.pertagger, which picks the best supertag se- 
q.,,mlce .sing statistical information (described in the 
v.,,x! s,,cl.i(m) ahout individual supertags and their de- 
p,'mh'm:i~s on other supertags. The chosen supertags 
axe combined to derive a parse, as explained in the pre- 
vious section. 
The parser without the supertagger would have to pro- 
cess combinations of the entire set of 28 trees; the parser 
with it need only process combinations of 7 trees. 
Dependency model of Supertagging 
One might think that a n-gram model of standard POS 
tagging would be applicable to supertagging as well. 
However, in the n-gram model for standard POS tag- 
ging, dependencies between parts-of-speech of words 
that appear beyond the n-word window cannot be incor- 
porated into the model. This limitation does not have 
56 
Sentence: 
Initial Supertag set: 
Final Assignment: 
John saw a, man with the telescope. 
~2 a7 ~3 ~4 ~S & fir 
~8 if9 fflO ~11 flS ff12 ~i3 
~8 ~2 ~3 ~4 ~8 ~5 06 
Figure 3: Supertag Assignment for John saw a man with the telescope 
a significant effect on the performance of a standard 
trigram POS tagger, since it is rare for dependencies 
to occur between POS tags beyond a three-word win- 
dow. However, since dependencies between supertags 
do not occur in a fixed sized window, the n-gram model 
is unsuitable for supertagging. This limitation can be 
overcome if no a priori bound is set on the size of the 
window, but instead a probability distribution of the 
distances of the dependent supertags for each supertag 
is maintained. A supertag is dependent on another 
supertag if the former substitutes or adjoins into the 
later. 
Experiments.and Results 
Table (1) shows the data required for the dependency 
model of supertag disambiguation. Ideally each entry 
would be indexed by a (word, supertag) pair but, due 
to sparseness of data, we have backed-off to a (POS, 
supertag) pair. Each entry contains the following infor- 
mation. 
• POS and Supertag pair. 
• List of + and -, representing the direction of the 
dependent supertags with respect to the indexed su- 
pertag. (Size of this list indicates the total number 
of dependent supertags required.) 
• Dependent supertag. 
• Signed number representing the direction and the or- 
dinal position of the particular dependent supertag 
mentioned in the entry from the position of the in- 
dexed supertag. 
• A probability of occurrence of such a dependency. 
The sum probability over all the dependent supcrt:ags 
at all ordinal positions in the same direction is one. 
For example, the fourth entry in the Table 1 reads 
that the tree a2, anchored by a verb (V), has a left, 
and a right dependent (-, +) and the first word to 
the left (-1) with the tree as serves as a dependent of 
the current word. The strength of this association is 
represented by the probabilit3/0.300. 
The dependency model of disambiguation works as 
follows. Suppose a2 is a member of the set of supertags 
associated with a word at position n in the sentence. 
The algorithm proceeds to satis|~ the dependency re- 
quirement of a2 by picking up the dependency entries 
for each of the directions. It picks a dependency data 
entry (fourth entry, say) from the database that is in- 
dexed by a2 and proceeds to sct up a path with the 
first word to the left that has the dependent supertag 
(as) as a member of its set of supertags. If the first. 
word that has as as a member of its set of supertags 
is at position m, then an arc is set up between c~ and 
as.. Also, the arc is verified so that it does not kite- 
string-tangle s with any other arcs in the path up to 
a2. The path probability up to a2 is incremcntcd by 
log0.300 to reflect the success of the match. The path 
probability up to as incorporates the unigram proba- 
bility of as. On the other hand, if no word is found 
that has as as a member of its set of supertags then 
the entry is ignored. A successflH supertag sequence is 
one which assigns a supertag to each position such that 
STwo arcs (a,c) and (b,d) kite-string-tangle if a < b < 
c<dorb<a<d<c. 
57 
(P.O.S,Supertag) 
(D,as) 
Dire'cti'on of 
Dependent 
Supertag () 
Dependent 
Supertag 
Ordinal 
position Prob 
() 
(-) a3 -1 0.999 
(-, +) -1 0.300 
(V,o 2) (-, +) 1 0.374- 
Table 1: Dependency Data 
,'m'h supertag has all of its dependents and maximizes 
the accumulated path probability. The direction of the 
dcp~mdcmt supertag and the probability information are 
us¢.,d t.o prune the search. A more detailed and formal 
description of this algorithm will appear elsewhere. 
"l'l/t. implementation and testing of this model of su- 
I,,'rl.ag disanlbiguation is underway. Preliminary exper- 
ilm,ld.s oil short fragments show a success rate of 88% 
i.e.a, sequence of correct supertags is assigned. 
Data Collection 
The data needed for disambiguating supertags (Sec- 
t.ion ) have been collected by parsing the Wall Street 
Journal s. IBM-manual and ATIS corpora using the 
wide-cow:rag c English grammar being developed as 
part of the XTAG system (XTAG Tech. Report, 1994). 
The parses generated for these sentences are not sub- 
.iectcd to any kind of filtering or selection. All the 
derivation structures are used in the collection of the 
sta.l.istics. 
XTAG is a large ongoing project to develop a wide- 
cov,.rage grammar for English, based on the LTAG for- 
realism. It also serves as an LTAG grammar devel- 
olnuent system and includes a predictive left-to-right 
parser, a morphological analyzer and a POS tagger. 
The wide-coverage English grammar of the XTAG sys- 
t,.m contains 317,000 inflected items in the morphology 
(21;L000 h~r nouns amt 46,500 for verbs among others) 
and 37,00(I eul.ries in the syntactic lexicon. The syntac- 
tic h,xicon associates words with the trees that they an- 
,'l,,r. There arc 385 l.rt'cs in all, in the grammar which 
is ,',,,Ul,.scd of 411 dilG'rcut sul~catcgorization frames. 
'~S~.ntuuces of length <_ 15 words. 
Each word in the syntactic lexicon, on the average, de- 
pending on the standard POS of the word, is an anchor 
for about 8 to 40 elementary trees. 
Conclusion 
In this paper we have shown that increasing the com- 
plexity of descriptions of primitive objects, lexical items 
in the linguistic context, enables more complex con- 
straints to be applied locally. However, increasing the 
complexity of descriptions greatly increases the num- 
ber of such descriptions for the primitive object. In a 
lexicalized grammar such as LTAG each lexical item is 
associated with complex descriptions (supertags) on the 
average of 10 descriptions. A parser for LTAG, given 
a sentence, disambiguates a large set of supertags to 
select one supertag for each lexical item before combin- 
ing them to derive a parse of the sentence. We have 
presented a new technique that performs the disam- 
biguation of supertags using local information such as 
lexical preference and local lexical dependencies as an 
illustration of our main theme of the relationship of 
complexity of descriptions of primitives to local statis- 
tical computations. This technique, like POS disam- 
biguation, reduces the disambiguation task that needs 
to be done by the parser. After the disambiguation, we 
have effectively completed the parse of the sentence and 
the parser needs 'only' to complete the adjunctions and 
substitutions. 

References 
Kenneth Ward Church. 1988. A Stochastic Parts Pro- 
gram and Nouu Phrase Parser lbr Unrestricted Text. 
In gnd Applied Natural Language Processing Confer- 
ence 1988. 
Philip Resnik. 1992. Probabilistic Tree-Adjoining 
Grammar as a Framework for Statistical Natural 
Language Processing Proceedings of the Fourteenth 
International Conference on Computational Linguis- 
tics (COLING '9~), Nantes, France, July 1992 
Yves Schabes, Anne Abeill~, and Aravind K. Joshi. 
1988. Parsing strategies with 'lexicalized' grammars: 
Application to tree adjoining grammars. In Proceed- 
ings of the 12 ta International Conference on Compu- 
tational Linguistics (COLING'88), Budapest, Hun- 
gary, August. 
Yves Schabes. 1990. Mathematical and Computational 
Aspects of Lexicalized Grammars. Ph.D. thesis, Uni- 
versity of Pennsylvania, Philadelphia, PA, August 
1990. Available as technical report (MS-CIS-90-48, 
LINC LAB179) from the Department of Computer 
and Information Science. 
Yves Schabes. 1992. Stochastic Lexicalized Tree- 
Adjoining Grammars Proceedings of the Fourteenth 
International Conference on Computational Linguis- 
tics (COLING 'g2), Nantes, France, July 1992. 
David Waltz. 1975. Understanding Line Drawings of 
Scenes with Shadows in Psychology of Computer Vi- 
sion by Patrick Winston, 1975. 
XTAG Technical Report. 1994. Department of Com- 
puter and Information Sciences, University of Penn- 
sylvania, Philadelphia, PA. In progress. 
