I 
! 
I 
I 
I 
i 
I 
I 
i 
I 
I 
I 
| 
i 
I 
I 
I 
i 
I 
Integration of syntactic and lexical information in a hierarchical 
dependency grammar 
Cristina Barbero and Leonardo Lesmo and Vincenzo Lombardo 
Dipartimento di Informatica 
Universit~ di Torino - Italy 
Paola Merlo 
Universit6 de Gen~ve - Switzerland 
IRCS - University of Pennsylvania 
Abstract 
In this paper, we propose to introduce syntactic 
classes in a lexicalized dependency formalism. Sub- 
categories of words are organized hierarchically from 
a general, abstract level (syntactic categories) to a 
word-specific level (single lexical items). The formal- 
ism is parsimonious, and useful for processing. We 
also sketch a parsing model that uses the hierarchi- 
cal mixed-grain representation to make predictions 
on the structure of the input. 
1 Introduction 
Much recent work in linguistics and computational 
linguistics emphasizes the role of lexical information 
in syntactic representation and processing. 
This emphasis given to the lexicon is the result 
of a gradual process. The original trend in linguis- 
tics has been to individuate categories of words hav- 
ing related characteristics - the traditional syntactic 
categories like verb, noun, adjective, etc. - and to 
express the structure of a sentence in terms of con- 
stituents, or phrases, built around these categories. 
Subsequent considerations lead to a lexicalization of 
grammar. Linguistically, the constraints expressed 
on syntactic categories are too general to explain 
facts about words - e.g. the relation between a verb 
and its nominalization, "destroy the city" and "de- 
struction of the city" - or to account uniformly for a 
number of phenomena across languages - e.g. pas- 
sivization. In parsing, the use of individual item 
information reduces the search space of the possi- 
ble structures of a sentence. From a mathematical 
point of view, lexicalized grammars exhibit proper- 
ties - like finite ambiguity (Schabes, 1990) - that 
are of a practical interest (especially in writing real- 
istic grammars). Dependency grammar is naturally 
suitable for a lexicalization, as the binary relations 
representing the structure of a sentence are defined 
with respect to the head (that is a word). 
Pure lexicalized formalisms, however, have also 
several disadvantages. Linguistically, the abstract 
level provided by syntactic rules is necessary to avoid 
the loss of generalization which would arise if class- 
level information were repeated in all lexical items. 
In parsing, a predictive component is required to 
guarantee the valid prefiz property, namely the ca- 
pabifity of detecting as soon as possible whether a 
substring is a valid prefix for the language defined 
by the grammar. Knowledge of syntactic categories, 
which does not depend on the input, is needed for a 
parser to be predictive. 
In this paper we address the problem of the in- 
teraction between syntactic and lexical information 
in dependency grammar. We introduce many inter- 
mediate levels between lexical items and syntactic 
categories, by organizing the grammar around the 
notion of subcategorizetion. Intuitively, a subcat- 
egorization frame for a lexical item L is a specifi- 
cation of the number and type of elements that L 
requires in order, for ml utterance that contains L, 
to be well-formed. For example, within the syntac- 
tic category VERB, different verbs require different 
numbers of nominal dependents for a well-formed 
sentence. In Italian (our case study), an intransi- 
tive verb such as dormirv, "sleep", subcategorizes for 
only one nominal element (the subject), while a tran- 
sitive verb such as baciare, "kiss", subcategorizes for 
two nominal elements (the subject and the object) 
1. Grammatical relations such as subject and object 
are primitive concepts in a dependency paradigm, 
i.e. they directly define the structure of the sen- 
tence. Consequently, the dependency paradigm is 
particularly suitable to define the grammar in terms 
of constraints on subcategorization frames. 
Our proposal is to use subcategories organized in 
a hierarchy: the upper level of the hierarchy corre- 
sponds to the syntactic categories, the other levels 
correspond to subcategories that are more and more 
1We include the subject relation in the subcategorization, 
or valency, of a verb - cf. (Hudson, 1990) (Mel'cuk, 1988). 
In most constituency theories, on the contrary, the subject is 
not part of the valency of a verb. 
58 
specific as one descends the hierarchy. This repre- 
sentation is advantageous because of its compact- 
ness, and bemuse the hierarchical mixed-grained or- 
ganization of the information is useful in processing. 
In fact, using the general knowledge at the upper 
level of the hierarchy, we can make predictions on 
the structure of the sentence before encountering the 
lexical head. 
Hierarchical formalisms have been proposed in 
some theories. Pollard and Sag (1987) suggested 
a hierarchical organization of lexical information: 
as far as subcategorization is concerned, they in- 
troduced a "hierarchy of lexical types". A specific 
formalisation of this hierarchy has never reached a 
wide consensus in the HPSG community, but sev- 
eral proposals have been developed - see for example 
(Meurers, 1997), that uses head subtypes and lexical 
principles to express generalizations on the valency 
properties of words. 
Hudson (1990) adopts a dependency approach and 
uses hierarchies to organize different kinds of lin- 
guistic information, for instance a hierarchy includ- 
ing word classes and lexical items. The subcatego- 
rization constraints, however, are specified for each 
lexical item (for instance STAND -4 STAND-intrans, STAND-trans): 
this is highly redundant and misses 
important generalizations. 
In LTAG (Joshi and Schabes, 1996), pure syntac- 
tic information is grouped around shared subcatego- 
rization constraints (tree families). Hierarchical rep- 
resentations of LTAG have been proposed: (Vijay- 
Shanker and Schabes, 1992), (Becker, 1993), (Evans 
et al., 1995), (Candito, 1996), (Doran et al., 1997). 
However, none of these works proposes to use the hi- 
erarchical representation in processing - just Vijay- 
Shanker and Schabes (1992) mention, as a possible 
future investigation, the definition of parsing strate- 
gies that take advantage of the hierarchical repre- 
sentation. 
The goal of our hierarchical formalism is twofold. 
On one side, we want to provide a hierarchical orga- 
nization to a lexicalized dependency formalism: sim- 
ilarly to the hierarchical representations of LTAG, 
the aim is to solve the problems of redundancy and 
lexicon maintenance of pure lexicalized approaches. 
On the other side, we want to explore how a hierar- 
chical formAllgm can be used in processing in order 
to get the maximum benefit from it. 
The paper is organized as follows: in section 2 we 
describe a lexiealized dependency formalism that is a 
simplified version of (Lombardo and Lesmo, 1998). 
Starting from this formalism, we define in section 
3 the hierarchy of subcategories. In section 4, we 
sketch a parsing model that uses the hierarchical 
grammar. In section 5, we describe an application 
of the formalism to the classification of 101 Italian 
verbs. Section 6 concludes the paper. 
2 A dependency formalism 
The basic idea of dependency is that the syntac- 
tic structure of a sentence is described in terms of 
binary relations (dependency relations) on pairs of 
words, a head (or parent), and a dependent (daugh- 
ter), respectively; these relations form a tree, the de- 
pendency tree. In this section we introduce a formal 
dependency system, which expresses the syntactic 
knowledge through dependency rules. The grammar 
and the lexicon coindde, since the rules are lexical- 
ized: the head of the rule is a word of a certain cate- 
gory, namely the lexical anchor. The formalism is a 
shnplified version of (Lombardo and Lesmo, 1998); 
we have left out the treatment of long-distance de- 
pendencies to focus on the subcategorization knowl- 
edge, which is to be represented in a hierarchy. 
A dependency grammar is a five-tuple <W,C,S,D, 
H>, where 
W is a finite set of words of a natural language; 
C is a finite set of syntactic categories; 
S is a non-empty set of categories (S _C C) that can 
act as head of a sentence; 
D is the set of dependency relations, for instance 
SUB J, OBJ, XCOMP, P-OB3, PRED; 
H is a set of dependency rules of the form 
z:X (<raYl> ... <ri-l~-l> # <ri+l~+l> ... 
<rmYrn>) 
1) z E W, is the head of the rule; 
2) X E C, is its syntactic category; 
3) an dement <rjYj> is a d-pair (which descri- 
bes a dependent); the sequence of d-pairs, in- 
eluding the special symbol # (representing the 
linear position of the head), is called the d-pair 
sequence. We have that 
3a) rj E D, j E {1,...,i - 1,i + 1 .... ,rn}; 
3b) Y~ ~ C,j ~ {1,...,i-l,i+l,...,m}; 
Intuitively, a dependency rule constrains one node 
(head) and its dependents in a dependency tree: the 
d-pair sequence states the order of elements, both 
the head (# position) and the dependents (d-pairs). 
The grammar is lexicalized, because each depen- 
dency rule has a lexieal anchor in its head (z:X). 
A d-pair <riYi> identifies a dependent of category 
Yi, connected with the head via a dependency rela- 
tion rl. 
As an example, consider the grammar 2: 
G--< 
W : {gli, un, amici, eroe, lo, credevano} 
2We use Italian terms to label grammatical relations - 
see table 1. Since subcategorization frames are language- dependent, we prefer to avoid confusions due to different ter- 
minology across languages. For example, the relation Ter- 
mine - see the caption of figure 4 - actually corresponds to the 
indirect object in English. However l-Objundergoes the dou- 
ble accusative transformation into Obj, while Termine does 
not. 
59 
Figure 1: Dependency tree of the sentence Gg arnici io 
credevano un eroe, "The friends considered him a hero", 
given the grammar G. The word order is indicated by 
the numbers 1, 2,... associated with the nodes - am/c/, 
~riend', is a left dependent of the head, as it precedes 
the head in the linear order of the input string, eroe, 
"hero', is a right dependent. 
C: {VERB,.NOUN, DETERM} 
S : {VERB} 
D : {SOGG, OGG, PRED, SPEC} 
H>, 
where H includes the following dependency rules: 
I. gli: DETERM (#); 
2. un: DETERM (#); 
3. amici: NOUN (<SPEC DETERM> #); 
4. eroe: NOUN (<SPEC DETERM> #); 
5. lo: PRON (#); 
6. credevano: VERB (<SOGG NouN> <OGG 
PRON> # <PRED NOUN>); 
By applying the rules of the grammar, we obtain 
the dependency tree in-figure 1 .for the sentence Gli 
arnici lo credevano un eroe, '~he friends considered 
him a hero". 
3 A hierarchy of subcategories 
The formalization of dependency grammar illus- 
trated above, like all lexicalizations, suffers from the 
problem of redundancy of the syntactic knowledge. 
In fact, for each w E W, a different rule for each 
configuration of the dependents for which w can act 
as a head must be included in the lexicon. Some 
tool is required to represent lexical information in a 
compact and perspicuous way. We propose to rem- 
edy the problem of redundancy by using a hierarchy 
of subcategorization frames. 
3.1 A basic hierarchy 
The description of the dependency rules is given on 
the basis of a hierarchy of subcategories, each of 
which has a subcategorization frame associated 3 
Each subcategorization frame is, in turn, a compact 
representation of a set of dependency rules. The for- 
real definition of the hierarchy is the following. 
A subcategorization hierarchy is a 6-tuple <T, L, D, 
Q,F, --<r>, where: 
T is a finite set of subcategorie.r, 
L is a mapping between W (the words, defined in the 
Sin this paper we focus our attention to verbal subcatego- 
rization frames. 
grammar) and sets of subcategories, L : W --~ 2 T- 
{}. That is, each word can "belong" to one or mo- 
re subcategories; 
D is a set of dependency relations (as in section 2); 
Q is a set of subcategorization frames. Each subcate- 
gorization frame is a total mapping q : D -4 Rx 
2 T, where R is the set of pairs of natural numbers 
<nl,n~> such that nl _> 0,n2 _> 0 and nl ~ n2; 
F is a bijection between subcategories and subcatv- 
gorization frames, F : T -4 Q; 
--T is an ordering relation among subcategories. -- 
In order to define _<T, we need some notation: 
N~(d), where q E Q and d E D, is the first element of 
q(d), i.e. the number restr/ct/ons associated with 
the relation d in the subcategorization frame q. 
Vq (d), where q E Q and d E D, is the second dement 
of q(d), i.e. the value restrictions associated with 
In the relation d in the subcategorization frame q. tuitively, Nq(d) is the number of times the depen- 
dency relation d can be instantiated according to the 
subcategorization frame q; Vq (d) is the set of subcat- 
egories that can be in relation d with a subcategory 
having q as a subcategorization frame. 
Let _<a, be an order relation of number restrictions; 
given two pairs of natural numbers R, and R2, 
R, < R,, R2 iff 
rain(R,) > rain(R2) ^ maz(R,) < maz(R2) 
namely, the range RI is inside the range R2. 
Let -<av be an order relation of value restrictions; 
given two sets of subcategories V\] and V2, 
V~ _<av V~ iff V~ C_ V2 
Now, we can say that, for each h, t~ E T: 
tl ~--T t2 iff 
VdED 
(NF(t,)(d) ~R~ NF(t2)(d) A 
(Vr(t,)(d) <--Rv VF(t2)(d)) 
The relation --<T is a partial order on T. If we as- 
sume the existence of a most general element TOP, 
it can act as the root of a hierarchy defined on -----r. 
In the definitions above, each subcategory in the 
hierarchy defined by _<r is associated, through F, 
with a subcategorization frame. So, through L and 
F, each word in the lexicon is associated with one 
or more subcategorization frames. Actually, lexical 
ambiguity is due to L since F is a bijection. 
In the rest of this section we show that each subcate- 
gorization frame q defines a set of dependency rules, 
in the sense nsed in section 2 for the formal defini- 
tion of the grammar. In this way, we get that the 
hierarchy specifies a correspondence between words 
and rules. Moreover, we show that the hierarchy 
acts as a taxonomy: given that rules(t,) C H is the 
set of dependency rules whose head is the syntactic 
category t,, we have that 
60 
Vtz, t~ E T Vdr E H 
(t, <_T t2 ^ dr ~ rules(h) --~ dr ~ rules(t~)) 
In order to specify the correspondence between sub- 
categorization frames and dependency rules, we first 
define 
= {ml m = \[< d,t > I t e V0(d)\] ^ 
minNq(d) < Card(m) < maxNq(d)} 
Given a subcategorization frame q and a relation d, 
Depq(d) is the set of all multisets of pairs < d, t >, 
where t is a subcategory E Vq(d). The multisets 
come from the fact that the same relation can be 
instantiated many times (depending on the range). 
In order to compute the sets of dependency relations 
that the subcategorization frame includes, we form 
the cartesian product of the various Depe(d): 
Carte = I\]aeD Depq(d) 
and we evaluate the union of each member of Carte; 
each of them is extended by including the special 
symbol #: 
DepSet, = {m I m = (U.es, sec°.t.s) U {#}} 
where the union is a mukiset union, preserving du- 
plications. Finally, by picking all the permutations 
of each member of DepSet¢, we get the set of rules 
(also called subcategorization patterns): 
Rulesq = {rJ r E Permute(m) A m 6 DepSetq} 
An example should make clear how the above defi- 
nitions work. Let's assume that 
D = {so9g, ogg , o~ml~} 
q = {<sogg, <<z, z>, {N}>>, 
<099, <<0, z>, {N, C}>>, 
<compL <<0, 2>, {P}>>} 
(where C is short for CHESUB - subordinating con- 
junction - and P for PREP). 
Then we have: 
Depq(so#9) = { { <sogg, N> } } 
Depq(ogg) = {{}, {<ogg, N>}, {<099 , C>}} 
Dep (co.. ) = {{}, {<co pZ, P>}, 
{<c~.pL P>, <com~, P>}} 
Cartq = 
{ <{<8ogg, zv>},{}, {} >, <{<8ogg, N>}, 
{}, >, . 
<{ <sogg, N>}, {}, {<compl, P>, <compl, P> } >, 
<{<sogg, N>}, {<ogg,N>}, {} >, 
<{<8ogg, N>}, {<ogg, N>}, {<compt, P>} >, 
<{<8og9, Jr>}, {<og~, N>}, { <compz, P>, 
<compl, P>} >, 
<{<8o9g, N>}, {<ogg, c>}, {} >, 
<{<8099, N>}, {<o~9, C>}, {<compZ, P>} >, 
<{ <so99, N>}, {<o99, C>}, {<compZ, P>, 
<compl, P> } >} 
DepSetq = 
{ {<,ogg, N>, #}, 
{ <sogg, N > , <oom~, P> , #}, 
{<#og 9, N>, <compS, P>, <compl, P>, #}, 
{<sogg, N>, <ogg, N>, #}, 
{ <aogg, N> , <ogg, N> , <compl, P> , #}, 
{<aogg, N>, <ogg, N>, <compl, P>, <compl, P>, #}, 
{ <aogg, N>, <ogg, C> , #}, 
{<aogg, N>, <ogg, C>, <compl, P>, #}, 
{ <.oqg, N> , <ogg, C> , <~,.W, P> , <com~, P> , #} 
If we take all the permutations of the various subsets, 
we finally obtain the rules. So that if we have 
L("to aprong") ffi {ttsT} 
F(ttaT) = q 
we obtain dependency rules of the form in the pre- 
vious section: 
to apron9 : ttsr(<sogg, N> ~) 
to spron 9 : tzsT(# <so99,N>) 
to sprong : tz3?( <sog9 , N> <compl, PREP> #) 
to aprong : t,av( <sogg, N> # <comp/, PREP>) 
This procedure has the goal of mapping the subcate- 
gorization frames onto the dependency rules. In the 
actual practice, the frames are not multiplied out be- 
fore processing (for instance, exactly 200 rules would 
be generated for our very simple example). Process- 
ing issues will be sketched in section 4. 
3.2 Ordering among dependents 
The hierarchy, and in particular the subcategoriza- 
tion frames, does not enforce a specific ordering 
among dependents of the same head. We propose an 
extension of the formalism that prevents some per- 
mutations of the rules from being generated. The 
definition of subcategorization frame is modified in 
the following way: 
Q is a set of ordered snbcategorization frames. Each 
of them is a pair consisting of a subcategorization 
frame and a set of ordering constraints. 
Vq E Q \[q :<<D ~ R x 2T> x20>\], where 1t is 
as before and O is a set of pairs <dl,dz> where 
d,,d2 e DU{#}. 
The pairs in O define a partial order on the rel- 
ative positions of the dependency relations and the 
head. If both dl and d2 are members of D, the 
constraint specifies that the dependent whose gram- 
matical relation is d, (if any) must precede linearly 
the dependent whose grammatical relation is d2 (if 
any). If the first (second) member of the constraint 
is #, it is specified that the dependent whose gram- 
matical relation is d2 (dl respectively), if any, must 
follow (precede) the head. The "if any" clauses say 
that in all cases where one of the two elements is 
optionally present (minimum of the range equal to 
61 
I 
! 
I 
I 
i 
! 
I 
I 
I 
I 
i 
i 
I 
I 
I 
I 
I 
i 
i 
0), the constraint is assumed to be respected in case 
the number of actual instantiations is 0. 
The ordering relation is transitive, namely: 
if<el,e2> E O= A <e2,e3>E O= then 
<e,, e3>E O= 
We require that the set of ordering constraints O= 
associated with any subcategorization frame be con- 
sistent: 
'0 for at e, e D u {4#}, <e,, e,>¢ Of 
b)/or at e,, e~ e D U {#}, il <e~, e~>e Of 
men <e~,e~>¢ Of 
Finally, we modify the -~T relation (which defines 
the hierarchy): 
for each tl, t~ E T: 
t~ <_r t2 i~ 
(OF(t,) _D OF(t=)) ^ 
VdED 
(NF(t~)(d) <_~ NF(t,)(d) ^ VF(t,)(d) <R~ VF(t,)(d)) 
This corresponds to the requirement that a sub- 
category tl, which is more specific than t2, does not 
have looser constraints on linear order than t2 has. 
If we refer to our previous example, a possible Oq 
is {<sogg, #>, <#,ogg>}, specifying that the sub- 
ject must precede the verbal head, which, in turn, 
must precede the direct object. If each p~mutation 
in Rulesq is checked to verify if it satisfies the con- 
straints, then only 40 rules are left, corresponding 
to the possible (free) positions of the (0 to 2) com- 
plements. 
3.3 Inheritance 
We briefly mention here a notational convention 
which is useful to simplify the description of the sub- 
categorization frames; this convention is widespread 
in almost all taxonomic hierarchies. For details 
about inheritance we remind to the extensive liter- 
ature on semantic networks, frames and description 
logics (Nebel, 1990). 
We define: 
tl <T t2 iff tl ~T t2 A -,(t2 ~--T tl) 
If we define in the same way <R,~ and <R~, it is 
easy to verify that: 
tl <T t2 iHtl --~T t2 A 
(OF(t,) D OF(t=) V 
3dE D 
(~F(t,)(d) <R. ~F(t,)(d) V VF(t,)(d) <R~ VF(t,)(d))) 
namely if tl _<T t2 but they are not the same 
subcategory, there must be a differentia keeping 
them apart. This enables us to represent tl as 
P~f(t~) + Diff(tl,t2), where Ref(t~) is a way to 
62 
t137 
\[1,1\]j¢" |O,1\]P ~ \[0,21 
iN} iN, {Prep) Chesub ) 
tl80 
II, l\] IOtO) 
iml I ! 
Figure 2: An example of subsumption between two sub- 
categories. 
identify t~ from tl, and Dif/(tl, t~) is a notation 
for specifying the difference between the constraints 
associated with tt, and the ones associated with t2. 
So, we can say that the constraints associated with 
t, are determined as the composition of the ones in- 
her/ted from t2 and the ones specified locally (the 
differentia) for tl. 
Graphically, an arc from t2 to t, represents the sub- 
sumption relation (P~ef(t2) in previous terms), par- 
simoniously represented by the immediate ancestor. 
We show in figure 2 an example of subsumption be- 
tween two subcategories, t\]u - corresponding to the 
subcategorization frame q shown in the example of 
paragraph 3.1 - and tlso. 
For the sake of clarity, we show the subcategoriza- 
tion frame associated with t137 with a graph. In tlso 
(subsumed by t137), we specify the local constraint 
restrictions: the number restrictions of eGG become 
\[1,1\], and those of COMPL become \[0, 0\]. Moreover, 
the value restrictions of OGG become {N} (CHESUB 
is ruled out). By inheriting the constraints of t,s7 
and restricting them locally, we obtain that tlso re- 
quires an obligatory nominal subject and an oblig- 
atory nominal object, and cannot have any comple- 
ment. The order constraints - not shown in the fig- 
ure - are also inherited in the obvious way. 
A more significative example is in figure 4, that we 
will describe in section 5. 
4 Parsing issues 
Computational desiderata point towards a process- 
ing model that is input-driven, predictive, and able 
to prune the parsing space as early as possible. 
In this section, we propose an Earley-type parsing 
model with left-corner filtering 4 The parser goes 
left-to-fight and builds a structure that is always 
connected, by hypothesizing templates for the lex- 
ical items which are predicted but not yet encoun- 
tered in the input. It uses the information in the 
4The basis of our work is (Lombardo and Lesmo, 1996), 
where the authors present an Earley-type recognizer for de- 
pendency grammar, and propose the compilation of depen- 
dency rules into parse tables. 
hierarchy, by descending from the top class towards 
more specific classes. The descent is motivated by 
the fact that lower subcategories provide stronger 
constraints. It is possible to specify a procedure - 
described in (Barbero, 1998) - that consults the hi- 
erarchy just one time, in a compilation phase (dur- 
ing parsing it would be very time-consuming), and 
builds a parse table that guides the parser moves. In 
the following we give an intuitive description of the 
algorithm by assuming the dependency tree as data 
structure instead of the sets of items that character- 
ize Earley's parsing style. 
Initially, the parser guesses the presence of a node 
of a root category in the dependency tree. Then, 
given a node n associated to the subcategory t and a 
word w, the parser can perform three types of action: 
PREDICTION, SCANNING and COMPLETION. 
1. Prediction: the parser guesses the presence of 
the dependents of n (by using left-corner infor- 
mation), given the constraints of the subcate- 
gory t of n. When the parser analyses a de- 
pendent which is distinctive for a possible spe- 
ciaiization from the subcategory t to one of its 
children in the hierarchy, tl replaces t as the 
subcategory of n (for instance, if a direct object 
is hypothesized, we can directly descend from 
VERB to VERB-TRANS). 
2. Scanning: the parser scans the head word of n 
(the word w in the input). The subcategory of 
w must be in the subtree rooted by t (including 
t itself). The left dependents of n that have 
been hypothesized in the prediction phase must 
fulfill the specific requirements imposed by the 
subcategory of the head (otherwise, the path is 
abandoned). 
3. Completion: when the node n is "complete", 
namely all the dependents required by the sub- 
category t have been found, the next elements 
of the string can be analysed as dependents of 
the father node of n. If n has no father, i.e. it is 
the root of the dependency tree, and the end of 
the input string has been reached, the analysis 
ends successfully. 
For example, the analysis of the sentence Gli amid 
lo credevano un eroe, "The friends considered him a 
hero", begins with the creation of a verbal root tem- 
plate (figure 3, "Initiaiization~). The first word in 
the input string is a determiner (Gli, "the ~). A de- 
terminer can be the left-corner of a nominal group, 
so a prediction phase on the root node hypothesizes a 
left dependent of category NOUN labelled as subject 
(SoGG) 5. The control goes to this node, from which 
a left dependent of category Determ is hypothesized. 
5In Italian, it could also be the direct object. We show 
here only one (non-deterministic) analysis path. 
This last one is associated with the input word Gi/, 
Uthe'. The control returns to the node of category 
NOUN, that is associated with the next word amid, 
"friends ~. The node of category NOUN can be con- 
sidered "complete ~ (no other dependent is required), 
and the control goes back to the root node. 
At this point, the pronoun/o, Uhim~, is read in in- 
put. A direct object is hypothesized and associated 
with it. A specialization from the top of the hierar- 
chy to the subcategory Of transitive verbs is possible: 
we know, in fact, that the root verb must be transi- 
tive, because a direct object has been hypothesized. 
The word credevano ("considered") is then read in 
input, and it is associated with the root node (scan- 
ning phase). Suppose that the verb credere, ~consid- 
er r, belongs to a class V-TR that requires a nominal 
subject (the hypothesis on the left dependent amid 
comes out to be correct), an object and a predicative 
complement. 
The next input word, un, "a ~, is a determiner. 
Again, a nominal group is hypothesized, composed 
by a noun, playing the role of predicative comple- 
ment, and a dependent of the noun, that is of cat- 
egory Determ and is associated with the word un. 
The next input word, eroe, "hero', is associated with 
the node playing the role of predicative complement. 
The completion phase ends successfully the analysis 
of the sentence, as all the dependents required by 
the verb credevano (subject, object and predicative 
complement) have been found in the input sentence. 
5 The classification of 101 Italian 
verbs 
In investigating the empirical properties of a hierar- 
chicai grammar two issues must be addressed: the 
linguistic adequacy of the classification, and the par- 
simony of the hierarchy. We present some quantita- 
tive analyses of a corpus, showing that the proposed 
hierarchy reduces considerably the redundancy of a 
grammar for naturally occurring texts, while at the 
same time being sufficiently fine-grained to represent 
even very idiosyncratic items. 
The hierarchy we propose encodes 101 Italian verbs 
taken from the grammar of Italian (Renzi, 1988) as 
the most representative of the main structures of 
Italian. 
5.1 Materials and Method 
The main sources of information used to carry out 
the classification are: (Renzi, 1988)'s Italian gram- 
mar, (Palazzi and Folena, 1992)'s Italian dictionary, 
and an Italian corpus of about 500 000 words. The 
corpus includes dally newspapers articles (367578 
words), scientific dissertations (40013), young stu- 
dents compositions (27531), Verga's novels (12905), 
short news reports (6757), stories and various texts 
(5012). It is a varied corpus, representative of sev- 
63 
I 
I 
I 
I 
I 
I 
I 
I 
! 
l 
I 
I 
i 
I 
I 
I 
I 
l 
t 
ZITI~'~IF.ATZOm 
I 
IFJUCDZC~'~ 
NOUN~ NOUN~ 
SP~C/ SP~ ---.2 
DETERN r~-\] 1 DETZRM ~'~ ~. 
$¢A/@C.T/~ PJU~ZC~ZCW 
tJU~XOH 
sP~/ 
O~OI.g27CW 
V-TR \[credevano\] V-TR \['c~dev..o~ V-TR \[clredevano\] 4 
,-.~-~--, ~ 3 ,-.-7.~.~ 2 3 _ 
Figure 3: Analysis of the sentence Gli amid |o credevano ~ ~ ~ friends con.~dered him a hero". 
eral literary genres of written Italian. 
The information required by our formalism -- the 
grammatical relations associated to the dependents, 
their number (Nq(d)) and the set of categories 
(Vq(d)) that can realize them -- was partly obtained 
by consulting Italian dictionaries, partly based on 
native speakers intuitions, and mostly from the anal- 
ysis of the corpus. 
All the sentences containing the verbs under anal- 
ysis were automatically extracted from the corpus, 
and the subcategorization patterns (rules) exhibited 
by the verbs in those sentences were manually col° 
lected. 
We represented the set of subcategorization patterns 
(rules) as subcategorization frames, by associating 
with each grammatical relation - according to the 
formalism - the related number (Nv(d)) and value 
(Vq(d)) restrictions computed on the corpus. In this 
test, we have kept the order between the dependents 
of a verb free, so there are no ordering constraints. 
Each class tt is connected to its supexclass t2. 
Diff(tl, t2), the difference between the constraints 
associated with tl and the ones associated with t.~, 
is expressed by specifying, for each relation that is 
restricted from t2 to tl, the relation itself with the 
new number and value restrictions. 
5.2 Hierarchy 
Figure 4 illustrates a small portion of the resulting 
hierarchy. This hierarchy is based on the depen- 
dency relations for a generic Italian verb summa- 
rized in Table 1 s. 
6Usually the adjuncts are not indicated as part of the sub- categorization frames of the verbs: they are not obligatorily 
required by the verbs themselves. We have specified them anyway, as the hierarchy represents the grammar - which 
in- 
cludes all the information about the dependents, adjuncts in- 
cluded. Moreover, by specifying the information about the adjuncts at the top level, we maintain the clarity of the rep- 
resentation and the mapping on the formal grammar. 
The whole hierarchy has 6 levels: the top level (class 
VERB) represents the general constraints for Italian 
verbs, the top+l level distinguishes the constraints 
for impersonal(V1), intransitive (VERB-INTR) and 
transitive (VERB-TRANS) verbs, the top+2, top+3, 
etc. levels represent specific classes of verbs (from 
V2 to VS0). 
5.3 Results 
The graph in figure 5 shows the distribution of verbs 
by type, namely how the number of verbs covered by 
the classes grows in relation to the number of classes. 
We can see that the first (more common) class covers 
15 verbs, the first and second more common classes 
together covers 26 verbs, etcetera. With the first 9 
classes we cover 63 verbs, giving rise to a reduction 
of 85.7% compared to having a distinct subcatego- 
rization frame for each verb. With the first 18 classes 
we cover 81 verbs (reduction of 77.7%). The whole 
set of verbs requires, however, 50 classes (reduction 
of 50.5%): in fact, we have found many verbs with 
very idiosyncratic behaviours. 
Table 2 shows the distribution of verbs by token 
(sum of the occurrences, in the corpus, of all the 
verbs referring to each class), level by level. The 
fact that some rare classes occur is interesting if com- 
pared to the percentage of reduction in the represen- 
tation. There is a compression of 55,7%, while still 
taking care of very low frequency patterns, where 
compression is almost 0%. 
In Table 3, we show, for each level, the number 
of subcategorization patterns represented by all the 
classes of that level, namely the sum of the patterns 
of each class at that level. The number of patterns 
decreases rapidly by d~,cending the hierarchy. 
The representation of the syntactic knowledge concerning ad- 
juncts is currently a research goal. Most authors tend to 
avoid it in the representation of subcategorlzation frames - 
see (Hudson, 1990) and the "adjoining" operation in LTAG 
(Josh| and Schabes, 1996). 
64 
GRAMMATICAL 
RELATION 
subject (Soot) 
object 
(Oct) 
predicative complement 
(PRED) 
complements (COMPL), 
at most 3 
adjunct 
(AGGIUNTO) 
SYNTACTIC CATEGORY 
(value restriction) 
nominal group, N 
embedded clause headed by the 
complementizer che, "that", Cheaub 
preposition di, "off, Prep\[di\] 
infinitive verb, Verb\[inf\] 
nominal group, N 
embedded clause headed by the 
complementizer ehe, "that", Chesub 
preposition d/, "of", Prep\[di\] 
nominal group, N 
adjective group, Adj 
prepositional group, Prep 
prepositional group, Prep 
clitic, Clitic 
prepositional group, Prep 
adverb, Ado 
conjunction, Conjdip 
verb of non-finite mood, Verb\[non.fin\] 
EXAMPLE + TRANSLATION 
Paolo area Maria 
"Paolo loves Maria" 
Mi diverte che tu dica ci~), 
"It amuses me that you say this" 
Non mi interessa di venire, 
"I am not interested in coming" 
~qdar¢ t be/to, 
"Skiing is nice" 
Gianni mangia una mela, 
"Gianni eats an apple" 
Credo J~e_z~l_d/lffizlm~, 
"I think it is amusing" 
Aepetto di partite, 
"I am waiting to leave" 
Gon.~idero Piero ~ Omico, 
"I consider Piero a friend" 
Luigi ~ gentile, 
"Luigi is 
Tuo zio ~ senza ritegno, 
"Your uncle is without reserve" 
Metro il vaso 
"I put the vase on the table" 
Gli ho dato un libro, 
"I gave him a book" 
Procedevo di buon p~so, 
"I was walking at a brisk pace" 
Luigi corse veloeemente, 
,Luigi ran quickly" 
Telefonami fuando puoi, 
"Call me when you can" 
Camminavo ~schiettando, 
"I was walking while whistling" 
Table I: The grammatical relations of a generic Italian verb, with their possible realizations and related examples. 
For the patterns found in the texts, we observe a de- 
crease similar but less marked than the grammatical 
patterns. Even the more specific classes describe a 
good portion of the patterns in the texts, so confirm- 
ing the usefulness of very specific information in the 
analysis. 
Table 2 show this point more clearly. The lower, 
more specific levels, while having fewer classes, still 
cover many occurrences of verbs in the text. 
6 Conclusion 
The paper has presented a hierarchical organization 
of a dependency formalism. The hierarchy is defined 
by the subsumption relation on subcategories, de- 
fined as a mapping between subcategories and sub- 
categorization frames. Subcategorization frames, in 
turn, define the number of possible instantiations 
of a dependency relation and the subcategories that 
can realize it. 
The hierarchical formalism has shown to be effective 
in representing parsimoniously - that is, without re- 
dundancy - the syntactic and lexical knowledge in 
an empirical test on 101 Italian verbs. 
Moreover, we have sketched a left-to-right predictive 
parsing model that takes advantage of the hierarchi- 
cal knowledge representation in order to make pre- 
dictions on the structure of the input sentence. 
In the next future we will address a massive empiri- 
cal test of Italian corpora, and the formal specifica- 
tion of the parsing model, together with a complex- 
ity analysis. 
7 Acknowledgements 
This research was partly sponsored by the Swiss 
National Science Foundation, with fellowhip 8210- 
46569 to P. Merlo. 
65 
VERB 
lo.,lv llOo,l ",,!,o.,! 1 
IN, Chesub, {N, Chesub, {N,AdJ, {Prep, 
Verblinf\], Prep\[dil } Prep} Clltle} Prep\[di\]} 
SOGG OGG COMPL 
l \[0.0l V°.01 10,Ol {} 1} 1} 1} 
VERB-ZNTR 
\[1,1l # \[0.0\] 
{N, Chesub, { } 
Verb \[Infl, 
Prep \[dl\] } 
I TERMINE \[0,1\] 
{Prep\[a\], 
Clitic \[dat \] } 
AGGIUNTO 
\[O, Lnf\] 
{ P repo Adv, 
ConJdlp, 
Verb\[no flnl} 
~-T~NS 
• l 
{N, Chesub, {N, Chesub, 
Verb\[inf\] } Prep\[dl\] } 
V$O 
|lO, l 
{N} {Adj\] {Preplda\] } 
Figure 4: A portion of the hierarchy. Subclasses inherit and restrict the constraints at the top of the hierarchy. The 
top class, VERB, has three daughters. V1 is the class of impersonal verbs, that can only have adjuncts as dependents 
- the restriction is on the range, \[0, 0\], of the other relations. For example, we can say Piove o,i tetti della cittd, "It 
rains on the roofs of the town". The classes VERB*INTR and VERB-TRANS correspond to intransitive and transitive 
verbs, respectively. VERB-INTR requires an obligatory subject (\[1,1\]) and it cannot have a direct object (\[0, 0\]). 
VERB-TRANS requires an obligatory subject (\[1,1}), that can be headed by a nominal element, a conjunction ch¢ or 
an infinitive verb, and an obligatory object (\[1,1\]). A subclass of VERB-INTR, V2, is shown: its only restriction is on 
the relation COMPL, which is specialized on the subrelation TERMINE, "Indirect Object", having a range \[0,1\], and 
having Prep\[a\]and Clitic\[dat\]as associated categories (preposition lexically realized by a, "to", and dative clitlc). For 
example, sembrare~ "seem", is a verb belonging to this class: we can say A Luigi Maria aembra beUissima, "To Lui~ 
Maria seems very beautiful". VS0 is a subclass of VERB-TRANS: it restrics the sets of categories associated to the 
relations OGG and PRED, and specializes the relation Co~lPl. on the subrelation SEPARAZIONE, "Separation" (realized 
by the preposition da, "from", Prep\[da)). The verb allontanare, "distance", belongs to VS0: Luigi mi aIlontanb da 
re, "Luigi distanced me from you". 
Number of I01 
verbs 
belon?lnq 
to ~he 
classes 
81 
4S, 
12. 
21,. 
15' 
|234S tD X| SO 
N~rof ClasBas 
Figure 5: Distribution of verbs by type. 
66 
ii 
LEVEL DISTRIBUTION OF CLASS SIZE 
1 
2 
3 
4 
5 
6 
5 
423 322 318 308170 169 148 136 99 77 67 545451 47 46 45 21 20 16 14 12 11 10 3 2 
321 292 229 116 103 897850 41206 5 
397332 212 111 52 15 
299 239 2 
Table 2: Distribution of class size by level. 
LEVEL 
1 
2 
3 
4 
5 
6 
PATTERNS 
GRAMM. \[ IN TEXT 
484531 5674 
244533 5674 
102986 2643 
20558 1351 
1166 1135 
134 540 
Table 3: Number of possible and actual patterns at the 
levels of the hierarchy. 

References 
C. Barbero. 1998. On the granularity of information 
in syntactic representation and processing: the use 
of a hierarchy of syntactic classes. Ph.D. thesis, 
Universith di Torino. 
T. Becker. 1993. HyTAG: a new type of Tree Ad- 
joining Grammars for Hybrid Syntactic Represen- 
tation of Free Order Languages. Ph.D. thesis, 
University of Saarbruecken. 
M.H. Candito. 1996. A principle-based hierarchical 
representation of LTAGs. In Proceedings of COL- 
ING'g6. 
C. Doran, B. Hockey, P. Hopely, J. Rosenzwieg, 
A. Sarkar, B. Srinivas, F. Xia, A. Nazr, and 
O. Ranbow. 1997. Maintaining the Forest and 
Burning out the Underbrush in XTAG. In Com- 
putational Environments for Grammar Develop- 
ment and Language Engineering (ENVGRAM). 
tL Evans, G. Gazdar, and D. Weir. 1995. Encoding 
Lexicalized Tree Adjoining Grammar with a Non- 
monotonic Inheritance Hierarchy. In Proceedings 
of ACL'95. 
R. Hudson. 1990. English Word Grammar. Black- 
well. 
A. Joshi and Y. Schabes. 1996. Tree-Adjoining 
Grammars. In Handbook of Formal Languages 
and Automata. Springer-Verlag, Berlin. 
V. Lombardo and L. Lesmo. 1996. An Earley-type 
recognizer for Dependency Grammar. In Proceed- 
ings of COLING'96. 
V. Lombardo and L. Lesmo. 1998. Formal aspects 
and parsing issues of dependency theory. In Pro- 
ceedings of A CL-COLING'g8. 
L Mel'cuk. 1988. Dependency 5yntaz: Theory and 
Practice. SUNY Press, Albany. 
W.D. Meurers. 1997. Using lexical principles in 
HPSG to generalize over valence properties. In 
Proceedings of the Third Conference on Formal Grammar, 
Aix-en-Provence, France. 
B. NebeL 1990. Reasoning and Revision in Hy- 
brid Representation Systems. In LNAI n. 4~. 
Springer-Verlag. 
F. Palazzi and G. Folena. 1992. Dizionario della 
lingua italiana. Loescher. 
C. Pollard and I. Sag. 1987. Information-based syn- 
taz and semantics, vol. 1 Fundamentals. CSLL 
L. Renzi. 1988. Grande grammatica italiana di con- 
sultazione~ I1 Mulino. 
Y. Schabes. 1990. Mathematical and Computational 
Aspects of Lezicalized Grammars. Ph.D. thesis, 
University of Pennsylvania. 
K. Vijay-Shanker and Y. Schabes. 1992. Structure 
sharing in lexicalized tree-adjoining grammars. In 
Proceedings of COLING'92. 
