Dependency and relational structure in treebank annotation
Cristina BOSCO, Vincenzo LOMBARDO
Dipartimento di Informatica, Universit`adiTorino
Corso Svizzera 185
10149 Torino,
Italia,
bosco,vincenzo@di.unito.it
Abstract
Among the variety of proposals currently mak-
ing the dependency perspective on grammar
more concrete, there are several treebanks
whose annotation exploits some form of Rela-
tional Structure that we can consider a general-
ization of the fundamental idea of dependency
at various degrees and with reference to diﬀer-
ent types of linguistic knowledge.
The paper describes the Relational Structure as
the common underlying representation of tree-
banks which is motivated by both theoretical
and task-dependent considerations. Then it
presents a system for the annotation of the Rela-
tional Structure in treebanks, called Augmented
Relational Structure, which allows for a system-
atic annotation of various components of lin-
guistic knowledge crucial in several tasks. Fi-
nally, it shows a dependency-based annotation
for an Italian treebank, i.e. the Turin Univer-
sity Treebank, that implements the Augmented
Relational Structure.
1 Introduction
Diﬀerent treebanks use diﬀerent annotation
schemes which make explicit two distinct but
interrelated aspects of the structure of the sen-
tence, i.e. the function of the syntactic units
and their organization according to a part-whole
paradigm. The first aspect refers to a form of
Relational Structure (RS), the second refers to
its constituent or Phrase Structure (PS). The
major diﬀerence between the two structures is
that the RS allows for several types of rela-
tions to link the syntactic units, whilst the PS
involves a single relation ”part-of”. The RS
can be seen as a generalization of the depen-
dency syntax with the syntactic units instanti-
ated to individual words in the dependency tree
(Mel’ˇcuk, 1988). As described in many theo-
retical linguistic frameworks, the RS provides a
useful interface between syntax and a seman-
tic or conceptual representation of predicate-
argument structure. For example, Lexical Func-
tional Grammar (LFG) (Bresnan, 1982) collo-
cates relations at the interface between lexicon
and syntax, Relational Grammar (RG) (Perl-
mutter, 1983) provides a description of the sen-
tence structure exclusively based on relations
and syntactic units not structured beyond the
string level.
This paper investigates how the notion of RS
has been applied in the annotation of tree-
banks, in terms of syntactic units and types
of relations, and presents a system for the
definition of the RS that encompasses several
uses in treebank schemata and can be viewed
as a common underlying representation. The
system, called Augmented Relational Struc-
ture (ARS) allows for an explicit representa-
tion of the three major components of linguistic
structures, i.e. morpho-syntactic, functional-
syntactic and semantic. Then the paper shows
how a dependency-based annotation can de-
scend on ARS, and describes the ARS-based an-
notation of a dependency treebank for Italian,
the Turin University Treebank (TUT), which is
the first available treebank for Italian, with a
few quantitative results.
The paper is organized as follows. The next
section investigates both the annotation of RS
in treebanks and the major motivations for the
use of RS from language-specific issues and NLP
tasks implementation; then we present the ARS
system; finally, we show the dependency anno-
tation of the TUT corpus.
2 Annotation of the Relational
Structure
In practice, all the existing treebank schemata
implement some form of relational structure.
Annotation schemata range from pure (depen-
dency) RS-based approaches to RS-PS combi-
nations (Abeill´e, 2003).
Some treebanks consider the relational infor-
mation as the exclusive basis of the annotation.
The Prague Dependency Treebank ((Hajiˇcov´a
and Ceplov´a, 2000), (B¨ohmov´a et al., 2003))
implements a three level annotation scheme
where both the analytical (surface syntactic)
and tectogrammatical level (deep syntactic and
topic-focus articulation) are dependency-based;
the English Dependency Treebank (Rambow
et al., 2002) implements a dependency-based
mono-stratal analysis which encompasses sur-
face and deep syntax and directly represents the
predicate-argument structure. Other projects
adopt mixed formalisms where the sentence is
split in syntactic subunits (phrases), but linked
by functional or semantic relations, e.g. the Ne-
gra Treebank for German ((Brants et al., 2003),
(Skut et al., 1998)), the Alpino Treebank for
Dutch (van der Beek et al., 2002), and the Lingo
Redwood Treebank for English (Oepen et al.,
2002). Also in the Penn Treebank ((Marcus
et al., 1993), (Marcus et al., 1994)) a limited
set of relations is placed over the constituency-
based annotation in order to make explicit the
(morpho-syntactic or semantic) roles that the
constituents play.
The choice of a RS-based annotation schema
can depend on theoretical linguistic motivations
(a RS-based schema allows for an explicit, fine-
grained representation of several linguistic phe-
nomena), task-dependent motivations (the RS-
based schema represents the linguistic informa-
tion involved in the task(s) at hand), language-
dependent motivations (the relational structure
is traditionally considered as the most adequate
representation of the object language).
Theoretical motivations for exploiting repre-
sentations based on forms of RS was developed
in the several RS-based theoretical linguistic
frameworks (e.g. Lexical Functional Grammar,
Relaional Grammar and dependency grammar),
which allow for capturing information involved
at various level (e.g. syntactic and semantic)
in linguistic structures, and grammatical for-
malisms have been proposed with the aim to
capture the linguistic knowledge represented in
these frameworks. Since the most immediate
way to build wide-coverage grammars is to
extract them directly from linguistic data (i.e.
from treebanks), the type of annotation used in
the data is a factor of primary importance, i.e.
a RS-based annotation allows for the extraction
of a more descriptive grammar
1
.
1
See (Mazzei and Lombardo, 2004a) and (Mazzei and
Lombardo, 2004b) for experiments of LTAG extraction
from TUT.
Task-dependent motivations rely on how the
annotation of the RS can facilitate some
processing aspects of NLP applications. The
explicit representation of predicative structures
allowed by the RS can be a powerful source
of disambiguation. In fact, a large amount of
ambiguity (such as coordination, Noun-Noun
compounds and relative clause attachment) can
be resolved using such a kind of information,
and relations can provide a useful interface
between syntax and semantics. (Hindle and
Rooth, 1991) had shown the use of dependency
in Prepositional Phrase disambiguation, and
the experimental results reported in (Hock-
enmaier, 2003) demonstrate that a language
model which encodes a rich notion of predicate
argument structure (e.g. including long-range
relations arising through coordination) can
significantly improve the parsing performances.
Moreover, the notion of predicate argument
structure has been advocated as useful in
a number of diﬀerent large-scale language-
processing tasks, and the RS is a convenient
intermediate representation in several applica-
tions (see (Bosco, 2004) for a survey on this
topic). For instance, in Information Extraction
relations allows for recognizing diﬀerent guises
in which an event can appear regardless of the
several diﬀerent syntactic patterns that can be
used to specify it (Palmer et al., 2001)
2
.In
Question Answering, systems usually use forms
of relation-based structured representations of
the input texts (i.e. questions and answers)
and try to match those representations (see
e.g. (Litkowski, 1999), (Buchholz, 2002)).
Also the in-depth understanding of the text,
necessary in Machine Translation task, requires
the use of relation-based representations where
an accurate predicate argument structure is a
critical factor (Han et al., 2000)
3
.
Language-dependent motivations rely on the
fact that the dependency-based formalisms has
been traditionally considered as the most ade-
quate for the representation of free word order
languages. With respect to constituency-based
2
Various approaches to IE (Collins and Miller, 1997)
address this issue by using relational representations,
that is forms of ”concept nodes” which specifies a trig-
ger word (usually a Verb) and also forms of mapping
between the syntactic and the semantic relations of the
trigger.
3
The system presented in (Han et al., 2000) gener-
ates the dependency trees of the source language (Ko-
rean) sentences, then directly maps them to the trans-
lated (English) sentences.
formalisms, free word order languages involves a
large amount of discontinuous constituents (i.e.
constituents whose parts are not contiguous in
the linear order of the sentence). In practice, a
constituency-based representation was adopted
for languages with rather fixed word order
patterns, like English (Penn Treebank), while
a dependency representation for languages
which allow variable degrees of word order
freedom, such as Czech (see Prague Depen-
dency Treebank) or Italian (as we will see later,
TUT). Nevertheless, in principle, since the
representation of a discontinuous constituent X
can be addressed in various ways (e.g. by in-
troducing lexically empty elements co-indexed
with the moved parts of X), the presence to
a certain extent of word order freedom does
not necessarily mean that a language has to be
necessarily annotated according to a relation-
based format rather than a constituency-based
one. Moreover, free word order languages can
present diﬃculties for dependency-based as
well as for constituency-based frameworks (e.g.
non-projective structures). The development
of dependency-based treebanks for English (see
English Dependency Treebank) together with
the inclusion of relations in constituency-based
treebanks (see Penn Treebank) too, confirms
the strongly prevailing relevance of motivations
beyond the language-dependent ones.
The types of knowledge that many applica-
tions actually need are RS-based representa-
tions where predicate argument structure and
the associated morphological and syntactic in-
formation can operate as an interface to a
semantic-conceptual representation. All these
types of knowledge have in common the fact
that they can be described according to the de-
pendency paradigm, rather than according to
the constituency paradigm. The many applica-
tions (in particular those referring to the Penn
Treebank) which use heuristics-based transla-
tion schemes from the phrase structure to lexical
dependency (”head percolation tables”) (Ram-
bow et al., 2002) show that the access to com-
prehensive and accurate extended dependency-
based representations has to be currently con-
sidered as a critical issue for the development of
robust and accurate NLP technologies.
Now we define our proposal for the representa-
tion of the RS in treebank annotation.
3 The Augmented Relational
Structure
A RS consists of syntactic units linked by re-
lations. An Augmented Relational Structure
(ARS) organizes and systematizes the informa-
tion usually associated in existing annotations
to the RS, and includes not only syntactic ,
but also linguistic information that can be rep-
resented according to a dependency paradigm
and that is proximate to semantics and un-
derlies syntax and morphology. Therefore the
eats
John the apple 
rel2rel1
Figure 1: A simple RS.
ARS expresses the relations and syntactic units
in terms of multiple components. We describe
the ARS as a dag where each relation is a
feature structure including three components.
Each component of the ARS-relations is use-
morph
fsynt
sem
v1
v2
v3
Figure 2: An ARS relation.
ful in marking both similarities and diﬀerences
among the behavior of units linked by the de-
pendency relations.
The morpho-syntactic component of the ARS
describes morpho-syntactic features of the
words involved in relations, such as their gram-
matical category. This component is useful for
making explicit the morpho-syntactic variants
of predicative structures. Instances of this kind
of variants often occur in intransitive, tran-
sitive and di-transitive predicative structures,
e.g. esplosione-esplodere (explosion - to ex-
plode) are respectively nominal and verbal vari-
ants of the intransitive structure ”something ex-
plodes”. By referring to the TUT, we can evalu-
ate the frequency of this phenomenon: in 1,500
sentences 944 Verbs occur (for a total of 4169
occurrences) and around the 30% of them are
present in the nominal variant too
4
.
The functional-syntactic component identifies
the subcategorized elements, that is it keeps
apart arguments and modifiers in the pred-
icative structures. Moreover, this component
makes explicit the similarity of a same predica-
tive structure when it occurs in the sentence in
diﬀerent morpho-syntactic variants. In fact, the
functional-syntactic components involved, e.g.,
in the transitive predicative structure ”some-
one declares something”, are the same in both
the nominal (dichiarazione [di innocenza]
OBJ
[di John]
SUBJ
- John’s declaration of innocence)
and verbal realization ([John]
SUBJ
dichiara [la
sua innocenza]
OBJ
- John declares his inno-
cence) of such a predication, i.e. SUBJ and
OBJ. The distinction between arguments and
modifiers has been considered as quite problem-
atic in the practice of the annotation, even if rel-
evant from the applicative and theoretical point
of view
5
, and is not systematically annotated in
the Penn Treebank (only in clear cases, the use
of semantic roles allows for the annotation of
argument and modifier) and in the Negra Tree-
bank. This distinction is instead usually marked
in dependency representations, e.g. in the En-
glish Dependency Treebank and in the Prague
Dependency Treebank.
The semantic component of the ARS-relations
specifies the role of words in the syntax-
semantics interface and discriminates among
diﬀerent kinds of modifiers and oblique com-
plements. We can identify at least three levels
of generality: verb-specific roles (e.g. Runner,
Killer, Bearer); thematic roles (e.g. Agent, In-
strument, Experiencer, Theme, Patient); gener-
alized roles (e.g. Actor and Undergoer). The
use of specific roles can cause the loss of useful
generalizations, whilst too generic roles do not
describe with accuracy the data. An example of
annotation of semantic roles is the tectogram-
matical layer of the Prague Dependency Tree-
bank.
ARS features a mono-stratal approach. By fol-
lowing this strategy, the annotation process can
be easier, and the result is a direct represen-
tation of a complete predicate argument struc-
ture, that is a RS where all the information
(morpho-syntactic, functional-syntactic and se-
mantic) are immediately available. An alterna-
tive approach has been followed by the Prague
4
This statistics does not take into consideration the
possible polysemic nature of words involved.
5
See, for instance, in LFG and RG.
Dependency Treebank, which is featured by
a three levels annotation. This case shows
that the major diﬀerence between the syntactic
(analytic) and the semantic (tectogrammatical)
layer consists in the inclusion of empty nodes
for recovering forms of deletion ((B¨ohmova et
al., 1999), (Hajiˇcov´a and Ceplov´a, 2000)). But
this kind of information does not necessarily re-
quires a separated semantic layer and can be
annotated as well in a mono-stratal represen-
tation, like the English Dependency Treebank
(Rambow et al., 2002) does.
The tripartite structure of the relations in ARS
guarantees that diﬀerent components can be ac-
cessed separately and analyzed independently
(like in (Montemagni et al., 2003) or in (Ram-
bow et al., 2002)). Furthermore, the ARS al-
lows for forms of annotation of relations where
not all the features are specified too. In fact, the
ARS-relations which specify only a part of com-
ponents allow for the description of syntactic
grammatical relations which do not correspond
with any semantic relation, either because they
have a void semantic content or because they
have a diﬀerent structure from any possible cor-
responding semantic relation (i.e. there is no
semantic relation linking the same ARS-units
linked by the syntactic one). Typical relations
void of semantic content can link the parts of ex-
pressions not compositionally interpretable (id-
ioms), for instance together with with in to-
gether with. While a classic example of a non-
isomorphic syntactic and semantic structure is
one which involves the meaning of quantifiers:
a determiner within a NP extends its scope be-
yond the semantic concept that results from the
interpretation of the NP. Another example is
the coordination where the semantic and syn-
tactic structure are often considered as non iso-
morphic in several forms of representation.
The ARS-relations including values for both
functional-syntactic and semantic components
may be used in the representation of grammat-
ical relations which participate into argument
structures and the so-called oblique cases (see
Fillmore and (Hudson, 1990)), i.e. where the
semantic structures are completely isomorphic
to the syntactic structures. For example, a
locative adjunct like in the garden in John was
eating fish in the garden is represented at the
syntactic level as a Prepositional Phrase play-
ing the syntactic function locative in the Verb
Phrase (in the Penn Treebank it could be an-
notated as a PP-LOC); the semantic concept
corresponding to the garden plays the semantic
role LOCATION in the ”eating” event stated
by the sentence.
4 TUT: a dependency-based
treebank for Italian
The TUT is the first available tree-
bank of Italian (freely downloadable at
http://www.di.unito.it/˜tutreeb/). The cur-
rent release of TUT includes 1,500 sentences
corresponding to 38,653 tokens (33,868 words
and 4,785 punctuation marks). The average
sentence length is of 22,57 words and 3,2
punctuation marks.
In this section, we concentrate on the major
features of TUT annotation schema, i.e. how
the ARS system can describe a dependency
structure.
4.1 A dependency-based schema
In Italian the order of words is fixed in non
verbal phrases, but verbal arguments and mod-
ifiers can be freely distributed without aﬀect-
ing the semantic interpretation of the sentence.
A study on a set of sentences of TUT shows
that the basic word order for Italian is Subject-
Verb-Complement (SVC), as known in litera-
ture (Renzi, 1988), (Stock, 1989), but in more
than a quarter of declarative sentences it is vi-
olated (see the following table
6
). Although the
Permutations Occurrences
SVC 74,26%
VCS 11,38%
SCV 7,98%
CSV 3,23%
VSC 2,29%
CVS 0,77%
Table 1: Italian word order
SVC order is well over the others, the fact that
all the other orders are represented quantita-
tively confirms the assumption of partial con-
figurationality intuitively set in the literature.
The partial configurationality of Italian can be
considered as a language-dependent motivation
for the choice of a dependency-based annota-
tion for an Italian treebank. The schema is
similar to that of the Prague Dependency Tree-
bank analytical-syntactic level with which TUT
6
The data reported in the table refer to 1,200 anno-
tated sentences where 1,092 verbal predicate argument
structures involving Subject and at least one other Com-
plement occur.
shares the following basic design principles typ-
ical of the dependency paradigm:
 the sentence is represented by a tree where
each node represents a word and each
edge represents a dependency labelled by a
grammatical relation which involves a head
and a dependent,
 each single word and punctuation mark
is represented by a single node, the so-
called amalgamated words, which are words
composed by lexical units that can occur
both in compounds and alone, e.g. Verbs
with clitic suﬃxes (amarti (to love-you) or
Prepositions with Article (dal (from-the)),
are split in more lexical units
7
,
 since the constituent structure of the sen-
tence is implicit in dependency trees, no
phrases are annotated
8
.
If the partial configurationality makes the
dependency-based annotation more adequate
for Italian, other features of this language
should be well represented by exploiting a
Negra-like format where the syntactic units are
phrases rather than single words. For instance,
in Italian, Nouns are in most cases introduced
by the Article: the relation between Noun and
Determiner is not very relevant in a dependency
perspective, while it contributes to the defini-
tion of a clearer notion of NP in Italian than
in languages poorer of Determiners like, e.g.,
Czech. The major motivation of a dependency-
based schema is therefore theoretical and, in
particular, to make explicit in the treebank an-
notation a variety of structures typical of the
object language.
Moreover, in order to make explicit in cases
of deletion and ellipsis the predicate argument
structure, we annotate in the TUT null ele-
ments. These elements allow for the annota-
tion of a variety of phenomena: from the ”equi”
deletion which aﬀects the subject of infinitive
Verb depending on a tensed Verb (e.g. John(1)
vuole T(1) andare a casa - John(1) want to
T(1) go home), to the various forms of gap-
ping that can aﬀect parts of the structure of
the sentence (e.g. John va(1) a casa e Mario
T(1) al cinema - John goes(1) home and Mario
7
Referring to the current TUT corpus, we see that
around 7,7% words are amalgamated.
8
If phrase structure is needed for a particular appli-
cation, it is possible to automatically derive it from the
dependency structure along with the surface word order.
dichiarava
In Sudja 
VERB,PREP
RMOD
TIME
il
VERB,NOUN
SUBJ
AGENT 
VERB,DET+DEF
OBJ
THEME
quei la 
PREP,DET+DEF
ARG 
#
fallimento
NOUN,DET+DEF
APPOSITION 
DENOM
DET+DEF,NOUN
ARG 
#
giorni zingara 
DET+DEF,NOUN
ARG 
#
DET+DEF,NOUN
ARG 
#
Figure 3: The TUT representation of In quei
giorni Sudja la zingara dichiarava il fallimento
(In those days Sudja the gipsy declared the
bankruptcy).
T(1) to the cinema), to the pro-dropped sub-
jects typical of Italian (as well as of Portuguese
and Spanish), i.e. the subject of tensed Verbs
which are not lexically realized in the sentence
(e.g. TVaacasa- T goes home). For phenom-
ena such as equi and gapping TUT implements
co-indexed traces, while it implements non co-
indexed traces for phenomena such as the pro-
drop subject.
4.2 An ARS-based schema
In TUT the dependency relations form the
skeleton of the trees and the ARS tripartite fea-
ture structures which are associated to these re-
lations resolve the interface between the mor-
phology, syntax and semantics. The ARS al-
lows for some form of annotation also of rela-
tions where only parts of the features are spec-
ified. In TUT this has been useful for under-
specifying relations both in automatic analysis
of the treebank (i.e. we can automatically ex-
clude the analysis of a specific component of
the relations) and in the annotation process (i.e.
when the annotator is not confident of a specific
component of a relation, he/she can leave such
acomponentvoid).
In the figure 3 we see a TUT tree.
All the relations annotated in the tree in-
clude the morpho-syntactic component, formed
by the morphological categories of the words in-
volved in the relation separated by a comma,
e.g. VERB,PREP for the relation linking the
root of the sentence with its leftmost child
(In). Some relation involves a morpho-syntactic
component where morphological categories are
composed by more elements, e.g. DET+DEF
(in DET+DEF,NOUN) for the relation linking
quei with giorni. The elements of the morpho-
syntactic component of TUT includes, in fact,
10 ”primary” tags that represent morphologi-
cal categories of words (e.g. DET for Deter-
miner, NOUN for Noun, and VERB for Verb),
and that can be augmented with 20 ”secondary”
tags (specific of the primary tags) which further
describe them by showing specific features, e.g.
DEF which specifies the definiteness of the De-
terminer or INF which specifies infiniteness of
Verb. Valid values of the elements involved in
TUT morpho-syntactic tags are 40.
By using the values of the functional-syntactic
component, TUT distinguishes among a variety
of dependency relations. In figure 3 we see the
distinction between argument, e.g. the relation
SUBJ linking the argument Sudja with the ver-
bal root of the sentence dichiarava,andtherela-
tion RMOD which represents a restrictive mod-
ifier and links the verbal root dichiarava with
in quei giorni. The dependents of Prepositions
and determiners are annotated as argument too,
according to arguments presented in (Hudson,
1990). Another distinction which is exploited
in the annotation of the sentence is that be-
tween restrictive modifier (i.e. RMOD which
links dichiarava with in quei giorni)andAP-
POSITION (i.e. non restrictive modifier linking
Sudja with la zingara), which are modifiers that
restrict the meaning of the head. Beyond these
basic distinctions, TUT schema draws other dis-
tinctions among the functional-syntactic rela-
tions and includes a large set of tags for a total
of 55 items, which are compounds of 27 pri-
mary and 23 secondary tags. These tags are
organized in a hierarchy (Bosco, 2004) accord-
ing to their diﬀerent degree of specification. In
the hierarchy of relations, Arguments (ARG) in-
clude Subject (SUBJ), Object (OBJ), Indirect
Object (INDOBJ), Indirect Complement (IN-
DCOMPL), Predicative Complements (of the
Subject (PREDCOMPL+SUBJ) and of the Ob-
ject (PREDCOMPL+OBJ)). The direct conse-
quence of its hierarchical organization is the
availability of another mechanisms of under-
specification in the annotation or in the anal-
ysis of annotated data. In fact, by referring to
the hierarchy we can both annotate and analyze
relations at various degrees of specificity.
The semantic component discriminates among
diﬀerent kinds of modifiers and oblique com-
plements. The approach pursued in TUT has
been to annotate very specific semantic roles
only when they are immediately and neatly
distinguishable. For instance, by referring to
the prepositional dependents introduced by da
9
(from/by), we find the following six diﬀerent
values for the semantic component:
- REASONCAUSE, e.g., in gli investitori sono
impazziti dalle prospettive di guadagno (the in-
vestors are crazy because of the perspectives of
gain)
-SOURCE,e.g.,intraggono benefici dalla
bonifica ([they] gain benefit from the drainage)
- AGENT, e.g., l’iniziativa `e appoggiata dagli
USA (the venture is supported by USA)
-TIME,e.g.,dal ’91 sono stati stanziati 450
miliardi (since ’91 has been allocated 450 bil-
lions)
- THEME, e.g., ci`o distingue l’Albania dallo
Zaire (that distinguishes the Albany from
Zaire)
- LOC, which can be further specialized in
LOC+FROM, e.g., in da qui `e partito l’assalto
(from here started the attack), LOC+IN, e.g., in
quello che succedeva dall’altra parte del mondo
(what happened in the other side of the world),
LOC+METAPH, e.g., in l’Albania `e passata dal
lancio dei sassi alle mitragliatrici (the Albany
has passed from the stone-throwing to the ma-
chineguns).
In figure 3 the semantic component has been
annotated only in four relations, which respec-
tively represent the temporal modifier In quei
giorni of the main Verb dichiarava,theappo-
sition la zingara of the Noun Sudja,andthe
arguments of the Verb, i.e. the subject Sudja la
zingara whichplaysthesemanticroleAGENT
and the object il fallimento which plays the se-
mantic role THEME. In the other relations in-
volved in this sentence a value for the semantic
component cannot be identified
10
,e.g. thear-
gument of a Preposition or Determiner cannot
be semantically specified as in the case of the
verbal arguments.
5 Conclusions
The paper analyzes the annotation of the RS in
the existing treebanks by referring to a notion
of RS which is a generalization of dependency.
9
In 1,500 TUT sentences we find 591 occurrences of
this Preposition.
10
In figure 3, we marked the semantic component of
these cases with sharp.
According to this notion, the RS includes types
of linguistic knowledge which are diﬀerent, but
which have in common that they can be repre-
sented by a dependency paradigm rather than
to a constituency-based one.
The paper identifies two major motivations
for exploiting RS in treebank annotation:
language-dependent motivations that have de-
termined the use of dependency for the repre-
sentation of treebanks of free word order lan-
guages, and task-dependent motivations that
have determined a wider use of relations in tree-
banks.
In the second part of the paper, we show a sys-
tem for the annotation of RS, i.e. the ARS,
andhowtheARScanbeusedforthean-
notation of a dependency-based treebank, the
TUT whose schema augments classical depen-
dency (functional-syntactic) relations with mor-
phological and semantic knowledge according to
the above mentioned notion of RS.

References
A. Abeill´e, editor. 2003. Building and using
syntactically annotated corpora. Kluwer, Dor-
drecht.
A. B¨ohmova, J. Panevov´a, and P. Sgall. 1999.
Syntactic tagging: procedure for the transi-
tion from analytic to the tectogrammatical
treestructures. In Proc. of 2nd Workshop on
Text, speech and dialog, pages 34–38.
A. B¨ohmov´a, J. Hajiˇc, E. Hajiˇcov´a, and
B. Hladk´a. 2003. The Prague Dependency
Treebank: A three level annotation scenario.
In Abeill´e (Abeill´e, 2003), pages 103–127.
C. Bosco. 2004. A grammatical rela-
tion system for treebank annotation.
Ph.D. thesis, University of Torino.
http://www.di.unito.it/˜bosco/.
T. Brants, W. Skut, and H. Uszkoreit. 2003.
Syntactic annotation of a German newspaper
corpus. In Abeill´e (Abeill´e, 2003), pages 73–
87.
J. Bresnan, editor. 1982. The mental represen-
tation of grammatical relations. MIT Press,
Cambridge.
S. Buchholz. 2002. Using grammatical rela-
tions, answer frequencies and the World Wide
Web for TREC Question Answering. In Proc.
of TREC 2001, pages 502–509.
M. Collins and S. Miller. 1997. Semantic tag-
ging using a probabilistic context free gram-
mar. In Proc. of 6th Workshop on Very Large
Corpora.
E. Hajiˇcov´a and M. Ceplov´a. 2000. Deletions
and their reconstruction in tectogrammatical
syntactic tagging of very large corpora. In
Porc. of COLING 2000, pages 278–284.
C. Han, B. Lavoie, M. Palmer, O. Rambow,
R. Kittredge, T. Korelsky, N. Kim, and
M. Kim. 2000. Handling structural diver-
gences and recovering dropped arguments in a
Korean/English machine translation system.
In Proc. of AMTA 2000, pages 40–54.
D. Hindle and M. Rooth. 1991. Structural am-
biguity and lexical relations. In Proc. of ACL
91, pages 229–236.
J. Hockenmaier. 2003. Parsing with generative
models of predicate-argument structure. In
Proc. of ACL 2003.
R. Hudson. 1990. English Word Grammar.
Basil Blackwell, Oxford and Cambridge.
K.C. Litkowski. 1999. Question-answering us-
ing semantic relation triples. In Proc. of
TREC-8, pages 349–356.
M. Marcus, B. Santorini, and M.A.
Marcinkiewicz. 1993. Building a large
annotated corpus of English: The Penn
Treebank. Computational Linguistics,
19:313–330.
M. Marcus, G. Kim, M.A. Marcinkiewicz,
R. MacIntyre, A. Bies, M. Ferguson, K. Katz,
and B. Schasberger. 1994. The Penn Tree-
bank: Annotating predicate argument struc-
ture. In Proc.ofHLT-94.
A. Mazzei and V. Lombardo. 2004a. Building a
large grammar for Italian. In Proc.ofLREC
2004, pages 51–54.
A. Mazzei and V. Lombardo. 2004b. A com-
parative analysis of extracted grammars. In
Proc. of ECAI 2004.
I.A. Mel’ˇcuk. 1988. Dependency Syntax: theory
and practice. SUNY, Albany.
S. Montemagni, F. Barsotti, M. Battista,
and N. Calzolari. 2003. Building the Ital-
ian syntactic-semantic treebank. In Abeill´e
(Abeill´e, 2003), pages 189–210.
S. Oepen, K. Toutanova, S. Shieber, C.D. Man-
ning, D. Flickinger, and T. Brants. 2002. The
LinGO Redwoods treebank: motivation and
prliminary applications. In Proc. of COLING
2002, pages 1253–1257.
M. Palmer, J. Rosenzweig, and S. Cotton. 2001.
Automatic predicate argument analysis of the
Penn Treebank. In Proc. of HLT 2001.
D.M. Perlmutter. 1983. Studies in Relational
Grammar 1. University of Chicago Press.
O. Rambow, C. Creswell, R. Szekely, H. Taber,
and M. Walker. 2002. A dependency tree-
bank for English. In Proc. of LREC 2002,
pages 857–863.
L. Renzi, editor. 1988. Grande grammatica
italiana di consultazione, vol. I. Il Mulino,
Bologna.
W. Skut, T. Brants, B. Krenn, and H. Uszkor-
eit. 1998. A linguistically interpreted corpus
of German in newspaper texts. In Proc. of
LREC 98, pages 705–713.
O. Stock. 1989. Parsing with flexibility, dy-
namic strategies, and idioms in mind. Com-
putational Linguistics, 15(1):1–17.
L. van der Beek, G. Bouma, R. Malouf, and
G. van der Noord. 2002. The Alpino depen-
dency treebank. In Proc. of CLIN 2001.
