Valency Frames of Czech Verbs in VALLEX 1.0
Zdeneˇk Zˇ abokrtsky´
Center for Computational Linguistics,
Charles University,
Malostranske´ na´m. 25,
CZ-11800 Prague, Czech Republic
zabokrtsky@ckl.mff.cuni.cz
Marke´ta Lopatkova´
Center for Computational Linguistics,
Charles University,
Malostranske´ na´m. 25,
CZ-11800 Prague, Czech Republic
lopatkova@ckl.mff.cuni.cz
Abstract
The Valency Lexicon of Czech Verbs, Version
1.0 (VALLEX 1.0) is a collection of linguisti-
cally annotated data and documentation, resul-
ting from an attempt at formal description of
valency frames of Czech verbs. VALLEX 1.0
is closely related to Prague Dependency Tre-
ebank. In this paper, the context in which
VALLEX came into existence is briefly outli-
ned, and also three similar projects for English
verbs are mentioned. The core of the paper is
the description of the logical structure of the
VALLEX data. Finally, we suggest a few di-
rections of the future research.
1 Introduction
The Prague Dependency Treebank1 (PDT) meets the
wide-spread aspirations of building corpora with rich an-
notation schemes. The annotation on the underlying (tec-
togrammatical) level of language description ((Hajicˇova´
et al., 2000)) – serving among other things for training
stochastic processes – allows to acquire a considerable
amount of data for rule-based approaches in computati-
onal linguistics (and, of course, for ’traditional’ linguis-
tics). And valency belongs undoubtedly to the core of all
rule-based methods.
PDT is based on Functional Generative Description
of Czech (FGD), being developed by Petr Sgall and his
collaborators since the 1960s ((Sgall et al., 1986)). Wi-
thin FGD, the theory of valency has been studied since
the 1970s (see esp. (Panevova´, 1992)). Its modification
is used as the theoretical background in VALLEX 1.0
(see (Lopatkova´, 2003) for a detailed description of the
framework).
Valency requirements are considered for autosemantic
words – verbs, nouns, adjectives, and adverbs. Now, its
1http://ufal.mff.cuni.cz/pdt
principles are applied to a huge amount of data – that
means a great opportunity to verify the functional criteria
set up and the necessity to expand the ‘center’, ‘core’ of
the language being described.
Within the massive manual annotation in PDT, the pro-
blem of consistency of assigning the valency structure
increased. This was the first impulse leading to the deci-
sion of creating a valency lexicon. However, the potential
usability of the valency lexicon is certainly not limited to
the context of PDT – several possible applications have
been illustrated in ((Stranˇa´kova´-Lopatkova´ and Zˇ abokrt-
sky´, 2002)).
The Valency Lexicon of Czech Verbs, Version 1.0
(VALLEX 1.0) is a collection of linguistically annota-
ted data and documentation, resulting from this attempt
at formal description of valency frames of Czech verbs.
VALLEX 1.0 contains roughly 1400 verbs (counting only
perfective and imperfective verbs, but not their iterative
counterparts).2 They were selected as follows: (1) We star-
ted with about 1000 most frequent Czech verbs, according
to their number of occurrences in a part of the Czech Nati-
onal Corpus3 (only ‘by´t’ (to be) and some modal verbs
were excluded from this set, because of their non-trivial
status on the tectogrammatical level of FGD). (2) Then we
added their perfective or imperfective aspectual counter-
parts, if they were missing; in other words, the set of verbs
in VALLEX 1.0 is closed under the relation of ‘aspectual
pair’.
The preparation of the first version of VALLEX has
taken more than two years. Although it is still a work
in progress requiring further linguistic research, the first
2Besides VALLEX, a larger valency lexicon (called
PDT-VALLEX, (Hajicˇ et al., 2003)) has been created during the
annotation of PDT. PDT-VALLEX contains more verbs (5200
verbs), but only frames occuring in PDT, whereas in VALLEX
the verbs are analyzed in the whole complexity, in all their me-
anings. Moreover, richer information is assigned to particular
valency frames in VALLEX.
3http://ucnk.ff.cuni.cz
version has been already publically released. The whole
VALLEX 1.0 can be downloaded from the Internet af-
ter filling the on-line registration form at the following
address: http://ckl.mff.cuni.cz/zabokrtsky/vallex/1.0/
From the very beginning, VALLEX 1.0 was designed
with an emphasis on both human and machine readability.
Therefore both linguists and developers of applications
within the Natural Language Processing domain can use
and critically evaluate its content. In order to satisfy diffe-
rent needs of these different potential users, VALLEX 1.0
contains the data in the following three formats:
a0 Browsable version. HTML version of the data
allows for an easy and fast navigation through the
lexicon. Verbs and frames are organized in several
ways, following various criteria.
a0 Printable version. For those who prefer to have a
paper version in hand. For a sample from the prin-
table version, see the Appendix.
a0 XML version. Programmers can run sophisticated
queries (e.g. based on XPATH query language) on
this machine-tractable data, or use it in their appli-
cations. Structure of the XML file is defined using a
DTD file (Document Type Definition), which natu-
rally mirrors logical structure of the data (described
in Sec. 3).
2 Similar Projects for English Verbs4
2.1 FrameNet
FrameNet ((Fillmore, 2002)) groups lexical units
(pairings of words and senses) into sets according to whe-
ther they permit parallel semantic descriptions. The verbs
belonging to a particular set share the same collection of
frame-relevant semantic roles. The ‘general-purpose’ se-
mantic roles (as Agent, Patient, Theme, Instrument, Goal,
and so on) are replaced by more specific ‘frame-specific’
role names (e.g. Speaker, Addressee, Message and Topic
for ‘speaking verbs’).
2.2 Levin Verb Classes
Levin semantic classes ((Levin, 1993)) are constructed
from verbs which undergo a certain number of alternations
(where an alternation means a change in the realization
of the argument structure of a verb, as e.g. ‘conative al-
ternation’ Edith cuts the bread – Edith cuts at the bread).
These alternations are specific to English. For Czech, e.g.
particular types of diatheses can be considered as useful
alternations.
Both FrameNet and Levin classification are focused (at
least for the time being) only on selected meanings of
verbs.
4For comparison of PropBank, Lexical Conceptual Data-
base, and PDT, see (Hajicˇova´ and Kucˇerova´, 2002).
2.3 PropBank
In the PropBank corpus ((Kingsbury and Palmer,
2002)) sentences are annotated with predicate-argument
structure. The human annotators use the lexicon conta-
ining verbs and their ‘frames’ – lists of their possible
complementations. The lexicon is called ‘Frame Files’.
Frame Files are mapped to individual members of Levin
classes.
There is only a minimal specification of the connecti-
ons between the argument types and semantic roles – in
principle, a one-argument verb has arg0 in its frame, a
two-argument verb has arg0 and arg1, etc. Frame Files
store all the meanings of the verbs, with their description
and examples.
3 Logical Structure of the VALLEX Data
3.1 Word Entries
On the topmost level, VALLEX 1.0 is divided into word
entries (the HTML ‘graphical’ layout of a word entry
is depicted on Fig. 1). Each word entry relates to one
or more headword lemmas5 (Sec. 3.2). The word entry
consists of a sequence of frame entries (Sec. 3.5) relevant
for the lemma(s) in question (where each frame entry
usually corresponds to one of the lemma’s meanings).
Information about the aspect (Sec. 3.16) of the lemma(s)
is assigned to each word entry as a whole.
Figure 1: HTML layout of a word entry.
Most of the word entries correspond to lemmas in a
simple one-to-one manner, but the following two non-
trivial situations (and even combinations of them) appear
as well in VALLEX 1.0:
5Remark on terminology: The terms used here either belong
to the broadly accepted linguistic terminology, or come from the
Functional Generative Description (FGD), which we have used
as the background theory, or are defined somewhere else in this
text.
a0 lemma variants (Sec. 3.3)
a0 homonyms (Sec. 3.4)
The content of a word entry roughly corresponds to the
traditional term of lexeme.
3.2 Lemmas
Under the term of lemma (of a verb) we understand the
infinitive form of the respective verb, in case of homonym
(Sec. 3.4) followed by a Roman number in superscript
(which is to be considered as an inseparable part of the
lemma in VALLEX 1.0!).
Reflexive particles se or si are parts of the infinitive
only if the verb is reflexive tantum, primary (e.g. ba´t se)
as well as derived (e.g. zabı´t se, sˇı´rˇit se, vra´tit se).
3.3 Lemma Variants
Lemma variants are groups of two (or more) lemmas that
are interchangable in any context without any change of
the meaning (e.g. doveˇdeˇt se/dozveˇdeˇt se). The only diffe-
rence usually is just a small alternation in the morphologi-
cal stem, which might be accompanied by a subtle stylis-
tic shift (e.g. myslet/myslit, the latter one being bookish).
Moreover, although the infinitive forms of the variants di-
ffer in spelling, some of their conjugated forms are often
identical (mysli (imper.sg.) both for myslet and myslit).
The term ‘lemma variants’ should not be confused with
the term ‘synonymy’.
3.4 Homonyms
There are pairs of word entries in VALLEX 1.0, the lem-
mas of which have the same spelling, but considerably
differ in their meanings (there is no obvious semantic re-
lation between them). They also might differ as to their
etymology (e.g. nakupovata0 - to buy vs. nakupovata0a1a0 - to
heap), aspect (Sec. 3.16) (e.g. stacˇita0 pf. - to be enough
vs. stacˇita0a2a0 impf. - to catch up with), or conjugated forms
(zˇilo (past.sg.fem) for zˇı´ta0 - to live vs. zˇalo(past.sg.fem)
zˇı´ta0a1a0 - to mow). Such lemmas (homonyms)6 are distingu-
ished by Roman numbering in superscript. These numbers
should be understood as an inseparable part of lemma in
VALLEX 1.0.
3.5 Frame Entries
Each word entry consists of a non-empty sequence of
frame entries, typically corresponding to the individual
meanings (senses) of the headword lemma(s) (from this
point of view, VALLEX 1.0 can be classified as a Sense
Enumerated Lexicon).
6Note on terminology: we have adopted the term ‘homo-
nyms’ from Czech linguistic literature, where it traditionally
stands for what was stated above (words identical in the spelling
but considerably different in the meaning); in English literature
the term ‘homographs’ is sometimes used to express the same
notion.
The frame entries are numbered within each word en-
try; in the VALLEX 1.0 notation, the frame numbers are
attached to the lemmas as subscripts.
The ordering of frames is not completely random, but
it is not perfectly systematic either. So far it is based only
on the following weak intuition: primary and/or the most
frequent meanings should go first, whereas rare and/or idi-
omatic meanings should go last. (We do not guarantee that
the ordering of meanings in this version of VALLEX 1.0
exactly matches their frequency of the occurrences in con-
temporary language.)
Each frame entry7 contains a description of the va-
lency frame itself (Sec. 3.6) and of the frame attributes
(Sec. 3.13).
3.6 Valency Frames
In VALLEX 1.0, a valency frame is modeled as a sequence
of frame slots. Each frame slot corresponds to one (either
required or specifically permitted) complementation8 of
the given verb.
The following attributes are assigned to each slot:
a0 functor (Sec. 3.7)
a0 list of possible morphemic forms (realizations)
(Sec. 3.8)
a0 type of complementation (Sec. 3.11)
Some slots tend to systematically occur together. In
order to capture this type of regularity, we introduced the
mechanism of slot expansion (Sec. 3.12) (full valency
frame will be obtained after performing these expansions).
3.7 Functors
In VALLEX 1.0, functors (labels of ‘deep roles’; similar
to theta-roles) are used for expressing types of relations
between verbs and their complementations. According to
FGD, functors are divided into inner participants (actants)
and free modifications (this division roughly corresponds
to the argument/adjunct dichotomy). In VALLEX 1.0,
we also distinguish an additional group of quasi-valency
complementations.
Functors which occur in VALLEX 1.0 are listed in the
following tables (for Czech sample sentences see (Lopat-
kova´ et al., 2002), page 43):
Inner participants:
a0 ACT (actor): Peter read a letter.
a0 ADDR (addressee): Peter gave Mary a book.
7Note on terminology: The content of ‘frame entry’ rou-
ghly corresponds to the term of lexical unit (‘lexie’ in Czech
terminology).
8Note on terminology: in this text, the term ‘complemen-
tation’ (dependent item) is used in its broad sense, not related to
the traditional argument/adjunct (complement/modifier) dicho-
tomy (or, if you want, covering both ends of the dichotomy).
a0 PAT (patient): I saw him.
a0 EFF (effect): We made her the secretary.
a0 ORIG (origin): She made a cake from apples.
Quasi-valency complementations:
a0 DIFF (difference): The number has swollen by 200.
a0 OBST(obstacle): The boy stumbled over a stumb.
a0 INTT (intent): He came there to look for Jane.
Free modifications:
a0 ACMP (accompaniement): Mother came
with her children.
a0 AIM (aim): John came to a bakery
for a piece of bread.
a0 BEN (benefactive): She made this for her children.
a0 CAUS (cause): She did so since they wanted it.
a0 COMPL (complement): They painted the wall blue.
a0 DIR1 (direction-from): He went from the forest to
the village.
a0 DIR2 (direction-through): He went
through the forest to the village.
a0 DIR3 (direction-to): He went from the forest
to the village.
a0 DPHR (dependent part of a phraseme): Peter talked
horse again.
a0 EXT (extent): The temperatures reached
an all time high.
a0 HER (heritage): He named the new villa
after his wife.
a0 LOC (locative): He was born in Italy.
a0 MANN (manner): They did it quickly.
a0 MEANS (means): He wrote it by hand.
a0 NORM (norm): Peter has to do it
exactly according to directions.
a0 RCMP (recompense): She bought a new shirt
for 25 $.
a0 REG (regard): With regard to George she asked his
teacher for advice.
a0 RESL (result): Mother protects her children
from any danger.
a0 SUBS (substitution): He went to the theatre
instead of his ill sister.
a0 TFHL (temporal-for-how-long): They interrupted
their studies for a year.
a0 TFRWH (temporal-from-when): His bad reminis-
cences came from this period.
a0 THL (temporal-how-long ): We were there
for three weeks.
a0 TOWH (temporal-to when): He put it over
to next Tuesday.
a0 TSIN (temporal-since-when):I have not heard about
him since that time.
a0 TWHEN (temporal-when): His son was born
last year.
Note 1: Besides the functors listed in the tables above,
also value DIR occurs in the VALLEX 1.0 data. It is used
only as a special symbol for slot expansion (Sec. 3.12).
Note 2: The set of functors as introduced in FGD is
richer than that shown above, moreover, it is still being
elaborated within the Prague Dependency Treebank. We
do not use its full (current) set in VALLEX 1.0 due to se-
veral reasons. Some functors do not occur with a verb at
all (e.g. APP - appuertenace, ‘my.APP dog’), some other
functors can occur there, but represent other than depen-
dency relation (e.g. coordination, ‘Jim or.CONJ Jack’).
And still others can occur with verbs as well, but their be-
haviour is absolutely independent of the head verb, thus
they have nothing to do with valency frames (e.g. ATT -
attitude, ’He did it willingly.ATT’).
3.8 Morphemic Forms
In a sentence, each frame slot can be expressed by a li-
mited set of morphemic means, which we call forms. In
VALLEX 1.0, the set of possible forms is defined either
explicitly (Sec. 3.9), or implicitly (Sec. 3.10). In the for-
mer case, the forms are enumerated in a list attached to
the given slot. In the latter case, no such list is specified,
because the set of possible forms is implied by the functor
of the respective slot (in other words, all forms possibly
expressing the given functor may appear).
3.9 Explicitly Declared Forms
The list of forms attached to a frame slot may contain
values of the following types:
a0 Pure (prepositionless) case. There are seven mor-
phological cases in Czech. In the VALLEX 1.0 no-
tation, we use their traditional numbering: 1 - no-
minative, 2 - genitive, 3 - dative, 4 - accusative, 5 -
vocative, 6 - locative, and 7 - instrumental.
a0 Prepositional case. Lemma of the preposition (i.e.,
preposition without vocalization) and the number of
the required morphological case are specified (e.g.,
z+2, na+4, o+6. . . ). The prepositions occurring in
VALLEX 1.0 are the following: bez, do, jako, k,
kolem, kvu˚li, mezi, mı´sto, na, nad, na u´kor, o, od,
ohledneˇ, okolo, oproti, po, pod, podle, pro, proti,
prˇed, prˇes, prˇi, s, u, v, ve prospeˇch, vu˚cˇi, v za´jmu,
z, za. (‘jako’ is traditionally considered as a con-
junction, but it is included in this list, as it requires a
particular morphological case in some valency fra-
mes).
a0 Subordinating conjunction. Lemma of the con-
junction is specified. The following subordinating
conjunctions occur in VALLEX 1.0: aby, at’, azˇ, jak,
zda,9 zˇe.
a0 Infinitive construction. The abbreviation ‘inf’
stands for infinitive verbal complementation. ‘inf’
can appear together with a preposition (e.g.
‘nezˇ+inf’), but it happens very rarely in Czech.
a0 Construction with adjectives. Abbreviation ‘adj-
digit’ stands for an adjective complementation in the
given case, e.g. adj-1 (Cı´tı´m se slaby´ - I feel weak).
a0 Constructions with ‘by´t’ . Infinitive of verb ‘by´t’ (to
be) may combine with some of the types above, e.g.
by´t+adj-1 (e.g. zda´ se to by´t dostatecˇne´ - it seems to
be sufficient).
a0 Part of phraseme. If the set of the possible le-
xical values of the given complementation is very
small (often one-element), we list these values di-
rectly (e.g. ‘napospas’ for phraseme ‘ponechat na-
pospas’ - to expose).
3.10 Implicitly Declared Forms
If no forms are listed explicitly for a frame slot, then the
list of possible forms implicitly results from the functor of
the slot according to the following (yet incomplete) lists:
a0 LOC: adverb, na+6, v+6, u+2, prˇed+7, za+7, nad+7,
pod+7, okolo+2, kolem+2, prˇi+6, vedle+2, mezi+7,
mimo+4, naproti+3, pode´l+2 . . .
a0 MANN: adverb, 7, na+4, . . .
a0 DIR3: adverb, na+4, v+4, do+2, prˇed+4, za+4,
nad+4, pod+4, vedle+2, mezi+4, po+4, okolo+2, ko-
lem+2, k+3, mimo+4, naproti+3 . . .
a0 DIR1: adverb, z+2, od+2, zpod+2, zpoza+2, zprˇed+2
. . .
a0 DIR2: adverb, 7, prˇes+4, pode´l+2, mezi+7, . . .
a0 TWHEN: adverb, 2, 4, 7, prˇed+7, za+4, po+6, prˇi+6,
za+2, o+6, k+3, mezi+7, v+4, na+4, na+6, kolem+2,
okolo+2, . . .
a0 THL: adverb, 4, 7, po+4, za+4, . . .
a0 EXT: adverb, 4, na+4, kolem+2, okolo+2, . . .
a0 REG: adverb, 7, na+6, v+6, k+3, prˇi+6, ohledneˇ+2,
nad+7, na+4, s+7, u+2, . . .
9Note: form ‘zda’ is in fact an abbreviation for couple of
conjunctions ‘zda’ and ‘jestli’.
a0 TFRWH: z+2, od+2, . . .
a0 AIM: k+3, na+4, do+2, pro+4, proti+3, aby, at’, zˇe,
. . .
a0 TOWH: na+4 . . .
a0 TSIN: od+2 . . .
a0 TFHL: na+4, pro+4, . . .
a0 NORM: podle+2, v duchu+2, po+6, . . .
a0 MEANS: 7, v+6,na+6,po+6, z+2, zˇe, s+7, na+4,
za+4, pod+7, do+2, . . .
a0 CAUS: 7, za+4, z+2, kvu˚li+2, pro+4, k+3, na+4, zˇe,
. . .
3.11 Types of Complementations
Within the FGD framework, valency frames (in a narrow
sense) consist only of inner participants (both obligatory10
and optional, ‘obl’ and ‘opt’ for short) and obligatory free
modifications; the dialogue test was introduced by Pane-
vova´ as a criterium for obligatoriness. In VALLEX 1.0,
valency frames are enriched with quasi-valency comple-
mentations. Moreover, a few non-obligatory free modi-
fications occur in valency frames too, since they are ty-
pically (‘typ’) related to some verbs (or even to whole
classes of them) and not to others. (The other free modi-
fications can occur with the given verb too, but are not
contained in the valency frame, as it was mentioned above
(Sec. 3.7) )
The attribute ‘type’ is attached to each frame slot and
can have one of the following values: ‘obl’ or ‘opt’ for
inner participants and quasi-valency complementations,
and ‘obl’ or ‘typ’ for free modifications.
3.12 Slot Expansion
Some slots tend systematically to occur together. For
instance, verbs of motion can be often modified with
direction-to and/or direction-through and/or direction-
from modifier. We decided to capture this type of regula-
rity by introducing the abbreviation flag for a slot. If this
flag is set (in the VALLEX 1.0 notation it is marked with
an upward arrow), the full valency frame will be obtained
after slot expansion.
If one of the frame slots is marked with the upward
arrow (in the XML data, attribute ‘abbrev’ is set to 1), then
the full valency frame will be obtained after substituting
this slot with a sequence of slots as follows:
a0a1a0 DIR
a2a4a3a6a5a8a7 DIR1a2a4a3a6a5 DIR2a2a4a3a6a5 DIR3a2a4a3a6a5
10It should be emphasized that in this context the term obliga-
toriness is related to the presence of the given complementation
in the deep (tectogrammatical) structure, and not to its (surface)
deletability in a sentence (moreover, the relation between deep
obligatoriness and surface deletability is not at all straightfor-
ward in Czech).
a0a1a0 DIR1
a0
a1a3a2
a7 DIR1a0
a1a3a2 DIR2
a2a4a3 a5 DIR3a2a4a3a6a5
a0a1a0 DIR2
a0
a1a3a2
a7 DIR1a2a4a3a6a5 DIR2a0 a1a3a2 DIR3a2a4a3a6a5
a0a1a0 DIR3
a0
a1a3a2
a7 DIR1a2a4a3a6a5 DIR2a2a4a3a6a5 DIR3a0 a1a3a2
a0a1a0 TSIN
a0
a1a3a2
a7 TSINa0
a1a3a2 THL
a2a4a3a6a5 TTILa2a4a3a6a5
a0a1a0 THL
a2a4a3a6a5a8a7 TSINa2a4a3a6a5 THLa2a4a3a6a5 TTILa2a4a3a6a5
3.13 Frame Attributes
In VALLEX 1.0, frame attributes (more exactly, attribute-
value pairs) are either obligatory or optional. The former
ones have to be filled in every frame. The latter ones
might be empty, either because they are not applicable
(e.g. some verbs have no aspectual counterparts), or be-
cause the annotation was not finished (e.g. attribute class
(Sec. 3.15) is filled only in roughly one third of frames).
Obligatory frame attributes:
a0 gloss – verb or paraphrase roughly synonymous with
the given frame/meaning; this attribute is not suppo-
sed to serve as a source of synonyms or even of
genuine lexicographic definition – it should be used
just as a clue for fast orientation within the word
entry!
a0 example – sentence(s) or sentence fragment(s) con-
taining the given verb used with the given valency
frame.
Optional frame attributes:
a0 control (Sec. 3.14)
a0 class (Sec. 3.15)
a0 aspectual counterparts (Sec. 3.16)
a0 idiom flag (Sec. 3.17)
3.14 Control
The term ‘control’ relates in this context to a certain
type of predicates (verbs of control)11 and two corre-
ferential expressions, a ‘controller’ and a ‘controllee’. In
VALLEX 1.0, control is captured in the data only in the
situation where a verb has an infinitive modifier (regar-
dless of its functor). Then the controllee is an element that
would be a ‘subject’ of the infinitive (which is structurally
excluded on the surface), and controller is the co-indexed
expression. In VALLEX 1.0, the type of control is stored
in the frame attribute ‘control’ as follows:
a0 if there is a coreferential relation between the (unex-
pressed) subject (‘controllee’) of the infinitive verb
and one of the frame slots of the head verb, then the
attribute is filled with the functor of this slot (‘cont-
roller’);
11Note on terminology: in English literature the terms ‘equi
verbs’ and ‘raising verbs’ are used in a similar context.
a0 otherwise (i.e., if there is no such co-reference) value
‘ex.’ is used.
Examples:
a0 pokusit se (to try) - control: ACT
a0 slysˇet (to hear), e.g. ‘slysˇet neˇkoho prˇicha´zet’ (to hear
somebody come) - control: PAT
a0 jı´t, in the sense ‘jde to udeˇlat’ (it is possible to do it)
- control: ex
3.15 Class
Some frames are assigned semantic classes like ‘mo-
tion’, ‘exchange’, ‘communication’, ‘perception’, etc.
However, we admit that this classification is tentative and
should be understood merely as an intuitive grouping of
frames, rather than a properly defined ontology.
The motivation for introducing such semantic classi-
fication in VALLEX 1.0 was the fact that it simplifies
systematic checking of consistency and allows for ma-
king more general observations about the data.
3.16 Aspect, Aspectual Counterparts
Perfective verbs (in VALLEX 1.0 marked as ‘pf.’ for
short) and imperfective verbs (marked as ‘impf.’) are dis-
tinguished between in Czech; this characteristic is called
aspect. In VALLEX 1.0, the value of aspect is attached to
each word entry as a whole (i.e., it is the same for all its
frames and it is shared by the lemma variants, if any).
Some verbs (i.e. informovat - to inform, charakterizo-
vat - to characterize) can be used in different contexts
either as perfective or as imperfective (obouvidova´ slo-
vesa, ‘biasp.’ for short).
Within imperfective verbs, there is a subclass of of ite-
rative verbs (iter.). Czech iterative verbs are derived more
or less in a regular way by affixes such as -va- or -iva-, and
express extended and repetitive actions (e.g. cˇı´ta´vat, cho-
dı´vat). In VALLEX 1.0, iterative verbs containing double
affix -va- (e.g. chodı´va´vat) are completely disregarded,
whereas the remaining iterative verbs occur as aspectual
counterparts in frame entries of the corresponding non-
iterative verbs (but have no own word entries, still).
A verb in its particular meaning can have aspectual
counterpart(s) - a verb the meaning of which is almost the
same except for the difference in aspect (that is why the
counterparts constitute a single lexical unit on the tecto-
grammatical level of FGD; however, each of them has its
own word entry in VALLEX 1.0, because they have di-
fferent morphemic forms). The aspectual counterpart(s)
need not be the same for all the meanings of the given
verb, e.g., odpoveˇdeˇt is a counterpart of odpovı´dat - to
answer, but not of odpovı´dat - to correspond. Therefore
the aspectual counterparts (if any) are listed in frame at-
tribute ‘asp. counterparts’ in VALLEX 1.0. Moreover, for
perfective or imperfective counterparts, not only the lem-
mas are specified within the list, but (more specifically)
also the frame numbers of the counterpart frames (which
is of course not the case for the iterative counterparts, for
they have no word entries of their own as stated above).
One frame might have more than one counterpart be-
cause of two reasons. Either there are two counterparts
with the same aspect (impf. pu˚sobit and impf. zpu˚sobo-
vat for pf. zpu˚sobit), or there are two counterparts with
different aspects (impf. scha´zet, pf. sejı´t, iter. scha´zı´vat).
3.17 Idiomatic frames
When building VALLEX 1.0, we focused mainly on pri-
mary or usual meanings of verbs. We also noted many fra-
mes corresponding to peripheral usages of verbs, however
their coverage in VALLEX 1.0 is not exhaustive. We call
such frames idiomatic and mark them with label ‘idiom’.
An idiomatic frame is tentatively characterized either by
a substantial shift in meaning (with respect to the primary
sense), or by a small and strictly limited set of possi-
ble lexical values in one of its complementations, or by
occurence of another types of irregularity or anomaly.
4 Future Work
We plan to extend VALLEX in both quantitative and qua-
litative aspects. At this moment, word entries for 500
new verbs are being created, and further batches of verbs
will follow in near future (selected with respect to their
frequency, again). As for the theoretical issues, we in-
tend to focus on capturing the structure on the set of
frames/senses (e.g. the relations between primary and me-
taphorical usages of a verb), on improving the semantic
classification of frames, and on exploring the influence of
word-formative process on valency frames (for example,
regularities in the relations between valency frames of a
basic verb and of a verb derived from it by prefixing, are
expected).
Acknowledgements
VALLEX 1.0 has been created under the financial sup-
port of the projects MSMT LN00A063 and GACR
405/04/0243.
We would like to thank for an extensive linguistic and
also technical advice to our colleagues from CKL and
UFAL, especially to professor Jarmila Panevova´.

References
Charles Fillmore. 2002. Framenet and the linking be-
tween semantic and syntactic relations. In Proceedings
of COLING 2002, pages xxviii–xxxvi.
Jan Hajicˇ, Jarmila Panevova´, Zdenˇka Uresˇova´, Alevtina
Be´mova´, Veronika Kola´rˇova´, and Petr Pajas. 2003.
PDT-VALLEX: Creating a Large-coverage Valency
Lexicon for Treebank Annotation. In Proceedings of
The Second Workshop on Treebanks and Linguistic
Theories, volume 9 of Mathematical Modeling in Phys-
ics, Engineering and Cognitive Sciences, pages 57–68.
Vaxjo University Press, November 14–15, 2003.
Eva Hajicˇova´ and Ivona Kucˇerova´. 2002. Argu-
ment/Valency Structure in PropBank, LCS Database
and Prague Dependency Treebank: A Comparative Pi-
lot Study. In Proceedings of the Third International
Conference on Language Resources and Evaluation
(LREC 2002), pages 846–851. ELRA.
Eva Hajicˇova´, Jarmila Panevova´, and Petr Sgall, 2000. A
Manual for Tectogrammatical Tagging of the Prague
Dependency Treebank.
Paul Kingsbury and Martha Palmer. 2002. From Tre-
ebank to PropBank. In Proceedings of the 3rd Inter-
national Conference on Language Resources and Eva-
luation, Las Palmas, Spain.
Beth C. Levin. 1993. English Verb Classes and Alter-
nations: A Preliminary Investigation. University of
Chicago Press, Chicago, IL.
Marke´ta Lopatkova´, Zdeneˇk Zˇ abokrtsky´, Karolina Skwar-
ska, and Va´clava Benesˇova´. 2002. Tektogramaticky
anotovany´ valencˇnı´ slovnı´k cˇesky´ch sloves. Technical
Report TR-2002-15.
Marke´ta Lopatkova´. 2003. Valency in the Prague Depen-
dency Treebank: Building the Valency Lexicon. Pra-
gue Bulletin of Mathematical Linguistics, (79–80).
Jarmila Panevova´. 1992. Valency frames and the me-
aning of the sentence. In Ph. L. Luelsdorff, editor,
The Prague School of Structural and Functional Lingu-
istics, pages 223–243, Amsterdam-Philadelphia. John
Benjamins.
Petr Sgall, Eva Hajicˇova´, and Jarmila Panevova´. 1986.
The Meaning of the Sentence in Its Semantic and Prag-
matic Aspects. D. Reidel Publishing Company, Dord-
recht.
Hana Skoumalova´. 2002. Verb frames extracted from
dictionaries. The Prague Bulletin of Mathematical Lin-
guistics 77.
Marke´ta Stranˇa´kova´-Lopatkova´ and Zdeneˇk Zˇ abokrtsky´.
2002. Valency Dictionary of Czech Verbs: Complex
Tectogrammatical Annotation. In Proceedings of the
Third International Conference on Language Resour-
ces and Evaluation (LREC 2002), volume 3, pages 949–
956. ELRA.
Nad’a Svozilova´, Hana Prouzova´, and Anna Jirsova´. 1997.
Slovesa pro praxi. Academia, Praha.
