SURFACE AND DEEP CASES 
JARMILA PANEVOVA 
Institute of Formal and Applied Linguistics 
Charles University 
Prague, Czechoslovakia 
HANA SKoUMALOVA 
Institute of Theoretical and Computational Linguistics 
Charles University 
Prague, Czechoslovakia 
Abstract 
In this paper we show the 
relation between the "surface 
(morphological) cases" and "deep 
cases" (participants), and the 
possible way to automate the 
creation of a syntactic diction- 
ary provided with frames contain- 
ing information about deep cases 
and their morphemic counterparts 
of particular lexical items 
(Czech verbs). 
Introduction 
In the project MATRACE I 
(MAchine TRAnslation between 
Czech and English) the first aim 
is to create two parallel text 
corpora (Czech and English), 
morphologically and syntactically 
tagged. Then it will be possible 
to use these corpora not only for 
creating an MT system but also 
for other linguistic research, 
needed e.g. for systems of NL 
understanding. For these purposes 
we try to make the syntactic 
representation "broader" so that 
the further work would be easier. 
I Project MATRACE, a research 
project of the Institute of 
Applied and Formal Linguistics 
and the Institute of Theoretical 
and Computational Linguistics, is 
carried out within the IBM Aca- 
demic Initiative project in 
Czechoslovakia. 
In the syntactic representation 
of a sentence, based on depend- 
ency grammar, we will specify not 
only the dependency and syntactic 
roles of the modifications but 
also their underlying counter- 
parts (i.e. "deep cases"). For 
this sort of tagging we need a 
dictionary with morphological and 
syntactic information, which 
consists of morphological para- 
digms of single words and their 
valency frames containing both 
syntactic and underlying roles of 
their members. As there is no 
such dictionary in machine-rea- 
dable form we have to create it. 
Unfortunately we even cannot 
extract the words with their 
frames from an existing corpus as 
we are only creating it. What we 
have is a morphological diction- 
ary, which is to be enriched by 
the syntactic information. The 
linguist adding this information 
should enter the surface frame 
and specify its underlying coun- 
terpart. We try to help him/her 
by automating the choice of the 
appropriate correspondence 
between "surface" and "deep" 
cases. 
In this paper we will con- 
centrate on the problems of verb 
and its valency slots. The gener- 
alization of our method for nouns 
and adjectives will not be diffi- 
cult as in many cases the syntac- 
tic frame of these words is just 
derived from the corresponding 
verb. 
AcrEs DE COLING-92, NANfES, 23-28 AO6-F 1992 8 8 5 l'Roc, ol: COLING-92, NANrEs, Au~3.23-28, 1992 
Theoretical background 
Using the framework of the 
functional generative description 
(FGP, see Sgall et al. 1986), 
slightly simplified for the pur- 
pose of this paper, we distin- 
guish two levels: a level of 
underlying structure (US, with 
the participants or "deep cases") 
and a level of surface structure 
(SS, morphemic units as parts of 
this are used here). As for the 
modifications of verbs we distin- 
guish inner participants and free 
modifications (see Panevov~ 1974- 
5). This can be understood as the 
paradigmatical classification of 
all possible verbal modificati- 
ons. The other dimension of their 
classification (combinatoric or 
syntagmatic dimension) concerns 
their obligatoriness and optiona- 
lity with the particular lexical 
item within the verbal frame. The 
verbal frame contains slots for 
obligatory and optional inner 
participants (which will be 
filled by the labels for "deep 
cases" and corresponding mor- 
phemic forms) and obligatory free 
modifications. The difference 
between an obligatory and 
optional participant is important 
for a parser, however, we will 
leave this dichotomy aside in 
this contribution. 
The following operational 
criteria for distinguishing 
between inner participants and 
free modifications are used: If 
the verbal modification can occur 
only once with a single verb 
token and if the governing verbs 
for a particular modification may 
be listed, the modification is 
considered as an "inner partici- 
pant". There are five partici- 
pants: Actor, Objective, 
Addressee, Origin and Effect. The 
other modifications (Time, 
Locative, Direction, Aim, Reason, 
Instrument, Regard, Manner etc.) 
can reoccur with a single verb 
token and may modify any verb. 
With some verbs free modifica- 
tions can also enter the respect- 
ive verb frame: either the con- 
struction is ungrammatical with- 
out them (to behave HOW, to last 
HOW LONG, to live WHERE etc.) or 
they are semantically obligatory, 
although they can be omitted on 
the SS level. This can be tested 
by a dialogue of the following 
type: 
A. My friend came. 
B. Where? 
A. *I don't know. 
Unacceptability of the answer "I 
don't know" indicates that the 
modification where is a part of 
a verbal frame of the verb to 
come. 
According to the theory 
proposed by Panevov~ (1974-5, 
esp. § 5) the following conse- 
quences are accepted here: If a 
verb has only one inner partici- 
pant then this participant is 
Actor. If a verb has two partici- 
pants then these are Actor and 
Objective. As fo~ the l"and 2 ~ 
participant our approach is simi- 
lar to Tesni~re's (1959). How- 
ever, if three or even more slots 
of a verbal frame are occupied 
then semantic considerations are 
involved. This is different from 
Tesni~re's solution and does not 
fully coincide with Fillmore's 
proposals (Fillmore 1968, 1970). 
Determining the Addressee, 
Origin and Effect is rather dif- 
ficult and requires taking into 
account the combination of sur- 
face cases in the frame (includ- 
ing the form of the Objective), 
the animacy of single members of 
the frame etc. Though there is no 
one-to-one mapping between "deep 
cases" and "surface cases", we 
are able to discover certain 
regularities and provide some 
generalization reflected in an 
algorithm. 
Observation 
In inflectional languages 
with (morphological) cases it is 
apparent that some cases are 
typical for certain participants. 
Objective is typically realized 
AcrEs DE COLING-92. NANTES, 23-28 AOtn" 1992 8 8 6 P~OC. OF COLING-92, NANTES, AUG. 23-28, 1992 
as the Accusative and Addressee 
as the Dative case. in Czech 
there are other typical (preposi- 
tional) cases. Thus z+Genitive 
(out of sb, st) or od+Genitive 
(from sb, st) ar~ typical for 
Origin, ha+Accusative (at st), 
do+Genitive (to st) or v+Accu- 
sative (into sb, st) are typical 
for Effect etc. This well known 
fact led us to the idea of creat- 
ing a program as a tool for in~ 
troducing verbal frames (to be 
used even by researchers without 
deep linguistic training) based 
on correspondences between sur~ 
face and deep caseE;. At f~rst we 
sorted the Czech v~rb~ into four 
groups: 
i. Verbs without Nominative in 
their frames. 
Examples: 
pr~i 
\[(it) rains\] 
hudl mi (Act (Dat) ) v hlav~ 
\[(it) is buzzing to me in head\] 
(my head is buzzing) 
This group contains verbs with 
empty frames but also a few verbs 
with very untypical frames. If 
the frame contains only one par~ 
ticipant, then this is obviously 
an Actor. if there are at least 
two participants in the frame and 
one of them is Dative, then this 
is the Actor. If, beside this, 
only one more participant occurs 
in the frame, it is necessarily 
the Objective. All other verbs 
must be treated individually by 
a linguist as a kind of excep~ 
tion. 
2. Verbs with Nominative and at 
most one more inner participant. 
Examples : 
on (Act (Nora)) zemfel 
\[he died\] 
Jan (Act (Nora)) vidfi_ 
rii (ob3 (Ace)) 
\[John sees Mary\] 
ze semene (Obj (Prep (z) 4Gen) ) 
rostl strom (Act (Nora)) 
\[from a seed grew a tree\] 
to(obj (Nora)) se mi (Act (Dat) ) libl 
\[it to me appeals\] (I like it) 
Ma~. 
vy ~. 
Accoi'diil%( to the the~)~'y, if the 
frame contains'; only one partici- 
pant, it is Actor,. if it contains 
two part~cipants~ one of them is 
Actor and the othe~: is Objective. 
Nominative usually represents the 
Actor but there is an exception 
to this rule: if the other par~ 
ticipant is in Dative, then this 
participant is the Actor and the 
Nominative represents the Objec- 
tive. Reasonability of this ex- 
ceptiot| call be proved by trans- 
lating particular verbs into 
other languages, ~n which the 
surface frames are different 
while there is no obvious reason 
why the deep frames should dif- 
fer~ Thus e.g. the verb libit se 
has Nominative/Clause and Dative 
in its surface frame while in the 
frame of the corresponding Eng- 
lish verb to like there are Sub- 
ject and obj cot/clause, where 
subject corresponds to Czech 
Dative and object to Nominative. 
3. Verbs with Nominative and two 
or more other inner participants, 
which occur only in "typical" 
cases (i.e~ Accusative, Dative, 
z+Genitive, od+Genitive, na+Accu- 
sative, do4Accusative, v+Accusa- 
tive) o A verb belongs to this 
group even if some of the slots 
for inner participants can be 
occupied either by a typical case 
or any other (prepositional) case 
o~- a clause or infinitive. 
Examples -" 
dan (Act (Nom) ) dal Ma- 
rii (Addr (Dat) ) knihu (Obj (Acc) ) 
\[John gave Mary a book\] 
Otec (Act (Nora)) ud~lal d~ ~o 
tern (Addr(Dat) ) ze 
dfeva (Orig (Prep (z) +Gen) ) pan~d- 
ka (Obj (Ace)) 
\[father made t(~ children out of 
wood a puppet\] 
The verbs (,f th~ third group 
behave "typioi~l\]y" v which means 
that Nominative represents the 
Actor, Accusative the objective, 
Dative the Addressee etCo 
4. othe~', i.eo verbs with Nomi- 
native and twt) or more other 
A(:H;S DI'; COLINGO2, NANTES, 23-28 aO~\]l 1992 8 8 '/ Pl~ol:. o,. COLING~92, NANI I!S, AU(;. 23-28, 1992 
inner participants, which occur 
not only in typical cases. 
Examples : 
~f (Act (Nom) ) j menoval Ja- 
na (Obj (Acc) ) z~s tup- 
cem (Eff (Instr) ) 
\[boss appointed John a deputy\] 
Jan (Act (Nora)) obkl opil Ma- 
rii (Addr (Acc) ) p~dl (Obj (Instr) ) 
\[John surrounded Mary with care\] 
In this group Nominative always 
represents Actor but for deter- 
mining other participants it is 
necessary to take into account an 
additional aspect, namely the 
prototypical character of the 
animacy of the participants; this 
enables us to distinguish the 
difference between deep frames of 
the two last examples jmenovat 
and obklopit. The surface frames 
are identical: Nominative, Accus- 
ative and Instrumental, but while 
the verb jmenovat has Accusative 
standing for the Objective and 
Instrumental for the Effect, the 
verb obklopit has Accusative 
standing for the function of 
Addressee and In@trumental for 
the function of Objective. 
Algoritbmisation 
The algorithms for the verbs 
of the first two groups were 
described in the previous para- 
graph. 
The possible algorithmiza- 
tion of determining the corre- 
spondences between "surface" and 
"deep" cases of the verbs of the 
last two groups can be seen from 
the following table of several 
Czech verbs with different 
frames: 
Pat Addr Orig Elf 
ud~lat Acc 
vzlt Acc (Dat) 
dostat Acc 
po~adovat Acc/Cl 
m~nit Acc (Dat) 
zaplatit Acc Dat 
/za+Acc 
d~dit Acc 
vypr~v~t Acc/Cl (Dat) 
v~d~t Acc/Cl 
spojit s+Instr Acc 
blahop~fit k+Dat/Cl Dat 
obklopit Instr Acc 
stilt se Instr 
jmenovat Acc 
~bdat o+Acc Acc 
hovo~it o+Loc (s+Instr) 
pom~hat s+Instr Dat /INF 
pt~t se na+Acc Acc /Cl 
91kat o+Acc Dat 
vsadit se o+Acc s+Instr 
z+Gen 
(od+Sen) 
od+Gen 
(od+Gen) 
na+Acc 
(po+Loc) 
z+Gen 
o+Loc 
o+Loc 
Instr 
make 
take 
get 
ask (for) 
change 
pay 
inherit 
talk 
know 
connect 
congratulate 
surround 
become 
appoint 
ask (for) 
speak 
help 
ask 
ask (for) 
bet 
We can see that the prepositional 
cases "typical" for Origin occur 
only in the position of Origin, 
and Dative occurs only in the 
position of Addressee. After 
these members of the surface 
frame are determined, in most 
cases only one undetermined par- 
ticipant remains, which must be 
Objective. If two or three par- 
ticipants are remaining we have 
to take into account the animacy 
ACRES DE COLING-92, NANTES, 23-28 AOtlr 1992 8 8 8 PROC. OF COLING-92. NANTES, AUG. 23-28, 1992 
(typical for Addressee) and in- 
animacy of the participants and 
the set of prepositional cases 
which are typical for Effect. 
This algorithm is used in a 
program which reads Czech verbs 
from an input file and asks a 
linguist (in the interactive 
regime) to fill in the surface 
verbal frame. 
conclusions 
Some general linguistic 
statements concerning relations 
between "centre" (prototypes) and 
"periphery" (marginality) in the 
domain of verb and its valency 
could be inferred from an appli- 
cation of the rules presented in 
our paper. In "nominative" lan- 
guages the verbal frame ~t Obj 
Addr can be considered as central 
(while e.g. Aat (Obj) Addr is not 
typical). Moreover, the corre- 
spondences between US and SS as 
Act -> Nom, Obj -> Ace, Addr -> 
Dat can be treated as prototypes 
(while e.g. correspondences Act 
-> Datr Addr -~ Ace, Obj -> Instr 
occur in Czech as marginal). The 
strategy of our algorithm is 
based principally on an observa- 
tion of this type. We assume that 
this method can be easily adapted 
for any other inflectional lan- 
guage and perhaps also for such 
languages as English. Languages 
may differ as to correspondences 
between a particular deep case 
(US) and its surface (morphemic 
form), but the idea of prototypi- 
cal and marginal relations seems 
to be valid and is supported by 
the algorithmic procedure for 
determining these correspon- 
dences. 
Roferonoos: 
Fillmore, Ch. (1968): The Case 
for Case, In: Universals of 
Linguistic Theory (ed. E. 
Bach, T. Haims), New York, 
pp. 1-88. 
Fillmore, Ch. (1970): Subjects, 
Speakers and Roles. 
these, Vol. 21, pp. 251- 
274. 
Panevov~, J. (1974-5) : On verbal 
Frames in Functional Gener- 
ative Description, Part I, 
Prague Bulletin of Ma- 
thematical Linguistics, 
Vol. 22, 1974, pp. 3-40, 
Part II, ibid, Vol. 23, 
1975, pp. 17-37. 
Sgall, P. - Haji~ov~, E. - Pane- 
vov~, J. (1986): The Mean- 
ing of the Sentence in Its 
Semantic and Pragmatic 
Aspects, Prague - Dor- 
drecht. 
Tesni~rer L. (1959): El~ments de 
syntaxe structurale, Paris. 
ACTF.S DE COLING-92, NANTES, 23-28 AO(~r 1992 8 8 9 PROC. OF COLING-92, NANTES, AOO. 23-28, 1992 
