A Karaka Based Approach to Parsing of Indian Languages 
Akshar Bharati Rajeev Sangal 
Department of Computer Science and Engineering 
indian Institute of Technology Kanpur 
Kanpur 208 016 India 
Abstract 
A karaka based &i)pro,'~cl'~ for' t),xYs{ng of \[nc/ian languages is 
described. I~ has been used for, building a parseL' of ttindi for 
a prototype Machine Translation system. 
A lex.\[ca\].\[sod gt'&mlnaF formalism has been developed that 
a\].lovas constraints to be specified between 'demand' ~and 'source' 
~;or'ds (e.g., between verb and its karaka roles). The parser 
has two important novel features: (.\[) It has a local word 
grouping phase in uhich wot"d gr'oups are formed using 'local' in- 
for-marion onl~ ~. They are formed based on finite state machine 
specifications thu~ resulting in a fas~t grouper. (ii) The parser. 
is a general constraint :~o\]ver. It first transforms the con- 
str'aints to ~n integer programming pr.ob\]em and then solves it. 
i. Introduction 
Languages belonging %o the Indian 
linguistic area shaFe several common 
features. They are relatively wor.d order 
free, nominals are inflected or- have post 
po::it ion case markers (collectively 
called as having vibhakti) , have verb 
complexes consisting of sequences of 
verbs (possibly joined together into a 
single word), etc. There ar'e also com- 
monal\]ties in vocabulory, in senses 
spanned by a ~4ord in one language to 
those of its counterpart in another In- 
dian language, etc. 
We base our grammar on the karaka 
(pronounced kaarak) structure. It is 
necessary to ment ion that although 
kaFakas are thought of as similar to 
c!~,'os, ~}~y ?,r'o fuDd;)mer~t:.a\] \] y ,7! { f f ei'e~\]+.: : 
"The pivotal categories of "the 
~bstL'act syntactic Fepresentation are the 
karakas, the grammar ica\] functions as ~ 
signed to nominals in relation to the 
• verbal root. They ar'e ne\] ther' se- 
mantic nol." morphological categories in 
themselves but cor'r'espond to semant {cs 
according to r'u\]. es specified in the 
grammar' and to mor.phology according to 
other rules specified in the grammar." 
\[Kip&rsky, 82\] . 
Before describing our grammar formal- 
ism, let us look at %he parser struc.- 
ttlPe" 
+ ................. + 
Ikaraka chart & I .... 
I lakshan charts \] 
.) .......................... + 
sentence 
V 
+ .................... + + ....................... + 
lactive lexiconl-> I morphological I 
I \] I analyzer I ...................... + + ....................... + 
I 
lexical I entries 
...................... ~ + ................... + 
Iverb form chartl-->llocal word grouperl 
....................... r + .................... + 
I word 
I groups 
I + .................. + 
I core par, set I 
+ ................. + 
l v 
intermediate 
representation 
Function of the mor'phol ogi cal analyzer 
is to take each word in the input 
sentence and extract its root and other 
associated grammatical information. This 
information for, ms the input to the local 
word grouper (LWG). 
1 25 
2. Local Word Grouper (LWG) 
The function of this block is to form 
the word groups on the basis of the 'lo- 
cal information' (i.e., information 
based on adjacent words) which will need 
no revision later on. This implies that 
wheneve~ there is a possibility of more 
than one grouping for some word, they 
will not be grouped together by the LWG. 
This block has been introduced to 
reduce the load on the core parser 
resulting in increased efficiency and 
simplicity of the overall system. 
The following example illustrates 
the job done by the LWG. In the fol- 
lowing sentence in Hindi: 
ladake adhyapak ko haar pahana rahe hein 
boys teacher to garland garland -ing 
(Boys are garlanding the teacher.) 
the output corresponding to the word 
'ladake' for-ms one unit, wo~ds 'adhyapak' 
and 'ko' form the next unit, similarly 
'pahana', '~ahe' and 'hein' will fo~m 
the last unit. 
3. Come Parser 
The function of the core parser is 
to accept the input from LUG and 
produce an 'intermediate language' 
representation (i.e parsed structure 
along with the identified karaka role,~;) 
of the given source language sentence. 
The core parser has to perfo~-m essential- 
ly two kinds of tasks 
l) karaka ~ole assignment tom verbs 
2) sense disambiguation for verbs and 
nouns ~espectively. 
For translating ~mong lhdian languages, 
assignment of karaka roles \[s sufficient. 
One need not do the semantic r, ole assign- 
ment after the kaFaka assignment. 
Let us now look at the grammar. 
3.1 Grammar Formalism 
The notion of karaka* rel~tion is 
.................................................. 
*Here, we use the word 'kar, aka' in an ex- 
tended sense which includes 'hetu' , 'ta- 
darthya' etc. in addition to actual kara- 
kas. 
central to the model. These are 
semantico-syntactic relations between 
the ve~'b(s) and the nominals in a 
sentence. The computational gTammar 
specifies a mapping from the nominals 
and the verb(s) in a sentence to kara- 
ka r'elations between them. Similarly, 
other rules of grammar provide a mapping 
from karaka ~elations to (deep) seman- 
tic relations between the verb(s) and 
the nominals. Thus, the karaka rela-- 
tions by themselves do not give the se- 
mantics. They specify relations which 
mediate between vibhakti of nominals 
and verb form on one hand and semantic 
\['elations on the other \[Bharati, 
Chaitanya, Sangal, 90\]. 
For each verb, for one of its forms 
called as basic, there is ~a default 
karaka chart. The default karak chart 
specifies a mapping from vibhakfis to 
karakas when that verb-form is used in a 
sentence. (Karaka chart has additional 
information besides vibhakti pertaining 
to 'yogyata' of the nominals. This serves 
to reduce the possible parses. Yogyata 
gives the semantic type that must be sa- 
tisfied by the word group that serves in 
the kamaka role.) 
When a verb-form other than the basic 
occurs in a sentence, the applicable 
karaka chart is obtained by taking the 
default karaka chart and transforming 
it using the verb type and its form. 
The new karaka chart defines the mapping 
from vibhakti to kanaka relations for 
the sentence. Thus, for example, 'jotata 
hat' (ploughs) in A.I has the default 
karaka chart which says that karts takes 
no parsarg (Ram). However, for 'jots' 
(ploughed) in A.2, or A.4, the karaka 
chart is transformed so that the karts 
takes the vibhakti 'ne' 'ko' or 'se~, 
A.I Ram khet ko jotata hai. 
Ram farm ko-parsarg plough -s. 
(Ram ploughs his farm.) 
A.2 Ram ne khet ko jots. 
Ram ne- farm ko- ploughed. 
(Ram ploughed the farm. ) 
A.3 Ram ko khet jotana pada. 
Ram ko- farm plough had-to. 
(Ram had to plough the farmo) 
26 2 
• ~4~ ~ ,~,, "- ~ ( / 
G -->Lc~l v___ 
q. 
S ~%/'.'~"e .... / C<"c4,.~_ 
= ""'e.,'-/,;. , 
Chart Fig. 3: Lakshan for Jota 
Finally, besides the merged kay- aka 
charts associated with individual verbs, 
there is also a global table of common 
k&rakas. It pertains to adhikarana karaka 
(i;ime arid place), hetu (cause), etc. and 
is applicable to all the verbs. It can be 
u;\]ed to account for. source ~ord groups 
that remain after sa-tisf~/.ing the mandato-- 
ry karakas. In this sense, it only con- 
i&ins optional karakas. 
3.3 Parsing 
For the task of karaka assignment, 
the core parser uses the fundamental 
principle of ' akanksha' (demand unit) 
and ' yogyata' (qualification of the 
sou~:ce unit) . 
The \]inguiatic units which play the 
role of demand and source word groups 
can vary depending on the parse cycle. 
In the case of simple sentences, only 
orle cycle is needed in which verb groups 
and some special noun groups (e.g. 
'i>aas'(near), 'door'(far) etc.) pllly the 
role of demand ~or(l groups, and noun 
groups and predicative adjectives play 
the role of source word groups, 
During the parsing process, each of 
the source word groups may be tested 
a~9.ainst each of the karaka r. estrictions 
in each of the k~xraka charts of the 
demand word groups. An appropriate data 
structure ma~ be cl. eated storing the 
sour-re word groups and the kar.ak restric- 
tions (in karats charts of demand groups) 
they satisfy, l~e call each such entr'y as 
a candidate variable. 
Typ \[ ('el \].g, a number of source 
word groups will qualify for a par- 
t: i cul ilr- <lem~nd . The job of the core 
parseF is to make an appropriate assign- 
merit of the candidates, subject to cer- 
tain constraints such as the following: 
I) one cand.{date sour're word group can- 
not satisfy more than orle demand of 
the same demand word. 
2) every obligatory demand must be satis- 
fied in some karaka chart of every 
demand word group. 
3) every source word must have an assign- 
ment . 
4) if more than one interpretation of 
a source word is available, then 
exactly one has to be selected. 
Tile above problem is transformed to 
an integer programming problem. Assigning 
1 to a candidate variable means that the 
particular karaka relation between the 
source word group and the demand word 
group holds; 0 stands for otherwise. All 
the various types of constraints men- 
tioned above can be specified in a very 
natural manner using algebraic inequali- 
ties in integer programming. Having a set 
of candidate variables assigned to I not 
only identifies the karaka relations 
which can be used to get the deep cases, 
but also identifies the karaka chart 
which serves to identify the sense of the 
verb group, etc. 
Moreover Integer' programming also 
permits a lingu{st to express prefer- 
ences among various candidates for a 
particular demand. A typical example of 
such a preference can be given. For. ex- 
ample, for most of the verbs an animate 
thing is more likely to be the karts 
than inanimate things, and among ani- 
mate~ human beings are more likely candl- 
date:~; to b:, karts than non-human candi- 
dates. These preferences would simply 
order the multiple parses if an~ in the 
absence of other information. 
The parsing strategy actually adopted 
in the system makes use of the merged 
kar'aka chart and corresponds to Anvit- 
Abhidhanvad, a theory of mimamsa school 
of the Indian grammatical it-edition. In 
this approach, we first determine the 
karaka relationships among the demand and 
source ~ord groups. (These are determined 
3 27 
A.4 Ram se khet nahi iota gaya. 
Ram se- farm not plough could. 
(Ram could not plough the farm.) 
The above principle allows us to deal 
with active passives. The verb forms for 
active and passive are just two special 
cases of the forms a verb can take. 
For example, the verb 'iota' in Hindi 
has four different meanings listed in the 
dictionary: 
I) harness (e.g., Ram ne bail ko kolhu 
me iota, or Ram harnessed the bullock 
for (turning) the crusher.) 
2) hitching the cart (e.g., Ram ne 
gaadii ko iota, or Ram hitched the 
 art.) 
3) plough (e.g., Ram ne jamindar ka khet 
iota, or Ram ploughed the landlord's 
farm.') 
4) exploit (e.g., Ram ne naukar ko 
kaam me iota diya, or Ram exploited 
his servant by putting him to (hard) 
work.) 
For each of the four senses, a karaka 
chart can be created. A karaka chart 
specifies the mandatory karakas (i.e., 
which must be filled for the sentence to 
be grammatical), optional karakas, and 
desirable karakas. For each of the kara-- 
kas, it specifies the vibhakti (i.e., in- 
flection or post position marker), and 
the semantic specification (typically in 
the form of semantic type) to be satis- 
fied by the source word (group). Such a 
specification for a karaka in a karaka 
chart is called a karaka restriction. 
Thus, the karaka chart for the 'hitching' 
sense of 'iota' has two mandatory karaka 
restrictions: one for karta karaka (p\[.o- 
nounced kartaa kaarak) and the other for 
karma karaka (pronounced kaFm kaaz.ak). 
The former karaka relation maps to agent 
and the latter to patient semantic rela- 
tion. As shown in Fig. i, the restriction 
for karta karaka says that a source word 
group satisfying it must be present in 
the sentence, its vibhakti must be 0, and 
its semantic type should be human. 
restriction on karta karaka: 
karaka: karta 
mandatory: yes 
vibhakti: 0 
semantic expression: human 
restriction on karma karaka: 
karaka: karma 
mandatory: yes 
vibhakti: 0-or-ko 
semantic expression: cart 
Fig. I: Karaka Chart for Jota (Sense 2) 
3.2 Refining the Grammar Model 
The actual grammar we use in the sys- 
tem is based on the model discussed 
above. However, it differs from it 
slightly so as to have a faster parser. 
Instead of a sepal'ate kar~ka chart 
for each sense of a verb, we have a sin- 
gle merged karaka chart. It consists of a 
set of karaka restrictions where a res- 
triction for a particular karaka relation 
is obtained by taking the logical-or of 
the necessary vibhakti and semantic types 
for the same karaka relation in the dif- 
ferent karaka charts. For example, seman- 
tic type in restriction for karma kanaka 
for the merged karaka chart is obtained 
by taking logical-or of semantic types in 
karma karaka restrictions in the dif- 
ferent karaka charts. Fig. 2 shows the 
merged karaka chart for iota. 
Karaka Necessity Vibhakti Semantic Type 
....................................... 
karta m 0 animate 
karma m 0-ko ~ animate or 
instrumentor 
land 
karana d se-dvara animate Or' 
instrument 
Fig. 2: ~erged Karaka Chart for Jota { 
As 'the separate karaka charts are no 
longer available for distinguishing among 
the senses of the main verb, separate in- 
formation is needed. This information is 
available in the form of lakshan charts 
or discrimination nets. These nots can be 
obtained by looking at the separate kara- 
ka charts and identifying features that 
help us in distinguishing among the dif- 
ferent senses. An example lakshan chart 
for jota is given in Fig. 3. 
28 4 
by l;eskin£ the sour. re wor'd ~AFoups ag<tinsl; 
kar, aka \[`est;p\] cl;ioria irl I;1l~, iue~£ed \].'.~xFal<ct 
char`t, and then :\]o\]vJI'Ig l;h,> inl. e&¢,~ pl'o- 
g\[`&Iuming problem. ) The \4Ol- d Itl ".'O.ll \[ Ilg \[.S 
(lete,.mined on\]ff isher using I;he \] ~{kshan 
chat`k~ on the karaka a~signment. 
d . Conc\].usJonE1 
The major, features o{ oar- app.-oath 
can be summarized as fo\] Iow~,:;: 
1) a paz'sing el:tel:egg based on 'akanksha' 
(demand) and 'yogyata' (qualification 
of the ~ou\[-ce uni. t). Note that the 
k&t'aka charts exp\[`essing restFict.ions 
as above are similar to sub- 
cat egor. i z~{t ion and selectional r'es - 
t;f. tctiop_s, but are no1: identical to 
1hem. Sub-- cat; egor" i zat ion F~t: ~r's to 
deep cases, and selectJonal restFic ~ 
1ions uaual\].y specify semantic types. 
fief" e we use karaka relations, and 
~peci. fy not just semantic type~.~ but 
also post-,.position mar.kers~, it should, 
of course, be noted that: these ideas 
l)la~ a central ~-ole in our g\['ammat" and 
l>,%r s e\[`. 
2) a, parsing; strategy that uses lae~-gect 
Ear`eke chart to do ka\[`aka assignment, 
and only later does the sense su\]ec:- 
tion for' nouns and verbs us\].rlg \]aksh&n 
chat`t s . 
3) formulation of the core parsing pt-ob- 
iera as integer" pr.ogr.amming pr.eblem° It 
~hould be noted that integer` pt.ogram- 
tiling is a general purpose technique 
making a lat-ge amount of power and 
flexibi\].ity available to the parser`. 
This is at the cost of efficiency if 
JlJ%e number of var. iab\]es to be haqdled 
,qimultaneously is la\[`ge (though our 
cut`rent par.set-, funs fai\[`ly ~&sk) . IJe 
&re engaged in building a special con-- 
:itrainl; solve\[- that will use this 
I;,'oweF onl~ wh,~ll l,,~ct;s;5,:~r-'~/ \[Ramesh,PU\]. 
Acknowl ~;d {-~ em en t 
tJe zJ. aid \]ike to acknowledge the 
pr'ill¢:ipa.\] ,"3ouFce of ideas .i.n t:his paper: 
l)r'. Vineet (.'haitan~r~. 
The gr'ammar and the parser' described 
above are part of a machine t\[`anslation 
sysl;em for, Indian languages based on an 
inter\]ingua \[Sangal & Chaitnya, 87\]. Gen- 
er, ator in the system uses the same gram- 
mar. In principle, each of the stages of 
the parser .is r. eveFsed \[SenGupta, 89\]. 

References

\[l%ha\[.ati, Chai\[:a.nya & Sangal, 90\] A Com--. 
putat:\].onal Grammar for. Indian. 

Language P:cocessing, A. Bharati, Vo 
Chai. tanya, &rid R. Sangal , Technical 
Report TRCS-90-96, Dept. of Computer 
So. & Engg. , I. I .T. Kanpu\[`, 1990. 

\[KJpar.sky,32 \] Some Theor'etical Problems 
in Pi, nini's Grammar, P. giparsky, 
Bhandarkar. Oriental Research Insti- 
tute, Pane, 1982. 

\[Ramesh, 9£I\] Constraints in Logic Pro- 
tramming, P .V. Ramesh, H.Tech. 
thesis, Dept. of Computer Sc. & 
Engg. , I.I.T. Kanpur, Fiat'. 1990. 

\[Sangal & Chaitanya, 87\] An Inter-mediate 
Langu~..ge for Fiachine T\[`anslation: An 
Approach based on Sanskrit using 
Conceptual Graph Notation, Computer 
Science & InfoFmatics, J, of Comput- 
er Society of India, I7, I, pp. 9- 
21 , 1987. 

\[Sangal, Chaitanga & Karnick, 88\] An Ap- 
proach to Fi&chine Tt-anslation in In- 
dian Languages, Proc. of Indo-US 
LJorkshop on Systems and Signal Pro- 
cessing, Indian Institute of Sci- 
ence, Bangalore, Jan. 1988. 

\[Sen Gupta, 89\] Some Aspects of Language 
Generation, Rimli Sen Gupta, H.Tech. 
thesis, Dept. of Electrical Engg, 
I.I.T. Kanpur, 1989. 
