PARSING TURKISH USING THE LEXICAL FUNCTIONAL GRAMMAR FORMALISM 1 
Zelal Giing6rdii Kemal Oflazer 
Centre for Cognitive Science 
University of Edinburgh 
Edinburgh, Scotland, U.K. 
gungordu @cogsci.ed.ac.uk 
Department of Computer Engineering 
Bilkent University 
Bilkent, Ankara, Turkey 
ko @cs.bilkent.edu.tr 
Abstract This paper describes our work on parsing Turk- 
ish using the lexical-functional grammar formalism. This 
work represents the first effort for parsing Turkish. Our 
implementation is based on Tomita's parser developed at 
Carnegie-Mellon University Center for Machine Transla- 
tion. The grammar covers a substantial subset of Turkish 
including simple and complex sentences, and deals with a 
reasonable amount of word order freeness. The complex 
agglutinative morphology of Turkish lexical structures is 
handled using a separate two-level morphological analyzer. 
After a discussion of key relevant issues regarding Turkish 
grammar, we discuss aspects of our system and present re- 
sults fi'om our implementatiou. Our initial results suggest 
that our system can parse about 82% of the sentences directly 
and almost all the remaining with very minor pre-editing. 
1 INTRODUCTION 
As part of our ongoing work on the development of compu- 
tational resources for natural language processing in Turk- 
ish we have undertakeu the development of a parser for 
Turkish using the lexical-functional grammar formalism, 
for use in a mtmber of applications. This work represents 
the first approach to the computational analysis of Turk- 
ish, though there have been a number of studies of Turkish 
syntax fi'om a linguistic perspective (e.g., \[Meskill 1970\]). 
Our implementation is based on Tomita's parser developed 
at Carnegie-Mellon University Center for Machine Transla- 
tion \[Musha et.al. 1988, Tomita 1987\]. Our grammar cov- 
ers a substantial subset of Turkish including simple and 
complex sentences, and deals with a reasonable amount of 
word order freeness. 
Turkish has two characteristics that have to be taken into 
account: agglutinative morphology, and rather ti'ee word 
orde r with explicit case marking. We handle the rather com- 
plex agglutinative morphology of the Turkish lexical struc- 
tures using a separate morphological processor based on 
the two-level paradigm \[Evans 1990, Otlazer 1993\] that we 
have integrated with the lexical-functional grammar parser. 
Word order freeness is dealt with by relaxing the order of 
phrases in the phrase structure parts of lexical-functional 
gramm.ai" rule by means of generalized phrases. 
IThis work was done as a part of the first author's M.Sc. degree work 
at the Department of Computer Engineering of Iqilkent University, Ankara, 
06533 Turkey. 
2 LEXICAL-FUNCTIONAL GRAMMAR 
Lexical-functional gramnmr (LFG) is a linguistic theory 
which fits nicely into computational approaches that use 
unification IShieber 1986\]. A lexical-functional grammar 
assigns two levels of syntactic description to every sen- 
tence of a language: a constituent structure and a functional 
structure. Constituent structures (c-structures) characterize 
the phrase structure configurations as a conventional phrase 
structure tree, while surface grammatical functions such as 
subject, object, and adjuncts are represented in functional 
structure (f-structure), Because of space limitations we will 
not go into the details of the theory. One can refer to Kaplan 
and Bresnan \[Kaplan and Bresnan 1982\] for a thorough dis- 
cussion of the LFG formalism. 
3 TURKISH SYNTAX 
In this section, we would like to highlight two of the rele- 
vant key issues in Turkish grammar, namely highly inflected 
agglutinative morphology and free word order, and give a 
description of the structural classification of Turkish sen- 
tences that we deal with. 
3.1 Morphology 
Turkish is an agglutinative language with word structures 
formed by productive affixations of derivational and inflec- 
tional suffixes to root words \[Ottazer 1993\]. This extensive 
use of suffixes causes morphological parsing of words to be 
rather complicated, and results iu ambiguous lexical inter- 
pretations in many cases. For example: 
(1) ~ocuklarl 
a. child+PLU+3SG-POSS his children 
b. child+3PL-POSS their child 
c. child+PLU+3PL-POSS their children 
d. child+PLU+ACC children (acc.) 
Such ambiguity can sometimes be resolved at phrase 
and sentence levels by the help of agreement requireuletlts 
though this is not always possible: 
(2a) Onlarm qocuklarl geldiler. 
it+PLU+GEN child+PLU cone+PAST 
+3PL-POSS +3PL 
(Their children came.) 
494 
Table 1: Percentage of different word orders in Turkish. 
Sentence C~n - Adult 
_ Type Speecl~Sl2eech J 
osv I 
ow 
L I O% O% vos 1 .... 
l)% 1 ..... = 
(2b) (..'ocukla n geldilel; 
child+PLU+3SG-POSS come+PAST+3PL 
(Ills children came.) 
child+PLU+31)I:POSS come+PAST+3PI. 
(Their children came.) 
For example, in (2a) only the interpretation (l c) (i.e., :heir 
et,ihlren) is possible because: 
• the agreement requirement between lhe modifier and 
the modified parts in a possessive compound norm 
eliminates (la). ~ 
,, the facts that gel (come) does not subcategorize for 
an accusative marked direct object, and that in Turkish 
the subject of a sentence must be nominative 3 elintinate 
lid). 
• the agreement requirement between the subject and the 
verb of a sentence eliminates lib). 4 
In (2b), both (l a) and (l c) are possible (his children, and 
their children, respectively) because the moditier of the pos- 
sessive compound noun is It covert one: it may be either 
onun (his) or onlartn (their). The other two interpretations 
are eliminated due to the same reasons as in (2a). 
3.2 Word Order 
If we concern ourselves with the typical order o\[ con- 
stituents, Turkish can be characterized as being a subject - 
object-verb (SOV) language, though the data in Table 1 
fiom Erguwmh \[Erguwmh I979\], shows that other orders 
for constituents are also common (especially in discourse). 
In Turkish it is not the position, but the case of a noun 
phrase that determines its grammatical function in the sei1- 
tence. Consequently typical order of the constituents may 
change rather freely without affecting the grammaticality 
of a sentence. Due to various syntactic and pragmatic 
constraints, sentences with the non-typical orders are not 
2The agreement of the modilier milSt be tim sanle as the Imssessivc ol 
the moditied wilh the exception that if the modilicr is third llerson phn'al 
the imssessive of the modilied may be third person singular. 
31u Turkish, the nominative caste is IlUltl~lrkctl. 
4In a "turkish sentence, person leattu'es of Ihe subject and the veil) 
sbould be the sitllle. This is true also lot tim nunlber t)atures with one 
exception: third person plural subjects may sometimes take third person 
sillglllllr verbs. 
stylistic wu'iants or the typical versions which can be used 
interehange:tbly in any c(mtexl \[l~rguvanh 1979\]. For ex-. 
an@e, a constituenl lhat is to be emphasized is generally 
placed immediately before the verb. This affects the places 
of all the constittmnts in a sentence except that of the verb: 
(3a) Ben ~,'ocu~a kilabt verdim. 
1 chiki+l)AT book+ACe give+PAST 
t.ISG 
(1 gave the book to the child.) 
(3b) (:ocu~a kitMu l}el! vei'dinL 
chikl+l)AT book+ACC 1 givc+l%ST 
+ 1SG 
(It wits me who gave the child the book.) 
(3c) Ben kitahl qocu~a verdim. 
l book+ACe child+l)AT give+l%ST 
+ t SG 
(It was file child to whom l gave tim book.) 
(3a) is an example of tim typk:al word order whereas in 
(31)) the subject, ben, is eml)hasized. Similarly, in (3c) the 
indirect object, ('oeu,@, is eml)hasized. 
In addition to these i)ossihle changes, the verb itself tnay 
move away from its lypical place, i.e., the end or Ihe sen- 
{CIICC. ~tlch sga\[etlces alc called inverted .~'gnlences ;I1KI are 
typically used in informal prose and discourse. 
llowew:r, this looseness or ordering collstr.'lilltS at sen 
tence level does not extc.nd into all syntactic levels. There 
are even COltStfilil/tS at sentence level: 
• A nominative direct object should be placed immediately 
before the verb. 5 llence, (51)) is ungramlnatical: 6 
(5a) Ben q'oeu~a Mtap vet(lira. 
1 child+l)Nl' book give+PAST+IS(; 
(I gave a bool,: to tim child.) 
(51)) *(;oeu~a Idta I) ben verdim. 
child+l)A'F book l give+PAST+l St; 
,, Some adverbial COml)lements or quality (those that are 
actually qualitative adjectives) always p,ecede the verb or, 
if it exists, tile indetinite direct object: 
(6a) Yeme~,i iyi i)i~/ir(lin. 
ntcal+A('C good co(>k-l-l)AS'\['+2S(l 
(You cooked tile ineal well.) 
(6h) iyi yeme~i pi~irdin. 
good ineal+AC(~ cook-IPAST+2S(; 
(You cooked the good meal.) 
(6c) iyi yemek l)i~iirdin. 
good meal cook+PAST+2SG 
(You cooked a good meal./You cooked a meal well.) 
Note th'tt although (61)) is L, ramnmtical iyi is no more an 
adverbial complentent, bill is an adjective that modities 
yeme~,i. Note also that (6c) is ambiguous: iyi can be in- 
terpreted either as an adjective modifying yemek or as an 
5In Turkish, a lransilive vmb that subcategolizes lor a direct object can 
take eilher fill acctlsalive marked t)r a IlOIIlillatiVC Ill/Irked (tllllllIIl'ketl Oll 
\[\]lC Slll'ftlce) I1OUll phi'lisle for l~lli\[ objecl, The IllllC\[it)ll of accusativo case 
marking isl to indicale Ihat the uh.ject tctkrs tu a Imtticular dclinilc enlity, 
though Ihere are very rate cases where Ihis is ni~t die case. 
6NI)Ie Ihat (3b,c) are grammatical sliuce the direct c, bieel kimbt, is 
1711liked aectlsalive. 
49.5 
adverb modifying pi~virdin.7 
3.3 ' Structural Classification of Sentences 
The following summarizes the major classes of sentences in 
Turkish. 
,Simple Sentences: A simple sentence contains only one 
independent judgement. The sentences in 12), (3), (4a), 
(5a), and (6) are all examples of simple sentences. 
,Complex Sentences : In Turkish, a sentence can be trans- 
formed into a construction with a verbal notttt, a participle 
or a gerund by affixing certain suffixes to the verb of the 
sentence. Complex sentences are those that include such 
dependent (subordinate) clauses as their constituents, or as 
modifiers of their constituents. Dependent clauses may 
themselves contain other dependent clauses. So, we may 
have embedded structures such as: 
(7) Burada su 
here+LOC water 
bulamayaca~iml 
find+NEG-POT 
+FUT+PART 
+ISG-POSS 
+ACC 
olmazdL 
be+NEG+AOR 
+PAST+3SG 
iqilebilecek 
drink+PASS+POT 
+FUT+PART 
zannetmek do~,ru 
think+lNF right 
(It wouldn't have been right for me to think that I wouldn't 
be able to find drinkable water here.) 
The subject of (7) (burada i?ilebilecek su bulamay- 
acafi, tmt zannetmek - to think that I wouldn't be able to find 
drinkable water here) is a nominal dependent clause whose 
definite object (burada ifilebilecek su bulamayaca~mtt - 
that I wouldn't be able to find drinkable water here) is an 
adjectival dependent clause which acts as a nominal one. 
The indefinite object of this defnite object (ifilebilecek su 
-drinkable water) is a conlpound noun whose nlodifier 
part is another adjectival dependent clause (ifilebilecek - 
drinkable), and modified part is a noun (su - water). 
It should be noted that there are other types of sentences 
in the classification according to structure, ttowever, we 
will not be concerned with them here because of space 
limitations. (See $im~ek \[$imsek 1987\], and Gting(~rdft 
\[GfingOrdi~ 1993\] for details.) 
4 SYSTEM ARCHITECTURE AND IM- 
PLEMENTATION 
We have implemented our parser in the grammar develop- 
meat environment of the Generalized LR Parser/Compiler 
developed at Carnegie Mellon University Center for Ma- 
chine Translation. No attempt has been made to include 
7The second interpretation is possible since yemek is an indefinite direct 
object. 
Input Sentence f-structure (s) 
TURKISH LFG PARSER 1 
t all argument fi-stnic- / morphological " | structure ture(s) 
analyses / Sentence with I 
| Morphological 
/ and 
, Parser Lexicon with 
I I Turkish LFG 
I J 17A% 
Figure 1: The system architecture. 
morphological rules as the parser lets us incorporate onr 
own morphological analyzer for wbich we use a full scale 
two-level specification of Turkish morphology based on a 
lexicon of about 24,000 root words\[Oflazer 1993\]. This 
lexicon is nminly used for morpbological analysis .'rod has 
limited additional syntactic and semantic information, and 
is augmented with an argument structure database. 8 
Figure 1 shows the architecture of our system. When 
a sentence is given as input to tbe program, the program 
first calls the morphological analyzer lot- each word in the 
sentence, and keeps the results of these calls in a list to 
be used later by the parser." If the tnorpt'~ological atmlyzer 
fails to return a structure for a word for any reason (e.g., 
the lexicon may lack the word or the word may be mis- 
spelled), the program returns with an error message. After 
the morphological attalysis is completed, the parser is in- 
voked to check whether the sentence is granmmtical. The 
parser performs bottom-up parsing. During this analysis, 
whenever it consumes a new word from {he sentence, it 
picks lip the morphological structttrc of this word from the 
list. If the word is a finite verb or an intinitiwtl, the parser is 
also provided with the subcategorizatiou frante o1' the word, 
At the end of the analysis, if the sentence is grammatical, 
its f-structure is output by the parser. 
8The morphological mudyzer returns a list nfJkature-vahw pairs. For 
instance forlhe ward evdekilerin (of those (things) in the house/your things 
in the house) it will relorll 
I. ((*CAT* N) (*R* "ev") (*CASE* LOC) (*CONV* ADJ 
"ki") (*AGE* 3PL) (*CASE* GEN) ) 
2. ((*CAT* N) (*R* "ev") (*CASE* LOC) (*CONV* ADJ 
"ki") (*AGR* 3PL) (*POSS* 2SG)) 
9Recall that tllcre may be a number of morl)hologieally alnbiguous 
interflrclalic, ns uf a word. In such a case, die nlorphological analyzer 
returns all of \[lie possible nlorllhological strilctllres ill a list, lind tile parser 
takes care of the ambiguity regarding the gramnmr rules. 
496 
Table 2: The number of rules lbr each category in the gram- 
mar. 
Category Number of Rules 
Noun phrases 
Adjectiwfl phrases 
Postpositional phrases 
Adverbial constructs 
Verb phrases 
Dependent clauses 
Sentences 
Lexical look up rules 
17 
10 
24 
50 
21 
14 
6 
11 
TOTAL 153 
5 TItE GRAMMAR 
In this section, we present an overview of the LI'~(I spec- 
itication that we have developed for Turkish syntax. Our 
grammar inchldes rules for sentences, dependent clauses, 
noun phrases, adjectival phrases, postpositional phraxes, 
adverbial constructs, verb phrases, and a number of h:~:i- 
cal look up rules. I° "lable 2 presents the number of rules 
for each category in the grammar. There are also some 
intermediary rules, not shown here. 
Recall that the typical order of constituents in a sentence 
may change due to a number of reasons. Since the order of 
phrases is tixed in the phrase structure component of an LFG 
rule, tiffs rather free nature of word order in sentence level 
constitutes a major problem. In order to keep fi'om using a 
number of redundant rules we adopt tbe following strategy 
in our rules: We use the same place bolder, <XP>, for all 
the syntactic categories in the phrase structure component 
of a sentence or a dependent chmse rule, and check the 
categories of these phrases in the eqtmtions part of the rule. 
In Figure 2, we give a granmmr rule for the sentence with two 
constituents, with an informal description of the equatkm 
part.~ 
Recall also that an indefinite object shouk\[ be placed im- 
mediately before tile verb, :md some adverbial complenmnts 
of quality (those that are actually qualitative adjectives) al- 
ways precede tile verb or, if it exists, the indefinite direct 
object. In our grammar, we treat such objects and adverbial 
complements as parts of the verb phrase. So, we do not 
check these constraints at the sentence or depeudeut clause 
level. 
6 PERFORMANCE EVALUATION 
In this section, we present some results about the lmrfor - 
mance of our system on test runs with four difl'erent texts on 
different topics. All of the texts are articles taken from mag- 
azines. We used the CMU Common l,isp system running 
I°Recall that no morphological rules are included. The lcxical look up 
rules are used jttsl 1o call the morphological analyzer. 
I I Note that Jr0, x\], al|d x2 refer to tile functional S\[ltlC\[lllCg (if lhc sen- 
tence, the lirst collstittlellt and the second eonsliltlent ill tile phrase strUCUlrc, 
m.~peclively. 
(<S> <==> (<XP> <XP>) 
i) if xl's category :is VP then 
assign xl to tile functional structure 
of the verb of the sentence 
if x2's category is VP then 
assign x2 to tile functional structure 
of the verb of the sentence 
2) for i = 1 to 2 do 
if xi has already been assigned to 
the verb then do nothing 
if xi's category is ADVP then 
add xi to the adverbial complements 
of the sentence 
if xi's category is NP and 
xi's case is nominative then 
assign xi to tile functional struct- 
ure of tile subject of the sentence 
if xi's category is NP then 
if tile verb of the sentence can take 
an object with this case (consider 
also the voice of the verb) 
add xi to the objects of the verb 
3) check if the verb has taken all the 
objects that it has to take 
4) make sure that the verb has not 
taken more than one object with 
the same thematic role 
5) check if the subject and the verb 
agree in number and person: 
if the subject; is defined (overt) 
then 
\]f tile agreement feature of tile 
subject is third person plural 
then the agreement feature of the 
verb may be either third person 
singular or third person plural 
else 
the agreement features of tile 
subject and the verb must be 
the same 
else if the subject is undefined 
(covert) then assign the 
agreement feature of the verb 
to that of the subject 
Iqgure 2: An LFG rule for tile seutence level given with an 
informal description of tile equation part. 
497 
Table 3: Statistical infortrtation about the test runs. 
#S #S #S #P Secs 
#S in ign. after pet" per 
Scope Pre-ed. Sent. Sent. 
43 30 0 55 4.28 12.26 
51 41 2 62 5.02 8.92 
56 48 l 64 4.87 10.28 
80 70 0 97 3.25 7.46 
230 189 3 279 - - 
100% 82% 
#S: Number of sentences, #P: Number of parses. 
in a Unix environment, on SUN Sp,'trcstations at Center for 
Cognitive Science, University ¢3f Edinburgh. I'e 
In all of the texts there were some sentences outside our 
scope. These were: 
,, sentences that contain finite sentences as their con- 
stituents or modifiers of their constituents, 
,, conditional sentences, 
,, finite sentences that are connected by coordinators 
(and/or), and 
,, sentences with discontinuous constituents. 13 
We pre-edited the texts so that the sentences were in 
our scope (e.g., separated finite sentences connected by co- 
ordinators and parsed them as independent sentences, and 
ignored the conditional sentences). Table 3 presents SOmE 
statistical information about the test runs. The first, sec- 
ond and third columns show the document number, the total 
number of sentences and the number of sentences that WE 
could parse without pre-editing, respectively. The other 
columns show the number of sentences that we totally ig- 
nored, the nnmber of sentences in the pre-edited versions of 
the documents, average nnmber of parses per sentence gen- 
erated and average CPU time for each of the sentences in the 
texts, respectively. It can be seen that our grammar can suc- 
cessfully deal with about 82% of the sentences that we have 
experimented with, with almost all remaining sentences be- 
coming passable after a minor pre-editing. This indicates 
that our grammar coverage is reasonably satisfactory. 
Below, we present the output for a sentence which shows 
very nicely where the structural ambigttity comes out in 
Turkish. 14 The output tbr (Sat indicates tlmt there are \]'out" 
12We should however note that the times reported are exclusive o1' 
tile time taken by the morphologicul proeessm, which with a 24,000 
word root lexicon is rather slow and can process about 2-3 lexical 
forms per second. We have, however, ported our morphological ana- 
lyzer to the XEI~.OX TWOL system developed by Karttunen and P, eesley 
\[Karttunen and Beesley 1992\] and this system can process about 500 forms 
a second. We intend to integrate this to our system soon. 
13Word order freeness in Turkish allows various kinds of discontinuous 
consfituents,e.g., an adverbial adjunct cutting in tile middle of a compound 
noun. 
14This example is not in any of the texts mentioned above. It is taken 
from the first author's tbesis \[Giinge, rdii 1993\]. 
ambiguous interpretations for this sentence as indicated in 
(8h-e): 15 
(Sat Kiiqiik ktrmlzl top gittikge InzlandL 
little red ball go+GER speed up 
red paint+ gradually +PAST 
3SG-POSS +3SG 
(8b) The little red ball gradually sped up. 
(8c) The little red (one) sped up as the ball went. 
(Sd) The little (one) sped up as the red ball went. 
(Set It sped up as the little red ball went. 
The outpttt of the parser for the first interpretation is 
given in Figure 3. This output indicates that the subject of 
the sentence is a noun phrase whose modifier part is kfifffk, 
mtd modified part is another noun phrase whose modilier 
part is ktrmtzt and modified part is top. The agreement 
o1' the subject is third person singuhr, case is nominative, 
etc. Htzlandt is the verb of the sentence, and its voice is 
active, tense is past, agreement is third person singular, etc. 
Gittikf'e is a temporal adverbial complement. 
Figures 4 through 7 illustrate the c-strnctures of the Ibm" 
ambiguous intetpret~tions (8b-e), respectively: 16 
In (Sb), the adjective ktrmtzt modilies the noun top, 
and this noun phrase is then modified by the adjective 
kfifiik. The entire noun phrase ftmctious as the sub- 
ject of the main verb htzlandt, and the gerund gittikfe 
functions as an adverbial adjunct of the main verb. 
in (8c), the adjective ktrnuzt is used as a noun, and is 
ntodified by the adjective kfff'iik. Iv This noun phrase 
functions as the snbject of the main verb. The noun 
top functions as the subject ()f the gertmd gittikf.e, and 
this non-finite clause functions as an adverbial adjunct 
of the main verb. 
In (Sd), the adjective kiigtik is used as a noun, and 
functions as the subject of the main verb. The noun 
phrase ktrnttzt top functions as the subject of the gerund 
gittikfe, and this non-finite clause functions as an ad- 
verbial adjunct of the main verb. 
In (SEt, the noun phrase kffciik ktrmtzt top functions 
as the subject of the gcrund gittikge (of. (Sb) where 
it functions as the subject of the main verb), and this 
non-finite clause functions as an adverbial adjunct of 
the main verb. Note that the subject of the main verb 
in this interpretation (i.e., it) is a COvErt one. llence, it 
does not appear in the c-structure shown in Figure 7. 
is In fact, this sentence has a fifth interpretation due to the lexical ambi- 
guity of the second word. Ill Turkish, ktrnaz is Ihe name nfa shining, red 
paint ohtained fi'um ;1ii insect with the same name. So, (g'~) also nlemls 'llis 
little ted/mint sped up tlS the hall went.' 1 lowever, this is very unlikely to 
COllie tO Illiod ev{~D for illdive spe~lkel's. 
IWl'he e-seructures given here ;ire slmplifled by removing some nodes 
introduced by certain intermediary rules to increase readability. 
17 IT1 Turkish. lilly adjective can be used as a ilourL 
498 
• **** ambiguity I *** 
({suBJ 
((*AGR* 3SG) (*CASE* NOM) 
(*DEF* -) 
(*CAT* NP) 
(MODIFIED 
((*CAT* NP) 
(MODIFIER 
((*CASE* NOM) (*AGR* 3SG) 
(*LEX* "kl~mlzI") 
(*CAT* ADJ) 
(*R* "kIrmIzl"))) 
(MODIFIED 
((*CAT* N) (*CASE* NOM) 
(*AGR* 3SG) 
{*LEX* "top") 
(*R* "tOp"))) 
(*AGR* 3SG) 
(*CASE* NOM) 
(*LEX* "tOp") 
{*DEF* -))) 
(MODIFIER 
((*SUB* QUAL) (*CASE* NOM) 
(*AGR* 3SG) 
(*LEX* "kUCUk"))) 
) 
(VERB 
({*TYPE* VERBAL) (*VOICE* ACT) 
(*LEX* "hIzlandI '') 
(*CAT* V) 
(*R* "hIzlan") 
(*ASPECT* PAST) 
(*AGR* 3SG))} 
(ADVCOMPLEMENTS 
((*SUB* TEMP) (*LEX* "gittikce") 
(*CAT* ADVP) 
(*CONV* 
((*WITII-SUFFIX* "dikce") 
(*CAT* V) 
(*R* "git")))))) 
Figure 3: Output of the parser l{)l" the first the ambigtmus 
interpretation of (Sa) (i.e., (Sb)). 
S 
NP ADVP VP 
G 1 ~.R V ADJ NP 
kiifiik ADJ N gittikfe htzlamh 
I I ktrmtzt top 
Figure 4: C-structure for (Sb). 
S 
NP 
AI)J N 
I I kiifiik knmtzt 
AI)VP VP 
NP GEl{ V 
I I I N gittikfe Inzlan& 
I 
top 
Figure 5: C-structure for (Re). 
S 
NP AI)VP VP 
N NP GER V 
I /'~ I I kiifiik 
ADJ N gittikfe luzlamh 
I J ktrmlzl lop 
Figure 6: C-structure for (8d). 
S 
AI)VP VP 
I 
NP GER I 
AI)J N P gittikfe 
kiifhk ADJ N 
I I ktrnttzt lop 
Figure 7: C-structure for (8e). 
499 
7 CONCLUSIONS AND SUGGESTIONS 
We have presented a summary and highlights of our cur- 
rent work on providing an LFG specilication for Turkish 
syntax. To the best of our knowledge this is the tirst such 
effort for constructing a computational grammar for Tnrk- 
ish. Our domain includes structurally simple and complex 
Turkish sentences. The rather complex morphological anal- 
yses of agglutinative words structures of Turkish are han- 
dled by a full-scale two-level morphological specitication 
implemented in PC-KIMMO. 
We have number of directions for improving our grammar 
and parser: 
• Turkish is very rich in terms of adverbial constructs. 
We handle a great deal of these constructs by using a 
large number of rules. We are now in the process of 
developing a tagger with a multi-word construct rec- 
ognizer to preprocess the text so that many multi-word 
and idiomatic constructs can he handled outside the 
grammar. In this way, multi-word constructs such as 
yapar yapmaz (do+AOR+3SG do+NEG+AOR+3SG) 
(as soon as (one) does (that)) where both lexical cat- 
egories are verbal but the coml~ound construct is an 
adverb, can be handled, so can idiomatic constructs 
like yant stra (side+3SG-POSS row) (besides) where 
the flmction and semantics of the multi-word construct 
has nothing to do with the function and semantics of 
the constituent lexical tbrms. 
• We are currently working on extending the subset of 
sentences dealt with in respect of structure. 
• We are currently working on augmenting our lexicon 
with substantial lexical information and selectional re- 
striction information to be used with an integrated on- 
tological database. 
8 ACKNOWLEDGEMENTS 
We would like to thank Carnegie-Mellon University, Cen- 
ter for Machine Translation for making awfilablc to us their 
LFG parsing system. We would also like to thank Elisa- 
bet Engdahl and Matt Crocker of the Centre for Cognitive 
Science, University of Edinbnrgh, for providing wduable 
comments and suggestions. This work was done as a part 
of a large scale NLP project (TU-LANGUAGE) which is 
funded by a NATO Grant under the Science lbr Stability 
Program. 
References 
\[Antworth 1990\] E. L. Antworth, PC-KIMMO: A 7~vo- 
level Processor for Morphological Analysis. 
Summer Institute of Linguistics, 1990. 
\[Erguvanh 1979\] E. E. Ergt, vanh,. The Function of Word 
Order in Turkish Grammar. PhD thesis, De- 
partment of Linguistics, Unive,sity of Call for 
nia, Los Angeles, 1979. 
\[Giing;Sx-di.i 1993\] Z. GiJng~3rdfi, "A lexical-functional 
grammar lbr Tnrkish," M.Sc. thesis, De- 
partment of Computer Engineering and infor- 
mation Sciences, Bilkent University, Ankara, 
Turkey, July 1993. 
\[Kaplan and Bresnan 1982\] R. Kaplan and J. Bresnan, The 
Mental Representation of Grammatical Rela- 
tions, chapter Lexical-Fnnctional Grammar: 
A Fornud System for G,'ammatical Represen- 
tation, pp. 173-281. MIT Press, t982. 
\[KarttunenandBeesley 1992\] L. Karttunen and K. R. 
Beesley,. "Two-level rule compiler,". Techni- 
cal Report, XEROX Palo Alto Research Ccq- 
ter, 1992. 
\[Mcskill 1970\] R. H. Meskill, A TranaformationalAnalysis 
of Turkish Syntax. Mouton, The I lagne, Paris, 
1970. 
lMusha et.al. 19881 H. Musha T. Mitamura and M. Kee,. 
The General&ed LR Parser/Compiler Velwion 
8.I: User's Guide. Carnegie-Mellon Univer- 
sity - Center for Machine Translation, April 
1988. 
\[Ollazer 1993\] K. Oflazer, "Two-level description of Turk- 
ish morphology," in Proceedings of the Sixth 
Conference of the European Chapter oat" the 
Association Jor Computational Linguistics, 
April 1993. A full version is to appear in Lit- 
eraty and Lingtdstic Computing, Vol.9 No.2, 
199d. 
\[Shieber 1986\] S.M. Shieber, An Introduction to 
Unificatiot,-Based Approaches to Grammar. 
CSLI-Lecture Notes 4, 1986. 
\[~imsek 1987l R. $im~ek, Orneklerle Ttirkf'e Si~edieimi 
(Turkish Syntax with E.~tmplex). Kuzey Mat- 
haamhk, 1987. 
\[Tomita L987\] M. Ton/ira, "An efficient augmented- 
context+free parsing algorithm," C'ompu/a- 
tionol Linguistics, vol. 13, 1-2, pp. 31-46, 
January-June 1987. 
500 
