Integrating Morphology with Multi-word Expression Processing in Turkish
Kemal Oflazer and  zlem ˙etino glu
Human Language and Speech Technology Laboratory
Sabanc University
Istanbul, Turkey
{oflazer,ozlemc}@sabanciuniv.edu
Bilge Say
Informatics Institute
Middle East Technical University
Ankara, Turkey
bsay@ii.metu.edu.tr
Abstract
This paper describes a multi-word expression pro-
cessor for preprocessing Turkish text for various
language engineering applications. In addition to
the fairly standard set of lexicalized collocations
and multi-word expressions such as named-entities,
Turkish uses a quite wide range of semi-lexicalized
and non-lexicalized collocations. After an overview
of relevant aspects of Turkish, we present a descrip-
tion of the multi-word expressions we handle. We
then summarize the computational setting in which
we employ a series of components for tokenization,
morphological analysis, and multi-word expression
extraction. We  nally present results from runs over
a large corpus and a small gold-standard corpus.
1 Introduction
Multi-word expression extraction is an important
component in language processing that aims to
identify segments of input text where the syntactic
structure and the semantics of a sequence of words
(possibly not contiguous) are usually not composi-
tional. Idiomatic forms, support verbs, verbs with
speci c particle or pre/post-position uses, morpho-
logical derivations via partial or full word duplica-
tions are some examples of multi-word expressions.
Further, expressions such as time-date expressions
or proper nouns which can be described with sim-
ple (usually  nite state) grammars, and whose inter-
nal structure is of no real importance to the overall
analysis of the sentence, can also be considered un-
der this heading. Marking multi-word expressions
in text usually reduces (though not signi cantly)
the number of actual tokens that further processing
modules use as input, although this reduction may
depend on the domain the text comes from. It can
also reduce the multiplicative ambiguity as morpho-
logical interpretations of tokens are reduced when
they are coalesced into multi-word expressions with
usually a single interpretation.
Turkish presents some interesting issues for multi-
word expression processing as it makes substan-
tial use of support verbs with lexicalized direct or
oblique objects subject to various morphological
constraints. It also uses partial and full reduplica-
tion of forms of various parts-of-speech, across their
whole domain to form what we call non-lexicalized
collocations, where it is the duplication and contrast
of certain morphological patterns that signal a col-
location rather than the speci c root words used.
In this paper, we describe a multi-word expression
processor for preprocessing Turkish text for vari-
ous language engineering applications. In the next
section after a very short overview of relevant as-
pects of Turkish, we present a rather comprehen-
sive description of the multi-word expressions we
handle. We then summarize the structure of the
multi-word expression processor which employs a
series of components for tokenization, morpholog-
ical analysis, conservative non-statistical morpho-
logical disambiguation, and multi-word expression
extraction. We  nally present results from runs over
a large corpus and a small gold-standard corpus.
1.1 Related Work
Recent work on multi-word expression extraction,
use three basic approaches: statistical, rule-based,
and hybrid. Statistical approaches require a corpus
that contains signi cant numbers of occurrences of
multi-word expressions. But even if the corpus con-
sists of millions of words, usually, the frequencies
of multi-word expressions are too low for statisti-
cal extraction. Baldwin and Villavicencio (2002)
indicate that  two-thirds of verb-particle construc-
tions occur at most three times in the overall corpus,
meaning that any extraction method must be able
to handle extremely sparse data. They use a rule-
based method to extract multi-word expressions in
the form of a head verb and a single obligatory
preposition employing a tagger augmented with an
Second ACL Workshop on Multiword Expressions: Integrating Processing, July 2004, pp. 64-71
existing chunking system with which they  rst iden-
tify the particle chunked and then turn back for the
verb part of the construction.
Piao et al. (2003) employ their semantic  eld an-
notator USAS, containing 37,000 words and a tem-
plate list of 16,000 multi-word units, all constructed
manually from various resources, in order to extract
multi-word expressions. The evaluation indicates a
high precision (over 90%) but the estimated recall is
about 40%. Deeper investigation on the corpus has
indicated that two-thirds of the multi-word expres-
sions occur in the corpus once or twice, verifying
the fact that the statistical methods  ltering low fre-
quencies would fail.
Urizar et al. (2000) describe a Basque terminol-
ogy extraction system which covered multi-word
term extraction as a subset. As Basque is a highly
in ected agglutinative language like Turkish, mor-
phological information is exploited to better de ne
multi-word patterns. Their lemmatizer/tagger EU-
SLEM, consists of a tokenizer followed by two sub-
systems for the treatment of single word and multi-
word expressions, and a disambiguator. The pro-
posed term extraction tool uses the tagged input as
the input of a shallow parsing phase which consists
of regular expressions representing morphosyntac-
tic patterns. The  nal step uses statistical measures
to eliminate incorrect candidates.
The basic disadvantages of rule-based approaches
are that they usually lack  exibility, and it is a
time-consuming and never ending process to try to
cover a high percentage of the multi-word expres-
sions in a language with rules and prede ned lists.
The LINGO group which de nes multi-word ex-
pressions as  a pain in the neck for NLP (Sag et al.,
2002), suggests hybrid approaches using rule based
approaches to identify possible multi-word expres-
sions out of a corpus and using statistical methods
to enhance the results obtained.
2 Multi-word expressions in Turkish
Turkish is an Ural-Altaic language, having aggluti-
native word structures with productive in ectional
and derivational processes. Most derivational phe-
nomena take place within a word form, but there are
certain derivations involving partial or full redupli-
cations that are best considered under the notion of
multi-word expressions.
Turkish word forms consist of morphemes concate-
nated to a root morpheme or to other morphemes,
much like beads on a string. Except for a very
few exceptional cases, the surface realizations of
the morphemes are conditioned by various morpho-
phonemic processes such as vowel harmony, vowel
and consonant elisions. The morphotactics of word
forms can be quite complex when multiple deriva-
tions are involved. For instance, the derived mod-
i er sa glamla‚st rd  g m zdaki1 would be
represented as:2
sa glam+Adj
 DB+Verb+Become
 DB+Verb+Caus+Pos
 DB+Adj+PastPart+P1sg
 DB+Noun+Zero+A3sg+Pnon+Loc
 DB+Adj
This word starts out with an adjective root and af-
ter  ve derivations, ends up with the  nal part-of-
speech adjective which determines its role in the
sentence.
Turkish employs multi-word expressions in essen-
tially four different forms:
1. Lexicalized Collocations where all compo-
nents of the collocations are  xed,
2. Semi-lexicalized Collocations where some
components of the collocation are  xed and
some can vary via in ectional and derivational
morphology processes and the (lexical) seman-
tics of the collocation is not compositional,
3. Non-lexicalized Collocations where the collo-
cation is mediated by a morphosyntactic pat-
tern of duplicated and/or contrasting compo-
nents  hence the name non-lexicalized, and
4. Multi-word Named-entities which are multi-
word proper names for persons, organizations,
places, etc.
2.1 Lexicalized Collocations
Under the notion of lexicalized collocations, we
consider the usual  xed multi-word expressions
1Literally,  (the thing existing) at the time we caused (some-
thing) to become strong . Obviously this is not a word that one
would use everyday. Turkish words (excluding nonin ecting
frequent words such as conjunctions, clitics, etc.) found in typ-
ical text average about 10 letters in length.
2Please refer to the list of morphological features given in
Appendix A for the semantics of some of the non-obvious sym-
bols used here.
whose resulting syntactic function and semantics
are not readily predictable from the structure and
the morphological properties of the constituents.
Here are some examples of the multi-word expres-
sions that we consider under this grouping:3;4
(1) hi olmazsa
 hi (never)+Adverb
ol(be)+Verb+Neg+Aor+Cond+A3sg
 hi _olmazsa+Adverb
 at least (literally  if it never is )
(2) ipe sapa gelmez
 ip(rope)+Noun+A3sg+Pnon+Dat
sap(handle)+Noun+A3sg+Pnon+Dat
gel(come)+Verb+Neg+Aor+A3sg
 ipe_sapa_gelmez+Adj
 worthless (literally  (he) does not come to rope
and handle )
2.2 Semi-lexicalized Collocations
Multi-word expressions that are considered under
this heading are compound and support verb forma-
tions where there are two or more lexical items the
last of which is a verb or is a derivation involving
a verb. These are formed by a lexically adjacent,
direct or oblique object, and a verb, which for the
purposes of syntactic analysis, may be considered
as single lexical item: e.g., sayg dur- (literally to
stand (in) respect  to pay respect), kafay ye- (lit-
erally to eat the head  to get mentally deranged),
etc.5 Even though the other components can them-
selves be in ected, they can be assumed to be  xed
for the purposes of the collocation, and the colloca-
tion assumes its morphosyntactic features from the
last verb which itself may undergo any morpholog-
ical derivation or in ection process. For instance in
(3) kafay ye-
 kafa(head)+Noun+A3sg+Pnon+Acc
ye(eat)+Verb...
3In every group we  rst list the morphological features of
all the tokens, one on every line (with the glosses for the roots),
and then provide the morphological features of the multi-word
construct and then provide glosses and literal meanings.
4Please refer to the list of morphological features given in
Appendix A for the semantics of some of the non-obvious sym-
bols used here.
5Here we just show the roots of the verb with - denoting the
rest of the suf xes for any in ectional and derivational markers.
 kafay _ye+Verb...
 get mentally deranged ( literally  eat the head )
the  rst part of the collocation, the accusative
marked noun kafay , is the  xed part and the part
starting with the verb ye- is the variable part which
may be in ected and/or derived in myriads of ways.
For example the following are some possible forms
of the collocation:
 kafay yedim  I got mentally deranged 
 kafay yiyeceklerdi  they were about to get
mentally deranged 
 kafay yiyenler  those who got mentally de-
ranged 
 kafay yedi gi  the fact that (s/he) got mentally
deranged 
Under certain circumstances, the   xed part may
actually vary in a rather controlled manner subject
to certain morphosyntactic constraints, as in the id-
iomatic verb:
(4) kafa(y )  ek-
 kafa(head)+Noun+A3sg+Pnon+Acc
 ek(pull)+Verb...
 kafa_ ek+Verb...
 consume alcohol (but literally  to pull the head )
(5) kafalar  ek-
 kafa+Noun+A3pl+Pnon+Acc
 ek+Verb...
 kafa_ ek+Verb...
 consume alcohol (but literally  to pull the
heads )
where the  xed part can be in the nominative or the
accusative case, and if it is in the accusative case, it
may be marked plural, in which case the verb has to
have some kind of plural agreement (i.e.,  rst, sec-
ond or third person plural), but no possessive agree-
ment markers are allowed.
In their simplest forms, it is suf cient to recognize
a sequence of tokens one of whose morphologi-
cal analyses matches the corresponding pattern, and
then coalesce these into a single multi-word expres-
sion representation. However, some or all variants
of these and similar semi-lexicalized collocations
present further complications brought about by the
relative freeness of the constituent order in Turkish,
and by the interaction of various clitics with such
collocations.6
When such multi-word expressions are coalesced
into a single morphological entity, the ambiguity in
morphological interpretation is reduced as we see in
the following example:
(6) devam etti
 devam(continuation)+Noun+A3sg
+Pnon+Nom
*deva(therapy)+Noun+A3sg+P1sg+Nom
et(make)+Verb+Pos+Past+A3sg
*et(meat)+Noun+A3sg+Pnon+Nom
 DB+Verb+Past+A3sg
 devam_et+Verb+Pos+Past+A3sg
 (he) continued (literally  made a continuation )
Here, when this semi-lexicalized collocation is rec-
ognized, other morphological interpretations of the
components (marked with a * above) can safely be
removed, contributing to overall morphological am-
biguity reduction.
2.3 Non-lexicalized Collocations
Turkish employs quite a number of non-lexicalized
collocations where the sentential role of the collo-
cation has (almost) nothing to do with the parts-of-
speech and the morphological features of the indi-
vidual forms involved. Almost all of these colloca-
tions involve partial or full duplications of the forms
involved and can actually be viewed as morphologi-
cal derivational processes mediated by reduplication
across multiple tokens.
The morphological feature representations of such
multi-word expressions follow one of the patterns:
1) ! !
2) ! Z !,
3) ! +X ! +Y
4) !1 +X !2 +X
where ! is the duplicated string comprising the
root, its part-of-speech and possibly some additional
morphological features encoded by any suf xes. X
and Y are further duplicated or contrasted morpho-
logical patterns and Z is a certain clitic token. In
6The question and the emphasis clitics which are written as
separate tokens, can occasionally intervene between the com-
ponents of a semi-lexicalized collocation. We omit the details
of these due to space restrictions.
duplications of type 4, it is possible that !1 is dif-
ferent from !2.
Below we present list of the more interesting non-
lexicalized expressions along with some examples
and issues.
 When a noun appears in duplicate following the
 rst pattern above, the collocation behaves like a
manner adverb, modifying a verb usually to the
right. Although this pattern does not necessarily
occur with every possible noun, it may occur with
many (countable) nouns without much of a further
semantic restriction. Such a sequence has to be co-
alesced into a representation indicating this deriva-
tional process as we see below.
(7) ev ev (! !)
 ev(house)+Noun+A3sg+Pnon+Nom
ev+Noun+A3sg+Pnon+Nom
 ev+Noun+A3sg+Pnon+Nom DB+Adverb+By
 house by house (literally  house house )
 When an adjective appears in duplicate, the col-
location behaves like a manner adverb (with the se-
mantics of -ly adverbs in English), modifying a verb
usually to the right. Thus such a sequence has to
be coalesced into a representation indicating this
derivational process.
(8) yava‚s yava‚s (! !)
 yava‚s(slow)+Adj
yava‚s+Adj
 yava‚s+Adj DB+Adverb+Ly
 slowly (literally  slow slow )
This kind of duplication can also occur when the
adjective is a derived adjective as in
(9) h zl h zl (! !)
 h z(speed)+Noun+A3sg+Pnon+Nom
 DB+Adj+With
h z+Noun+A3sg+Pnon+Nom
 DB+Adj+With
 h z+Noun+A3sg+Pnon+Nom
 DB+Adj+With DB+Adverb+Ly
 rapidly (literally  with-speed with-speed )
 Turkish has a fairly large set of onomatopoeic
words which always appear in duplicate and func-
tion as manner adverbs. The words by themselves
have no other usage and literal meaning, and mildly
resemble sounds produced by natural or arti cial
objects. In these cases, the root word almost al-
ways is reduplicated but need not be, but both words
should be of the part-of-speech category +Dup that
we use to mark such roots.
(10) har l hurul (!1 +X !2 +X )
 har l+Dup
hurul+Dup
 har l_hurul+Adverb+Resemble
 making rough noises (no literal meaning)
 Duplicated verbs with optative mood and third
person singular agreement function as manner ad-
verbs, indicating that another verb is executed in a
manner indicated by the duplicated verb:
(11) ko‚sa ko‚sa (! !)
 ko‚s(run)+Verb+Pos+Opt+A3sg
ko‚s(run)+Verb+Pos+Opt+A3sg
 ko‚s+Verb+Pos+ DB+Adverb+ByDoingSo
 by running (literally  let him run let him run )
 Duplicated verbs in aorist mood with third person
agreement and  rst positive then negative polarity,
function as temporal adverbs with the semantics  as
soon as one has verbed 
(12) uyur uyumaz (! +X ! +Y )
 uyu+Verb+Pos+Aor+A3sg
uyu+Verb+Neg+Aor+A3sg
 uyu+Verb+Pos+ DB+Adverb+AsSoonAs
 as soon as (he) sleeps ( literally  (he) sleeps (he)
does not sleep )
It should be noted that for most of the non-
lexicalized collocations involving verbs (like (11)
and (12) above), the verbal portion before the in-
 ectional marking mood can have additional deriva-
tional markers and all such markers have to dupli-
cate.
(13) sa glamla‚st r r sa glamla‚st rmaz (!+X !+Y )
 sa glam+Adj DB+Verb+Become
 DB+Verb+Caus DB+Verb+Pos+Aor+A3sg
sa glam+Adj DB+Verb+Become
 DB+Verb+Caus DB+Verb+Neg+Aor+A3sg
 sa glam+Adj DB+Verb+Become+
 DB+Verb+Caus+Pos
 DB+Adverb+AsSoonAs
 as soon as (he) forti es (causes to become strong) 
Another interesting point is that non-lexicalized col-
locations can interact with semi-lexicalized collo-
cations since they both usually involve verbs. For
instance, when the verb of the semi-lexicalized col-
location example in (5) is duplicated in the form of
the non-lexicalized collocation in (12), we get
(14) kafalar  eker  ekmez
In this case,  rst the non-lexicalized collocation has
to be coalesced into
(15) kafalar  ek+Verb+Pos
 DB+Adverb+AsSoonAs
and then the semi-lexicalized collocation kicks in,
to give
(16) kafa_ ek+Verb+Pos
 DB+Adverb+AsSoonAs
( as soon as (we/you/they) get drunk )
Finally, the following non-lexicalized collocation
involving adjectival forms involving duplication and
a question clitic is an example of the last type of
non-lexicalized collocation.
(17) g zel mi g zel (! Z !)
 g zel+Adj
mi+Ques
g zel+Adj
 g zel+Adj+Very
 very beautiful (literally  beautiful (is it?) beauti-
ful )
2.4 Named-entities
Another class of multi-word expressions that we
process is the class of multi-word named-entities
denoting persons, organizations and locations. We
essentially treat these just like the semi-lexicalized
collocation discussed earlier, in that, when such
named-entities are used in text, all but the last com-
ponent are  xed and the last component will usually
undergo certain morphological processes demanded
by the syntactic context as in
Figure 1: The architecture of the multi-word expres-
sion extraction processor
(18) T rkiye B y k Millet Meclisi’nde ....7
Here, the last component is case marked and this
represents a case marking on the whole named-
entity. We package this as
(19) T rkiye_B y k_Millet_Meclisi
+Noun+Prop+A3sg+Pnon+Loc
To recognize these named entities we use a rather
simple approach employing a rather extensive
database of person, organization and place names,
developed in the context of a previous project, in-
stead of using a more sophisticated named-entity
extraction scheme.
3 The Structure of the Multi-word
Expression Processor
Our multi-word expression processor is a multi-
stage system as depicted in Figure 1. The  rst
component is a standard tokenizer which splits in-
put text into constituent tokens. These then go into
7In the Turkish Grand National Assembly.
a wide-coverage morphological analyzer (Oflazer,
1994) implemented using Xerox  nite state technol-
ogy (Karttunen et al., 1997), which generates, for all
tokens, all possible morphological analyses. This
module also performs unknown processing by pos-
tulating possible noun roots and then trying to parse
the rest of a word as a sequence of possible Turk-
ish suf xes. The morphological analysis stage also
performs a very conservative non-statistical mor-
phological disambiguation to remove some very un-
likely parses based on unambiguous contexts. Fig-
ure 2 shows a sample Turkish text that comes out of
morphological processing, about to go into multi-
word expression extraction.
Kistin kist+Noun+A3sg+P2sg+Nom
kist+Noun+A3sg+Pnon+Gen
sa gl  g m sa gl k+Noun+A3sg+P1sg+Acc
sa g+Adj DB+Noun+Ness+
A3sg+P1sg+Acc
s k nt ya s k nt +Noun+A3sg+Pnon+Dat
sokacak sok+Verb+Pos+Fut+A3sg
sok+Verb+Pos DB+Adj
+FutPart+Pnon
herhangi herhangi+Adj
bir bir+Det
bir+Num+Card
bir+Adj
bir+Adverb
etkisi etki+Noun+A3sg+P3sg+Nom
s z s z+Noun+A3sg+Pnon+Nom
konusu konu+Noun+A3sg+P3sg+Nom
de gil de g+Verb DB+Verb+Pass
+Pos+Imp+A2sg
de gil+Conj
de gil+Verb+Pres+A3sg
. .+Punc
Figure 2: Output of the morphological analyzer
The multi-word expression extraction processor has
three stages with the output of one stage feeding into
the next stage:
1. The  rst stage handles lexicalized collocations
and multi-word named entities.
2. The second stage handles non-lexicalized col-
locations.
3. The third stage handles semi-lexicalized col-
locations. The reason semi-lexicalized collo-
cations are handled last, is that any duplicate
verb formations have to be processed before
compound verbs are combined with their lexi-
calized complements (cf. examples (14)  (16)
above).
The output of the multi-word expression extraction
processor for the relevant segments in Figure 2 is
given in Figure 3.
The multi-word expression extraction processor has
been implemented in Perl. The rule bases for
the three stages are maintained separately and then
compiled of ine into regular expressions which are
then used by Perl at runtime.
...
s k nt ya_sokacak s k nt ya_sok+Verb
+Pos+Fut+A3sg
s k nt ya_sok+Verb
+Pos DB+Adj
+FutPart+Pnon
herhangi_bir herhangi_bir+Det
...
s z_konusu s z_konu+Noun+A3sg
+P3sg+Nom
...
Figure 3: Output of the multi-word expression ex-
traction processor
Table 1 presents statistics on the current rule base
of our multi-word expression extraction processor:
For named entity recognition, we use a list of about
Rule Type Number of Rules
Lexicalized Colloc. 363
Semi-lexicalized Colloc. 731
Non-lexicalized Colloc. 16
Table 1: Rules base statistics
60,000  rst and last names, a list of about 16,000
multi-word organization and place names.
4 Evaluation
To improve and evaluate our multi-word expression
extraction processor, we used two corpora of news
text. We used a corpus of about 730,000 tokens to
incrementally test and improve our semi-lexicalized
rule base, by searching for compound verb forma-
tions, etc. Once such rules were extracted, we tested
our processor on this corpus, and on a small corpus
of about 4200 words to measure precision and re-
call. Table 2 provides some statistics on these cor-
pora.
Table 3 shows the result of multi-word expression
extraction on the large (training) corpus. It should
be noted that we only mark multi-word named-
entities, not all. Thus many references to persons by
Corpus Number of Avg. Analyses
Tokens per Token
Large Corpus 729,955 1.760
Small Corpus 4,242 1.702
Table 2: Corpora Statistics
their last name are not marked, hence the low num-
ber of named-entities extracted.8 As a result of
MW Type Number Extracted
Lexicalized Colloc. 3,883
Semi-lexicalized Colloc. 9,173
Non-lexicalized Colloc. 220
Named-Entities 4,480
Total 17,750
Table 3: Multi-word expression extraction statistics
on the large corpus
this extraction, the average number of morphologi-
cal parses per token go from 1.760 down to 1.745.
Table 4 shows the result of multi-word expression
extraction on the small corpus. We also manu-
MW Type Number Extracted
Lexicalized Colloc. 15
Semi-lexicalized Colloc. 62
Non-lexicalized Colloc. 0
Named-Entities 99
Total 176
Table 4: Multi-word expression extraction statistics
on the small corpus
ally marked up the small corpus into a gold-standard
corpus to test precision and recall. The results in
Table 4 correspond to an overall recall of 65.2%
and a precision of 98.9%, over all classes of multi-
word expressions. When we consider all classes
except named-entities, we have a recall of 60.1%
and a precision of 100%. An analysis of the er-
rors and missed multi-word expressions indicates
that the test corpus had a certain variant of a com-
pound verb construction that we had failed to ex-
tract from the larger corpus we used for compil-
ing rules. Failing to extract the multi-word expres-
sions for that compound verb accounted for most
of the drop in recall. Since we are currently using
a rather naive named-entity extraction scheme,9 re-
8Since this is a very large corpus, we have no easy way of
obtaining accurate precision and recall  gures.
9As opposed to a general purpose statistical NE extractor
that we have developed earlier (T r et al., 2003).
call is rather low as there are quite a number of for-
eign multi-word named-entities (persons and orga-
nizations mostly) that do not exist in our database
of named-entities. On the other hand, since named-
entity extraction for English is a relatively mature
technology, we can easily integrate an existing tool
to improve our recall.
5 Conclusions
This paper has described a multi-word expression
extraction system for Turkish for handling vari-
ous types of multi-word expressions such as semi-
lexicalized and non-lexicalized collocations which
depend on the recognition of certain morphologi-
cal patterns across tokens. Our results indicate that
with about 1100 rules (most of which were extracted
from a large  training corpus searching for patterns
involving a certain small set of support verbs), we
were able get almost 100% precision and around
60% recall on a small  test corpus. We expect that
with additional rules from dictionaries and other
sources we will improve recall signi cantly.
6 Acknowledgments
We thank Orhan Bilgin for helping us compile the
multi-word expressions.

References
Timothy Baldwin and Aline Villavicencio. 2002.
Extracting the unextractable: A case study on
verb-particles. In Proceedings of the Sixth Confer-
ence on Computational Natural Language Learning
(CoNLL 2002), pages 99 105.
Lauri Karttunen, Tamas Gaal, and Andre Kempe.
1997. Xerox Finite-State Tool. Technical report,
Xerox Research Centre Europe.
Kemal Oflazer. 1994. Two-level description of
Turkish morphology. Literary and Linguistic Com-
puting, 9(2):137 148.
Scott S. L. Piao, Paul Rayson, Dawn Archer, An-
drew Wilson, and Tony McEnery. 2003. Extracting
multiword expressions with a semantic tagger. In
Proceedings of the ACL 2003 Workshop on Multi-
word Expressions: Analysis, Acquisition and Treat-
ment.
Ivan Sag, Timothy Baldwin, Francis Bond, Ann
Copestake, and Dan Flickinger. 2002. Multiword
expressions: A pain in the neck for nlp. In Pro-
ceedings of the Third International Conference on
Intelligent Text Processing and Computational Lin-
guistics (CICLING 2002), pages 1 15.
G khan T r, Dilek Zeynep Hakkani-T r, and Ke-
mal O azer. 2003. A statistical information extrac-
tion systems for Turkish. Natural Language Engi-
neering, 9(2).
R. Urizar, N. Ezeiza, and I. Alegria. 2000. Mor-
phosyntactic structure of terms in Basque for auto-
matic terminology extraction. In Proceedings of the
ninth EURALEX International Congress.
