Collocation Extraction: Needs, Feeds and Results of an Extraction System
for German
Julia Ritz
Institut f r Maschinelle Sprachverarbeitung (IMS)
Universit t Stuttgart
Azenbergstr. 12
70174 Stuttgart
Germany
Julia.Ritz@ims.uni-stuttgart.de
Abstract
This paper provides a speci cation of re-
quirements for collocation extraction sys-
tems, taking as an example the extraction
of noun + verb collocations from German
texts. A hybrid approach to the extrac-
tion of habitual collocations and idioms
is presented, aiming at a detailed descrip-
tion of collocations and their morphosyn-
tax for natural language generation sys-
tems as well as to support learner lexicog-
raphy.
1 Introduction
Since Firth  rst described collocations as habit-
ual word combinations in the 1950ies (cf. Firth,
1968), a number of papers focusing on collocation
extraction have been published (see the overviews
in (Evert, 2004; Bartsch, 2004)). Most studies
concentrate on the extraction from English. How-
ever, the procedures proposed in these studies can-
not necessarily be applied to other languages as
English stands out, e.g. with respect to con gu-
rationality. They rely on the fact that the syntax
of English (and of all con gurational languages)
provides positional clues to the grammatical func-
tion of noun phrases, and they exploit this con-
cept by means of window-based, adjacency-based
or pattern-based extraction, combined with associ-
ation measures to identify co-occurrences that are
more frequent than statistically expectable. What
these procedures do not cover is semantic-oriented
de nitions like (a) and (b).
a. A collocation is a combination of a free (’au-
tosematic’) element (the base) and a lexically
determined (’synsemantic’) element (the col-
locate, which may lose (some of) its meaning
in a collocation) (adapted from (Hausmann,
1979; Hausmann, 1989; Hausmann, 2003)).
b. A collocation is a word combination whose
semantic and/or syntactic properties cannot
be fully predicted from those of its compo-
nents, and which therefore has to be listed in
a lexicon (Evert, 2004).
We argue that linguistic knowledge could not
only improve results (Krenn, 2000b; Smadja,
1993) but is essential when extracting colloca-
tions from certain languages: this knowledge pro-
vides other applications (or a lexicon user, respec-
tively) with a  ne-grained description of how the
extracted collocations are to be used in context.
Additional requirements resulting from the
needs of dictionary users are described in (Haus-
mann, 2003; Heid and Gouws, 2005) and are of
interest not only in lexicography but can also be
transferred to the  eld of natural language gener-
ation. These requirements in uence the develop-
ment of collocation extraction systems, which mo-
tivates this paper.
The structure of the paper is as follows: In chap-
ter 2, the requirements, depending on factors like
the targeted language, are presented. We then dis-
cuss and suggest methods to meet the given needs.
A documentation of ongoing work on the extrac-
tion of noun + verb collocations from German
texts is given in chapter 3. Chapter 4 gives a con-
clusion and an outlook on work still to be done.
2 Collocation Extraction Tools:
Requirements
The development of a collocation extraction tool
depends on the following conditions:
1. properties of the targeted language
41
2. the targeted application
3. the kinds of collocations to be extracted
4. the degree of detail
Whereas issues 1 to 3 deal with the collocation
itself, issue 4 is focused at the collocation in con-
text, i.e. its behaviour (from a syntagmatic analy-
sis point of view) or, respectively, its use (from a
generation perspective).
2.1 Language factors
One of the most important factors is, of course, the
targeted language and its main characteristics with
respect to word formation and word order. De-
pending on word and constituent order, the pros
and cons of positional vs. relational extraction
patterns need to be considered. Positional pat-
terns (based on adjacency or a ’window’) are ad-
equate for con gurational languages, but in lan-
guages with rather free word order, words belong-
ing to a phrase or collocation do not necessarily
occur within a prede ned span1.
Extracting word combinations using relational
patterns (represented by part of speech (PoS) tags
or dependency rules) offers a higher level of ab-
straction and improves the results (cf. (Krenn,
2000b; Smadja, 1993)). However, this requires
part of speech tagging and possibly partial parsing.
A system extracting word combinations by apply-
ing relational patterns, obviously pro ts from lan-
guage speci c knowledge about phrase and sen-
tence structure and word formation. One example
is the order of adjective + noun pairs: in English
and German, the adjective occurs left of the noun,
whereas in French, the adjective can occur left or
right of the noun. Another example is compound-
ing, handled differently in different languages:
noun + noun in English, typically separated by
a white space (e.g. death penalty) vs. noun +
prepositional phrase in French (e.g. peine de mort)
vs. compound noun in German (e.g. Todesstrafe).
Consequently, language speci c word formation
rules need to be considered when designing ex-
traction patterns. For languages with a rich in-
 ectional morphology where the individual word
forms are rather rare, frequency counts and results
1In German, e.g., in usual verb second constructions with
a full verb in the left sentence bracket (topological  eld the-
ory see (W llstein-Leisten et al., 1997)), particles of particle
verbs appear in the right sentence bracket. The middle  eld
(containing arguments and possibly adjuncts of the verb) is
of undetermined length.
of statistical analyses are little reliable. To allow a
grouping of words sharing the same lemma, lem-
matisation is crucial.
2.2 Application factors
Other important factors are the targeted applica-
tion (i.e. analysis vs. generation) and, to some ex-
tent resulting from it, factors (3.) and (4.), above.
Depending on the purpose of the tool (or lexicon,
respectively), the collocation de nition chosen as
an outline may vary, e.g. including transparent and
regular collocations (cf. (Tutin, 2004)) for genera-
tion purposes, but excluding them for analysis pur-
poses. In addition, a more detailed description of
the use of collocations in context (e.g. information
about preferences with respect to the determiner,
etc.) is needed for generation purposes than for
text analysis.
2.3 Factors of collocation de nition
Collocations can be distinguished on two levels:
the formal level and the content level. On the for-
mal level, a collocation can be classi ed accord-
ing to the structural relation between its elements.
Typical patterns are shown in table 12 (taken from
(Heid and Gouws, 2005)).
On the content level, there are regular, transpar-
ent, and opaque collocations (according to (Tutin,
2004)) and, taking de nition (b) into account, id-
ioms as well. However, as a classi cation at the
content level needs detailed semantic description,
we see no means of accomplishing this goal other
than manually at the moment.
2.4 Contextual factors
(Hausmann, 2003; Heid and Gouws, 2005; Evert
et al., 2004) argue that collocations have strong
preferences with respect to their morphosyntax
(see examples (1) and (2)) and may be combined
(see example (3)). The collocation in example (1)
(’to charge somebody’) is restricted with respect
to the determiner (null determiner) of the base,
whereas the same base shows a strong preference
for a (de nite or inde nite) determiner when used
2Abbreviations in table 1:
advl - adverbial
prd - predicative
subj - subject
obj - object
pobj - prepositional object
dat - dative case
gen - genitive case
quant - quantifying
42
No. Type Example
1 N + Adj tiefer Schlaf
2 Adj + Adv tief rot
3 V + Adv tief schlafen
4 V + NPa0a2a1a4a3a6a5 Baukl tze staunen
5 V + Na7a9a8a11a10a13a12 Frage + sich stellen
6 V + Na1a14a0a14a15 Anforderungen + gen -
gen
7 V + Na16a17a10a13a12 Frage + aufwerfen
8 V + PPa18a19a16a17a10a13a12 zu + Darstellung + gelan-
gen
9 V + Adja18a19a20 a1 verr ckt spielen
10 N + Na21a6a22a24a23 Einreichung des Antrags
11 Na25 a8 a0 a23 a15 + N ein Schwarm Heringe
the category containing the base is underlined.
Table 1: Collocational patterns
with a different collocate (example (2), ’to drop
a lawsuit’). Example (3) shows two collocations
sharing the base can form a collocational sequence
(example taken from (Heid and Gouws, 2005)).
(1) ł Anklage erheben
(2) die/eine Anklage fallenlassen
(3) Kritik  ben + scharfe Kritik
scharfe Kritik  ben
For both natural language generation systems
and lexicography, such information is highly rel-
evant. Therefore, the extraction of contextual in-
formation (called ’context parameters’ in the fol-
lowing) should be integrated into the collocation
extraction process.
3 Extracting noun + verb collocations
from German
The standard architecture for collocation extrac-
tion systems contains three stages (cf. (Krenn,
2000)): a more or less detailed linguistic analysis
of the corpus text (preprocessing), an extraction
step and a statistic  ltering of the extracted word
combinations. We follow this architecture (see  g-
ure 1). However, our hypothesis differs from other
approaches. Collocations are often restricted with
respect to their morphosyntax. We test to what ex-
tent they can be identi ed via these restrictions.
3.1 Approach
In an experiment, we extracted relational word
combinations (verb + subject/object pairs) from
German newspaper texts.
The syntactic patterns for the extraction of these
combinations concentrate on verb- nal construc-
tions as in example (4) and verb second con-
structions with a modal verb in the left sentence
bracket according to the topological  eld theory
(see (W llstein-Leisten et al., 1997)) as in exam-
ple (5). The reason is that, in these constructions,
the particle forms one word with the verb (see ex-
ample (6)), as opposed to usual verb second con-
structions (see example (7)). Thus, we need not
recombine verb + particle groups that appear sep-
aratedly.
(4) ... wenn Wien einen Antrag auf Vollmitglied-
schaft stellt.
(’if Vienna an application for full member-
ship puts’)
(if Vienna applies for full membership)
(5) ... kann Wien einen Antrag auf Vollmitglied-
schaft stellen.
(’might Vienna an application for full mem-
bership put.’)
(Vienna might apply for full membership.)
(6) ..., da er ein Schild auf stellt.
(’that he a sign upputs’)
(that he puts up a sign)
(7) Er stellt ein Schild auf.
(’He puts a sign up.’)
(He puts up a sign.)
Preprocessing
As data, we used a collection of 300 million
words from German newspaper texts dating from
1987 to 1993. The corpus is tokenized and PoS-
tagged by the Treetagger (Schmid, 1994), then
chunk annotated by YAC (Kermes, 2003). The
chunker YAC determines phrase boundaries and
heads, and disambiguates agreement information
as far as possible. It is based on the corpus query
language cqp (Christ et al., 1999)3, which can in
turn be used to query the chunk annotations.
Data Extraction
The syntactic patterns used to extract verb +
subject/object combinations are based on PoS tags
and chunk information. These patterns are repre-
sented using cqp macros (see  gure 2). The cqp
syntax largely overlaps with regular expressions.
3http://www.ims.uni-
stuttgart.de/projekte/CorpusWorkbench/
43
cqp
postprocessing
macros
preprocessing collocation identification
interpretationanalysis
(STTS tagset)(newspaper texts)
PoS tagging
tokenizing,
information
agreement
morphology
TreeTagger
phrase boundaries,
extraction
lexicons
stat. filtering NLP resources
corpus
data
lemma and
IMSLex
wrt. morphosyntax
pairs and their features
extraction of noun + verb database
morphology
semantic classification
agreement
(annotation of
partial parsing
YAC
Figure 1: Tool architecture
( 1) MACRO n_vfin(0)
( 2) (
( 3) [pos = "(KOUS|VMFIN)"]
( 4) []*
( 5) a26 npa27
( 6) [!pp
( 7) & _.np_f not contains "ne"
( 8) & _.np_f not contains "pron"
( 9) & _.np_f not contains "meas"
(10) & _.np_h != "@card@"]+
(11) a26 /npa27
(12)[pos != "($.|KOUS|VMFIN)"]*
(13)[pos = "V.*"]+
(14)[pos = "($.|KON)"]
(15))
(16);
Figure 2: sample macro
Line (1) of  gure 2 contains the name of the
macro and the number of its parameters. In line
(3), a word PoS tagged KOUS (subordinating con-
junction) or VMFIN ( nite modal verb) is re-
quested, followed by an arbitrary number (’*’) of
words without any restrictions (line (4)). Line (5)
indicates the start of a nominal phrase (np), line
(11) its end. The elements within this np (one or
more words, as indicated by ’+’) must not be part
of a prepositional phrase (pp) to avoid the extrac-
tion of pp + verb (line (6), see example (8)). In ad-
dition, the np must be neither a named entity (ne,
see line (7)) nor a pronoun (pron, line (8)) nor an
np of measure (meas, line (9), see example (9)),
nor must its head be a cardinal number (card, line
(10), see example (10)). An arbitrary number of
words may follow the np (punctuation marks (PoS
tagged $.), subordinating conjunctions and  nite
modal verbs excluded). At least one verb is re-
quired (line (13), all PoS tags for verbs start with
a capital ’V’4). Line (14) indicates the end of the
subclause or sentence.
(8) ... kann [zur Verf gung]a28a6a28 gestellt werden.
(9) ... weil davon j hrlich [3,5 Tonnen]a29 a28a31a30a33a32a35a34a2a36
eingef hrt werden.
(10) ... obwohl er [1989]a29 a28a38a37a9a34a14a39a4a40 noch dort
arbeitete.
By applying the macro to the corpus, all se-
quences of words matching the pattern are ex-
tracted.
From these sequences, the following informa-
tion is made explicit (cf. (Heid and Ritz, 2005)):
a41 lemma of the noun (potential base)
a41 lemma of the verb (potential collocate)
a41 number of the noun (singular, plural)
a41 case of the noun
4The search condition is underspeci ed with respect to
the  niteness and the role of the verb (auxiliary, modal or full
verb). Thus, line (13) matches verbal complexes. It also cov-
ers cases where full verbs are accidentally PoS tagged modal
or auxiliary verbs.
44
a42 determination of the noun (de nite, inde -
nite, null, demonstrative, quanti er)
a42 modi cation of the noun (adjective, cardinal
number, genitive np, compound noun etc.)
a42 negation (yes/no)
a42 auxiliaries and modal verbs
a42 original phrase from the corpus
For each instance found, the lemmas of noun
and verb along with all the context parameters
mentioned above are stored as feature value pairs
in a relational data base. The database can be
queried via SQL. See  gure 3 for a sample query
asking for distinct lemma pairs, ordered by fre-
quency (in descending order), and  gures 5 and 4
for more speci c queries and some of their results.
SELECT COUNT(*) AS f,
n_lemma, v_lemma
FROM comfea1
GROUP BY n_lemma, v_lemma
ORDER BY f DESC;
Figure 3: sample query
Filtering
The instances extracted in the previous step are
grouped according to noun and verb lemmas, i.e.
instances of the same lemma pair form one group.
Within these groups, a relative frequency distribu-
tion is computed for each of the features. For que-
riability reasons, the results of this postprocessing
are also stored in the database, as shown in  gure
1. A word combination is chosen as a collocation
candidate if a preference (speci ed by a threshold
of e.g. 60% of the occurrences) for a certain fea-
ture value (singular / plural, presence / absence of
a determiner, de nite / inde nite / demonstrative
/ possessive / quantifying determiner, presence of
modifying elements) is discovered.
3.2 Results
From 300 million words, we extracted more than
1.3 million noun + verb combinations, the in-
stances of 726,488 different lemma pairs. 10,934
of these lemma pairs appeared with a minimum
SELECT COUNT(*) AS f,
n_lemma, v_lemma
FROM comfea1
WHERE neg = ’+’
GROUP BY n_lemma, v_lemma
ORDER BY f DESC;
f | n_lemma | v_lemma
1152| Rede | sein
748 | Angabe | machen
322 | Einigung | erzielen
228 | Chance | haben
217 | Forderung | erf llen
188 | Problem | l sen
151 | Rolle | spielen
131 | Auskunft | geben
127 | Stellungnahme | abgeben
120 | Alternative | geben
110 | Interesse | haben
110 | Angabe | best tigen
102 | Geld | haben
Figure 4: sample query: word combinations from
negated phrases
frequency of 10. Sample results are shown in  g-
ure 65.
We evaluated collocation candidates with a fre-
quency of at least 100. Within the 323 most fre-
quent collocation candidates, we found 213 collo-
cations (including 11 idioms). This corresponds to
a precision of 66% (see table 26). As a compari-
son, a window-based study was carried out on the
same (PoS-tagged) data. In this study, the window
was de ned in a way that up to two tokens (exclud-
ing sentence boundaries and  nite full verbs) were
allowed to appear between a noun (PoS tagged
NN) and a  nite full verb (PoS tagged VVFIN).
Log-likelihood7 was used as an association mea-
sure. The precision of this approach is 41%8.
5Abbreviations in  gure 6:
c - rated as a collocation in evaluation
i - rated as an idiom in evaluation.
For chosing collocation candidates, a threshold of 60% is
used. However, additional preferences are displayed for val-
ues greater than 50%.
6Abbreviations in table 2:
log-l - window-based approach using log-likelihood
feat - pattern-based approach using morphosyntactic features
7www.collocations.de
8Note that partial matches, such as Verf gung + stellen
45
SELECT COUNT(*) AS f,
n_lemma, v_lemma
FROM comfea1
WHERE cas = ’Akk’
GROUP BY n_lemma, v_lemma
ORDER BY f DESC;
f | n_lemma | v_lemma
507 | Beitrag | leisten
237 | Antrag | stellen
173 | Eindruck | erwecken
173 | Weg |  nden
167 | Umsatz | steigern
145 | Hut | nehmen
140 | Bericht | vorlegen
135 | Betrieb | aufnehmen
121 | Sprung | schaffen
120 | Ausschlag | geben
116 | Mut | haben
111 | Sitz | haben
106 | Weg | ebnen
105 | Zuschlag | erhalten
104 | Platz |  nden
100 | Anspruch | haben
94 | Tod | feststellen
93 | Zusammenhang | geben
90 | Vertrag | unterzeichnen
90 | Riegel | vorschieben
Figure 5: sample query: verb + accusative object
However, the evaluation criteria from de ni-
tions (a) and (b) remain vague or even contradic-
tory for some of the results. First, there is the
problem of semantic equivalence: does the com-
bination express more than its elements (consider
example (11))? Secondly, de nitions (a) and (b)
may judge the same example differently: Anteil
nehmen (example (12)) is usually agreed upon to
be a support verb construction, but the distinction
of the noun Anteil as the base (making the main
contribution to the meaning) is questionable. On
the other hand, its unpredictable syntactic proper-
ties (e.g. null determiner) and semantics (partial
loss of meaning of the collocate nehmen) make it
clear that this combination has to be listed in a lex-
icon.
(without the corresponding preposition), have been treated as
correct matches in 72 cases.
log-l feat
collocation candidates 700 323
collocations (manually veri ed) 290 213
precision 41% 66%
Table 2: evaluation results
For evaluation purposes, combinations judged
collocations by either (or both) of the de nitions
were marked as correct matches. In cases like ex-
ample (11), combinations were marked as correct
matches if no alternative collocate existed for de-
scribing the denoted situation or event.
(11) Chance + haben (’to have a/the chance’)
(12) Anteil + nehmen (’to commiserate’)
4 Conclusion and Outlook
We presented a system for collocation extraction
that takes into account the behaviour or use of
collocations in context. Pro ting from linguis-
tic information (PoS tagging, chunking), the tool
reaches a precision of 66% on the top 323 candi-
dates by frequency. On the same data, a window-
based approach relying only on PoS information
reached a precision of 41%.
As the extracted word combinations as well as
their context parameters (including the original ev-
idence from the corpus) are stored in a database,
the tool also supports explorative research in lexi-
cography.
However, there are some enhancements worth
doing: Especially when dealing with low frequen-
cies, relative frequencies lack reliability. There-
fore, we suggest computing a con dence interval
as proposed in (Evert, 2004b; Heid and Ritz, 2005;
Ritz, 2005).
As indicated in  gure 1, several postprocessing
steps can be added to the system, e.g. enabling a
sorting of collocation candidates with compound
nouns by the morphological heads of their base.
In order to get more data, the extraction from
verb  rst and verb second constructions is also
possible. To complete the tool, extraction patterns
for collocations of different syntactic relations (cf.
table 1) could be designed.
46
n_lemma | v_lemma | total | restrictions
- | Polizei | mitteilen | 5689| sg(100%), det(99.49%), def(99.49%), modif(53.84%)
c | Rede | sein | 2144| sg(99.81%), det(99.16%), quant(53.59%)
- | Sprecher | mitteilen | 1401| sg(94.15%), det(99.21%), indef(93.36%), modif(68.31%)
c | Fall | sein | 1233| sg(99.03%), det(97.32%), def(96.03%)
- | Kantonspolizei | mitteilen | 1094| sg(99.82%), det(59.60%), def(59.60%), modif(100%)
- | Beh rde | mitteilen | 952 | pl(82.14%), det(99.89%), def(99.37%), modif(63.13%)
c | Stellung | nehmen | 831 | sg(99.88%), no_det(88.21%)
c | Angabe | machen | 802 | pl(98.50%), det(93.64%), quant(93.27%)
- | Polizeisprecher | mitteilen | 737 | sg(90.91%), det(92.67%), indef(90.77%), modif(100%)
i | Rolle | spielen | 724 | sg(98.62%), det(97.93%), indef(65.61%)
c | Problem | l sen | 690 | det(89.13%), def(64.06%), modif(51.88%)
- | Zeitung | berichten | 670 | sg(94.78%), det(94.78%), def(91.79%), modif(83.28%)
- | Nachrichtenagentur| melden | 667 | sg(98.95%), det(98.20%), def(97.90%), modif(100%)
c | Rechnung | tragen | 661 | sg(100%), no_det(98.94%)
- | Unternehmen | mitteilen | 655 | sg(98.16%), det(98.32%), def(98.32%), modif(66.26%)
c | Chance | haben | 614 | sg(85.67%), det(86.31%)
c | Beitrag | leisten | 575 | sg(96.70%), det(95.65%), indef(52.70%), modif(62.54%)
- | Polizei | berichten | 564 | sg(100%), det(98.40%), def(98.40%)
c | Einigung | erzielen | 551 | sg(99.82%), det(92.20%), quant(58.26%)
- | Sprecher | sagen | 508 | sg(76.38%), det(97.83%), indef(76.18%), modif(56.69%)
c | Arbeit | aufnehmen | 492 | sg(98.37%), det (98.37%), poss(76.62%)
c | Ziel | erreichen | 476 | sg(78.78%), det(96.22%)
- | Nachrichtenagentur| berichten | 454 | sg(100%), det(98.90%), def(98.90%), modif(100%)
c | Druck | aus ben | 451 | sg(100%), no_det(88.70%), modif(74.72%)
c | Erfolg | haben | 438 | sg(99.32%), no_det(78.54%)
- | Frau | sein | 425 | sg(54.59%), det(50.12%), modif(61.41%)
- | Land | verlassen | 421 | sg(99.76%), det(98.57%), def(86.70%)
c | Frage | stellen | 419 | sg(71.60%), det(79.00%), def(68.26%)
Figure 6: sample results
47

References
Sabine Bartsch. 2004. Structural and functional prop-
erties of collocations in English. A corpus study
of lexical and pragmatic constraints on lexical co-
occurrence Narr, T bingen.
Oliver Christ, Bruno M. Schulze, Anja Hofmann and
Esther K nig. 1999. The IMS Corpus Workbench:
Corpus Query Processor (CQP). User’s manual. In-
stitut f r maschinelle Sprachverarbeitung, Univer-
sit t Stuttgart.
Jonathan Crowther, Sheila Dignen and Diana Lea.
2002. Oxford Collocations Dictionary for students
of English. Oxford University Press.
Stefan Evert. 2004. The Statistics of Word Cooc-
curences - Word Pairs and Collocations. PhD thesis.
Institut f r Maschinelle Sprachverarbeitung (IMS),
Universit t Stuttgart.
Stefan Evert, Ulrich Heid and Kristina Spranger. 2004.
Identifying morphosyntactic preferences in colloca-
tions. In M. T. Lino, M. F. Xavier, F. Ferreira, R.
Costa and R. Silva (eds.): Proceedings of the 4th In-
ternational Conference on Language Resources and
Evaluation (LREC 2004), p. 907 - 910. Lisbon, Por-
tugal.
Stefan Evert. 2004. The statistical analysis of mor-
phosyntactic distributions. In M. T. Lino et al.
(eds.): Proceedings of the 4th International Confer-
ence on Language Resources and Evaluation (LREC
2004), p. 1539 - 1542. Lisbon, Portugal.
John R. Firth. 1968. F.R. Palmer (ed.): Selected Papers
of J.R. Firth 1952-59. Longman, London.
Franz Josef Hausmann. 1979. Un dictionnaire de col-
locations est-il possible? Travaux de Linguistique
et de Litterature XVII(1), p. 187-195. Centre de
philologie et de littØrature romanes de l’universitØ
de Strasbourg.
Franz Josef Hausmann. 1989. Le dictionnaire de col-
locations. In F. J. Hausmann et al. (eds.): W rter-
b cher, Dictionaries, Dictionnaires, p. 1010 - 1019.
De Gruyter, Berlin.
Franz Josef Hausmann. 2003. Was sind eigentlich
Kollokationen? In K. Steyer (ed.): Wortverbindun-
gen - mehr oder weniger fest. Jahrbuch des Instituts
f r Deutsche Sprache 2003:309  334. De Gruyter,
Berlin.
Ulrich Heid and Rufus H. Gouws. 2005. A model
for a multifunctional electronic dictionary of col-
locations. Institut f r Maschinelle Sprachverar-
beitung (IMS), Universit t Stuttgart, submitted to
EURALEX 2006.
Ulrich Heid and Julia Ritz. 2005. Extracting colloca-
tions and their contents from corpora. F. Kiefer et
al. (eds.): Papers in Computational Lexicography:
Complex 2005. Hungarian Academy of Sciences,
Budapest.
Hannah Kermes. 2003. Off-line (and On-line) Text
Analysis for Computational Lexicography. Ar-
beitspapiere des Instituts f r Maschinelle Sprachver-
arbeitung (AIMS):9(3). Stuttgart.
Brigitte Krenn. 2000. The Usual Suspects: Data Ori-
ented Models for the Identi cation and Representa-
tion of Lexical Collocations. PhD thesis. DFKI und
Universit t des Saarlandes, Saarbr cken.
Brigitte Krenn. 2000. Collocation Mining: Ex-
ploiting Corpora for Collocation Identi cation and
Representation. W. Z hlke and E. G. Schukat-
Talamazzini (eds.): Proceedings of KONVENS 2000,
p. 209-214. Ilmenau, Deutschland.
Julia Ritz. 2005. Entwicklung eines Systems zur
Extraktion von Kollokationen mittels morphosyn-
taktischer Features. Diploma Thesis. Institut f r
Maschinelle Sprachverarbeitung (IMS), Universit t
Stuttgart.
Helmut Schmid. 1994. Probabilistic part-of-speech
tagging using decision trees. D. Jones and H.
Somers (eds.): Proceedings of the International
Conference on New Methods in Language Process-
ing (NeMLaP). Manchester, U.K.
Frank Smadja. 1993. Retrieving Collocations from
Text: Xtract. Computational Linguistics (19), p.
143-177. Manchester, U.K.
AgnŁs Tutin. 2004. Pour une modØlisation dynamique
des collocations dans les textes. G. Williams and
S. Vessier (eds.): Proceedings of the Eleventh EU-
RALEX International Congress. Lorient, France.
Angelika W llstein-Leisten, Axel Heilmann, Peter
Stepan and Sten Vikner. 1997. Deutsche Satzstruk-
tur. Stauffenburg, T bingen.
