Proceedings of the Third ACL-SIGSEM Workshop on Prepositions, pages 17–22,
Trento, Italy, April 2006. c©2006 Association for Computational Linguistics
A Quantitative Approach to Preposition-Pronoun Contraction in Polish
Beata Trawi·nski
University of T bingen
SFB 441
Nauklerstra e 35
D-72074 T bingen
trawinski@sfs.uni-tuebingen.de
Abstract
This paper presents the current results of
an ongoing research project on corpus
distribution of prepositions and pronouns
within Polish preposition-pronoun con-
tractions. The goal of the project is to pro-
vide a quantitative description of Polish
preposition-pronoun contractions taking
into consideration morphosyntactic prop-
erties of their components. It is expected
that the results will provide a basis for a re-
vision of the traditionally assumed in ec-
tional paradigms of Polish pronouns and,
thus, for a possible remodeling of these
paradigms. The results of corpus-based in-
vestigations of the distribution of preposi-
tions within preposition-pronoun contrac-
tions can be used for grammar-theoretical
and lexicographic purposes.
1 Introduction
As (·Swidzi·nski and Derwojedowa, 2004) and
(Trawi·nski, 2005) have observed, preposition-
pronoun contraction (PPC) in Polish (cf. (1)) is a
highly idiosyncratic phenomenon.
(1) a. na niego ‘on him’ a0 na·n ‘on_him’
b. w niego ‘in him’ a0 we·n ‘in_him’
On the one hand, not just any pronoun can occur
in a PPC, on the other hand, the set of prepositions
which are able to contract with pronouns involves
a very limited number of elements.1
The distribution of pronouns and prepositions
within Polish PPCs has not yet been discussed
1For a discussion on prosodic, morphosyntactic and se-
mantic properties of Polish PPC, see (Trawi·nski, 2005).
in detail. There are, however, several traditional
approaches to Polish third person personal pro-
nouns (TPPPs) which provide some relevant infor-
mation.2 In the following, the approach to TPPPs
of (Saloni, 1981), adopted in our research project,
will be presented.
According to (Saloni, 1981), the inventory of
Polish TPPPs comprises masculine human, mas-
culine animate, masculine inanimate, feminine,
and neuter pronouns, in ecting for case (nomina-
tive, genitive, dative, accusative, instrumental and
locative), number (singular and plural), postprepo-
sitionality (yes or no) and accentability (yes or no).
The in ectional paradigms of TPPPs proposed by
(Saloni, 1981), and adopted in most Polish gram-
mars, indicate that only genitive and accusative
masculine human, masculine animate and mascu-
line inanimate singular TPPPs possess unaccented
postprepositional realizations, i.e., are able to con-
tract with prepositions.3 However, corpus evi-
dence indicates that there may be many further
possibilities of the realization of unaccented post-
prepositional pronouns, i.e., pronouns contractible
with prepositions.
Corpus data also provide interesting informa-
tion about the distribution of prepositions within
PPCs. Only some PPCs found in the corpus cor-
respond with respect to the form of prepositions
contained in those PPCs, to dictionary data.
The goal of this research project is to character-
ize the corpus distribution of TPPPs and preposi-
tions occurring within PPCs and to quantitatively
analyze the results. While the  rst part of the
2Note that only third person personal pronouns can con-
tract with prepositions in Polish.
3Note that (Doroszewski and Wieczorkiewicz, 1972) even
claim that unaccented postprepositional pronouns are possi-
ble only in the accusative.
17
project has already been completed, the second
one is still in progress. Section 2 presents the re-
sults of the corpus examination in regard to the
distribution of pronouns and prepositions within
PPCs, Section 3 outlines the proposal of a quanti-
tative analysis of the results presented in Section 2,
and Section 4 sums up the discussion and outlines
future goals.
2 Corpus Distribution of Pronouns and
Prepositions within PPCs
For the corpus-based investigation of the distribu-
tion of pronouns and prepositions within Polish
PPCs, the IPI PAN Corpus of Polish was used.4
Because of their very low frequency, the PPCs
were searched for in the largest of the available IPI
PAN subcorpora, i.e., the automatically annotated
wstepny corpus (over 70 million segments).
PPCs had to be identi ed manually, as they
were not recognized in the wstepny corpus
as consisting of multiple segments, instead be-
ing identi ed as unknown forms (tagged by ign).
Thus, in the  rst instance, a search was performed
for all unknown forms ending in -(e)·n.5 Next,
a total of 1193 PPCs were manually extracted
from 3308 result matches. Later, an interpreta-
tion in terms of grammatical features was assigned
to each contracted pronoun by identifying its an-
tecedent. The antecedent identi cation proceeded
manually as well. Finally, the set of the acquired
PPCs was veri ed by querying the corpus for all
potential contractions of unaccented postpreposi-
tional pronouns with each particular Polish prepo-
sition.
As a result, genitive and accusative masculine
human plural, locative masculine inanimate sin-
gular, genitive and accusative masculine inani-
mate plural, genitive and accusative neuter singu-
lar, genitive, accusative and locative neuter plu-
ral, genitive and accusative feminine singular, and
genitive, accusative and locative feminine plural
pronominal forms within PPCs were recorded in
addition to the masculine human, masculine ani-
mate and masculine inanimate singular pronomi-
4The IPI PAN Corpus is a large (over 300 million seg-
ments), morphosyntactically annotated corpus of Polish, de-
veloped at the Institute of Computer Science at the Polish
Academy of Sciences (cf. (Przepi rkowski, 2004)). The cor-
pus web page is located athttp://korpus.pl. For quan-
titative information about the corpus, see Przepi rkowski (to
appear).
5Note that all TPPPs contracting with prepositions are re-
alized by the syncretic form -(e)·n.
nal forms.
A further observation that was made on the
basis of corpus data was that the set of prepo-
sitions detected in contractions with unaccented
postprepositional pronouns involves a very lim-
ited number of elements, more precisely dla ‘for’,
do ‘to’, na ‘on’, od ‘from’, po ‘after’, przez
‘by’, w ‘in’, za ‘behind’, z ‘with’, and przed ‘in
front of’. No occurrences of contractions con-
taining other prepositions were found in the cor-
pus. While the absence of contractions involving
secondary prepositions, such as ponad ‘above’,
poprzez ‘through’, mi edzy ‘between’, etc. corre-
sponds to dictionary data, the non-appearance of
contractions containing prepositions such as bez
‘without’, o ‘about’, nad ‘above’, or pod ‘under’,
provided in Polish dictionaries such as (Dubisz,
2003) or (Ba·nko, 2000), does not.6
Figure 1 on the next page presents an overview
of the distribution of all unaccented postprepo-
sitional pronouns and prepositions within PPCs
found in the IPI PAN Corpus. For each pronoun
form, the context in which it occurs is speci ed,
i.e., the contraction of that form with a particu-
lar preposition, and the total number of times this
form occurred together with the percentage of the
total frequency of all unaccented postprepositional
forms is recorded. In addition, the total of all oc-
currences of each contraction found in the corpus
is indicated, as well as the percentage of the total
frequency of all preposition-pronoun contractions
occurring in the corpus.7
3 Quantitative Interpretation
To determine whether the distribution of the un-
accented postprepositional pronouns and prepo-
sitions within PPCs found in the IPI PAN Cor-
pus may be considered linguistically signi cant
and, in consequence, may establish the basis for
a revision of the traditionally assumed in ectional
paradigms, a number of quantitative procedures
must be performed.
First of all, it must be determined whether the
frequency of each unaccented postprepositional
6Note, however, that in spite of the fact that contractions
such as o·n ‘for_TPPP’ or we·n ‘in_TPPP are included in dic-
tionaries of contemporary Polish, these expressions are not
accepted by all native speakers of Polish.
7The speci cations m1, m2 and m3 refer to masculine
human, masculine animate and masculine inanimate respec-
tively. The minus signs indicate the absence of particular
forms by means of the case government properties of the par-
ticular preposition.
18
dla·n do·n na·n we·n ze·n ode·n przeze·n po·n za·n przede·n Total, Percentage
‘for_TPPP’ ‘to_TPPP’ ‘on_TPPP’ ‘in_TPPP’
‘with_TPPP’ /
‘from_TPPP’ ‘from_TPPP’ ‘by_TPPP’ ‘after_TPPP’ ‘behind_TPPP’ ‘in front of_TPPP
nom, m1, sg           0 0.00 a1
gen, m1, sg 74 72   17 12   0  175 14.68 a1
dat, m1, sg        0   0 0.00 a1
acc, m1, sg   207 39   140 0 4 0 390 32.70 a1
instr, m1, sg     0    0 0 0 0.00 a1
loc, m1, sg   0 0    0   0 0.00 a1
nom, m1, pl           0 0.00 a1
gen, m1, pl 2 1   0 0   0  3 0.25 a1
dat, m1, pl        0   0 0.00 a1
acc, m1, pl   3 0   2 0 0 0 5 0.42 a1
instr, m1, pl     0    0 0 0 0.00 a1
loc, m1, pl   0 0    0   0 0.00 a1
nom, m2, sg           0 0.00 a1
gen, m2, sg 2 2   1 0   0  5 0.42 a1
dat, m2, sg        0   0 0.00 a1
acc, m2, sg   10 0   0 0 0 0 10 0.84 a1
instr, m2, sg     0    0 0 0 0.00 a1
loc, m2, sg   0 0    0   0 0.00 a1
nom, m2, pl           0 0.00 a1
gen, m2, pl 0 0   0 0   0  0 0.00 a1
dat, m2, pl        0   0 0.00 a1
acc, m2, pl   0 0   0 0 0 0 0 0.00 a1
instr, m2, pl     0    0 0 0 0.00 a1
loc, m2, pl   0 0    0   0 0.00 a1
nom, m3, sg           0 0.00 a1
gen, m3, sg 14 102   49 8   0  173 14.51 a1
dat, m3, sg        0   0 00.0 a1
acc, m3, sg   134 48   62 1 20 1 266 22.31 a1
instr, m3, sg     0    0 0 0 0.00 a1
loc, m3, sg   1 0    0   1 0.08 a1
nom, m3, pl           0 00.0 a1
gen, m3, pl 0 5   4 0   0  9 0.75 a1
dat, m3, pl       1 0   1 0.08 a1
acc, m3, pl   1 2   1 0 1 0 5 0.42 a1
instr, m3, pl     0    0 0 0 0.00 a1
loc, m3, pl   0 0    0   0 0.00 a1
nom, neut, sg           0 0.00 a1
gen, neut, sg 3 16   16 1   0  36 3.02 a1
dat, neut, sg        0   0 0.00 a1
acc, neut, sg   13 6   32 0 2 0 53 4.45 a1
instr, neut, sg     0    0 0 0 0.00 a1
loc, neut, sg   0 0    0   0 0.00 a1
nom, neut, pl           0 0.00 a1
gen, neut, pl 0 5   0 0   0  5 0.42 a1
dat, neut, pl        0   0 0.00 a1
acc, neut, pl   0 1   1 0 0 0 2 0.17 a1
instr, neut, pl     0    0 0 0 0.00 a1
loc, neut, pl   0 1    0   1 0.08 a1
nom, fem, sg           0 0.00 a1
gen, fem, sg 5 15   4 1   0  25 2.06 a1
dat, fem, sg        0   0 0.00 a1
acc, fem, sg   5 4   10 0 0 0 19 1.59 a1
instr, fem, sg     0    0 0 0 0.00 a1
loc, fem, sg   0 0    0   0 0.00 a1
nom, fem, pl           0 0.00 a1
gen, fem, pl 1 1   2 1   0  5 0.42 a1
dat, fem, pl        0   0 0.00 a1
acc, fem, pl   2 0   1 0 0 0 3 0.25 a1
instr, fem, pl     0    0 0 0 0.00 a1
loc, fem, pl   1 0    0   1 0.08 a1
Total 101 219 377 101 93 23 250 1 27 1 1193
Percentage 8.47a1 18.36 a1 31.60 a1 8.47 a1 7.80 a1 1.93a1 20.96 a1 0.08a1 2.26 a1 0.08 a1 100 a1
Figure 1: The distribution of unaccented postprepositional pronouns and prepositions within the PPCs
occurring in the IPI PAN Corpus
19
pronoun form in the corpus is statistically signif-
icant. For this purpose, the distribution of all ac-
cented postprepositional pronouns must be com-
piled. On the basis of the total frequency of
accented and unaccented postprepositional pro-
nouns, the statistical signi cance can be calcu-
lated using the a2 a3 test, for instance. If one deter-
mines that the frequency of unaccented postprepo-
sitional pronouns in the corpus is statistically sig-
ni cant, ratios of the total number of particular
accented postprepositional pronouns to the total
number of their unaccented counterparts can be as-
certained. These ratios can then be compared.8 If
the ratios of accented postprepositional pronouns
to their unaccented counterparts not included in
the traditionally assumed in ectional paradigms
correlate with the ratios of accented postpreposi-
tional pronouns to their unaccented counterparts
contained in the traditionally assumed in ectional
paradigms, the distribution of the unaccented post-
prepositional pronouns in the corpus may be con-
sidered linguistically important.
In our ongoing study, the distribution of
accented postprepositional pronouns combining
with the prepositions dla ‘for’, do ‘to’, na ‘on’,
w ‘in’, z ‘with’, od ‘from’, przez ‘by’, po ‘af-
ter’, za ‘behind’, and przed ‘in front of’ has
been ascertained. These pronouns correspond
to their unaccented counterparts occurring as
parts of the contractions dla·n ‘for_TPPP’, do·n
‘to_TPPP’, na·n ‘on_TPPP’, we·n ‘in_TPPP’, ze·n
‘with_TPPP’ / ‘from_TPPP’, ode·n ‘from_TPPP’,
przeze·n ‘by_TPPP’, po·n ‘after_TPPP’, za·n ‘be-
hind_TPPP’, and przede·n ‘in front of_TPPP respec-
tively. Note that assigning interpretations to pro-
nouns must proceed manually on the basis of their
antecedents, as a vast number of pronouns in the
IPI PAN Corpus are resolved incorrectly. Figure 2
on the next page provides the current results.9
8Alternatively, the percentage of occurrences of each un-
accented postprepositional pronoun of the total number of
occurrences of unaccented postprepositional pronouns and
the percentage of occurrences of each accented postprepositi-
nal pronoun of the total number of occurrences of accented
postprepositional pronouns can be ascertained and the results
compared.
9Note that in some cases, assigning an interpretation to a
given pronoun was impossible, which is indicated in Figure 2
by the question mark (?). In some cases, identi cation of an
antecedent was not possible, more than one antecedent can-
didate bearing different features came into question, or some
features provided by an antecedent and a given pronoun were
inconsistent with one another. In the majority of cases, mor-
phosyntactic features clashed with contextual / pragmatic /
natural features.
Currently, only the distributional characteri-
zation of genitive and accusative feminine sin-
gular postprepositional pronouns is available for
analysis. It has been ascertained that genitive
unaccented postprepositional feminine pronouns
are used signi cantly less frequently in the IPI
PAN Corpus than are genitive accented postprepo-
sitional feminine pronouns (a2 a3 =101.76 (df=1),
p<0.001), and accusative unaccented postprepo-
sitional feminine pronouns are used signi cantly
less frequently in the IPI PAN Corpus than are ac-
cusative accented postprepositional feminine pro-
nouns (a2
a3 =36.95 (df=1), p<0.001). The per-
centage of genitive unaccented postprepositional
feminine singular pronouns of the total of all
unaccented postprepositional pronouns amounted
to 2.06a4 , while the percentage of genitive ac-
cented postprepositional feminine singular pro-
nouns amounted to 11.41a4 . The percentage of
accusative unaccented postprepositional feminine
singular pronouns of the total of all unaccented
postprepositional pronouns was 1.59a4 , while the
percentage of accusative accented postpreposi-
tional feminine singular pronouns was 5.68a4 . The
ratios of the totals of genitive and accusative ac-
cented postprepositional feminine singular pro-
nouns to the totals of their unaccented counter-
parts are given in Figure 3. Additionally, Figure 3
provides the ratio of the total of all accented plu-
ral pronouns occurring in the contexts indicated
in Figure 2, to the total of the unaccented forms.
For the  nal conclusions, however, the distribution
patterns of particular plural pronouns must be de-
scribed.
Ratio
gen, fem, sg 226.56
acc, fem, sg 148.42
pl 759.60
Figure 3: Ratios of accented postprepositional
pronouns to their unaccented counterparts
In the next step, the remaining accented post-
prepositional pronoun forms will be identi ed in
the corpus and totaled.10 Then, the ratios of the
totals of these pronouns to the totals of their unac-
cented forms will be calculated. Finally, all ra-
10Note that the total frequency of accented postpreposi-
tional forms corresponding to unaccented forms with zero
frequency will, in fact, not affect the analysis.
20
dla TPPP do TPPP na TPPP w TPPP z TPPP od TPPP przez TPPP po TPPP za TPPP przed TPPP Total, Percentage
‘for TPPP’ ‘to TPPP’ ‘on TPPP’ ‘in TPPP’
‘with TPPP’ /
‘from TPPP’ ‘from TPPP’ ‘by TPPP’ ‘after TPPP’ ‘behind TPPP’ ‘in front of TPPP
nom, m1, sg
gen, m1, sg 1141 1902
dat, m1, sg
acc, m1, sg 192
instr, m1, sg 699
loc, m1, sg
nom, m1, pl
gen, m1, pl 1207 987
dat, m1, pl
acc, m1, pl 126
instr, m1, pl 310
loc, m1, pl
nom, m2, sg
gen, m2, sg 8 24
dat, m2, sg
acc, m2, sg 1
instr, m2, sg 25
loc, m2, sg
nom, m2, pl
gen, m2, pl 14 12
dat, m2, pl
acc, m2, pl
instr, m2, pl 9
loc, m2, pl
nom, m3, sg
gen, m3, sg 128 1066
dat, m3, sg
acc, m3, sg 99
instr, m3, sg 183
loc, m3, sg
nom, m3, pl
gen, m3, pl 166 808
dat, m3, pl
acc, m3, pl 16
instr, m3, pl 75
loc, m3, pl
nom, neut, sg
gen, neut, sg 80 336
dat, neut, sg
acc, neut, sg 14
instr, neut, sg 41
loc, neut, sg
nom, neut, pl
gen, neut, pl 170 429
dat, neut, pl
acc, neut, pl 7
instr, neut, pl 29
loc, neut, pl
nom, fem, sg
gen, fem, sg 872 2619 0 0 1514 659 0 0 0 0 5664 11.41 a1
dat, fem, sg
acc, fem, sg 0 0 1401 264 0 0 830 74 251 0 2820 5.68 a1
instr, fem, sg 580
loc, fem, sg
nom, fem, pl
gen, fem, pl 319 914
dat, fem, pl
acc, fem, pl 9
instr, fem, pl 123
loc, fem, pl
? 350 26
Total 4455 9097 4853 4652 15143 2582 3661 591 2815 1773 49622
Percentage 8.98 a1 18.33 a1 9.78 a1 9.37 a1 30.52 a1 5.20 a1 7.38a1 1.19a1 5.67 a1 3.57 a1 100 a1
Figure 2: The distribution of accented postprepositional pronouns in the IPI PAN Corpus
21
tios will be compared. If there are any signi -
cant differences between particular ratios, an at-
tempt will be made to ascertain possible reasons
for these differences (e.g., ungrammaticality, pro-
duction errors, meta data, etc.) and conclusions
will be made. If there are no signi cant differ-
ences between the particular ratios, it will be con-
cluded that the distribution patterns of pronouns
and prepositions within PPCs found in the corpus
are also linguistically signi cant and that the tradi-
tionally assumed in ectional paradigms of TPPPs,
as well as previous dictionary speci cations of
PPCs, may have to be revised.
4 Summary and Outlook
In this paper, the current results of our ongoing
corpus-based study on the distribution of preposi-
tions and pronouns within Polish PPCs were pre-
sented. At this point, conclusions can be drawn
that, according to corpus evidence, there seem to
exist more pronominal forms being able to con-
tract with prepositions than traditionally assumed.
On the other hand, corpus data provide fewer
prepositions contracting with pronouns than do
Polish dictionaries. To verify these results for the
purpose of a possible revision of the traditionally
assumed in ectional paradigms of TPPPs, as well
as for lexicographic purposes, a quantitative anal-
ysis was proposed which draws on the calculation
and comparison of ratios of the total frequency of
all accented postprepositional forms to the total
frequency of their unaccented counterparts. The
analysis will be completed within the next project
phase.
In future work, other corpora of Polish, such
as the PWN Corpus of Polish11 or the PELCRA
Corpus12 will be examined with respect to the
distribution of pronouns and prepositions within
PPCs, and the results will be compared with those
achieved using the IPI PAN Corpus.13 Further on,
meta data will be analyzed with respect to the dis-
11http://korpus.pwn.pl
12http://korpus.ia.uni.lodz.pl
13A preliminary list of PPCs occurring in the PWN Cor-
pus has been provided to us by Magdalena Derwojedowa
(personal communication). According to this list, the fol-
lowing PPCs appear in the PWN Corpus: dla·n ‘for_TPPP’,
do·n ‘to_TPPP’, nade·n ‘above_TPPP’, na·n ‘on_TPPP’, ode·n
‘from_TPPP’, o·n ‘above_TPPP’, po·n ‘after_TPPP’, przede·n
‘behind_TPPP’, przeze·n ‘by_TPPP’, we·n ‘in_TPPP’, ze·n
‘with_TPPP’ / ‘from_TPPP’.
This set of PPCs does not fully correspond to that found of
the IPI PAN Corpus. Thus, such a comparison seems to be
reasonable.
tribution of TPPPs. Finally, all results will be eval-
uated by human judges.
Acknowledgments
We would like to thank Magdalena Derwoje-
dowa, El zbieta Hajnicz, Timm Lichte, Adam
Przepi rkowski, Janina Rad , Zygmunt Saloni,
Marek ·Swidzi·nski and Marcin Woli·nski, as well as
the reviewers of the Third ACL-SIGSEM Work-
shop on Prepositions held at the EACL 2006 in
Trento for their helpful comments. We are also
grateful to Janah Putnam for proofreading this pa-
per.

References
Miros aw Ba·nko. 2000. Inny s ownik j ezyka pol-
skiego [Different Polish Dictionary]. Wydawnictwo
Naukowe PWN, Warszawa.
Witold Doroszewski and Boles aw Wieczorkiewicz.
1972. Gramatyka opisowa j ezyka polskiego z
·cwiczeniami [A Descriptive Grammar of Polish with
Exercises], volume II: Fleksja. Sk adnia [In ection.
Syntax.]. Pa·nstwowe Zak ady Wydawnictw Szkol-
nych, Warszawa.
Stanis aw Dubisz. 2003. Uniwersalny s ownik
j ezyka polskiego [The Universal Polish Dictionary].
Wydawnictwo Naukowe PWN, Warszawa.
Adam Przepi rkowski. 2004. The IPI PAN Corpus.
Preliminary Version. Institute of Computer Science
PAS, Warsaw.
Adam Przepi rkowski. to appear, . The Potential of
the IPI PAN Corpus. Pozna·n Studies in Contempo-
rary Linguistics, 41: .
Zygmunt Saloni. 1981. Uwagi o opisie  eksyjnym
tzw. zaimk w rzeczownych [Some Remarks on the
In exional Description of Polish Pronouns]. In Acta
Universitatis Lodziensis, volume 2 of Folia Linguis-
tica, pages 243 253. Uniwersytet   dzki.
Marek ·Swidzi·nski and Magdalena Derwojedowa.
2004. Idiosynkrazja na przeci eciu idiosynkrazyj,
czyli o poprzyimkowo·sci i liczebnikach [Idiosyn-
crasy at the Interface of Idiosynrasies. About Post-
prepositionality and Numerals]. In Andrzej Mo-
roz and Marek Wi·sniewski, editors, Studia z gra-
matyki i semantyki j ezyka polskiego, pages 33 42.
Wydawnictwo Uniwersytetu Miko aja Kopernika,
Toru·n.
Beata Trawi·nski. 2005. Preposition-Pronoun Contrac-
tion in Polish. In Proceedings of the Second ACL-
SIGSEM Workshop on The Linguistic Dimensions of
Prepositions and their Use in Computational Lin-
guistics Formalisms and Applications, pages 20 29,
University of Essex, Colchester, United Kingdom.
