Learning Part-of-Speech Guessing Rules from Lexicon: 
Extension to Non-Concatenative Operations* 
Andrei Mikheev 
HCRC Language Technology Group 
University of Edinburgh 
2 Buccleuch Place 
Edinburgh EH8 9LW, Scotland, UK 
: Andrei. Mikheev@ed. ac. uk 
Abstract 
One of the problems in part-of-speech 
tagging of real-word texts is that of 
unknown to the lexicon words. In 
(Mikheev, 1996), a technique for fully 
unsupervised statistical acquisition of 
rules which guess possible parts-of- 
speech for unknown words was proposed. 
One of the over-simplification assumed 
by this learning technique was the acqui- 
sition of morphological rules which obey 
only simple coneatenative regularities of 
the main word with an affix. In this pa- 
per we extend this technique to the non- 
concatenative cases of suffixation and as- 
sess the gain in the performance. 
1 Introduction 
Part-of-speech (pos) taggers are programs which 
assign a single pos-tag to a word-token, provided 
that, it is known what parts-of-speech this word 
can take on in principle. In order to do that tag- 
gers are supplied with a lexicon that lists possible 
l'os-tags for words which were seen at the training 
phase. Naturally, when tagging real-word texts, 
one can expect to encounter words which were 
not seen at the training phase and hence not in- 
cluded into the lexicon. This is where word-Pos 
guessers take their place - they employ the analy- 
sis of word features, e.g. word leading and trailing 
characters to figure out its possible pos categories. 
Currently, most of the taggers are supplied with 
a word-guessing component for dealing with un- 
known words. The most popular guessing strat- 
egy is so-called "ending guessing" when a possible 
set, of pos-tags for a word is guessed solely on the 
basis of its trailing characters. An example of such 
guesser is the guesser supplied with the Xerox tag- 
ger (Kupiec, 1992). A similar approach was taken 
gome of the research reported here was funded as 
part of EPSRC project IED4/1/5808 "Integrated Lan- 
guage Database". 
in (Weischedel et al., 1993) where an unknown 
word was guessed given the probabilities for an 
unknown word to be of a particular pos, its cap- 
italisation feature and its ending. In (Brill, 1995) 
a system of rules which uses both ending-guessing 
and more morphologically motivated rules is de- 
scribed. Best of these methods were reported to 
achieve 82-85% of tagging accuracy on unknown 
words, e.g. (Brill, 1995; Weischedel et al., 1993). 
In (Mikheev, 1996) a cascading word-Pos 
guesser is described. It applies first morpho- 
logical prefix and suffix guessing rules and then 
ending-guessing rules. This guesser is reported to 
achieve higher guessing accuracy than quoted be- 
fore which in average was about by 8-9% better 
than that of the Xerox guesser and by 6-7% bet- 
ter than that of Brill's guesser, reaching 87-92% 
tagging accuracy on unknown words. 
There are two kinds of word-guessing rules em- 
ployed by the cascading guesser: morphological 
rules and ending guessing rules. Morphological 
word-guessing rules describe how one word can be 
guessed given that another word is known. In En- 
glish, as in many other languages, morphological 
word formation is realised by affixation: prefixa- 
tion and suffixation, so there are two kinds of mor- 
phological rules: suffix rules (A '~) - rules which 
are applied to the tail of a word, and prefix rules 
(AP) -- rnles which are applied to the beginning 
of a word. For example, the prefix rule: 
AP : \[u, (VBD VBN) (JJ)l 
says that if segmenting the prefix "un" from an 
unknown word results in a word which is found 
in the lexicon as a past verb and participle (VBD 
VBN), we conclude that the unknown word is an 
adjective (J J). This rule works, for instance, for 
words \[developed -+undeveloped\]. An example of a 
suffix rule is: 
A ~ : \[ed (NN VB) (JJ VBD VBN)\] 
This rule says that if by stripping the suffix "ed" 
from an unknown word we produce a word with 
the pos-class noun/verb (NN VB), the unknown 
word is of the class adjective/past-verb/participle 
(JJ VBD VBN). This rule works, for instance, for 
770 
word l)airs \[book -+booked\], \[water -+watered\], etc. 
Unlike morphological guessing rules, ending- 
guessing rules do not require the main form of an 
unknown word to be listed in the lexicon. These 
rules guess a pos-c.lass for a word .just Oil the ba- 
sis of its ending characters and without looking up 
it'~ st;era in the lexicon. For example, an ending- 
guessing rule 
Ae: \[ing--- (aa NN VBG)\] 
says that if a word ends with "ing" it; can 
be an adjective, a noun or a gerund. Unlike 
a morphological rule, this rule does not ask to 
(:hock whether the snbstring preceeding the "ing"- 
ending is a word with a particular pos-tag. 
Not surt)risingly, morphoh)gical guessing rules 
are more accurate than ending-guessing rules lint 
their lexical coverage is more restricted, i.e. dmy 
are able to cover less unknown words. Sine('. they 
al-( ~. illore, accurate, in the cascading guesser they 
were al)plied before the ending-guessing rules and 
improved the pre, cision of the guessings by about 
5°./0. This, actually, resulted in about 2% higher 
accuracy of tagging on unknown words. 
Although in general the performance of the cas- 
cading guesser was detected to be only 6% worse 
than a general-language lexicon lookup, one of the 
over-simt)lifications assumed at the extraction of 
i;he mort)hological rules was that they obey only 
simI)le con(:atenative regularities: 
book • ~book-}-ed; take --+take-l-n; play -4playqoing. 
No atteml)tS were made to model non- 
concatenadve cases which are quite eoinmon in 
English, as for instance: 
try - ,tries; reduce-+reducing; advise-~advisable. 
So we thought that the incorporation of a set of 
guessing rule, s which call capture morphok)gical 
word dependencies with letter alterations should 
ext;end the lexieal coverage of tile morphoh)gical 
rules and hence might contribute to the overall 
guessing accuracy. 
In the rest of the paper first, we will I)riefly 
outline the unsupervised statistical learning tech- 
nique proposed in (Mikheev, 1996), then we pro- 
pose a modification which will allow for the incor- 
poration of the learning of non-concatenative mor- 
t)hological rules, and finally, wc will ewfluate and 
assess the contribution of the non-concatenative 
sutfix morphological rules to the overall tagging 
av, curaey on unknown words using the cascading 
guesser. 
2 The Learning Paradigm 
The major topic in the development of worth 
pos guessers is the strategy which is to be 
used f()r dm acquisition of the guessing rules. 
Brill (Brill, 1995) outlines a transformation-based 
learner which learns guessing rules from a pre- 
tagged training corpus. A statistical-based suffix 
learnex is presented in (Schmid, 1994). From a 
l)re-tagged training corpus it constructs the suf- 
fix tree where every sutfix is associated with its 
information measure. 
The learning technique employed in the in- 
duction of tile rules of the cascading guesser 
(Mikheev, 1996) does not require specially pre- 
pared training data and employs fully tmsuper- 
vised statistical learning from the lexicon supplied 
with the tagger and word-ti'equeneies obtained 
from a raw corpus. The learning is implemented 
as a two-staged process with fe.edback. First, set- 
dng certain parameters a set of guessing rules is 
acquired, th(m it is evaluated and the results of 
evaluation are used for re.-acquisition of a bette.r 
tuued rule-set. As it has been already said, this 
learning technktue t)roved to be very successful, 
but did not attempt at the acquisition of word- 
guessing rules which do not obey simple concate- 
nations of a main word with some prefix. Ilere we 
present an e, xte, nsion to accommodate such cases. 
2.1 Rule Extraction Phase 
In the initial learning technique (Mikheev, 1996) 
which ac(:ounted only tbr sitnl)le concatenative 
regularities a guessing rule was seen as a triph',: 
A = (S, I, H,) where 
S is the affix itself; 
I is the l'os-elass of words which should be 
looked llI) in the lexicon as main forms; 
R is the pos-elass which is assigned to unknown 
words if the rule is satisfied. 
IIere we extend this structure to handle cases of 
the mutation in the last n letters of tile main word 
(words of/-class), as, for instance, in the case of 
try -÷tries, wtlen the letter "y" is changed to "i" be- 
fore the suffix. To accommodate such alterations 
we included an additional mutation dement (M) 
into tile rule structure. This element keeps the, 
segment to be added to the main word. So the 
application of a guessing rule can be described as: 
unknown-word - 5' + M : I -}H, 
i.e. fl'om an unknown word we strip the affix S, 
add the nlutative segment M, lookup tile pro- 
duced string in the lexicon and if it is of (:lass 
I we conclude that the unknown word is of class 
H,. For examt)le: the suttix rule A~: 
\[ S ~-~ ied I~--- (NN, VB) R~--~ (JJ VBD VBN) M=y\] 
or in short lied (NN VB) (JJ VBD VBN) y\] 
says that if there is an unknown word which ends 
with "led", we should strip this ending and ap- 
pend to the remaining part the string "y". If 
then we find this word in the lexicon as (NN VB) 
(noun/verb), we conclude that the guessed word is 
of category (JZ VBD VBN) (adjective, past verb or 
participle). This rule, for example, will work for 
word pairs like ,pecify - specified or deny - denied. 
Next, we modified the V operator which was 
771 
used for the extraction of morphological guessing 
rules. We augmented this operator with the index 
n which specifies the length of the mutative end- 
ing of the main word. Thus when the index n is 
0 the result of the application of the V0 operator 
will be a morphological rule without alterations. 
The V1 operator will extract the rules with the 
alterations in the last letter of tile main word, as 
in the example above. The V operator is applied 
to a pair of words from the lexicon. First it seg- 
ments the last n characters of the shorter word 
and stores this in the M element of the rule. Then 
it tries to segment an affix by subtracting the 
shorter word without the mutative ending from 
the longer word. If the subtraction results in an 
non-empty string it creates a morphological rule 
by storing the pos-class of the shorter word as the 
/-class, the pos-class of the longer word as the R- 
(:lass and the segmented affix itself. For example: 
\[booked (JJ VBD VBN)\] Vo \[book (NN VB)\] --) 
A s : \[ed (NN VB) (JZ VBD VBN) .... \] 
\[advisable (JJ VBD VBN)\] ~1 \[advise (NN VB)\] --~ 
A ~ : \[able (NN VB) (JJ VBD VBN) "e"\] 
The V operator is applied to all possible 
lexicon-entry pairs and if a rule produced by such 
an application has already been extracted from 
another pair, its frequency count (f) is incre- 
mented. Thus sets of morphological guessing rules 
together with their calculated frequencies are pro- 
duced. Next, from these sets of guessing rules 
we need to cut out infrequent rules which might 
bias the further learning process. To do that we 
eliminate all the rules with the frequency f less 
than a certain threshold 0:. Such filtering reduces 
the rule-sets more than tenfold and does not leave 
clearly coincidental cases among the rules. 
2.2 Rule Scoring Phase 
Of course, not all acquired rules are equally good 
as plausible guesses about word-classes. So, for 
ew~ry acquired rule we need to estimate whether 
it is an effective rule which is worth retaining in 
the final rule-set. To perform such estimation 
we take one-by-one each rule from the rule-sets 
produced at the rule extraction phase, take each 
word-token from the corpus and guess its POS-set 
using the rule if the rule is applicable to the word. 
For example, if a guessing rule strips a pro'titular 
suffix and a current word from the corpus does not 
have such suffix we classify these word and rule 
as incompatible and the rule as not applicable to 
that word. If the rule is applicable to the word we 
perform lookup in the lexicon and then compare 
the result of the guess with the information listed 
in the lexicon. If the guessed pos-set is the same 
as the Pos-set stated in the lexicon, we count it as 
success, otherwise it is failure. Then for each rule 
:usually we set this threshold quite low: 2-4. 
we calculate its score as explained in (Mikheev, 
1996) using the scoring function as follows: 
.~/t 1 scorei =/3i - 1.65 * V '~, t~ + log(ISd)) 
where /3 is the proportion of all positive out- 
comes (x) of the rule application to the total num- 
ber of compatible to the rule words (n), and ISl 
is the length of the affix. We also smooth/3 so as 
not to have zeros in positive or negative outcome 
probabilities:/3 = n+l " 
Setting the threshold Os at a certain level lets 
only the rules whose score is higher than the 
threshold to be included into the final rule-sets. 
The method for setting up the threshold is based 
on empirical evaluations of the rule-sets and is de- 
scribed in Section 2.3. 
2.3 Setting the Threshold 
The task of assigning a set of pos-tags to a par- 
ticular word is actually quite similar to the task 
of document categorisation where a document 
should be assigned with a set of descriptors which 
represent its contents. The performance of such 
assignment can be measured in: 
recall - the percentage of pos-tags which the 
guesser assigned correctly to a word; 
precision - the percentage of POS-tags tile 
guesser assigned correctly over the total number 
of pos-tags it assigned to the word; 
coverage - tile proportion of words which the 
guesser was able to classify, but not necessarily 
correctly. 
There are two types of test-data in use at this 
stage. First, we measure the performance of a 
guessing rule-set against the actual lexicon: ev- 
ery word from the lexicon, except for closed-class 
words and words shorter than five characters, is 
guessed by the rule-sets and the results are com- 
pared with the information the word has in the 
lexicon. In the second experiment we measure 
the performance of the guessing rule-sets against 
the training corpus. For every word we mea- 
sure its metrics exactly as in the previous exper- 
iment. Then we multiply these measures by the 
corpus frequency of this particular word and av- 
erage them. Thus the most fi'equent words have 
the greatest influence on the final measures. 
To extract the best-scoring rule-sets for each ac- 
quired set of rules we produce several final rule- 
sets setting the threshold 0, at different values. 
For each produced rule-set we record tile three 
metrics (precision, recall and coverage) and choose 
the sets with the best aggregate measures. 
3 Learning Experiment 
One of the most important issues in the induction 
of guessing rule-sets is the choice of right data for 
training. In our approach, guessing rules are ex- 
772 
-Guessing 
Strategy 
-Suffix (S6o) 
SutIix with alt. (Aso) 
S60-I-As0 
Aso +S(~o 
Ending (E75) 
Srm+E75 
$6o +As0 +E75 
As0+S60+ET~ 
Lexicon 
Precision Recall Coverage 
0.920476 0.959087 0.373851 
0.964433 0.97194 0.193404 
0.925782 0.959568 0.4495 
0.928376 0.959457 0.4495 
0.666328 0.94023 0.97741 
0.728449 0.941157 0.9789471 
0.739347 0.941548 0.979181 
0.740538 0.941497 0.979181 
Corpus 
Precision Recall Coverage 
0.978246 0.973537 0.29785 
0.996292 0.991106 0.187478 
0.981375 0.977098 0.370538 
0.981844 0.977165 0.370538 
0.755653 0.951342 0.958852 
0.798186 0.947714 0.961047 
0.805789 0.948022 0.961047 
0.805965 0.948051 0.961047 
Table l.: Results of the cascading application of the rule-sets over the training lexicon and training 
corpus. As0 - suffixes with alterations scored over 80 points, $60 - suffixes without alterations scored 
over 60 points, ET~ - ending-guessing rule-set scored over 75 points. 
tracted from the lexicon and the actual corpus fre~ 
qnencies of word-usage then allow for discrinfina- 
tion between rules which are no longer productive 
(but haw'~ left their imprint on the basic lexicon) 
and rules that are productive in real-life texts. 
Thus the major factor ill the learning process is 
the lexicon - it should be as general as possible 
(list all possible Poss for a word) and as large as 
possible, since guessing rules are meant to capture 
general language regularities. The corresponding 
corpus should include most of the words fi'om the 
lexicon and be large enough to obtain reliable es- 
timates of word-frequency distribution. 
We performed a rule-induction experiment us- 
ing the lexicon and word-frequencies derived 
from the Brown Corpus (Prancis&Kucera, 1982). 
There are a number of reasons tbr choosing tile 
Brown Corpus data for training. The most im- 
portant ones are that the Brown Corpus provides 
a model of general multi-domain language use, 
so general language regularities carl be induced 
h'om it;, and second, many taggers come with data 
trained on the. Brown Corpus which is useflll for 
comparison and evaluation. This, however, hy no 
means restricts the described technique to that or 
any other tag-set, lexicon or corpus. Moreover, 
despite the fact that tile training is performed 
on a particular lexicon and a particular corpus, 
the obtained guessing rules suppose to be domain 
and corpus independent and the only training- 
dependent, feature is the tag-set in use. 
Using the technique described above and the 
lexicon derived frora the Brown Corpus we ex- 
tracted prefix morphological rules (no alter- 
ations), suffix morphological rules without alter- 
ations and ending guessing rules, exactly as it was 
done in (Mikheev, 1996). Then we extracted suf- 
fix morphological rules with alterations ill the last 
letter (V1), which was a new rule-set for the cas- 
cading guesser. Quite interestingly apart froln tile 
expected suffix rules with alterations as: 
\[ S= led 1= (NN, VB) R= (JJ VBD VBN) M=y\] 
which can handle pairs like deny -+denied, this 
rule-set was populated with "second-order" rules 
which describe dependencies between secondary 
fornls of words. For instance, the rule 
\[ S= ion I= (NNS VBZ) R= (NN) M=8\] 
says if by deleting the suffix "ion" from a word 
and adding "s" to the end of the result of this 
deletion we produce a word which is listed in the 
lexicon as a plural noun and 3-rd form of a verb 
(NNS VBZ) the unknown word is a noun (NN). 
This rule, for instance, is applicable to word pairs: 
affects -+affection, asserts -+assertion, etc. 
Table 1 presents some results of a comparative 
study of the cascading application of the new rule- 
set against the standard rule-sets of the cascading 
guesser. Tim first part of Table 1 shows tile best 
obtained scores for the standard suffix rules (S) 
and suffix rules with ~flterations in the last let- 
ter (A). Wtmn we applied the two suffix rule-sets 
cascadingly their joint lexical coverage increased 
by about 7-8% (from 37% to 45% on the lexicon 
and fl'om 30% to 37% on the corpus) while pre- 
cision and recall remained at tile sanle high level. 
This was quite an encouraging result which, a('- 
tually, agreed with our prediction. Then we mea- 
sured whether suffix rules with alterations (A) add 
any improveulent if they are used in conjunction 
with the ending-guessing rules. Like in the previ- 
ous experiment we measured the precision, recall 
and coverage both on tim lexicon and on tile cor- 
pus. The second part of Table 1 shows that sim- 
ple concatenative suffix rules ($60) improved the 
precision of the guessing when they were applied 
before the ending-guessing rules (E75) by about 
5%. Then we cascadingly applied the suffix rules 
with alterations (As0) whictl caused further im- 
provement in precision by about 1%. 
After obtaining the optimal rule-sets we per- 
formed tile same experiments on a word-sample 
which was not included into the training lexicon 
and corpus. We gathered about three thousand 
words from tile lexicon devcloped for tile Wall 
773 
Total Unkn. 
Score Score 
Lexicon Guessing 
strategy 
Pull standard: P+S+E 
)hfll with new: P+A+S+E 
Small standard: P+S+E 
Small with new: P+A+S+E 
Total Unkn. Total 
words words mistag. 
5,970 347 
5,970 347 
5,970 2,215 332 
5,970 2,215 311 
Unkn. 
mistag. 
292 33 
292 33 
309 
288 
95.1% 
95.1% 
94.44% 
94.79% 
90.5% 
90.5% 
86.05% 
87.00% 
Table 2: Results of tagging a text using the standard Prefix+Suffix+Ending cascading guesser and the 
guesser with the additional rule-set of suffixes-with-Alterations. For each of these cascading guessers 
two tagging experiments were performed: the tagger was equipped with the flfll Brown Corpus lexicon 
and with the small lexicon of closed-class and short words (5,465 entries). 
Street Journal corpus 2 and collected frequencies 
of these words in this corpus. At this test-sample 
evaluation we obtained similar metrics apart from 
the (:overage which dropped by about 7% for both 
kinds of sutfix rules. This, actually, did not come 
as a surprise, since many main tbrms required by 
the suffix rules were missing in the lexicon. 
4 Evaluation 
The direct performance measures of the rule-sets 
gave us the grounds for the comparison and se- 
lection of the best performing guessing rule-sets. 
The task of unknown word guessing is, however, a 
subtask of the overall part-of-speech tagging pro- 
cess. Thus we are mostly interested in how the 
advantage of one rule-set over another will affect 
the tagging performance. So, we performed an in- 
dependent evaluation of the lint)act of the word 
guessing sets on tagging accuracy. In this evalu- 
ation we used the cascading application of prefix 
rules, suffix rules and ending-guessing rules as de- 
scribed in (Mikheev, 1996). We measured whether 
the addition of the suffix rules with alterations 
increases the accuracy of tagging in comparison 
with the standard rule-sets. In this experiment we 
used a tagger which was a c++ re-implementation 
of the LISP implemented HMM Xerox tagger de- 
scribed in (Kupiec, 1992) trained on the Brown 
Corpus. For words which failed to be guessed by 
tile guessing rules we applied the standard method 
of classifying them as common nouns (NN) if they 
are not capitalised inside a sentence and proper 
nouns (NP) otherwise. 
In the evaluation of tagging accuracy on un- 
known words we payed attention to two metrics. 
First we measure the accuracy of tagging solely 
on unknown words: UnkownSeore 
= CorrectlyTa,q,qcdUnkownWords 
TotalUnknownWords This metric gives us the exact measure of how 
the tagger has done when equipped with different 
guessing rule-sets. In this case, however, we do 
not account for the known words which were mis- 
tagged because of the unknown ones. To put a 
9these words were not listed in the training lexicon 
perspective on that aspect we measure the overall 
tagging performance: 
TotaIScore = C°rrectlyTaggedW°rds TotaIWords 
To perform such evaluation we tagged several 
texts of different origins, except ones from the 
Brown Corpus. These texts were not seen at the 
training phase which means that neither the tag- 
ger nor the guesser had been trained on these texts 
and they naturally had words unknown to the lex- 
icon. For each text we performed two tagging ex- 
periments. In tile first experiment we tagged the 
text with the full-fledged Brown Corpus lexicon 
and hence had only those unknown words which 
naturally occur in this text. In the second ex- 
periment we tagged the same text with the lexi- 
con which contained only closed-class a and short 4 
words. This small lexicon contained only 5,456 
entries out of 53,015 entries of the original Brown 
Corpus lexicon. All other words were considered 
as unknown and had to be guessed by the guesser. 
In both experiments we ineasured tagging accu- 
racy when tagging with the guesser equipped with 
the standard Prefix+Suffix+Ending rule-sets and 
with the additional rule-set of suffixes with alter- 
ations in the last letter. 
Table 2 presents some results of a typical ex- 
ample of such experiments. There we tagged a 
text of 5,970 words. This text was detected to 
have 347 unknown to the Brown Corpus lexicon 
words and as it can be seen the additional rule- 
set did not cause any improvement to the tagging 
accuracy. Then we tagged tile same text using 
the small lexicon. Out of 5,970 words of the text, 
2,215 were unknown to the small lexicon. Here 
we noticed that the additional rule-set improved 
tile tagging accuracy on unknown words for about 
1%: there were 21 more word-tokens tagged cor- 
rectly because of the additional rule-set. Among 
these words were: "classified", "applied", "tries", 
"tried", "merging", "subjective", etc. 
aarticles, prepositions, conjunctions, etc. 
4shorter than 5 characters 
774 
5 Discussion and Conclusion 
The target; of the research reI)orted in this pa- 
1)er was to incorporate the learning of morl)holog- 
ical word-t'os guessing rules which (lo not ol)ey 
simI)le (:oncatenations of main words with affixes 
into the learning paradigm proposed in (Mikheev, 
1996). ~l.k) do that we extended the data stru(:- 
tures and the algorithlns for the guessing-rule ap- 
1)li(:ation to handle the mutations in the last n 
letters of the main words. Thus siml)le concate- 
native rules naturally became a sul)set of the mu- 
tative rules they can 1)e seen as mutative rules 
with the zero inutation, i.e. when the M element 
of the rule is empty. Simple. con(:atenative rules, 
however, are not necessarily regular morphological 
rules and quite often they capture other non-linear 
morphological dependen(:ies. For instance, con- 
sonant doubling is naturally cal)tured by the af 
fixes themselves and obey siml)le concatenations, 
as, fl)r exalnI)le, describes the suffix rule A~': 
\[ S = gangl= (NN VII) 1~: = (JJ NN VB(;) M~---""\] 
This rule. for examl)le , will work fl)r word pairs 
like t,~g - tagging OF dig - digging. Not() that here 
we don't speei\[y the prerequisites for the stem- 
word to have one syllable and end with the same 
consonant as in the beginifing of the affix. Our 
task here is not to provide a t)recise morpholog- 
ical deserii)tion of English 1)ut rather to SUl)t)ort 
computationally effective pos-guessings, by elll- 
1)loying some, morphological information. So, in- 
st;cad of using a prol)er morphological t)ro(:essor, 
we adopted an engineering at)preach which is ar- 
gued tbr in (Mikheev&Liubushkina, 1995). There 
is, of course, ilothing wrong with morphological 
processors perse, but it is hardly feasit)le to re- 
train them fully automatically for new tag-sets 
or to induce, new rules. Our shallow Ix~('hnique 
on the contrary allows to in(hlce such rules com- 
pletely automat;ically and ensure that these rules 
will have enough discriminative features for robust 
guessings. In fact, we abandoned the notion of 
morpheme attd are dealing with word segments re- 
gardless of whether they are, "proper" morphemes 
or nol;. So, for example, in the rule above "ging" 
is (:onsidered as a suffix which ill i)rincil)le is not 
right: the suffix is "ing" and "g" is the dubbed 
(:onsonant. Clearly, such nuan(:es are impossible 
to lem'n autolnati(:ally without specially l)repared 
training data, which is denied by the technique 
in use. On the other hand it is not clear that 
this fine-grained information will contribute to the 
task of morphological guessing. The simplicity of 
the l)rol)osed shallow morphology, however, en- 
sures flflly automatic acquisition of such rules and 
the emi)iri(:al evahlation presenl;ed in section 2.3 
('()ntirmed that they are just right for the task: 
1)recision ;rod recall of such rules were measured 
ili the railge of 96-99%. 
The other aim of the research tel)erred here 
was to assess whether nou-concatenative morpho- 
logical rules will improve the overall performance 
of the cascading guesser. As it was measured in 
(Mikheev, 1996) simple concatenative prefix and 
sutlix morphological rules iInproved the overall 
i)recision of the cascading guesser 1)y about 5%, 
which resulted in 2% higher a(:curacy of tagging 
on mlknown words. The additional rule-set of stir k 
fix rules with one, letter mutation caused soille 
flirt, her improvement. The precision of the guess- 
ing increased by al)out 1% and the tagging ac- 
cura(:y on a very large set of unknown words in- 
creased l)y at)out 1%. in (:onchlsion we (:tin say 
that although the ending-guessing rules, which 
are nmeh simpler than morphological rules, can 
handle words with affixes longer than two chara(> 
ters almost equally well, in the fi'amework of pos- 
tagging even a fl'action of per(:ent is an important 
imi)rovement. Therefore the (:ontribution of the 
morphological rules is wflual)le and ne(:essary for 
I;he robust t'os-tagging of real-world texts. 
References 
E. Brill 1995. 'l'ransformation-l)ased error-driven 
learning and Natural Language t)roeessing: a 
case study in part-ot~speeeh tagging. In Com- 
putational Linguistics 21(4) 
W. Fran(:is and 1I. Kucera 1982. Frequency 
Analysis of English Usage. Houghton Mitflin, 
Boston. 
3. Kupiec 1992. l/,obust Part-of-Speech Tagging 
Using a lIidden Markov Model. in Uomputer 
,qpeech and Language 
A. Mikhe<;v and L. Liubushkina 1995. Russian 
m<)rphology: An engineering approach. In Nat- 
u'ynl Language Engineering, 1(3) 
A. Mikheev 1996. Unsupervised Lem:ning of 
\¥ord-C~tegory Guessing Rules. In l'rocecdings 
of the .?~th Anr~atal Meeting of the As.~ocicttion 
for Computational Linguistics (AUL-96), Santa 
Cruz, USA. 
\[t. Schmid 1994. Part of Speech Tagging with 
Neural Networks. In P~vcecdinqs of the lSth In- 
ter'national Cot@fence on Cornputatior~,al Lin- 
guistics (COLING-9~), Kyoto, Japan. 
R. Weischedel, M. Meteer, R. Schwartz, 
L. Ramshaw and J. Pahnucci 1993. Coping 
with ambiguity and unknown words through 
t)rol)at)ilistic models. In Computational Lin- 
.quistic,% w)l 19/2 
775 
