Implementing Voting Constraints with Finite State 
Transducers 
Kemal Oflazer and G6khan T/Jr 
Department of Computer Engineering and Information Science 
Bilkent University, Bilkent, Ankara, TR-06533, Turkey 
{ko, tur)@cs, bilkent, edu. tr 
Abstract. We describe a constraint-based morphological disambiguation system in which 
individual constraint rules vote on matching morphological parses followed by its im- 
plementation using finite state transducers. Voting constraint rules have a number of 
desirable properties: The outcome of the disambiguation is independent of the order of 
application of the local contextual constraint rules. Thus the rule developer is relieved 
from worrying about conflicting rule sequencing. The approach can also combine statis- 
tically and manually obtained constraints, and incorporate negative constraints that rule 
out certain patterns. The transducer implementation has a number of desirable proper- 
ties compared to other finite state tagging and light parsing approaches, implemented 
with automata intersection. The most important of these is that since constraints do 
not remove parses there is no risk of an overzealous constraint "killing a sentence ~ by 
removing all parses of a token during intersection. After a description of our approach 
we present preliminary results from tagging the Wall Street Journal Corpus with this 
approach. With about 400 statistically derived constraints and about 570 manual con- 
straints, we can attain an accuracy of 97.82% on the training corpus and 97.29% on the 
test corpus. We then describe a finite state implementation of our approach and discuss 
various related issues. 
1 Introduction 
We describe a finite state implementation of a constraint-based morphological disambiguation 
system in which individual constraints vote on matching morphological parses and disambigua- 
tion of all tokens in a sentence is performed at the end, by selecting parses that collectively 
make up the the highest voted combination. The approach depends on assigning votes to con- 
straints via statistical and/or manual means, and then letting constraint rules cast votes on 
matching parses of a given lexical item. This approach does not reflect the outcome of matching 
constraint rules to the set of morphological parses immediately. Only after all applicable rules 
are applied to a sentence, all tokens are disambiguated in parallel. Thus, the outcome of the 
rule applications is independent of the order of rule applications. 
Constraint-based morphological disambiguation systems (e.g. \[6, 7, 15\]) typically look at a 
context of several sequential tokens each annotated with their possible morphological interpre- 
tations (or tags), and in a reductionistic way, remove parses that are considered to be impossible 
in the given context. Since constraint rule application is ordered, parses removed by one rule 
may not be used or referred to in subsequent rule applications. Addition of a new rule requires 
that its place in the sequence be carefully determined to avoid any undesirable interactions. 
Automata intersection based approaches run the risk of deleting all parses of a sentence, and 
have also been observed to end up with large intersected machines. Our approach eliminates the 
ordering problem, since parse removals are not committed during application, but only after all 
rules are processed. Figure 1 highlights the voting congtraints paradigm. 
91 
W1 W2 W3 W4 Wn 
tl tl tl tl ... .tl 
R1 R3 R2 -.. Rm 
Tokens 
Parses/Tags 
Voting Rules 
Figure 1. Voting Constraint Rules 
In the following sections we describe voting constraint rules and then some present pre- 
liminary results from tagging English. We then present the implementation using finite state 
transducers and discuss various issues involved. 
2 Voting Constraints 
Voting constraints operate on sentences where each token has been assigned all possible tags 
by a lexicon or by a morphological analyzer. We represent, using a directed acyclic graph 
(DAG), a sentence consisting n tokens wl,w2,...to,, each with morphological parses/tags 
ti,1, ti,2,..., ti,a,, ai being the number of ambiguous parses for token i. The nodes in the DAG 
represent token boundaries and arcs are labeled with triplets of the sort L -- (wi, tij, vij) where 
vij (initially 0) is the total vote associated with tag tij of wi. For instance, the sentence "I can 
can the can." would initially be represented by the graph shown in Figure 2, where bold arcs 
denote the contextually correct tags. 
(can,HD, O) (can, ~, O) (can MD, O) 
Figure 2. Representing Sentences with a directed a~yclic graph 
We describe each constraint on the ambiguous interpretation of tokens using rules with two 
components R= (C1, C2,.-', Cn; V), where the Ci's are, in general, feature constraints on a 
sequence of ambiguous parses, and V is an integer denoting the Vote assigned to the rule. For 
English, the features that we use are TAG and LEX, but it is certainly possibly to extend the 
set of features used, by including features such as initial letter capitalization, any derivational 
information, etc. 
The following examples illustrate some rules: 
92 
m 
m 
m 
mm 
mm 
m 
m 
m 
mm 
m 
m 
m 
mm 
m 
m 
m 
m 
m 
m 
m 
mm 
m 
1. ( \[TAG=lID\], \[TAG=VB\] ; 100) and ( \[TAG=MD\], \[TAG=II~\], \[TAG=VB\] ; 100) are two constraints 
with a high vote to promote modal followed a verb possibly with an intervening adverb. 
2. ( \[TAGffiDT,LEX=that\], \[TAG=IlIlS\] ; -100) demotes a singular determiner reading of tha~ 
before a plural noun. 
3. ( \[TAG=DT,LKXfeach\] \[TAGffiJJ,LKl=ol~her\] ; 100) is a rule with a high vote that captures a 
collocation \[\].0\]. 
The constraints apply to a sentence in the following manner. Assume, for a moment, all possible 
paths from the start node to the end node of a sentence DAG are explicitly enumerated. For 
each path, we apply each constraint to all possible sequences of token parses. For instance, 
let R - (CI, C~,..., C,,~; I/) be a constraint and let Li, Li+l,..-, Li+m-1 be some sequence of 
labels labeling sequential arcs of a path. We say R matches this sequence of parses if tag and 
token components of Ly, i _< j _< i + m - 1, subsumed by the corresponding constraint Cy-i+l. 
When such a rule matches a sequence of parses, the votes of all parses in that sequences are 
incremented by V. Once all constraints are applied to all possible sequences in all paths, we 
select the path(s) with the maximum total tallied vote for the parses on it. If there are multiple 
paths with the same maximum vote, the tokens whose parses are different in these paths are 
assumed to be left ambiguous. 
Given that in English each token has on the average about more than one tag, the procedurai 
description above is, in general, very inefficient. A quite efficient procedure for imphmenting 
this operation based on Church's windowing idea \[2\] has been described by Tiir and Oflazer \[12\]. 
Also, Oflaser and Tiir \[8\] presents an application of essentially the same approach (augmented 
with some additional statistical help) to morphological disambiguation of Turkish. 
3 Preliminary Results from Tagging English 
We have experimented with this approach using the Wail Street Journal Corpus from the Penn 
2~reebank CD. We used two classes of constraints: one class derived from the training corpus (a 
set of 5000 sentences (about 109,000 tokens in total) from the WSJ Corpus) and a second set 
of hand-crafted constraints mainly incorporating negative constraints (demoting impossible or 
unlikely situations) or lexicalized positive constraints. These were constructed by observing the 
failures of the statistical constraints on the training corpus and fixing them accordingly. A test 
corpus of 500 sentences (about 11,500 tokens in total) was set aside for testing. 
For the statistical constraints, we extracted tag k- grams from the tagged training corpus for 
k = 2, 3, 4, and 5. For each tag k-gram, we computed a vote which is essentially very similar to 
the weights used by Tzoukermann et al. \[14\] except that we do not use their notion of genotypes 
exactly in the same way. Given a tag k-gram tl, t~,...tk, let 
n = count(t1 E Tags(wi), t2 E Tags(wi+l),..., tk E Tags(wi+k-1)) 
for all possible i's in the training corpus, be the number of possible places the tags sequence 
can possibly occur. Here Tags(wi) is the set of tags associated with the token wi. Let f be 
the number of times the tag sequence tl,t2,...tk actually occurs in the tagged text, that is l+o.s 
.f = count(thtg. .... tk). We smooth//n by defining p = n+l so that neither p nor 1 - p is 
zero. The uncertainty of p is given by V/p(1- p)/n \[14\]. We then compute the vote for this 
k-gram as Vote(t1, 
t,,...tk) = (p- -p)/,) • 100. 
This formulation thus gives high votes to k-grams which are selected most of the time they 
are "selectable." And, among the k-grams which are equally good (same .f/n), those with a 
93 
II 
II 
II 
II 
higher n (hence less uncertainty) are given higher votes. The votes for negative and positive 
hand-crafted constraints are selected to override any vote the statistical constraints may have. 
The initial lexical votes for the parse ti,j of token wi are obtained from the training corpus in 
the usual way, i.e., as eount(wi, tij)/count(wi) normalized to between 0 and 100. 
After extracting the /c-grams as described above for k = 2, 3, 4 and 5, we ordered each 
group by decreasing votes and did an initial set of experiments with these, to select a small 
group of constraints performing satisfactorily. Table 1 presents, for reference, the number of 
distinct k-grams extracted and how they performed when they solely were used as constraints. 
We selected after this experimentation, the first 200 (with highest votes) of the bi-gram and 
k I No. of Train. Set Test Set 
k-grams Accuracy Accuracy 
2 867 97.78 95.70 
3 8315 97.99 96.87 
14 27871 98.88 96.56 
r5 54780 99.61 95.84 
Table 1. Performance with 2,3, 4 and 5-gram voting constraints 
the first 200 of the 3-gram constraints, as the set of statistical constraints; inclusion of 4- and 
5-grams with highest votes did not have any meaningful impact on the results. It should be 
noted that the constraints obtained this way are purely constraints on tag sequences and do 
not use any lezical or genotype information. The initial lexical votes were obtained from the 
training corpus as also described above. 1 We started tagging the training set with this set of 
constraints and, by observing errors made and introducing hand-crafted rules, arrived at a total 
of about 970 constraints. Most of the hand-crafted constraints were negative constraints (with 
large negative votes) to rule out certain tag sequences. Table 2 presents a set of tagging result 
from this experimentation. Although the results are quite preliminary, we feel that the results 
in the last row of Table 2 are quite satisfactory and warrant further extensive investigation. 
4 Implementing Voting Constraints with Finite State Transducers 
The approach described above can also be implemented by finite state transducers. For this, 
we view the parses of the tokens making up a sentence as acyclic a finite state recognizer (or an 
identity transducer \[4\]). The states mark word boundaries, transitions are labeled with labels 
are of the sort L = (wi, tij, vij), and the rightmost node denotes the finalstate. 
This approach is very different from that of Roche and Schabes \[9\] who use transducers to 
implement Brill's transformation-based tagging approach \[1\]. It shares certain concepts with 
Tz0ukermann and Radev's use of weighted finite state tra~nsdueers for tagging \[13\] in that both 
approaches combine statistical and hand-crafted linguistic information, but employ finite state 
devices in very different ways. 
The basic idea behind using finite state transducers is that the voting constraint rules can be 
represented as transducers which increment the votes of the matching input sequence segments 
Thus the ambiguities of the tokens were limited to the ones found in the training corpus. 
94 
Constraint Set Train. Set Test Set 
Accuracy Accuracy 
1 95.37 94.13 
1+2 96.37 95.38 
1+3 96.18 94.99 
1+2+3 96.65 95.80 
1+4 97.13 96.48 
1+2+4 97.74 97.08 
1+3+4 9"/.41 96.77 
11+2+3+4 97.82 97.29 
(1) LexicalVotes Only (2) 2o0 2-gra~ (3) 200 3-grams (4) 570 Manual Constraints 
Table 2. Results from tagging with both statistically and manually derived voting constraints rules 
by an appropriate amount, but ignore and pass through unchanged, segments they are not 
sensitive to. When an identity finite state transducer corresponding to an input sentence is 
composed with a constraint transducer, the output is a slightly modified version of the sentence 
transducer with possibly additional transitions and states, where the votes of some of the 
transition are labels have been appropriately incremented. When the sentence transducer is 
composed with all the constraint transducers in sequence, all possible votes are cast and the 
final sentence transducer reflects all the votes. The parses on the path with the highest total 
vote, from the start to any of the final states, can then be selected. The key point here is that 
due to "the nature of the composition operator, the constraint transducers can, if necessary, be 
composed off.line first, giving a single constraint transducer, which can then be composed with 
every sentence transducer once. 
Using a finite state framework provides, by its nature, some additional descriptive advantages 
in describing rules. For instance, one can use rules involving the Kleene star so that a single 
rule such as (rTAG--MD\], \[TAG=RIt\] *, \[TA¢=Vlt\] ; 100) can deal with any number of intervening 
adverbials. 2 
4.1 The Transducer Architecture 
We use the Xerox Finite State Tools to implement our approach. The finite state transducer 
system consists of the following components, depicted in of Figure 3. 
The lexicon transducer The lexicon transducer implements \[ L \[ .... \]+ j.3, where the 
transducer L maps a token to all its possible tags/parses, also inserting the relevant lexical 
votes for each parse. In our current implementation for English, the transducer L is the union 
of a set of transducers of the sort: 
2 Note that in this case the vote will be added to all matching parses, thus depending on how many 
sequential parses match the *'ed constraint, the total vote contribution of the rules will differ. This 
may actually be desirable to promote larger votes for longer matches. 
We use the Xerox regular expression language (see http://vmw.xrce.xerox, com/researctdmltt/- 
fst/home.html) to describe our regular expressions. 
95 
I 
I 
0 
~4 
O 
N 
0 
--~ 0 .~ o ? 
0 k 
U ~ 
! 
J • 
I 
? 
i.l 
~ J 
m 
i 
i 
i 
i 
i 
-" I 
i 
1 i 
^ 
i 
Figure 3. The Architecture of Voting Constraint Transducers 
96 
Cs a id3 .x. C"( .... VBD/" 
Csaid3 .x. C"( .... VBS/" 
Cs a i d\] .x. \[ '?( .... Ja/" 
s a i d "<" "+" 9 8 "> .... )" \] \] 
s a $ d "<" "+" 1 "> .... )" \] \] 
s a i d "< .... +" 1 "> .... )" \] \] 
So a "lookdown" of the token said will result on the lower side of the transducer outputs 
(VBD/said<+98>) (VBN/said<+l>) (JJ/said<+l>). Thus when a sentence transducer (repre- 
senting just the lexical items) is composed with the lexicon transducer as depicted at the top 
of Figure 3, one gets a transducer with lexical ambiguities and also appropriate votes inserted, 
which can then be composed with the constraint transducers. 
Voting Constraint Transducers Each voting constraint rule is represented by a transducer 
that checks if the constraints imposed that rule are satisfied on the input, and if so, appropri- 
ately increments the votes at the relevant input positions. In order to describe the transducer 
architecture more clearly, let us concentrate on a specific example rule: 
( \[TAG=MD2, \[TAGfVB\] ; 100) 
I 
II 
II 
II 
li 
I 
II 
l 
II 
I 
I 
II 
II 
II 
II 
II 
II 
Let us assume that the input to the transducer is represented as a sequence of triplets of 
the sort (tag word vote) 4. The transducer corresponding to the regular expression below will 
increment the vote fields of a sequence of any two triplets by 100, provided the first one has 
tag MD and the second one has tag VB. 
\[ "(" TAGS WORD VOTES ")"2. (I) 
*Oo 
\[5 "( .... ~I" WORD VOTES ") .... (" "VSI" WORD VOTES ")" e-> "(" ... "}" \] (2) 
.O. \[ (a) 
r ,,(,, TAGS WORD VOTES ")" \] I 
\[ "{" C "(" TAGS WORD \[ ADDIOO \] ")" 
"(" TAGS WORD \[ &DDIOO \] ")" 
3 
l,~. II 
2 \], 
.0. 
"("-> D, ")"-> D}; (4) 
This transducer is the composition of four transducers (separated by 'the composition operator 
. o.). The top transducer (1) constrains the input to valid triples, s The second transducer 
brackets with ( and ), any sequence of such triplets matching the given rule constraints, using 
the longest match bracket operator \[5\] .6 Thus any sequence of two triplets in the input sequence 
where the first has a tag MD and the second has a tag VB are bracketed by this transducer. The 
4 Please note that this is a slightly different order than described earlier. In practice, this order was 
found to generate smaller transducers duz~ng compositions. 
5 Here W0BD denotes a regular expression wh/ch describes an arbitrary sequence of English characters. 
TAGS denotes a regular expression which is the union of all (possibly mslti-chazacter) tag symbols. 
VOTES denotes a regular expression of the sort "<" \['+" I"-"3 DIGITS+ ">" with DIGITS being the 
union of all decimal digit symbols. 
s Note that this simple version does not deal with rules whose constraints may overlap (e.g. 
(\[TAG=NN\],\[TAG=NN\]; 100)). 
97 
1 
1 
1 
third transducer (3) either passes through the unbracketed sections of the input (as indicated 
by the first part of the disjunct), or increments by 100 the vote fields of the triplets within the 
• brackets { and }. The ADD100 is a transducer that "adds" 100 to the vote field of the matching 
triplet. It is the 99-fold composition of an ADD1 transducer with itself. The AI)D1 transducer 
will add one to a (signed) number at its upper side input, z When compiled this constraint rule 
becomes a transducer with 75 states and 1,197 arcs. 
The transducers for all constraints are obtained in a similar way. and composed off-line 
giving one big transducer which can do the appropriate vote updates in appropriate places. In 
practice, the final voting constraint transducer may be big, so instead, one can leave it as a 
cascade of a small number of transducers. 
4.2 Operational Aspects 
A sentence such as "I can can the can." is represented as the transducer corresponding to the 
the regular expression 
\[ <BS> I can can the can . <ES>\] s 
When this transducer is composed with the lexicon transducer, the resulting transducer corre- 
sponds to the following regular expression: 
\[(<BS>I<DS><+100>)\] 
\[(PRP/I<+I00>) \] 
\[(HD/can<+g7>) I (W/can<+1>) I (NNlcan<+l>) I 
\[(MD/can<+97>) l (VB/can<+l>) l (NNlcan<+l>) l 
\[(DT/the<+lO0>) \] 
\[(ND/can<+97>) I (VB/can<+l>) I (NN/can<+l>) I 
\[(.I.<+I00>)\] 
\[(<ES>I<ES><+100>)\] 
(VBPlcan<+l>) \] 
(VBPlcan<+l>) \] 
(VBP/can<+l>) \] 
which allows for 64 possible "readings." After this transducer is composed with the voting 
constraint transducer(s), one gets a transducer which still has 64 readings, but now the labels 
reflect votes from any matching constraints. A simple DAG longest path algorithm (e.g. \[3\]) on 
the DAG of the resulting transducer gives the largest voted path as 
1 
l 
1 
1 
i 
I 
i 
I 
1 
an 
I 
1 
I 
m 
l 
i 
(<BS>/<BS><+IO0>) 
(PRP/I<+lOO>)(ND/can<+194>)(VB/can<+98>)(DT/the<+197>)(NN/can<97>)(./.<+lO0>) 
(<ES>/<ES><+IO0>) 
5 Implementation 
We have developed two PERL-based rule compilers for compiling lexicon files and constraints, 
into scripts which are then compiled into transducers by the Xerox finite state tools. In this 
section we provide some information about the transducers obtained from the WSJ Corpus 
experiments. 
7 This is a bit modified version of the transducer described at http://w~.rxrc.xerox.coa/- 
research/mltt/fst/fsexuples.html, dealing with signed numbers. The ADD1 transducer can be 
composed with itself off line any number of times to get a transducer sddin 8 any number. 
s For better readability, the obligatory spaces between wo~d symbols will not be shown from now on. 
98 
U 
II 
I 
mm 
I 
nm 
mm 
mm 
II 
i 
mm 
nm 
U 
n 
m 
II 
II 
mm 
I 
i 
U 
IN 
mm 
n 
The lexicon transducer compiled from about 16,000 unique lexical tokens from the training 
set had 37,208 states, and 52,912 arcs. The three sets of constraints for 2-grams 3-grams and 
hand-crafted constraints (sets 2, 3 and 4 in Figure 2 respectively) were compiled separately 
into three constraint transducers with 19,954 states and 296,545 arcs, 56,910 states and 685,365 
arcs and 334,215 states, 2,651,550 arcs, respectively. It is certainly possible to combine these 
transducers by composition at compile time. If size becomes a problem, one can have smaller 
transducers, which are sequentially composed with the sentence transducer at tag time. For 
instance, when the hand-crafted constraints are split into three groups of about 200 each, the 
three resulting transducers are of size 63,865 states, 467,966 arcs, 44,831 states, 306,257 arcs 
and 33,862 states, 233,401 arcs, respectively, the collective size of which is less than the size of 
fully composed one. We have not really optimized the hand-crafted constraints for finite state 
compilation but it is certainly possible to reduce the number oi~ such constraints by utilizing 
operators such as the Kleene star, complementation, etc. 
Another observation during constraint compilation is that as constraints are being compiled, 
the size of intermediate compositions do not grow explosively. Thus the problem alluded to by 
Tapanainen in a similar approach \[11\], does not seem to occur here since an intersection is not 
being computecl..The results that we have provided earlier are from a C implementation. The 
tagging speed with the finite state transducers in the current environment is not very high, 
since for each sentence, the transducers have to be loaded from the disk. But with a suitable 
application interface to the lower level functions of the Xerox finite state environment, the 
tagging speed can be improved significantly. 
The system deals with unknown words in a rather trivial way, by attaching any meaningful 
open class word tags to unknown words and later picking the one(s) selected by the voting 
process. 
6 Conclusions 
We have presented an approach to constraint-based tagging and an implementation of the 
approach based on finite state transducers. The approach can combine both statistically and 
manually derived constraints, and relieves the developer from worrying about conflicting rule 
application sequencing. Preliminary results from tagging the Wall Street Journal Corpus are 
quite promising. We would like to further evaluate our approach using 10-fold cross validation 
on the WSJ corpus and later on the Brown Corpus. We also would like to utilize the full 
expressive power of the regular expression operations to compact our constraint rule base. 
7 Acknowledgments 
Most of this research was conducted while the first author was visiting Xerox Research Centre 
Europe, Grenoble, France, July 1997 to Sept 1997. He graciously thanks Lauri Karttunen and 
XRCE for providing this opportunity. Part of the work of the second author was conducted 
while he was visiting Johns Hopkins University, Center for Language and Speech Processing, 
under a NATO A2 Visiting Graduate Student Scheme, administered by TUBITAK, the Turkish 
National Science Foundation, during Fall 1997. Gracious support by Johns Hopkins University 
and TUBITAK axe acknowledged. This research was also supported in part by a NATO Science 
for Stability Project Grant, TU-LANGUAGE. We also thank Zelal Giing6rdii for providing 
comments of a draft of this paper. 
99 
g 
J 
m 

References 
1. Eric Brill.' Transformation-based error-driven learning and natural language processing: A case 
study in part-of-speech tagging. Computational Linguistics, 21(4):543-566, December 1995. 
2. Kenneth W. Church. A stochastic parts program and a noun phrase parser for unrestricted text. 
In Proceedings of the Second Conference on Applied Natural Language Processing, Austin, Texas, 
1988. 
3. Thomas H Carmen, Charles E. Leiserson, and Ronald L. Kivest. Introduction to Algorithms. The 
MIT Press sad McGraw Hill, 1991. 
4. Ronald M. Kaplan and Martin Kay. Regular models of phonological rule systems. Computational 
Linguistics, 20(3):331-378, September 1994.. 
5. Lauri Ksrttunen. Directed replacement. In Proceedings of the 3~th Annual Meeting of the Associ- 
ation for Computational Linguistics, pages 108-115, 1996. 
6. Kimmo Keskenniemi. Finite-state parsing and disambiguation. In Proceedings of COLING.90, 
volume 2, pages 229-232, 1990. 
7. Kemal Oflaser and ilker Kuru6z. Tagging and morphological disambiguation of Turkish text. 
in Proceedings of the 4 ta Applied Natural Language Processing Conference, pages 144-149. ACL, 
October 1994. 
8. Kemal Oflaser and G6khan Tfir. Morphological disambiguation by voting constraints. In Pro- 
ceedings of ACL'97/EACL'97, The 35th Annual Meeting of the Association for Computational 
Linguistics, 1997. 
9. Emmanuel Roche and Yves Schabes. Determl-i~tic part-of-speech tagging with finite-state trans- 
ducers. Computational LinguisticS, 21(2):227-253, June 1995. 
10. Beatrice Santoriui. Psrt-ofospeech tagging guidelines. Available from http://m~w, ldc.upenn, edu, 
1995. 
11. Pasi Tapanainen. Applying a finite-state intersection grammar. In Emmanuel Roche and Yves 
Schabes, editors, Finite State Language Processing, chapter 10. The MIT Press, 1997. 
12. G6khsn T~r and Kemal Oflszer. Tagging English by path voting constraints. To appear in 
Proceedings of COLING-ACL'98, Montreal, Canada, August 1998. 
13. Evelyne Tzoukerman and Dragomir R. Radev. Use of weighted finite state transducers in part of 
speech tagging. Available from http://xxx, lanl .gov/ps/cmp-lg/9710001, 1997. 
14. Evelyne Tzoukermann, Dra~omir R. Radev, and William A. Gale. Combining linguistic knowledge 
and statistical learning in french part-of-speech tagging. In Proceedings of the ACL SIGDAT 
Workshop From Texts to Tags: Issues in Muitilingual Language Analysis, pages 51-57, 1995. 
15. Atro Voutilainen. Morphological disambiguation. In Fred Karlsson, Atro Voutilainen, Juha 
Helkkil~, and Arto Anttila, editors, Constraint Grammar-A Language-Independent System for 
Parsing Unrestricted Tezt, chapter 5. Mouton de Gruyter, 1995. 
