From Machine Translation to Computer Assisted Translation using
Finite-State Models
Jorge Civera, Elsa Cubel, Antonio L. Lagarda, David Pic´o,
Jorge Gonz´alez, Enrique Vidal, Francisco Casacuberta
Instituto Tecnol´ogico de Inform´atica
Dpto. de Sistemas Inform´aticos y Computaci´on, Universidad Polit´ecnica de Valencia
E-46071 Valencia, Spain
jorcisai@iti.upv.es
Juan M. Vilar, Sergio Barrachina
Dpto. de Lenguajes y Sistemas Inform´aticos, Universidad Jaime I,
E-12071 Castell´on de la Plana, Spain
jvilar@lsi.uji.es
Abstract
State-of-the-art machine translation techniques are
still far from producing high quality translations.
This drawback leads us to introduce an alterna-
tive approach to the translation problem that brings
human expertise into the machine translation sce-
nario. In this framework, namely Computer As-
sisted Translation (CAT), human translators inter-
act with a translation system, as an assistance tool,
that dinamically offers, a list of translations that best
completes the part of the sentence already trans-
lated. In this paper, finite state transducers are
presented as a candidate technology in the CAT
paradigm. The appropriateness of this technique
is evaluated on a printer manual corpus and re-
sults from preliminary experiments confirm that hu-
man translators would reduce to less than 25% the
amount of work to be done for the same task.
1 Introduction
State-of-the-art machine translation techniques are
still far from producing high quality translations.
This drawback leads us to introduce an alternative
approach to the translation problem that brings
human expertise into the machine translation sce-
nario. (Langlais et al., 2000) proposed this idea that
can be illustrated as follows. Initially, the human
translator is provided with a possible translation
for the sentence to be translated. Unfortunately in
most of the cases, this translation is not perfect, so
the translator amends it and asks for a translation
of the part of the sentence still to be translated
(completion). This latter interaction is repeated as
many times as needed until the final translation is
achieved.
The scenario described in the previous para-
graph, can be seen as an iterative refinement of
the translations offered by the translation system,
that without possessing the desired quality, help the
translator to increase his/her productivity. Nowa-
days, this lack of translation excellence is a common
characteristic in all machine translation systems.
Therefore, the human-machine synergy represented
by the CAT paradigm seems to be more promising
than fully-automatic translation in the near future.
The CAT paradigm has two important as-
pects: the models need to provide adequate com-
pletions and they have to do so efficiently to per-
form under usability constrains. To fulfill these two
requirements, Stochastic Finite State Transducers
(SFST) have been selected since they have proved
in the past to be able to provide adequate transla-
tions (Vidal, 1997; Knight and Al-Onaizan, 1998;
Amengual et al., 2000; Casacuberta et al., 2001;
Bangalore and Ricardi, 2001). In addition, efficient
parsing algorithms can be easily adapted in order to
provide completions.
The rest of the paper is structured as follows.
The following section introduces the general setting
for machine translation and finite state models. In
section 3, the search procedure for an interactive
translation is presented. Experimental results are
presented in section 4. Finally, some conclusions
and future work are explained in section 5.
2 Machine translation with finite-state
transducers
Given a source sentence a0 , the goal of MT is to find
a target sentence a1t that maximizes:
a1t
a0 argmax
t
a1a3a2a5a4 t
a6 sa7a8a0 argmax
t
a1a8a2a5a4 t
a9 sa7 (1)
The joint distribution a1a8a2a5a4 ta9 sa7 can be modeled
by a Stochastic Finite State Transducer a10 (Pic´o and
Casacuberta, 2001):
a1t
a0 argmax
t
a1a8a2a11a4 t
a9 sa7a8a12 argmax
t
a1a3a2a14a13a15a4 t
a9 sa7 (2)
A Stochastic Finite-State Transducer (SFST)
is a finite-state network whose transitions are la-
beled by three items:
1. a source symbol (a word from the source lan-
guage vocabulary);
2. a target string (a sequence of words from the
target language vocabulary) and
3. a transition probability.
They have been successfully applied into
many translation tasks (Vidal, 1997; Amengual et
al., 2000; Casacuberta et al., 2001). Furthermore,
there exist efficient search algorithms like Viterbi
(Viterbi, 1967) for the best path and the Recur-
sive Enumeration Algorithm (REA) (Jim´enez and
Marzal, 1999) for the a16 -best paths.
One possible way of inferring SFSTs is the
Grammatical Inference and Alignments for Trans-
ducer Inference (GIATI) technique (the previous
name of this technique was MGTI - Morphic-
Generator Transducer Inference) (Casacuberta et
al., 2004). Given a finite sample of string pairs, it
works in three steps:
1. Building training strings. Each training pair
is transformed into a single string from an ex-
tended alphabet to obtain a new sample of
strings. The ”extended alphabet” contains
words or substrings from source and target sen-
tences coming from training pairs.
2. Inferring a (stochastic) regular grammar.
Typically, smoothed a16 -gram is inferred from
the sample of strings obtained in the previous
step.
3. Transforming the inferred regular grammar
into a transducer. The symbols associated
to the grammar rules are transformed into
source/target symbols by applying an ade-
quate transformation, thereby transforming the
grammar inferred in the previous step into a
transducer.
The transformation of a parallel corpus into
a corpus of single sentences is performed with the
help of statistical alignments: each word is joined
with its translation in the output sentence, creating
an “extended word”. This joining is done taking
care not to invert the order of the output words. The
third step is trivial with this arrangement. In our
experiments, the alignments are obtained using the
GIZA software (Och and Ney, 2000; Al-Onaizan et
al., 1999), which implements IBM statistical mod-
els (Brown et al., 1993).
3 Interactive search
The concept of interactive search is closely related
to the CAT paradigm. This paradigm introduces the
new factor ta17 into the general machine translation
equation (Equation 1). ta17 represents a prefix in the
target language obtained as a result of the interac-
tion between the human translator and the machine
translation system.
As a side effect of this reformulation, the op-
timization defined in Equation 3 is performed over
the set of target suffixes rather than the set of com-
plete target sentences. Thence, the goal of CAT in
the finite-state transducer framework is to find a pre-
diction of the best suffix a1ta18 , given a source sentence
s, a prefix of the target sentence ta17 and a SFST a10 :
a19
ta20a22a21 argmax
ta23
a24a26a25a28a27 t
a20a30a29 sa31 ta32a34a33a35a21 argmax
ta23
a24a35a25a36a27 t
a32 ta20a37a31 sa33a35a38
a38 argmax
ta23
a24a26a25a40a39a3a27 t
a32 ta20 a31 sa33 (3)
A transducer can be understood as a weighted
graph in which every path is a possible source-target
sentence pair represented in a compact manner.
Given a source sentence s to be translated, this sen-
tence is initially employed to define a set of paths in
the transducer, whose sequence of source symbols
is compatible with the source sentence. Equation 3
is just defining the most probable path (target suffix
a1t
a18 ) among those that are compatible, having ta17 as a
target prefix.
.
0 1"load" (0.28)
3"(null)" (0.061728)
2
"the" (0.246914)
10"paper" (0.4)
9"(null)" (0.133333)
8"paper" (0.020833)
7"paper" (0.020833)
5
"stock" (0.020833)
4
"(null)" (0.104167)
11
f=1
"." (0.133333)
"." (0.133333)
"." (1)
6"neatly" (1)
"." (1)
"." (1)
"." (1)
Figure 1: Resultant word graph given the source sentence ”cargue el papel”
The search for this path (the product of the
probabilities associated with its edges is maximum)
is performed according to the Viterbi decoding over
the set of paths that were compatible with the source
sentence. The concatenation of the target symbols
of this best path will give place to the target sentence
(translation).
The solution to the search problem has been
devised in two phases. The first one copes with the
extraction of a word graph a0 from a SFST a10 given
a source sentence s. A word graph represents the
set of paths whose sequence of source symbols is
compatible with the source sentence s.
The second phase involves the search for the
best translation over the word graph a0 . To be
more precise, in the present work the concept of
best translation has been extended to a set of best
translations (n-best translations). This search can be
carried out efficiently taking into account not only
the a posteriori probability of a given translation a1t,
but also the minimum edit cost with respect to the
target prefix. The way in which this latter criterium
is integrated in the search process will be explain in
section 3.2.
3.1 Word-graph derivation
A word graph represents the set of all possible trans-
lations for a given source sentence s that were em-
beded in the SFST a10 . The derivation of the word
graph is performed by intersecting the SFST a10
with the source sentence s defining a subgraph in
a10 whose paths are compatible with the source sen-
tence.
Interactive search can be simplified signif-
icantly by using this representation of the set of
target sentences, since the inclusion of edit cost
operations along with the search procedure intro-
duces some peculiarities that can be solved effi-
ciently in the word graph. An example of word
graph is shown in Figure 1.
3.2 Search for a16 -best translations given a
prefix of the target sentence
The application of this type of search is aimed at
the core of CAT. In this paradigm, given a source
sentence s, the human translator is provided with a
list of n translations, also called a16 -best translations.
Then, the human translator will proceed to accept a
prefix of one of these a16 -best translations as correct,
appending some rectifications to the selected prefix.
This new prefix of the target sentence ta17 together
with the source sentence s will generate a new set
of best translations that will be again modified by
the human translator. This process is repeated as
many times as neccessary to achieve the desired fi-
nal translation.
Ideally, the task would be to find the target
suffix ta18 that maximizes the probability a posteriori
given a prefix ta17 of the target sentence and the in-
put sentence. In practice, however, it may happen
that ta17 is not present in the word graph a0 . The
solution is to use not ta17 but a prefix ta1a17 that mini-
mizes the edition distance with ta17 and is compatible
with a0 . Therefore, the score of a target transla-
tion t a2 ta17 ta18 is characterized by two functions, the
edit cost between the target prefix ta17 and the opti-
mal prefix ta1
a17
found in the word graph a0 and the a
posteriori probability of ta18 (a3a5a4 a4 ta18 a6 ta1
a17
a7 ). In order
to value more significantly those translations that
were closer to the user preferences, the list of a16 -
best translations has been prioritized using two cri-
teria: first, the minimum edit cost and then, by the a
posteriori probability.
The algorithm proposed to solve this search
problem is an adapted version of the Recursive Enu-
meration Algorithm (REA) described in (Jim´enez
and Marzal, 1999) that integrates the minimum edit
cost algorithm in the search procedure to deal with
words, introduced by the user, that are not present
in the word graph. This algorithm consists of two
parts:
a0 Forward search that calculates the 1-best path
from the initial state to every state in the
word graph a0 . Paths in the word graph are
weighted not only based on their a posteriori
probability, but also on their edit cost respect
to the target sentence prefix.
To this purpose, ficticious edges have been in-
serted into the word graph to represent edition
operations like insertion, substitution and dele-
tion. These edition operations have been in-
cluded in the word graph in the following way:
– Insertion: An insertion edge has been
”inserted” as a loop for each state in the
word graph with unitary cost.
– Deletion: A deletion edge is ”added”
for each arc in the word graph having
the same source and target state than its
sibling arc with unitary cost.
– Substitution: Each arc in the word graph
is treated as a substitution edge whose edit
cost is proportional to the levenshtein dis-
tance between the symbol associated with
this arc and the word prefix employed to
traverse this arc during the search. This
substitution cost is zero when the word
prefix matches the symbol in the word
graph arc.
a0 Backward search that enumerates candidates
for the a1 -best path along the a4 a1a3a2a5a4 a7 -best path.
This recursive algorithm defines the next best
path that arrives at a given state a6 as the next
best path that reaches a6 a1 plus the arc leaving
from a6 a1 to a6 . If this next best path arriving at
state a6 a1 has not been calculated yet, then the
next best path procedure is called recursively
until a 1-best path is found or no best paths are
found.
To reduce the computational cost of the
search, the beam-search technique (Ney et al., 1992)
has been implemented. During the word graph
construction, two beam coefficients were employed
to penalize those edges leading to backoff states
over those ones arriving at normal states. Finally,
a third beam coefficient controls how far in terms of
number of edition operations a hypothesis.
4 Experimental results
4.1 Corpus features
The corpus employed to perform experiments was
the Xerox corpus (SchlumbergerSema S.A et al.,
2001). It involves the translation of technical Xe-
rox manuals from English to Spanish, French and
German and vice-versa. Some statistics about the
data used for training and test purposes are shown
in Table 1.
4.2 Sample session
A TT2 interactive prototype, which uses the
searching techniques presented in the previous sec-
tions, has been implemented. The user is able to
customized this prototype in different ways: num-
ber of suggested translations, length in number of
words of these suggestions, etc. In the example be-
low, the number of suggestions is five and the length
of these suggestions has not been bound.
Example 1 This example shows the functionality
and the interaction between the TT2 prototype and
a translator through a translation instance from En-
glish to Spanish for a given sentence drawn from
the Xerox corpus. For better understanding of this
example the reference target sentence is given be-
low:
Reference target sentence: Instalaci´on
de controladores de impresora y
archivos PPD.
Source sentence: Installing the Printer Drivers and
PPDs.
Hypothesis 0.0: Instalaci´on del los controladores
de impresi´on y archivos PPD adaptados.
Hypothesis 0.1: Instalaci´on del los controladores
de impresi´on y ver los archivos PPD.
Table 1: Features of Xerox Corpus: training, vocabulary and test sizes measured in thousands of words.
SIM: Currently used “reversible” preprocessing.
RAW: Original corpus without preprocess.
PERPLEXITY: Measure how well a language model describes the test set.
EN / ES EN / DE EN / FR
RAW SIM RAW SIM RAW SIM
TRAINING 600/700 600/700 600/500 500/600 600/700 500/400
VOCABULARY 26 / 30 8 / 11 25 / 27 8 / 10 25 / 37 8 / 19
TEST 8 / 9 8 / 10 9 / 10 11 / 12 11 / 10 12 / 12
PERPLEXITY (3-gram) 107/60 48/33 93/169 51/87 193/135 73/52
Hypothesis 0.2: Instalaci´on de la los controladores
de impresi´on y archivos PPD adaptados.
Hypothesis 0.3: Instalaci´on de la los controladores
de impresi´on y ver los archivos PPD.
Hypothesis 0.4: Instalaci´on de la esta los contro-
ladores de impresi´on y ver los archivos PPD.
User interaction 0: Hypothesis 0.2 is selected and
the cursor is positioned at the beginning of the
word ”los”. Then, the translator would type
the character ”c”, that is, the next character in
the reference target sentence.
Prefix 0: Instalaci´on de c
Hypothesis 1.0: Instalaci´on de c los controladores
de impresi´on y archivos PPD adapatados.
Hypothesis 1.1: Instalaci´on de c los controladores
de impresi´on y ver los archivos PPD.
Hypothesis 1.2: Instalaci´on de c esta los contro-
ladores de impresi´on y archivos PPD adapata-
dos.
Hypothesis 1.3: Instalaci´on de c esta los contro-
ladores de impresi´on y ver los archivos PPD.
Hypothesis 1.4: Instalaci´on de controladores de
impresi´on y fax y en archivos PPD adapatados.
User interaction 1: Hypothesis 1.4 is selected and
the cursor is positioned between the character
”s” and ”i” of the word ”impresi´on”. Then, the
translator would type the next character in the
reference target sentence: ”o”.
Prefix 1: Instalaci´on de controladores de im-
preso
Hypothesis 2.0: Instalaci´on de controladores de
impresora y archivos PPD adaptados.
Hypothesis 2.1: Instalaci´on de controladores de
impresora y ver los archivos PPD.
Hypothesis 2.2: Instalaci´on de controladores de
impresora/fax y ver los archivos PPD.
Hypothesis 2.3: Instalaci´on de controladores de
impresora/fax y archivos PPD adaptados.
Hypothesis 2.4: Instalaci´on de controladores de
impresora y fax de CentreWare y ver los
archivos PPD.
User interaction 2: Hypothesis 2.0 is selected and
the cursor is positioned at the end of the word
”PPD”. The translator would just need to add
the character ”.”.
Prefix 2: Instalaci´on de controladores de impre-
sora y archivos PPD.
Hypothesis 3.0: Instalaci´on de controladores de
impresora y archivos PPD.
Hypothesis 3.1: Instalaci´on de controladores de
impresora y archivos PPD.:
Hypothesis 3.2: Instalaci´on de controladores de
impresora y archivos PPD..
Hypothesis 3.3: Instalaci´on de controladores de
impresora y archivos PPD...
Hypothesis 3.4: Instalaci´on de controladores de
impresora y archivos PPD.:.
User interaction 3 : Hypothesis 3.0 is selected
and the user accepts the target sentence.
Final hypothesis: Instalaci´on de controladores de
impresora y archivos PPD.
4.3 Translation quality evaluation
The assessment of the techniques presented in sec-
tion 3 has been carried out using three measures:
1. Translation Word Error Rate (TWER): It is
defined as the minimum number of word
substitution, deletion and insertion operations
to convert the target sentence provided by the
transducer into the reference translation. Also
known as edit distance.
2. Character Error Rate (CER): Edit distance in
terms of characters between the target sentence
provided by the transducer and the reference
translation.
3. Key-Stroke Ratio (KSR): Number of key-
strokes that are necessary to achieve the
reference translation plus the acceptance key-
stroke divided by the number of running
characters.
4. BiLingual Evaluation Understudy (BLEU)
(Papineni et al., 2002): Basically is a function
of the k-substrings that appear in the hypothe-
sized target sentence and in the reference target
sentence.
These experiments were perfomed with 3-
gram transducers based on the GIATI technique. On
the leftmost column appears the language pair em-
ployed for each experiment, English (En), Spanish
(Es), French (Fr) and German (De). The main two
central columns compare the results obtained with
1-best translation to 5-best translations. When using
5-best translations, that target sentence out of these
five, that minimizes most the correspondent error
measure is selected. The results are shown in Ta-
ble 2.
The best results were obtained between En-
glish and Spanish language pairs, in which the hu-
man translator would need to type less than 25% of
the total reference sentences. In other words, this
would result in a theoretically factor of 4 increase in
the productivity of human translators. In fact, pre-
liminary subjective evaluations have received pos-
itive feedback from professional translators when
testing the prototype.
Table 2: Results for the Xerox Corpus comparing
1-best to 5-best translations
3-gram (1-best) 3-gram (5-best)
RAW KSR CER TWER KSR CER TWER
En-Es 26.0 29.1 42.3 23.4 24.4 37.2
Es-En 27.4 33.1 50.1 24.1 24.9 42.7
En-Fr 53.7 55.4 77.5 49.3 48.7 70.5
Fr-En 54.0 55.6 74.2 49.9 49.4 68.8
En-De 59.4 61.2 82.4 54.0 54.7 76.6
De-En 52.6 60.3 77.9 48.0 53.4 72.7
Furthermore, in all cases there is a clear and
significant improvement in error measures when
moving from 1 to 5-best translations. This gain in
translation quality dimishes in a log-wise fashion as
the number of best translations increases. However,
the number of hypotheses should be limited to the
user capability to skim through the candidate trans-
lations and decide on which one to select.
Table 3 presents the results obtained on a
simplified version of the corpus. This simplification
consists on tokenization, case normalization and
the substitution of numbers, printer codes, etc. by
their correspondent category labels.
Table 3: Results for the Xerox Corpus comparing
1-best to 5-best translations
3-gram (1-best) 3-gram (5-best)
SIM WER CER BLEU WER CER BLEU
En-Es 31.8 24.7 0.67 26.8 20.3 0.71
Es-En 34.3 27.8 0.62 27.0 20.4 0.69
En-Fr 64.2 48.8 0.43 57.2 42.8 0.45
Fr-En 59.2 48.5 0.42 53.6 42.5 0.45
En-De 72.1 55.3 0.32 65.8 49.1 0.35
De-En 64.7 53.9 0.36 58.4 47.7 0.39
Pair of languages as English and French
presents somewhat higher error rates, as is also the
case between English and German, reflecting the
complexity of the task faced in these experiments.
5 Conclusions and future work
Finite-state transducers have been successfully
applied to CAT. These models can be learnt from
parallel corpora. The concept of interactive search
has been introduced in this paper along with some
efficient techniques (word graph derivation and a16 -
best) that solve the parsing problem given a prefix
of the target sentence under real-time constraints.
The results show that the 5-best approach
clearly improves the quality of the translations, with
respect to the 1-best approximation.
The promising results achieved in the first ex-
periments provide a new field in machine transla-
tion still to be explored, in which the human ex-
pertise is combined with machine translation tech-
niques to increase productivity without sacrifying
high-quality translation.
Finally, the introduction of morpho-syntactic
information or bilingual categories in finite-state
transducers, are topics that leave an open door to
future research. As well as some improvements in
the search algorithms to reduce the computational
cost of finding a path in the word graph with the
minimum edit cost.
Acknowledgments
The authors would like to thank all the reasearchers
involved in the TT2 project who have contributed to
the development of the methodologies presented in
this paper.
This work has been supported by the Euro-
pean Union under the IST Programme (IST-2001-
32091).

References
Yaser Al-Onaizan, Jan Curin, Michael Jahr, Kevin Knight, John Lafferty, Dan Melamed, Franz J. Och, David Purdy, Noah Smith, and David Yarowsky. 1999. Statistical machine translation: Final report. Workshop on language engineering, Johns Hopkins University, Center for Language and Speech Processing, Baltimore, MD, USA.
Juan C. Amengual, Jos´e M. Bened´ı, Asunci´on Castano, Antonio Castellanos, V´ıctor M. Jim´enez, David Llorens, Andr´es Marzal, Mois´es Pastor, Federico Prat, Enrique Vidal, and Juan M. Vilar. 2000. The EuTrans-I speech translation system. Machine Translation, 15:75–103.
S. Bangalore and G. Ricardi. 2001. A finite-state approach to machine translation. In Second Meeting of the North American Chapter of the Association for Computational Linguistics.
Peter F. Brown, Stephen Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263–312.
Francisco Casacuberta, David Llorens, Carlos Mart´ınez, Sirko Molau, Francisco Nevado, Hermann Ney, Mois´es Pastor, David Pic´o, Alberto Sanchis, Enrique Vidal, and Juan M. Vilar. 2001. Speech-to-speech translation based on finite-state transducers. In International Conference on Acoustic, Speech and Signal Processing, volume 1. IEEE Press, April.
Francisco Casacuberta, Hermann Ney, Franz J. Och, Enrique Vidal, Juan M. Vilar, Sergio Barrachina, Ismael Garc´ıa-Varea, David Llorens, Carlos Mart´ınez, Sirko Molau, Francisco Nevado, Mois´es Pastor, David Pic´o, and Alberto Sanch´ıs. 2004. Some approaches to statistical and finite-state speech-to-speech translation. Computer Speech and Language, 18:25–47.
V´ıctor M. Jim´enez and Andr´es Marzal. 1999. Computing the k shortest paths: a new algorithm and an experimental comparison. In J. S. Vitter and C. D. Zaroliagis, editors, Algorithm Engineering, volume 1668 of Lecture Notes in Computer Science, pages 15–29, London, July. Springer-Verlag.
Kevin Knight and Yaser Al-Onaizan. 1998. Translation with finite-state devices. In E. Hovy D. Farwell, L. Gerber, editor, Machine Translation and the Information Soup: Third Conference of the Association for Machine Translation in the Americas, volume 1529, pages 421–437, Langhorne, PA, USA, October. AMTA’98.
Philippe Langlais, George Foster, and Guy Lapalme. 2000. Unit completion for a computer-aided translation typing system. Machine Translation, 15(4):267–294.
Hermann Ney, Dieter Mergel, Andreas Noll, and Annedore Paeseler. 1992. Data driven organization for continuous speech recognition. In IEEE Transactions on Signal Processing, volume 40, pages 272–281.
Franz J. Och and Hermann Ney. 2000. Improved statistical alignment models. In ACL00, pages 440–447, Hong Kong, China, October.
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia.
David Pic´o and Francisco Casacuberta. 2001. Some statistical-estimation methods for stochastic finite-state transducers. Machine Learning, 44:121–142, July-August. 
SchlumbergerSema S.A, Instituto Tecnol´ogico de Inform´atica, Rheinisch Westf¨alische Technische Hochschule Aachen Lehrstul f¨ur Informatik VI, Recherche Appliqu´ee en Linguistique Informatique Laboratory University of Montreal, Celer Soluciones, Soci´et´e Gamma, and Xerox Research Centre Europe. 2001. TT2. TransType2 - computer assisted translation. Project technical annex.
Enrique Vidal. 1997. Finite-state speech-to-speech translation. In Int. Conf. on Acoustics Speech and Signal Processing (ICASSP-97), proc., Vol.1, pages 111–114, Munich.
Andrew Viterbi. 1967. Error bounds for convolutional codes and a asymtotically optimal decoding algorithm. IEEE Transactions on Information Theory, 13:260–269.
