Using Categories in the EUTRANS System 
J. C. Amengual 1 J.M. Benedf 2 
A. Castellanos 1 D. Llorens ~ 
E. Vidal 2 
(1) Unidad Predepartamental de Inform£tica 
Campus Penyeta Roja 
Universitat Jaume I 
12071 Castelldn de la Pinna (Spain) 
Abstract 
The EUTRANS project, aims at devel- 
oping Machine Translation systems for 
limited domain applications. These sys- 
tems accept speech and text input, and 
are trained using an example based ap- 
proach. The translation model used in 
this project is the Subsequential Trans- 
ducer, which is easily integrable in con- 
ventional speech recognition systems. In 
addition, Subsequential Transducers can 
be automatically learned from corpora. 
This paper describes the use of categories 
for improving the EUTRANS translation 
systems. Experimental results with the 
task defined in the project show that this 
approach reduces the number of exam- 
ples required for achieving good models. 
1 Introduction 
The EUTRANS project 1 (Amengual et al., 1996a), 
funded by the European Union, aims at develop- 
ing Machine Translation systems for limited do- 
main applications. These systems accept speech 
and text input, and are trained using an ex- 
ample based approach. The translation model 
used in this project is the Subsequential Trans- 
ducer (SST), which is easily integrable in con- 
ventional speech recognition systems by using it 
both as language and translation model (Jimdnez 
et al., 1995). In addition, SSTs can be automati- 
cally learned from sentence aligned bilingual cor- 
pora (Oncina et ai., 1993). 
This paper describes the use of categories both 
in the training and translation processes for im- 
proving the EUTRANS translation systems. The 
1Example-Based Understanding and Translation 
Systems (EuTRANS). Information Technology, Long 
Term Research Domain, Open Scheme, Project Num- 
ber 20268. 
F. Casacuberta 2 A. Castafio 1 
A. Marzal 1 F. Prat 1 
J. M. Vilar 1 
(2) Depto. de Sistemas Inform~ticos y 
Computacidn 
Universidad Politdcnica de Valencia 
46071 Valencia (Spain) 
approach presented here improves that in (Vilar 
et al., 1995), the integration of categories within 
the systems is simpler, and it allows for categories 
grouping units larger than a word. Experimental 
results with the Traveler Task, defined in (Amen- 
gual et al., 1996b), show that this method reduces 
the number of examples required for achieving 
good models. 
The rest of the paper is structured as follows. 
In section 2 some basic concepts and the notation 
are introduced. The technique used for integrat- 
ing categories in the system is detailed in section 3. 
Section 4 presents the speech translation system. 
Both speech and text input experiments are de- 
scribed in section 5. Finally, section 6 presents 
some conclusions and new directions. 
2 Basic Concepts -rid Notation 
Given an alphabet X, X* is the free monoid of 
strings over X. The symbol A represents the 
empty string, first letters (a, b, c, ...) rep- 
resent individual symbols of the alphabets and 
last letters (z, y, x, ...) represent strings of 
the free monoids. We refer to the individual el- 
ements of the strings by means of subindices, as 
in x = al...an. Given two strings x,y E X', xy 
denotes the concatenation of x and y. 
2.1 Subsequential Transducers 
A Subsequential Transducer (Berstel, 1979) is a 
deterministic finite state network that accepts sen- 
tences from a given input language and produces 
associated sentences of an output language. A 
SST is composed of states and arcs. Each arc 
connects two states and it is associated to an in- 
put symbol and an output substring (that may 
be empty). Translation of an input sentence is 
obtained starting from the initial state, follow- 
ing the path corresponding to its symbols through 
the network, and concatenating the corresponding 
output substrings. 
44 
Formally, a SST is a tuple r = (X, Y, Q, q0, 
E, o-) where X and 1," are the input and output 
alphabets, Q is a finite set of states, qo E Q is 
the initial state, E E Q x X x Y" x ~ is a set 
of arcs satisfying the determinism condition, and 
a : Q ~ Y" is a state emission function 2. Those 
states for which o" is defined are usually called final 
states. The determinism condition means that, if 
(p, a. y, q) and (p, a, y', q') belong to E, then y = y' 
andq=q'. Given astringx = al...an E X', a 
sequence (qo~al,yl,ql) .... , (qn-l,a,~,yn,q,~) is a 
valid path if (qi-1, ai, Yi, qi) belongs to E for every 
i in 1,..., n, and qn is a final state. In case there 
exists such a valid path for z, the translation of 
z by r is yl... yna(q~). Otherwise, the transla- 
tion is undefined. Note that due to the condition 
of determinism, there can be no more than one 
valid path, and hence at most one translation, for 
a given input string. Therefore, r defines a func- 
tion between an input language, Lt C_ X °, and 
an output language, Lo C Y*. Both Lt and Lo 
are regular languages and their corresponding au- 
tomata are easily obtainable from the SST. In par- 
ticular, an automaton for Lt can be obtained by 
eliminating the output of the arcs and states, and 
considering the final state set of the automaton 
being the same as in the SST. A state is useless if 
it is not contained in any valid path. Useless states 
can be eliminated from a SST without changing 
the function it defines. 
In section 3, we will relax the model. Instead 
of imposing the determinism conditition, we will 
only enforce the existence of at most one valid 
path in the transducer for each input string (non- 
ambiguity). We will call them Unambiguous SSTs 
(USSTs). Standard algorithms for finding the 
path corresponding to a string in an unambigous 
finite state automaton (see for instance (Hopcroft 
and UNman, 1979)) can be used for finding the 
translation in a USST. When the problem is the 
search for the best path in the expanded model 
during speech translation (see section 4), the use 
of the Viterbi algorithm (Forney, 1973) guarantees 
that the most likely path will be found. 
2.2 Inference of Subsequential 
Transducers 
The use of SSTs to model limited domain trans- 
lation tasks has the distinctive advantage of al- 
lowing an automatic and efficient learning of the 
translation models from sets of examples. An in- 
ference algorithm known as OSTIA (Onward Sub- 
21n this paper, the term function refers to partial 
functions. We will use f(z) = @ to denote that the 
function .f is undefined for ~. 
sequential Transducer Inference Algorithm) allows 
the obtainment of a SST that correctly models the 
translation of a given task, if the training set is 
representative (in a formal sense) of the task (On- 
cina et al., 1993). Nevertheless, although the 
SSTs learned by OSTIA are usually good trans- 
lation models, they are often poor input language 
models. In practice, they very accurately trans- 
late correct input sentences, but also accept and 
translate incorrect sentences producing meaning- 
less results. This yields undesirable effects in case 
of noisy input, like the one obtained by OCR or 
speech recognition. 
To overcome this problem, the algorithm 
OSTIA-DR (Oncina and Var6, 1996) uses finite 
state domain (input language) and range (out- 
put language) models, which allow to learn SSTs 
that only accept input sentences and only produce 
output sentences compatible with those language 
models. OSTIA-DR can make use of any kind 
of finite state model. In particular, models can 
be n-testable automata, which are equivalent to 
n-grams (Vidal et al., 1995) and can be also au- 
tomatically learned from examples. 
3 Introducing Word Categories in 
the Learning and Translation 
Processes 
An approach for using categories together with 
SSTs was presented in (Vilar et al., 1995), proving 
it to be useful in reducing the number of examples 
required for learning. However, the approach pre- 
sented there was not easily integrable in a speech 
recognition system and did not provide for the 
case in which the categories included units larger 
than a word. 
For the EUTRANS project, the approach was 
changed so that a single USST would comprise 
all the information for the translation, including 
elementary transducers for the categories. These 
steps were followed: 
• CATEGORY IDENTIFICATION. The categories 
used in EUTRANS were seven: masculine 
names, femenine names, surnames, dates, 
hours, room numbers, and general numbers. 
The election of these categories was done 
while keeping with the example based nature 
of the project. In particular, the categories 
chosen do not need very specific rules for 
recognising them, the translation rules they 
follow are quite simple, and the amount of 
special linguistic knowledge introduced was 
very low. 
• CoRPus CATEGORIZATION. Once the cate- 
45 
Original sample: 
D6me la Ilave de la habitaci6n ciento veintitr~.s 
Give me the key to room number one two three 
EGORI ER ? 
Categorized sample: 
D~me la II-',ve de la habltact6n SROOM 
Give me the key to room number $ROOM 
( OS' ,A-DR ) 
I 
( EXPANDER ) 
LEARNING PROCESS 
\[npu¢ sentence: 
D~me la llave de la habitaci6n quinientos setenta y ocho 
Give me the key m room SROOM $ROOM=\[five seven eight 
( PosT ocv.ssoR ) 
"rrmmlalt4m: 
Give me the key to room number five seven eight 
TRANSLATION PROCESS 
Figure 1: General schema of the treatment of categories in the learning and translation processes. 
gories were defined, simple scripts substituted 
the words in the categories by adequate la- 
bels, so that the pair (ddme la Have de la 
habitaci6n ciento veintitrds - give me the key 
to room one two three) became (dime Is Uave 
de la habitaci6n $ROOM - give me the key 
to room SROOM), where $ROOM is the cat- 
egory label for room numbers. 
• INITIAL MODEL LEARNING. The categorised 
corpus was used for training a model, the ini- 
tial SST. 
• CATEGORY MODELLING. For each cate- 
gory, a simple SST was built: its category 
SST (cSST). 
• CATEGORY EXPANSION. The arcs in the ini- 
tial SST corresponding to the different cate- 
gories were expanded using their cSSTs. 
A general view of the process can be seen in Fig- 
ure 1. The left part represents the elements in- 
volved in the learning of the expanded USST, ex- 
emplified with a single training pair. The right 
part of the diagram gives a schematic representa- 
tion of the use of this transducer. 
The category expansion step is a bit more com- 
plex than just substituting each category-labeled 
arc by the corresponding cSST. The main prob- 
lems are: (I) how to insert the output of the 
cSST within the output of the initial transducer; 
(2) how to deal with more than one final state in 
the cSST; (3) how to deal with cycles in the cSST 
involving its initial state. 
The problem with the output had certain sub- 
telities, since the translation of a category label 
46 
can appear before or after the label has been seen 
in the input. For example, consider the transducer 
in Figure2(a) and a Spanish sentence categorised 
as me voy a $HOUR, which corresponds to the 
categorised English one I am leaving at $HOUR. 
Once me roy a is seen, the continuation can only 
be $HOUR, so the initial SST, before seeing this 
category label in the input, has already produced 
the whole output (including $HOUR). Taking this 
into account, we decided to keep the output of the 
initial SST and to include there the information 
necessary for removing the category labels. To do 
this, the label for the category was considered as 
a variable that acts as a placeholder in the output 
sentence and whose contents are also fixed by an 
assignment appearing elsewhere within that sen- 
tence. In our example, the expected output for 
me roy alas tres y media could be I am leaving 
at $HOUR $HOUR = \[half past three\]. This as- 
sumes that each category appears at most once 
within each sentence. 
The expanded model is obtained by an itera- 
tive procedure which starts with the initial SST. 
Each time the procedure finds an arc whose in- 
put symbol is a category label, it expands this arc 
by the adequate cSST producing a new model. 
This expansion can introduce non-determinism, so 
these new models are now USSTs. When every arc 
of this kind has been expanded, we have the ex- 
panded USST. The expansion of each arc follows 
these steps: 
• Eliminate the arc. 
• Create a copy of the cSST corresponding to 
the category label. 
• Add new arcs linking the new cSST with the 
USST. These arcs have to ensure that the 
output produced in the cSST is embraced be- 
tween c=\[ and \], c being the category label. 
• Eliminate useless states. 
Formally, we have an USST 7" = (X,Y,Q, 
qo, E,a), a cSST r~ = (X,Y, Qc, qoe, E~,ac), 
where we assume that ac(qoc = 0, and an arc 
(p, c, z, q) e ~ E. We will produce a new USST 
r' = (x,v, QuQ~,qo,(E- (p,e,z,q))u E~,a'). 
The new elements are: 
• The set Q~ is disjoint with Q and there exists 
a bijection ¢ : Qc ~ Q~. 
• The new set of arcs is: 
E'~ = {(¢(r),a,y,¢(s))lCr, a,~,s) e Ec)} 
u {(p,a, zc=\[y,¢(s))l(qoc,a,y,s) E Ee)} 
u {(¢(r),a, yac(s)\],q)l(r,a,y,s) ~ Ec) 
Aa¢(s) # 0} 
U {(p,a, zc=\[ya~(s)\],q)\[(qo¢,a,y,s) E Ec) 
^so(s) ~ o} 
Note that this solves the problems deriving 
from the cSST having multiple final states or 
cycles involving the initial state. The price to 
pay is the introduction of non-determinism in 
the model. 
• The new state emission function is: 
{ a(s) ifsEQ 
Finally, the useless states that may appear during 
this construction are removed. 
A simple example of the effects of this procedure 
can be seen on Figure 2. The drawing (a) depicts 
the initial SST, (b) is a cSST for the hours between 
one and three (in o'clock and half past forms), and 
the expanded USST is in (c). 
4 Overview of the Speech 
Tr-an.~lation System 
A possible scheme for speech translation consists 
in translating the output of a conventional Contin- 
uous Speech Recognition (CSR) front-end. This 
implies that some restrictions present in the trans- 
lation and the output language, which could en- 
hance the acoustic search, are not taken into ac- 
count. In this sense, it is preferable to integrate 
the translation model within a conventional CSR 
system to carry out a simultaneous search for the 
recognised sentence and its corresponding trans- 
lation. This integration can be done by using a 
SST as language and translation model, since it 
has included in the learning process the restric- 
tions introduced by the translation and the output 
language. Experimental results show that bet- 
ter performance is achieved (Jimdnez et al., 1994; 
Jim/mez et al., 1995). 
Thus, our system can be seen as the result of in- 
tegrating a series of finite state models at different 
levels: 
• ACOUSTIC LEVEL. Individual phones are rep- 
resented by means of Hidden Markov Models 
(HMMs). 
• LEXICAL LEVEL. Individual words are repre- 
sented by means of finite state automata with 
arcs labeled by phones. 
47 
her I .~Ja~ 
I =¢ SHOUR • I ,u SHOU 
(a) Initial SaT. (b) A cSST for the category SHOUR. 
hop / i~ay 
' 
una/one I 
(c) Expanded USST. 
Figure 2: An example of the expansion procedure. 
• SYNTACTIC AND TRANSLATION LEVEL. The 
syntactic constrains and translation rules are 
represented by an USST. 
In our case, the integration means the substitution 
of the arcs of the USST by the automata describ- 
ing the input language words, followed by the sub- 
stitution of the arcs in this expanded automata by 
the corresponding HMMs. In this way, a conven- 
tional Viterbi search (Fomey, 1973) for the most 
likely path in the resulting network, given the in- 
put acoustic observations, can be performed, and 
both the recognised sentence and its translation 
are found by following the optimal path. 
5 Experiments 
5.1 The Traveler Task 
The Traveler Task (Amengual et al., 1996b) was 
defined within the EUTRANS project (Amengual 
et al., 1996a). It is more realistic that the one 
in (Castellanos et al., 1994), but, unlike other cor- 
pora such as the Hansards (Brown et al., 1990), it 
is not unrestricted. 
The general framework established for the Trav- 
eler Task aims at covering usual sentences that can 
be needed in typical scenarios by a traveler visiting 
a foreign country whose language he/she does not 
speak. This framework includes a great variety 
of different translation scenarios, and thus results 
appropriate for progressive experimentation with 
increasing level of complexity. In a first phase, 
the scenario has been limited to some human-to- 
human communication situations in the reception 
of a hotel: 
• Asking for rooms, wake-up calls, keys, the 
bill, a taxi and moving the luggage. 
• Asking about rooms (availability, features, 
price). 
• Having a look at rooms, complaining about 
and changing them. 
• Notifying a previous reservation. 
• Signing the registration form. 
• Asking and complaining about the bill. 
• Notifying the departure. 
• Other common expressions. 
The Traveler Task text corpora are sets of pairs, 
each pair consisting in a sentence in the input lan- 
guage and its corresponding translation in the out- 
put language. They were automatically built by 
using a set of Stochastic, Syntax-directed Trans- 
lation Schemata (Gonzalez and Thomason, 1978) 
with the help of a data generation tool, specially 
developed for the EUTRANS project. This soft- 
ware allows the use of several syntactic extensions 
48 
Table 1: Some examples of sentence pairs from the Traveler Task. 
Spanish: Pot favor, ~quieren pedirnos un taxi para la habitacidn trescientos diez? 
English: " Will you ask for a taxi for room number three one oh for us, please? 
Spanish: DeseaHa reservar una habitaciSn tranquiIa con teldfono y teIevisidn hasta pasado 
mal~ana. 
German: Ich mSchte ein ruhiges Zimmer mit TeIefon und Fernseher his iibermorgen reservieren. 
Spanish: zMe pueden dar las llaves de la habitaciSn, por favor? 
Italian: Mi potreste dare le chiavi della stanza, per favore? 
Table 2: Main features of the Spanish to English, Spanish to German and Spanish to Italian text 
corpora. 
Spanish to English Spanish to German Spanish to Italian 
Vocabulary size 689 514 691 566 687 585 
Average sentence length 9.5 9.8 8.9 8.2 12.7 11.8 
Test set perplexity 13.8 7.0 13.2 9.0 13.6 10.6 
to these schema specifications in order to express 
optional rules, permutation of phrases, concor- 
dance (of gender, number and case), etc. The use 
of automatic corpora generation was convenient 
due to time constrains of the first phase of the 
EUTRANS project, and cost-effectiveness. More- 
over, the complexity of the task can be controlled. 
The languages considered were Spanish as input 
and English, German and Italian as output, giving 
a total of three independent corpora of 500,000 
pairs each. Some examples of sentence pairs are 
shown in Table I. Some features of the corpora 
can be seen in Table 2. For each language, the test 
set perplexity has been computed by training a 
trigram model (with simple fiat smoothing) using 
a set of 20,000 random sentences and computing 
the probabilities yielded by this model for a set of 
i0,000 independent random sentences. The lower 
perplexity of the output languages derives from 
a design decision: multiple variants of the input 
sentences were introduced to account for different 
ways of expressing the same idea, but they were 
given the same translation. 
Finally, a multispeaker speech corpus for the 
task was acquired. It consists of 2,000 utterances 
in Spanish. Details can be found in (Amengual et 
al., 1997a). 
5.2 Text Input Experiments 
Our approach was tested with the three text cor- 
pora. Each one was divided in training and test 
sets, with 490,000 and 10,000 pairs, respectively. 
A sequence of models was trained with increasing 
subsets of the training set. Each model was tested 
using only those sentences in the test set that were 
not seen in training. This has been done because 
a model trained with OSTIA-DR is guaranteed 
to reproduce exactly those sentences it has seen 
during learning. The performance was evaluated 
in terms of Word Error Rate (WER), which is 
the percentage of output words that has to be in- 
serted, deleted and substituted for they to exactly 
match the corresponding expected translations. 
The results for the three corpora can be seen 
on Table 3. The columns labeled as "Different" 
and "Categ.", refer to the number of different sen- 
tences in the training set and the number of differ- 
ent sentences after categorization. Graphical rep- 
resentations of the same results are on Figures 3, 
4 and 5. As expected, the use of lexical categories 
had a major impact on the learning algorithm. 
The differences in WER attributable to the use of 
lexical categories can be as high as about a 40% 
in the early stages of the learning process and de- 
crease when the number of examples grows. The 
large increase in performance is a natural conse- 
quence of the fact that the categories help in re- 
ducing the total variability that can be found in 
the corpora (although sentences do exhibit a great 
deal of variability, the underlying syntactic struc- 
ture is actually much less diverse). They also have 
the advantage of allowing an easier extension in 
the vocabulary of the task without having a neg- 
ative effect on the performance of the models so 
obtained (Vilar et al., 1995). 
5.3 Speech Input Experiments 
A set of Spanish to English speaker independent 
translation experiments were performed integrat- 
ing in our speech input system (as described in 
49 
Table 3: Text input results: Translation word error rates (WER) and sizes of the transducers for different 
number of training pairs. 
Training pairs Without categories With categories 
Generated Different Categ. WER States Arcs WER States Arcs 
I0,000 6,791 5,964 60.72 3,210 I0,427 30.51 4,500 
20,000 12,218 9,981 54.86 4,119 15,243 22.46 4,700 
40,000 21,664 16,207 47.92 5,254 22,001 13.70 4,551 
80,000 38,438 25,665 38.39 6,494 31,017 7.74 4,256 
160,000 67,492 39,747 26.00 6,516 36,293 3.71 4,053 
320,000 119,048 60,401 " 17.38 6,249 41,675 1.42 4,009 
490,000 168,629 77,499 13.33 5,993 47,151 0.74 3,854 
32.599 
35.585 
34879 
37.673 
34045 
33643 
29394 
(a) Spanish to English corpus. 
Training pairs Without categories With categories 
Generated Different Categ. WER States Arcs WER States Arcs 
10,000 6,679 5,746 66.17 3,642 I1,410 35.21 5,256 76,582 
20,000 11,897 9,535 58.45 4,892 16,956 23.41 8,305 148,881 
40,000 21,094 15,425 53.87 6,486 25,358 16.06 11,948 245,293 
80,000 37,452 24,580 48.74 8,611 37,938 9.85 12,530 255,294 
160,000 66,071 38,656 42.06 11,223 56,432 5.17 11,724 227,667 
320,000 115,853 59,510 33.93 14,772 82,434 2.55 9,919 174,208 
490,000 163,505 77,053 29.86 16,914 101,338 1.23 10,055 178,312 
(b) Spanish to German corpus. 
Training pairs Without categories With categories 
Generated Different Categ. WER States Arcs WER States Arcs 
I0,000 6,698 5,795 58.29 2,857 9,650 29.86 
20,000 12,165 9,716 52.96 3,774 14,176 22.29 
40,000 21,670 15,741 47.39 4,629 19,864 14.30 
80,000 38,408 25,119 36.40 5,403 26,989 7.66 
160,000 67,355 39,281 26.98 5,598 32,588 4.68 
320,000 118,257 60,286 20.72 5,827 40,754 3.06 
490,000_ 166,897 77,877 17.60 6,399 49,430 2.54 
3 094 
3 581 
4 151 
4 599 
5 109 
6 143 
7 467 
30,010 
38,370 
52,482 
61,575 
76,007 
100,099 
123,900 
(c) Spanish to Italian corpus. 
50 
70% 
60% 
50% 
40°7o 
30" 
20% 
lO% 
o% 
I0,000 6,791 
5,964 
I I I I I 
Without categories 0 
ties -.4--- - 
| ....... -t-- .... I I I 
20,000 40,000 80,000 160,000 320,000 490,000 12,218 21,664 38 438 67,492 119,048 L68,629 
9,981 16,207 251665 39,747 60,401 77,499 
Figure 3: Evolution of translation WER with the size of the training set: Spanish to English text corpus. 
The sizes in the horizontal axis refer to the first three columns in Table 3(a). 
Table 4: Speech input results:Translation word er- 
ror rates (WER) and real time factor (RTF) for 
the best Spanish to English transducer. 
Number of HMM Beam 
Gaussians Width WER RTF 
1,663 300 2.3 % 5.9 
1,663 150 6.4 % 2.2 
5,590 300 1.9 % 11.3 
5,590 150 6.3 % 5.6 
section 4) the following models: 
• ACOUSTIC LEVEL. The phones were rep- 
resented by context-independent continuous- 
density HMMs. Each HMM consisted of six 
states following a left-to-right topology with 
loops and skips. The emission distribution of 
each state was modeled by a mixture of Gaus- 
sians. Actually, there were only three emi~. 
sion distributions per HMM since the states 
were tied in pairs (the first with the second, 
the third with the fourth, and the fifth with 
the sixth). Details about the corpus used in 
training these models and its parametrization 
can be found in (Amengnal et al., 1997a). 
• LEXICAL LEVEL Spanish Phonetics allows 
the representation of each word as a sequence 
of phones that can be derived from standard 
rules. This sequence can be represented by a 
simple chain. There were a total of 31 phones, 
including stressed and unstressed vowels plus 
two types of silence. 
SYNTACTIC AND TRANLATION LEVEL. We 
used the best of the transducers obtained in 
the Spanish to English text experiments. It 
was enriched with probabilities estimated by 
parsing the same training data with the final 
model and using relative frequencies of use as 
probability estimates. 
The Viterbi search for the most likely path was 
speeded up by using beam search at two levels: 
independent beam widths were used in the states 
of the SST (empirically fixed to 300) and in the 
states of the HMMs. Other details of the experi- 
ments can be found in (Amengnal et al., 1997a). 
Table 4 shows that good translation results (a 
WER of 6.4%) can be achieved with a Real Time 
Factor (RTF) of just 2.2. It is worth noting that 
these results were obtained in a HP-9735 worksta- 
tion without resorting to any type of specialised 
hardware or signal processing device. When trans- 
lation accuracy is the main concern, a more de- 
tailed acoustic model and a wider beam in the 
search can be used to achieve a WER of 1.9%, 
but with a RTF of 11.3. 
6 Conclusions 
In the EUTRANS project, Subsequential Tran- 
sucers are used as the basis of translation systems 
that accept speech and text input. They can be 
51 
7O% 
..6O% 
50% -- 
4O% -- 
3O% 
2O% -- 
10% -- 
O% 
10,000 6,679 
5,746 
I 
I 
20,000 11,897 
9,535 
I I I 
Without categories 0 
With categories -F- - 
i. i l ..... 
40,000 80~0 160,0~ 320,000 490 21,094 37'452 66,071 116,853 163 
15,425 241580 38,656 59,510 77 
000 505 
053 
Figure 4: Evolution of translation WER with the size of the training set: Spanish to German text 
corpus. The sizes in the horizontal axis refer to the first three columns in Table 3(b). 
automatically learned from corpora of examples. 
This learning process can be improved by means 
of categories using the approach detailed in this 
paper. 
Experimental results show that this approach 
reduces the number of examples required for 
achieving good models, with good translation re- 
sults in acceptable times without using speciaiised 
hardware. 
Our current work concentrates in further reduc- 
ing the number of examples necessary for training 
the translation models in order to cope with spon- 
taneous instead of synthetic sentences. For this, 
new approaches are being explored, like reordering 
the words in the translations, the use of new in- 
ference algorithms, and automatic categorization. 
Results obtained with a different enhancement 
of our text input system, the inclusion of error 
correcting techniques, can be found in (Amengual 
et al., 1997b). 

References 
Juan-Carlos Amengual, JosA-Miguel Benedi, 
Francisco Casacuberta, Asuncidn Castafio, An- 
tonio Castellanos, Victor M. Jimdnez, David 
Llorens, Andrds MarzM, Federico Prat, Hdctor 
Rulot, Enrique Vidal, Juan MiKuel Vilar, 
Cristina Delogu, Andrea Di Carlo, Hermann 
Ney, Stephan Vogel, Josd Manuel Espejo, and 
Josep Ramdn Freixenet. 1996a. EUTRANS: 
Example-based understanding and translation 
systems: First-phase project overview. Techni- 
cal Report D4, Part 1, EUTRANS (IT-LTR-OS- 
20268). (Restricted). 
Juan-Carlos Amenguai, Jos~-Miguel Benedi, 
Asuncidn Casta~o, Andrds Marzai, Federico 
Prat, Enrique Vidai, Juan Miguel Vilar, 
Cristina Delogu, Andrea Di Carlo, Hermann 
Ney, and Stephan Vogel. 1996b.. Definition of 
a machine translation task and generation of 
corpora. TechnicaJ Report DI, EUTRANS (IT- 
LTR-OS-20268). (Restricted). 
J. C. AmenguM, J. M. Benedi, K. Beulen, 
F. Casacuberta, A. Castafio, A. Castella- 
nos, V. M. Jimdnez, D. Llorens, A. Marzal, 
H. Ney, F. Prat, E. Vidal, and J. M. Vilar. 
1997a. Speech translation based on automat- 
ically trainable finite-state models. To appear 
in Proceedings of EUROSPEECH'97. 
Juan C. Amengual, Josd M. Benedi, Francisco 
Casacuberta, Asuncidn Castafio, Antonio Cas- 
tellanos, , David Llorens, Andrds Marzal, Fed- 
erico Prat, Enrique Vidal, and Juan M. Vilar. 
1997b. Error correcting parsing for text-to-text 
machine translation using finite state models. 
To appear in Proceedings of TMI 97, July. 
J. Berstel. 1979. Transductions and Contezt-Free 
Languages. Teubner. 
Peter F. Brown, John Cocke, Stephen A. 
Della Pietra, Vincent J. Della Pietra, Freder- 
ick Jelinek, John D. Lafferty, Robert L. Mercer, 
and Paul S. Roossin. 1990. A statistical ap- 
proach to machine translation. Computational 
Linguistics, 16(2):79-85, June. 
A. Casteltanos, I. Galiano, and E. Vidal. 1994. 
Application of OSTIA to machine translation 
tasks. In Rafael C. Carrasco and Joss On- 
cina, editors, Grammatical Inference and Appli- 
cations, volume 862 of Lecture Notes in Com- 
puter Science, pages 93-105. Springer-Verlag, 
September. 
G. D. Forney, Jr. 1973. The Viterbi algorithm. 
Proceedings of the IEEE, 61(3):268-278, March. 
R. C. Gonzalez and M. G. Thomason. 1978. Syn- 
tactic Pattern Recognition: An Introduction. 
Addison-Wesley, Reading, Massachusetts. 
J. E. Hopcroft and J. D. UNman. 1979. Introduc- 
tion to Automata Theory, Languages and Com- 
putation. Addison-Wesley, Reading, Mass., 
USA. 
V.M. Jim~nez, E. Vidal, J. Oncina, A. Cas- 
tellanos, ,H. Rulot, and J. A. S~inchez. 
1994. Spoken-language machine translation 
in limited-domain tasks. In H. Niemann, 
R. de Mori, and G. Hanrieder, editors, Pro- 
ceedings of the CRIM/FORWISS Workshop on 
Progress and Prospects of Speech Research and 
Technology, pages 262-265. Infix. 
V. M. Jim~nez, A. Castellanos, and E. Vidal. 
1995. Some results with a trainable speech 
translation and understanding system. In Pro- 
ceedings of the ICASSP-95, Detroit, MI (USA). 
Jos60ncina and Miguel Angel Var6. 1996. Us- 
ing domain information during the learning of 
a subsequential transducer. In Laurent Miclet 
and Colin de la Higuera, editors, Grammati- 
cal Inference: Learning Syntax from Sentences, 
volume 1147 of Lecture Notes in Computer Sci- 
ence, pages 301-312. Springer-Verlag. 
Jose Oncina, Pedro Garcia, and Enrique Vidal. 
1993. Learning subsequential transducers for 
pattern recognition interpretation tasks. IEEE 
Transactions on Pattern Analysis and Machine 
Intelligence, 15(5):448-458, may. 
Enrique Vidal, Francisco Casacuberta, and Pe- 
dro Garcia. 1995. Grammatical inference and 
automatic speech recognition. In A. J. Ru- 
bio and J. M. L6pez, editors, Speech Recogni- 
tion and Coding, New Advances and Trends, 
NATO Advanced Study Institute, pages 174- 
191. Springer-Verlag, Berlin. 
Juan Miguel Vilar, AndrOs Marzal, and Enrique 
Vidal. 1995. Learning language translation in 
limited domains using finite-state models: some 
extensions and improvements. In 4th Euro- 
pean Conference on Speech Communication and 
Technology, Madrid (Spain), September. ESCA. 
