BEDE= A MICROPROCESSOR-BASED MACHINE TRANSLATION SYSTEM 
H.L, Somers and R.L. Johnson 
Centre for Computational Linguistics, ~niversity of Mancheste~ 
Institute of Science and Technology 
The proposed paper describes an on-goin6 research pro- 
~eot being carried out by staff and students at the Centre 
for Computational Linguistics to design a. limited-syntax 
controlled-vocabulary machine translation system of sophistic- 
ated design to run on .a microprocessor. 
1 t. Background 
Bede is essentially a research project: we are not 
immediately concerned with commercial applications, thouek 
such are clearly p~ssible if the research proves fruitful. 
Work on Bede at this stage thoueh is primarily experimental. 
The aim at the moment is to investigate the extent to which 
a microprocessor-based ~.T. system of advanced design is 
possible, and the limitations that have to be imposed in order 
to. achieve a workin 8 system. This paper describes the overall 
system design specification to which ~ are currently workine. 
In the basic design of the system we attempt to incor- 
porate as much as possible features of large-scale ~.T° sys- 
tems that have proved to be desirable or effective. Thus, Beds 
is multllingual by design (i.e. not based on language pairs) 
(of. King, 1981:12)! algorithms and linguistic data are 
strictly separated (of. Johnson, 1979:140); and the system is 
designed in more or less independent modules (of° Vauquols, 
1965:33) • 
- 266 - 
The microprocessor environment means that oz~Lteria of 
size are important: data structure both dynamic (created by 
and maz~pulated during the translation process) and static 
(dictionaries and linguistic rule packages) are constrained 
to be as economical in terms of storage space and access 
procedures as possible,Limitations on in-core and peripheral 
storage are important considerations in the system design. 
In large ~eneral purpose M.T. systems, it is necessary 
to assume that failure to translate given input correctly is 
generally not due to incozTectly formed input, but to insuff- 
iciently elaborated translation algorithms. This is particul- 
arly due to two problems: the lexical problem of choice of 
~ppropriate translation equivalents, and the strate~c problem 
of effective analysis of the wide r~e of syntactic patterns 
found in natural langu~e. The reduction of these problems via 
the notions of controlled vocabulary and restricted e~rntax 
seem particularly appropz~ate in the microprocessor environ- 
ment, since the alternative of making a system infinitely 
extendable is probably not feasible. Both notions have been 
tried w~th bigger systems, resulting both in better results 
from the M.T. system itself, and in increased leF~Lbility from 
a human point of view of source texts (cf. Ducro%, 1972! Elli- 
eton, 1978! Lawson, 1979"81-21 So~ners and McNaught, 1980:49). 
Given these constraints, it seems feasible to achieve 
translation via an "interlingus" (cf. Veillon, 1969! Hutchiz~s, 
1978z 131), in which the canonical structures from the source 
lan&n~a~e =are mapped directly onto those of the target lan4gaa- 
ge(s), avoiding any langua~e-pe~ir oriented "transfer" ste4~e. 
~=ranslation thus takes place in two phases= analysis of source 
text and synthesis of target text. 
2 t Brief description 
A description of the system forms the second half of 
the proposed paper. For the sake of clarity and brevity in 
- 267 - 
this summary, we refer to the attached schematic representat- 
ion of the translation process in Bede. In the full version o~ 
this paper, each step is to be outlined in rather more detail. 
The analyser uses a chart-like structure (of. Kaplan, 
1973) to produce the interface trees of the abstract inter- 
lingual representation. These trees serve as input to syn- 
thesis, where they are reazTan~ed into valid surface structur- 
es for the target ~age. 
The source text is translated sentence by sentence (or 
equivalent). Text is first s.ubJeoted to a two-sta~e morpholog- 
ical analysis. In the first stage the text is compared word 
by word with a stop-list of frequently occurring words (most- 
ly function words)! words not found in the stop-list undergo 
morphological analysis, again on a word by word basis. Morpho- 
logical rules form a finlte-state grammar of affix-strlpping 
rules (°A rules') and the output is a chart with labelled 
arcs indicating lexical item and possible interpretation of 
stripped affixes, as confirmed by dictionary look-up. The 
morphological analysis phase also creates a temporary "sent- 
ence dlctionary~ consistin~ of copies of the dictionary entr- 
ies for (only) those lexical items found in the current 
translation unit. 
The chart then undergoes a two-sta~e eyntactico~semant- 
io an~ysis. In the first stage, context-sensitive phrase- 
-structure rules ("E rules") work towards creating a single 
arc spanning the entire translation unit- arcs are labelled 
with appropriate syntactic class and syntactico-semantic 
feature information and a trace of the lower arcs which have 
been subsumed. In the second stage, the tree structure implied 
by the labels and traces on these arcs is disjoined from the 
~aph and undergoes general tree-to-tree-transduction rules 
("T z~les') resulting in a single tree structure representing 
the canonical form of the translation unit. With source-lang- 
uage lexical items replaced by unique multilingual-dictionaryj 
-268 - 
addresses, this is the interlin6ua which is passed for syn- 
thesis into the target language(s). 
Synthesis consists of a combination of T rules which re- 
assign new order and structure to the lnterlingua, and of con- 
text-sensitive rules which can be used to assign mainly syn- 
tactic feature labels to leaves ("L rule~), for example for 
the purpose of assigning number and gender concord (etc.). The 
resulting list of labelled leaves (the superi~or branches are 
no longer needed)is passed to morphological synthesis where 
a finite-state grsmmar of morphographenLlc and affixation 
rules produces the target string. 
As can be seen, the system is strictly modular, and at 
each interface only a small part of the data structures used 
by the donating module is required by the receiving module. 
The "unwanted" data structures are written to peripheral store 
to enable recovery of partial structures in the case of fail- 
ure or mistranslation, though automatic back-tracking to 
previous modules by the system as such is not envisaged as a 
maj or component. 
The "static" data Used by the system consist of the 
different sets of linEulstio rule packages, plus the diction- 
ary. The system essentially has one large multillngual diction- 
ary from which numerous software packages generate various 
subdlctionarles as required either in the translation process 
itself, or for lexloographers working on the system. Alpha- 
betioal or other structured language-speciflc listings can be 
produced, while of course dictionary updating and editing 
pack~es are also provided. 
~ Implementation details 
The system will run on any microprocessor system which 
runs under the CP/M operating system and at C.0.L. is implem- 
ented on the Intertec Superbrain with twin ~' double density 
floppy disk drives, giving a total of 320k bytes of on-llne 
- 269 - 
store. Programs are written in Pascal/M (Scrod, 1979), a 
Pascal dialect closely resembli~ UCSD Pascal. 
Key to the scheme (see overleaf): 
(_da~ra sfrucfure/) 
< d lct ionary / gra~r ~ 
enters 
creates 
uses 
is llnked to 
- 271 - 
t • 
#" 
I s 
L >i 
\j 
# 
\ / 
\ / 
*4 ! t_ 
D 
/ # i 
/ 
/ 
/ 
,#/ 
1<. I ~ I /'/ 
. --F"'-. _ ~/S/' ' 
+ ",,, C~j ", 
X Ill. 
~ o w ~ r-i r-I 
~ 0 ~ 
~dJ 
~'° I 
\ I 13 
:.~ ~ 
• I 
\ I 
Schematic representation of the translation process in Bede 
• - 272 - 

References 

Ducrot, J.M. (1972) - Research for an automatic translation 
s~stem for the diffusion of scientific and technical 
textile doQumentation in En~llsh-speakln~ countries: 
l~nal report. Boulogne-Billancourt: Institut Textile 
de Prance. 

Elllston, J.S.G. (1978) - Computer aided translation: a busi- 
ness viewpoint. In Snell, B.M. (ed.) - Translatin~ and 
the computer: Proceedings of a Seminar I London I 14th 
November~ 1978. Amsterdam (1979): North-Holland. 149-158. 

Hutchins, W.J. (1978) -Machine translation and machine aided 
translation. Journal of Documentation 34, 119-159. 

Johnson, R.L. (1979) - Contemporary perspectives in machine 
translation. In Hanon, S. and Pedersen, V.H. (eds.) - 
H tunan translation machine translation: Papers from the 
lOth Annual Conference on Computational Linguistics in 
Odsnse a Denmark I 22-2~ November. 1979 (Noter og Kommen- 
tater 39). Odense (1980): Romansk Institut, Odense Unl- 
versitet. 133-147. 

Kaplan, R.M. (1973) - A general syntactic processor. In Rustln, 
R. (ed.) - Na.tural Lan~e Processln~ (0ourant Computer 
Symposium 10). New York: Algorlthmics Press. 193-241. 

King, M. (1981) - EUROTRA - a European system for machine 
translation. Lebende Sprachen 26, 12-14. 

Lawson, V. (1979) - Tigers and polar bears, or: translating 
and the computer. The Incorporated Linguist 18, 81-85. 

somere, H.L. and ZcNaught, J. (1980) - The translator as a 
computer user. The Incorporated Linguist 1~, 49-53. 

Sorcim (1979) - Pasoal/M user's reference manual, Walnut 
Creek, CA: Digital Marketing. 

Vauquole, B. (1975) - La traduction automatique a Grenoble 
(Document de Linguistique Quantitative 24), Paris: 
Dumod. 

Yeillon, G. (1969) - Description du language pivot du systeme 
de traductlon automatique de CETA. T.A. Informations I I 8-17. 
