ABSTF(ACT 
Shake-and-Bake Machine Translation 
Traduceidn autom£tiea mediante refrito 
,John L. BEAVEN " 
Universidad de Cambridge 
Este ~trtfculo presenta un nuevo planteamiento 
para la traduccidn autom£tica (TA), tlamado 
Shake-and-Bake (refrito), que aprovecha re- 
cientes adelantos en lingiifstica computacional en 
cuanto a la aparicidn de teorfas de gram£ticas le- 
xicallstas bmsadas en unificacidn. Se propone que 
resuelve algunaz de las dificultades existentes en 
mdtodos basados en interlingua yen transferen- 
Cla, 
En un sistema de TA basado en transferencia~ 
este componente es especffico al par de lenguas 
entre Ins que se traduce,y por 1o tanto es nece- 
sario escribirlo ciudando de garantizar su compa- 
tibilidad con los componentes monolinl~fies. "En 
general, el mddulo de transferencia pue~le incluir 
varios centenares de reglas, y escrib\]r estas es el 
aspecto m£s costoso de\[ disefio de un sistema se- 
mejante. E1 resultado no es muy port£til, ya que 
los cambios que se realicen en los componentes 
monolingiies se reflejar~n en las reglas del trans- 
ferencia. 
Por otra parte, los m6todos de inter\]ingua ptan- 
tean lo que Landsbergen llama el problema de los subconjuntos. 
Si la mterlingua es lo suficiente- 
mente poderosa como para representar todos los 
significados de las expresiones en los idiomas en 
euestidn, habr£ vaxias formulas -posiblemente un 
ndmero infinito de ellas- equivalentes a la que pro- 
duce el anaJizador. No se puede entonces garanti- 
zar que la formula produc\]da pot el analizador de 
la Lengua Fuente (LF) se encuentre bajo la cober- 
tura del generador de la Lengua Destino (LD), a 
no ser que podamos realizar inferencias 16gicas en 
la interlingua lo cual resulta de una complejidad exceslva. 
Shake-and-Bake ofrece una mayor modularidad 
de los componentes LF y LD, que pueden es- 
eribirse con gran independencia los unos de los 
otros, utilizando consideraciones puramente mo- 
nollngfies. Estos componentes se relacionan me- 
diante un 16xico bilingiie. 
El formalismo utilizado es una variante de la 
gram£tica categorial de unillcaci6n o UCG (\[Cal- 
der et al. 88\]), que representa objetos llngfifsticos 
como conjuntos de pares de rasgos y valores, lla- 
mados signos. Los valores de estos rasgos pue- 
den ser atdmicos, variables, o a su vez conjuntos 
de pares de rasgos y variables. Se pueden repre- 
*Gracias a Enrique Torrejdn por ayud~zme con los 
t~rminos t~cnicos 
sentar como matrices utilizando la notacidn de 
PATR-1I (\[Shieber 86\]), combin£ndose mediante 
la operacidn de unificacidn. Los rasgos utilizados 
son OKTOGRAFfA, CA'/" (sintLxis en gramgtica ca- 
tegorial), OItDEN (la direecidn de la "barra", que 
especifica el orden lineal), RASGOS (un conjunto 
de rasgos sint~.cticos), CASOS (un mecanismo de 
asignacidn de casos afiadido a la UCG tradicio- 
hal), y SF~M, una sem£ntica basada en unificacidn, 
con un tratamiento de roles neodavidsoniano. 
Suponiendo que tenemos entradas ldxicas sufi- 
cientemente rlcas, lo dnico que se necesita es una 
correspondencia entre 6stas, que se obtiene del 
ldxicoq)ilingiie, junto con una serie de restriccio- 
nes para cada correspondencia. E1 sistema consta 
de tres comp.onentes: dos ldxicos LF y LD, y un 
ldxico bilinglie. 
Brevemente, el mdtodo Shake-and-Bake para 
la TA consiste en analizar la expresidn de la LF, 
utillzando la gram£tica tie 6sta. Una vez eom- 
pleto el an£1isis, se skolemizan las variables de los 
indices sem£nticos, y se ignora el £rbol sint£ctico 
de la expresidn (ya que cumplid su labor de deter- 
minar las unificaciones en la sem£ntica), lo que 
roduce una bolsa de entradas ldxicas y frasa- 
s de la LF, cuyas variables sem£nticas han re- 
sultado instanciadas como resultado del an£1isis. 
Lue~o se consultan estas entradas en el 16xico bi- 
lingue, sustituydndose por sus equivalentes en la 
LD, y respetando las unificaciones flue iinponen 
Ins correspondencias bilingiies. Finalmente, la ge- 
neracidn se realiza a l)art\]r de la bolsa de si~nos 
de la LD, que tienen-sus indices sem£nticos'\]ns- 
tanciados como resultado de todo este proceso. 
E1 principal algoritmo (3ue se presenta para la 
generacidn es una sencilla variante del conocido 
mdtodo CKY para el anLlisis, cn el que se per- 
mite que la gram£tica de la LD instancie el orden 
lineal. 
Para ilustrar los principios de este mdtodo, se 
escribi5 un pequeiio sisteina de TA bidireecio- 
hal entre castellano e inglds, y se presentan al- 
gunas de la.s entradas ldxacas que proponen solu- 
clones a algunos problemas interesantes de tra- 
duccidn. El componente castellano y el inglds 
fueron disefiados con consideraciones puramente 
monollngiies, y los tratamientos de las gram£ticaz 
son pues b~tante diferentes. 
Acrgs DE COLING-92, NAm'US, 23-28 ao~'r 1992 6 0 2 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 
Shake-and-Bake Machine Translation 
John L. BEAVEN * 
Computer Laboratory 
University of Cambridge 
New Museums Site 
Pembroke Street 
Cambridge CB2 3QG 
UK 
E-roaCh John .Beaven©cil_. cam. ac .uk 
Abstract 
A novel approach to Machine Translation (MT), 
called Shake-and-Bake, is presented, which ex- 
ploits recent advances iLL Computational Linguis- 
tics in terms of tile increased spread of lexicMist 
unification-based grammar theories. It is argued 
that it overcomes some difficulties encountered by 
transfer and interfingual methods. 
It offers a greater modularity of the monolingual 
components, which can be written with indepen- 
dence of each other, using purely monofinguM 
considerations. These are put into correspon- 
dence by means of a bilingual lexicon. 
The Shake-and-Bake approach for MT consists 
of parsing the Source Language in any usual way, 
then looking up the words in the bilinguM lexi- 
con, and finally generating from tile set of transla- 
tions of these words, but allowing the Target Lan- 
guage grammar to instantiate tile relative word 
ordering, taking advantage of the fact that the 
parse produces lexical and phrasal signs which are 
highly constrained (specifically in the semantics). 
TILe main algorithm presented for generation is 
a variation on the well-known CKY one used for 
parsing. 
A toy bidirectional MT system was written to 
translate between Spanish and Enghsh, and some 
of the entries are shown. 
1 Motivation 
'l/he research reported here was motivated by the 
desire to exploit recent trends in Computational 
*The work reported here was carried out at the Univer- 
sity of Edinburgh under the support of a studentship from 
the Science and Engineering Research Council. Thanks to 
Ann Copestake, Mark Ilepple, Antonio Sanfilippo, Arturo 
Trujillo, Pete Whitelock and the anonymous reviewers for 
their colnlltents, Any error8 relnaJn lay own. 
Linguistics, such as tile appearance of lexical- 
ist unification-ba~sed grammar formalisms for the 
purposes of machine translation, in an attempt 
to overcoine what are perceived to be some of tile 
major shortcomings of transfer and inter\]ingual 
at~proaches. 
With a transfer-based MT system, the transfer 
component is very imcch language-pair specific, 
and must be written bearing very closely in mind 
both monofingual components in order to ensure 
compatibifity. Depending on how much work is 
clone by the analysis and generation components, 
the tasks carried out by the transfer element may 
vary, but iLL gener',d this module is very idiosyn- 
cratic and will involve several hundred transfer 
rules. Writing these transfer rules is the most 
time-consmning aspect of the design of a transfer- 
based system, as it nlust be consistent with hoth 
nlonolingual grammars. The process is therefore 
error-prone, and the result is not very portable, 
since the consequences of making changes to the 
monolingual components may be far-reaching as 
far as the transfer rules are concerned. 
One of the mMu difficulties with interlingual ap- 
proaches is what Laudsbergen \[Landsbergen 87\] 
refers to as the subset problem. If the system is 
to be robust, it is essential to guarantee that any 
interlingual formula derived from ally Source Lan- 
guage (SL) expression is amenable to generation 
into tile Target Language (TL). If the interlingua 
is powerful enough to represent all the meanings 
in all tile languages involved, there will be several t 
probably iatlnitely many) formulae in that inter 
ingua which are logically equivalent to the one 
produced by the analyser. It cannot then be guar- 
anteed that this fornmla comes under the cover- 
age or the TL generator, unless we can draw log- 
ical inferences in the interfingua. The complexity 
of this task may be eompntationMly daunting, 
since suh-problems of this (such as satistiability 
and non-tautology) are known to be Nl'-complete 
(\[Garey and Johnson 1979\]). 
The approach presented here bears some similar- 
ity with that of \[Alshawi et al 91\], which uses 
AcrEs DE COLING-92, NAIVI'~. 23-28 Aovr 1992 6 0 3 Paoc. ov COLING-92. NAntES, AUG. 23-28, 1992 
the algorithm of \[Shieber et al. 90\] for generation 
from quasi-logical forms. On the other hand, gen- 
eration here takes place from a set of TL lexical 
items, with instantiated semantics, which makes 
the task easier. 
This approach was tested with independently- 
written grammars for small yet linguistically in- 
teresting fragments of Spanish and English, which 
are used both for parsing and generation. These 
are put into correspondence by means of a bilin- 
gual lexicon containing the kind of information 
one might expect to find in an ordinary bilingual 
dictionary. 
2 The grammar formalism 
A version of Unification Categorial Grammar 
(UCG) (\[Calder et al. 88\]) is used. Like many 
other current grammatical formalisms (\[Shieber 
86\], \[Pollard and Sag 87\], \[Uszkoreit 86D, it rep- 
resents linguistic objects by sets of feature (or 
attribute)-value pairs, called signs. The values 
of these signs may be atomic, variables or fur- 
ther sets feature-value pairs. They can therefore 
be represented as directed acyclic graphs or as 
attribute-value matrices using the PATR-II nota- 
tion of \[Shieber 86\]. The notion of unification is 
then used to combine these. 
The main features used in the signs are OR- 
THOGRAPHY, CAT (the categorial grammar syn- 
tax), OItDER (the directionality of the "slash", 
which specifies linear ordering), FEATS (a set 
of syntactic features), CASES (a case-assignment 
mechanism built on top of standard UCG), and 
SEM, a unification-based semantics with a neo- 
Davidsonian treatment of roles (\[Parsons 80, 
Dowty 89\]). The semantics of an expression is 
of the form I:P, where lis a variable for the se- 
mantic index of the whole expression, and P is a 
conjunction of propositions in which that index 
appears. In addition, features called ARGO, ARG\] 
and so on provide useful "handles" for allowing 
the bilingual lexicon to access the semantic in- 
dices, but they are not strictly necessary for the 
grammars 
The signs presented are only shorthand abbrevi- 
ations of the full ones used, and the interested 
reader is referred to \[Beaven 92\] for a more com- 
plete view. The PATR-II notation will be used, 
with the Prolog convention that names starting 
with upper case stand for variables. In addi- 
tion, for the sake of clarity and brevity, the non- 
essential features will be omitted, as will be their 
names when these are are obvious. 
The grammar rules used subsume both functional 
application and composition, but for the exam- 
ples given here, only functional application will 
be necessary. 
An important feature of this approach is that this 
will make it possible to have an MT system in 
which no meaningful elements in the translation 
relation are introduced syncategorematically (in 
the form of transfer rules or operations with inter- 
lingual representations). In particular, assuming 
we have very rich lexicai entries (which contain 
information about various dimensions of the lan- 
guage, such as orthography, syntax and seman- 
tics), all that is needed is a correspondence be- 
tween the lexieai entries, supplied by a bilingual 
lexicon, together with a set of constraints for each 
correspondence. 
The design of such a translation system will there- 
fore involve three components: two monolingual 
lexicons for the languages concerned, and a bilin- 
gual lexicon. The Spanish and English com- 
ponents were designed using purely monolingual 
considerations, and as a consequences the treat- 
ments of English and Spanish grammars are quite 
different. 
The basics of the grammar will be explained 
by presenting the monolingual lexical entries re- 
quired for the Spanish sentence Maria visit6 
Madrid, which corresponds to the English Mary 
visited Madrid. More linguistically interesting 
sentences will be offered at a later stage. 
2.1 The Spanish Grammar 
The Spanish grammar is somewhat an unconven- 
tional version of UCG, in that VPs are treated 
as sentences (S), and NPs as sentence modifiers 
(S/S in the eategorial notation). The reasons for 
this decision have to do with accounting for sub- 
ject pro-drop, and are discussed in \[Whitehick 88\] 
and \[\[leaven 92\]. A ease-assignment mechanism 
is added to standard UCG. Amongst other uses, 
it provides a coverage of clitic placement. 
NPs are sentence modifiers. The following one, 
for instance, looks for a sentence with semantics 
11: Seml, and returns another sentence, in which 
the semantics have been modified to state that 
F3 (an index standing for Maria), plays a certain 
(unspecified) role in the semantics of 11. The op- 
eration U stands for set union, and "all the propo- 
sitions in the semantics are interpreted here as 
being conjoined. 
(1) 
(c RTItO 'M a(ia' \] 
AT s/ s I1 : Seml 
SEM II: (\[role(I1,_R1,F3),'~ U Seml 
IS A !,name(F3,maria)J " 
no0 F3 
Since intransitive verbs axe sentences, a transitive 
verb must be a sentence looking for its object NP 
(now S/S), which makes sure that this object gets 
identified with index Y (which fills tile patient 
AcrEs DE COLING-92, NANTES, 23-28 not~r 1992 6 0 4 Paoc, OF COLING-92, NANTES, AUG. 23-28, 1992 
role). This is carried out by the case-assignment 
mechanism, not shown here. The following en- 
try for tim transitive verb will be derived from 
the base form and abstract tense morphemes (see 
below). 
(2) 
ORTHO visit6 
fvisitar(E), ) 
(rolc(E,pat,Y) J J 
\[Sem 
SEM Sem 
ARGO E 
ARG1 X 
ARG2 Y 
The third NP used just parallels tim first one: 
sentence: 
(5) 
ORTHO Maria visit6 Madrid 
CAT S 
SEM E : 
ARGO E 
AUG1 F3 
ARG2 L3 
visitar(E), 
role(E,agt,F3), 
name(F3,maria), 
role(E,pat,L3), 
name(L3,madrid) 
Since Spanish word order is relatively free (and 
in particular since the OVS ordering is possible), 
the verb does not put tight constraints on the 
directionality of the NPs. The case-assignment 
mechanism, which identifies the indices of the 
NPs, can be used to interact with the ORDER fea- 
ture if this is desired. In tim above example, the 
only thing that prevents the assignment of agent 
role to Madrid and patient role to Maria are con- 
straints on the semantic types of the arguments 
of the verb. 
(3) 
A RTIIO 'Mairid' \] 
CAT s/ s 13 : Sem3 
s~M I3: ({role0a,_m,L3), \] U Sere3 \[name(L3,madrid)J 
RG0 L3 
Signs (2) and (3) combine by means of function 
application to produce the following sentence: 
(4) 
ORTUO visit6 Madrid 
CAT s 
\[visitar(E), 
~ro|e(E,agt,Xl), 
SEM E : /role(E,pat,L3~, 
I.name(L3,madrid) 
ARGO E 
ARol X3 
ARG2 L3 
It does not subcategorize for anything, but it may 
be modiIied by the NP (3) to give the following 
2.2 The English Grammar 
The English grammar is virtually taken "off the 
shelf" and closely resembles that of \[Calder et al. 
88\], with only the addition of a case-assignment 
mechanism (not shown here). A simple NP is as 
follows: 
(6) 
Ii RTtl O 'Mary' 1 CAT nil 
SEM G3: { ...... (G3 ..... y)} 
R(~0 G3 J 
A transitive verb subcategorizes for its object and 
its subject NPs. Again, the following one is de- 
rived from that of the base form and abstract 
inflectional morphemes: 
(7) 
ORTHO visited 
~A~ s / /X2:Sem4/ / 
/fvisiting(E2), /~ 
SEM E2: |(role(E2,pat,Y2) J 
\u Seml o Sem2 / 
ARGO E2 
ARG1 X2 
ARG~ Y2 
ACqES DE COLING-92, NANTES, 23-28 AO~I" 1992 6 0 5 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 
The remaining NP is: the semantic indices of the two monolinguM signs. 
(s) 
li RTH O 'Madrid' \] 
CAT np 
s~ v3: {n~me(V3,m~y)} 
R°0 F3 J 
2.3 Structure of the bilingual lexicon 
Tile bilingual lexicon merely puts into correspom 
dence pairs of monolingual lexieM entries. In 
other words, each entry in the bilingual lexicon 
will contain a pair of pointers to monolingual em 
tries in each of the languages translated. These 
monolingual entries are very rich signs, and the 
bilingual entries may add constraints for their 
monolingual signs to be in the translation rela- 
tion. For instance, if a word has more than one 
translation depending on how various semantic 
featnres become instantiated, the bilinguM lexi- 
cal entries may express these restrictions. 
The bilingual lexicon writer needs to be aware of 
what the monolingual lexicons look like, in order 
to encode the restrictions that the bilingual sign 
imposes on the monollngual entries. As long as 
some broad conventions are followed, this task be- 
comes very straightforward. Most bilingual cor- 
respondences are very simple, and merely require 
some semantic indices in the monolingual signs to 
be unified. Provided these indices are made eas- 
ily available in predictable places of tile monolin- 
gum signs, the task of writing the corresponding 
lexical entries is very simple. When some seman- 
tic constraints need to be put on these indices, 
again it is a straightforward task. It is only on 
the occasions when syntactic constraints have to 
be included that the monollnguM signs need to 
be examined more closely, in order to determine 
how that syntactic information is encoded. 
This results in a great modularity in the sys- 
tem. Any monolingual component may easily be 
changed, without affecting to any significant ex- 
tent the bilingual lexicon, and certainly not tile 
monollngual components for any other language. 
At the same time, the simplicity of the bilingual 
component makes it practicable to write multi- 
language systems, since all the hard work goes 
into the monolingual lexicons which may be re- 
used for many language pairs, and the language- 
pair-specific information is concisely kept in the bilingual lexicon. 
The following examples represent entries in the 
bilingual lexicon. Such an entry consists of point- 
ers to monolingual signs (for instance, (9) pnts 
signs (1) and (6)into correspondence), together 
with constraints about the semantic indices con- 
talned in these signs. Thus example (9) identifies 
(9) 
(10) 
\[AnG0 E 
SPANISH \[~ \[AItG1 X\] L An°2 
ENGI~ISlI \[~ IhnG1 
LARG2 
(The above is not exactly the entry as it appears 
ill the billngnai lexicon, since correspondences be- 
tween morphemes are used, but it clarifies the 
exposition). 
(11) 
SPANISII ~ \[;EM \[AnGO F3 
Ill this very simple example, there was a one- 
to-one correspondence between monolinguM en- 
tries. More generally, tile hilinguM lexicon will 
encode correspondences between sets of nlono- 
lingual entries, with appropriate constraints on 
them (which allows us to enter idioms in the 
bilingual lexicon). Most of the time these will 
be singletons, but they may occasiomflly contain 
several elements or indeed one of them may be 
empty (if a word in one language corresponds to 
the empty string in the other, as will sometimes 
occur with function words). 
3 Shake-and-Bake 
A new algorithm for generation, developcd by 
Pete Whitelock and Mike Reape, and known as 
Shake-and-Bake is presented (see \[Whitelock 
92\] for further discussion). It can be outlined as 
follows: first of all the SL expression is parsed 
using the SL (monolingual) grammar. After the 
parse is complete the variables in the semantic 
indices are Skolemised, and lexical entries are 
looked up in the bilingual lexicon and replaced 
with their TL equivalents. Generation then takes 
place starting from the bag of TL lexical entries, 
which have their semantic indices instantiated as 
a result of the parsing and look-up process. 
Two well-known parsing algorithms (shift-reduce 
and CKY) have been adapted to do this kind of 
ACTES DE COLING-92, NANTEs, 23-28 AO~" 1992 6 0 6 PROC. OF COLING-92, NAtCrEs, AUG. 23-28. 1992 
generation instead. Generation in tiffs context 
can be seen ms a variation of parsing, in which we 
let ttm syntactic constraints instantiate the word 
order rather than letting the word order drive the 
parsing process. 
The CKY parsing algorithm may be eharacterised 
a~ follows: it uses a chart or table where all 
well-formed substrings (WFSs) that are found are 
recorded, together with their position (i.e. the 
words that they span in the string). The ta- 
ble is initialised with tile n words of the input 
string. The algorithm builds parses by tinding 
the shorter WFSs before the longer ones. For all 
integers j between 2 attd n, it records all WFSs 
of length j by looking for two adjacent strings of 
length k and j - k recorded on the table. If they 
may combine hy means of a grammar rule, the 
result is recorded on the table. 
The algorithm may be modified for generating 
strings from a bag of lexical entries. The table 
here no longer records the position of WFSs, but 
just the WFSs with the set of entries from the 
bag that they are made from. It is initialised 
by recording first all the well-formed strings of 
length 1 (the lexieal entries). Then, for all inte- gers j from 2 to n (the cardinality of the bag), 
it looks for two disjoint WFSs of length k and 
j - k recorded in the table. If they combine by 
means of an (unordered) grammar rule, the re- 
sulting string (with orthography specified hy the 
direction of the combination) is recorded on the 
table, together with the set of entries it involves 
(the union of the sets of the two components): 
Starting from the bag of TL signs above, this al- 
gorithm would first put the verb and object to- 
gether into a component, and then combine the 
result of that with the subject of the sentence. 
Linear ordering is determined hy tile TL gram- 
mar and the fact that the semantic indices a~'e 
instantiated by the tinm generation takes place. 
4 Morphology and Further ex- 
amples 
Finally we shall see how Shake-and-Bake handles 
more interesting examples, in particular those in- 
volving argument switching and head switching. 
Entries for verbs such as the ones shown above 
are derived from the base forms and single mor- 
phemes. \]~br instance, visitedis derived from mor- 
phemes for visit, 3sg and past. A similar thing is 
clone for Spanish, and the bilingual lexicon ac- 
tually puts into correspondence the hase forms 
and the separate morphemes. Correspondences 
between morphemes will be used from here on. 
4.1 Argument switching 
Argument switching, such as John likes Mary, 
which translates into Spanish as Maria gusta a 
Juan (literally Mary pleases John can be covered 
in a very simple manner. The monolingual verbs 
closely resemble (2) and (7). 
Their essential features are just: 
(12) 
I :)RTttO like \] 
flike(E1), \]\[ 
SEM El: ~role(Ei,experi ....... X1),~/ 
\[,role(E 1 ,stimulus,Y 1) J/ 
ARC,(} \[;1 J { xnG1 X1 
LARG2 Y1 
(13) 
IAttTIIO gust- \[gustar(E2), \]\] 
SEM E2: ~role(E2,stimuh,s,X2), 7| 
\[role(E2,experiencer,Y2) J | 
nG 0 E2 / 1 
\]AUG l X2 
\[AU¢;2 Y2 
The hilingual entry merely needs to cross-identify 
the semantic indices: 
(14) 
qI'ANISH 
ENGLISh \[~ 
i \[ARGO i\] \] 
\[ARG2 
ARGO E 
EM {ARG1 
LARG2 
4.2 Head switching 
A harder example is when the head word in one 
language corresponds to a non-head in the other, 
such as Mary swam across the river, which trans- 
lates as Mama cruz6 el rio nadando (literally 
Mary crossed the river swimming). 
This can be solved by putting into correspon- 
dence across with the stem cruz- as a possi- 
ble translation pair, together with the base form 
swim with nadando. Tile morphemes for 3so and 
Acids DE COLING-92, NANTES, 23-28 Aour 1992 6 0 7 PROC. OF COL1NG-92, NANTES, AUG. 23-28, 1992 
past are also put into correspondence. 
(15) 
ORTHO across 
CAT np 
SEM Crossed:Sere \] 
CAT S/ "CAT s " 
SEM El:Sem2 
ARGO E1 
ARG1 Crosser 
Seml LJ Sere2 U 
SEM El: {role(El,across,Cr .... d)} 
ARGO El 
ARG I Crosser 
ARG~ Crossed 
(16) 
\[~ RTHO cruz- 1 
AT s/NPfcruzar(E2), )| 
n2: ~role(S2,agt,Crosser), H \[sEM 
(role(E2,pat,Crossed) J J 
The bilingual entry that puts these two together 
is: 
(17) 
\]PANISH 
ENGLISH \[~ 
iRG0 E 1 \] RG 1 Crosser 
RG2 CrossedJ | 
IARG 1 Crosser 
\[ARG2 CrossedJ J 
A similar pair of monolingual entries, together 
with the bilingual entry to put them into corre- 
spondence, is needed for swim-nadando. 
(is) 
ORTHO swim 
,sAT NP:X \] CAT s/ (swim(E3) 1/ 
EM \[\] E3 lrole(E3,agt,X)~/ 
J 
ISEM \[\] 
ARGO E3 
ARGI X 
(19) 
ORTHO nadando 
SEM E4:Se 
SEM E4:({nadar(E4)} U Sere 1 
ARGO E4 
ARG 1 X 
(20) 
ARGO E SPANISH \[~ ~ARG\] 
|ARGO ENGLISH \[~ \[ARG1 
The important aspects of these signs is that the 
bilingual element correctly identifies the indices of 
the lexical entries, and the Shake-and-Bake gen- 
eration takes care of the rest. 
5 Conclusion 
I hope to have shown how lexically-driven Ma- 
chine Translation makes it possible to write 
modern, unification-based monolingual gram- 
mars with great independence from each other, 
and to put them into correspondence by means 
of a bilingual lexicon of a similar degree of com- 
plexity as one might expect to find in a commonly 
available bilingual dictionary, which could make 
it easier to automate its construction. 
These points were demonstrated by constructing 
two monolingual Unification Categorial Gram- 
mars for small fragments of Spanish and English, 
which nevertheless included some lingnistically 
interesting phenoinena. They were written inde- 
pendently, and with purely monolingual consid- 
erations in mind, which led to some noticeable 
differences in the grammar design. The monolim 
gual components were put into correspondence by 
means of a bilingual lexicon, and algorithms for 
parsing, doing bilingual lookup and generation 
were suggested, which together constitute what 
has been named Shake-and-Bake Translation. 
While the process of Shake and Bake generation 
itself is NP-complete, it is likely that average case 
complexity may be reasonable (\[Brew 92\]). In 
this sense, Shake and Bake may address issues 
raised by the Landsbergen's subset problem, since 
inference in an interlingua may not even be de- 
cidable. 

References 

\[Alshawi et al. 91\] Alshawi, It., Carter, D., Ray- 
ner, M., and Gambgck, B. Translation by 
Quasi Logical Form Transfer. In Proceedings 
of the 29th Annual Meeting of the Association 
of Computational Linguistics, pages 161-168, 
Berkeley, 1991. 

\[Beaven 92\] Beaven, J.L. Lexicalist Unification- 
Based Machine Translation, PhD Thesis, Uni- 
versity of Edinburgh, 1992. 

\[Brew 92\] Brew, C. Letting the cat out of tim 
bag: generation for Shake-and-Bake MT. Pro- 
ceedings of the I4th International Conference 
on Computational Linguistics (COLING 92), 
Nantes, 1992. 

\[Calder et at. 88\] Calder, J., Klein, E. and Zee- 
vat, lI. Unification Categorial Grammar- 
A Concise, Extendable Grammar for Natu- 
ral Language Processing. In P~vceedings of the 
12th International Conference on Computa- 
tional Linguistics (COLING 88), pages 83-86, 
Budapest, 1988. 

\[Dowty 89\] Dowry, D. On the Semantic Content 
of Notion "Thematic Role". In Chierchia, G., 
Partee, B. and Turner, R. (eds.) Property The- 
ory, Type Theory and Natural Language Se- 
mantics. Dordreeht: D. Reidel, 1989. 

\[Garey and Johnson 79\] Garey, M.J., and John- 
son, D.S. Computers and Intractability: A 
Guide to the Theory of NP-Completeness. 
W.tI. Freeman & Co, New York, 1979. 

\[Landsbergen 87\] Laaldsbergen, J. 
Montague Grammar and Machine Translation. 
In Whiteloek, P. J., Wood, M. M., Somers, H., 
Bennett, P., and Johnson, R. (eds.) Linguistic 
Theory and Computer Applications. Academic 
Press, 1987. 

\[Parsons 80\] Parsons, T. Modifiers and Quantifi- 
ers in Natural Language. Canadian Journal of 
Philosophy, supplementary Volume VI, pages 
29-60, 1980. 

\[Pollard and Sag 87\] Pollard, C. and Sag, I.A. 
Information-Based Syntax and Semantics - 
Volume 1: Fundamentals. Lecture Notes Num- 
ber 13. Center for the Study of Language and 
Information, Stanford University, 1987. 

\[Shieber 86\] Shieber, S. An Introduction to 
Unification-based Approaches to Grammar. 
Lecture Notes Number 4. Center for the Study 
of Language and Information, Stanford Univer- 
sity, 1986. 

\[Sbieber et al. 90\] Shieber, S., van Noord, G, 
Pereira, F.C.N, and Moore, R.C. Semantic- 
Head-Driven Generation. Computational Lin- 
guistics, Volume 16, number 1, pages 30-42, 
1990. 

\[Uszkoreit 86\] Uszkoreit, H. Categorial Unifica- 
tion Grammars. Proceedings of the 11th In- 
ternational Conference on Computational Lin- 
guistics (COLING 86), pages 187-194, Boun, 
• 1986. 

\[Whitelock 88\] Whitelock, P. A Feature-based 
Categorial Morpho-Syntax for Japanese. DAI 
research paper no 324, Dept. of Artificial Intel- 
ligence, Univ. of Edinburgh. Also in P~ohre, C., 
and Reyle, U. (eds.) Natural Language Parsing 
and Linguistic Theories. D. Reidel, Dordrecht, 
1988. 

\[Whitelock 92\] Whitelock, P. Shake and Bake 
Translation. In Proceedings of the 14th Mterna- 
tional Conference on Computational Linguis- 
tics (COLING 92), Nantes, 1992. 
