I OIO~S :iN THE ROSETTA MACHINE TRANSLATION SY.~TEM 
Andr~ Schenk 
Philips Research Laboratories 
Eindhoven, The Netherlands 
Abstract 
This paper discusses one of the problems of machine 
trans\].ation, n.m~mly the translation of idioms. The paper 
describes a solution to this problem within the theoretical 
framework of the Rosetta machine translation syst~n. 
Rosetta is an experimental trans\] at\]on system which uses an 
intermediate lard,mate and translates between Dutch, English 
and, in the future, Spanish. 
I ~nt roduet ion 
Idioms have been told still are a basic theoretical sttnlb- 
ling block in most linguistic theories. For the purposes of 
machine translation or, in genera\], natural language pro- 
cessing, it is necessary to Ix~ able to deal with :idioms 
because there are so i~any of th~n in every language and 
because they are an essential part of it. 
ldioms occur in sentences as a number of words, possibly 
scattered over the sentence and possibly with sonde inflec- 
ted el~nents; this ntnfl~er of words has to be interpreted as 
havip4~, one primitive meaning. For example, in (1) "nade", 
"peace" and "~.Rth" have to be interpreted idlomatically. 
Note that words that are part of the idJ~n are underlined. 
(1) lie has made his peace with his neighbour 
The classic example Is (2): 
(2) Pete kicked the bucket 
Literally this sentence means that Pete hit a specific 
vessel ~¢\[th his foot. In the idiomatic reading the inter- 
pretation is that Pete died. It is impossib\]e to infer this 
idiomatic meaning directly fron the prlm~tives "Pete", 
"kick", "the" and "bucket" and from the way they are 
eomblned. 
Idioms can undergo sy~itaetie transformations, but sometimes 
they are reluctant to do so. The passive sentence (3) has 
lost its idiomatic reading, while in the ~assive sentence 
(4) the idiomatic reading has heen retained . 
(3) The bucket was kicked by Pete 
(4) Mary's heart was broken by Pete 
Other examples are (5-12). In the idiomatic reading in (5) 
clefting with the object as focus is not allowed, while it 
is allowed in (6) if "Mary" is stressed. Clefting with the 
subject as focus in both (7) and (8) is permitted. In (9) 
the PP "at whose door" and in (I0) the NP "whose heart" can 
be subject to wh-movement. In (ii) the NP "Mary's heart" 
can be topicallzed (if "Mary" is stressed), but in (1.2) tbe 
NP "the bucket" cannot undergo this transformation without 
losing the idicanatic reading. Thus idioms behave syntacti- 
cally like non-idiomatic structnres, although sometimes 
they are restricted . 
(5) It was the bucket that Pete Idcked 
(6) It was Mart's heart that Pete broke 
(7) It was Pete that kicked the bucket 
(8) It was Pete that broke Mary's heart " 
(9) At whose door did Pete lay his failure 
(i0) those heart did Pete say that Mary broke 
(ll) Mary's heart Pete broke 
(12) The bucket Pete Icicloed 
Idioms can take free arguments or can have elenents, llke 
possessive pronotms, which have to be hound hy arguments. 
In sentences (13-1.5) "Mary" \].s a complement to the idio- 
matic verb, and realizes different grammatical functions in 
the sentence (i.e. indirect object, possessive NP and to-PP 
object respectively)° In (16) the pronoun "his" has to Im 
bound by the subject "l?ete". 
(13) Pete gave Mary the finger 
(14) Pete broke Mary's heart 
(15) Pete Laid down the law I:o Mary 
( 16 ) Pete _19st - hi s .t_empe_r_ 
l.ingtdstic theories on idioms should he able to account for 
the prohibits out\]\]ned above. The proposals made are usually 
fragmentary, ill the sense that they only are concerned with 
part of the probl~n, for instance Fraser (1970), who only 
deals with the possible application of transfo~vaations to 
idioms, or they are a relatively minor part of a larger 
theory, for example Chomsky (1981.), who gives a very 
general and prfneipled account of idioms, Nit cannot cope 
with all the data. More elaborate studies on idioms are 
usually not directly relevant to machine translation, for 
instance Boisset (1978), who treats idioms frc~n a more 
pragm~qtic lYoint of view. To illustrate it could be argued 
that Chomsky (1981) can cope with sentences such as (2) and 
(15), but not wtt:h (13), (1.4) and (16); Pesetsky (1985) can 
deal with (2) or (13-16), hut not with a sentence lilm: 
(17) Pete lald his failure a:t Mary's d_oo ~ 
Chonsky (198\[, p. 146, note 94) claims that "we may think 
of an idiom rule for an idiom with a verbal head as a rule 
adding the string aVc to the phrase marker of each terminal 
string abc, ~dmre b is the idiom, now understanding a 
phrase marker to be a set of strings" and that idioms 
"appear either :In D-structure or S-strncture or LF-form." 
Furthermore "at: D-structure, idioms can be distinguished as 
subject or not snhject to Move alpha". 
Thus here it is possible to reanalyse a string abe ~nto aVc 
as for example for sentence (2) in figure (18), where the 
reanalysis is indicated by a double tree and where a is 
"Pete", b is "kick the bucket" and c is empty: 
(1.8) s 
NP VP 
Pete 
kick 
f/ V 
It seams that on this approach ele~tents of idioms must be 
adjacent at a certain level (D-structure, S-structure or 
LF-form), which is the case for sentence (2). ttowever, in 
sentence (14) the parts of the idiom "breok" and "heart" 
are not adjacent at any level, since the free argument 
"Mary" is situated between the idiom parts and in (16) 
"lose" and "temper" are not: adjacent at any level either. 
IIence this theory is not able to deal with every type of 
idiom. 
319 
According to Pesetsky (1985) in a configuration such as 
figure (19) B and E may undergo a rule of idiosyncratic 
interpretation, if E is the head of C. 
(19) A 
B C 
D E 
For sentence (14) in which "heart" is the head of the NP 
dominating '~ury's heart", the Me of idtosyncratie Inter- 
pretation is allowed, resulting in: 
(20) S 
NP VP 
Pete 
V NP 
break~ ~ 
i Np N 
; ~ heart 
: Mary's "' 
V 
In the above tree, the effect of the rule of idiosyncratic 
interpretation is indicated by the dotted lines; the effect 
is that the idiom parts are mapped onto one meaning. 
As snggested by Pesetsky, this would also account for 
sentence (13) if we follow F~yne (1982) in his analysis of 
double object constructions. Kayne claims that "NP the 
finger" forms a constituent with "the finger" as its head, 
so the rule of idiosyncratic interpretation is allowed. 
Sentences (17) and (21-22) are problematic even under this 
analysis: 
(21) Pete rammed his lack of money down Mary's throat 
(22) Pete gave Mary credit for her work 
Figure (23) gives a representation of sentence (21) in 
which "his lack of money" and "Mary" are free arguments: 
(23) S 
Pete 
his lack of money P NP 
down ~ 
NP N 
throat 
Mary's 
Since "throat" and "down" are heads of their constituents, 
one might suggest a successive application of the rule of 
idiosyncratic interpretation, but it is not clear how such 
a rule should operate and since every constituent has a 
head and syntactic categories are no barrier to rule 
application, the domain in which this rule is permitted is 
unlimited. 
It seems that Chomsky (1981) and Pesetsky (1985) are not 
capable of dealing with the counter examples given here. 
The treatment of idioms presented in this paper can cope 
with these phenomena because it is based on the asst~nption 
that elements of idioms neither have to be adjacent at the 
level of interpretation nor do they have to be in the 
320 
specific configuration proposed by Pesetsky. 
In the field of compntational linguistics not much atten- 
tion has been paid to idioms. Some examples are Rothkegel 
(1973) and Wehrli (1984). However, in their proposals 
idioms are treated in the lexicon or morphology and there 
is no apparent way to account for the scattering of 
elements of idioms in sentences. 
The organisation of the rest of the paper is as follows: in 
section (2) an outline of the theoretical framework of t|~ 
Rosetta machine translation system will be given; section 
(3) discusses idioms within this frmuework; section (4) 
discusses some of the typical problems mentioned in the 
introduction. 
2 Outline of Isomorphic M-Grammars 
The Rosetta system is based on the "isomorphic grammar" 
approach to machine translation. In this approach a sen- 
tence s" is considered a possible translation of a sentence 
s if s and s" have not only the same meaning but if they 
also have similar derivational histories, which implies 
that their meanings are derived in the same way from the 
same basic meanings. This approach requires that "iso- 
morphic grammars" are written for the languages under 
consideration. 
The term "possible translation" should be interpreted as 
"possible in a particular context". The discussion in this 
paper will be restricted to the translation of isolated 
sentences on the basis of linguistic knowledge only. 
In the following sections the notions M-gremmars, the 
variant of Montague grammar used in the Rosetta system, and 
isomorphic grammars will be introduced. For a more detailed 
discussion of isomorphic M-grammars the reader is referred 
to Landshergen (1982, 1984). In section (2.3) an example of 
an M-grammar will be given. 
2.1 M-Gray, mrs 
The grammars used in the system, called M-grammars, can be 
seen as a computationally viable variant of Montague 
Grammar which is in accordance with the transformational 
extensions proposed by Partee (1973). This implies that the 
syntactic rules operate on syntactic trees rather than on 
strings. Restrictions have been imposed on the grammars in 
such a way that effective parsing procedures are possible. 
An M-grammar consists of (i) a syntactic, (ii) a morphol- 
ogical and (iii) a semantic component. 
(i) The syntactic component of an M-grammar defines a set 
of "S-trees". 
An "S-tree" is a labelled ordered tree. The labels of the 
nodes consist of a syntactic category and a list of 
attribute-value pairs. The branches are labelled with the 
names of syntactic relations, such as subject, head, 
object, etc. 
An M-grammar defi~es a set of S-trees by specifying a set 
of basic S-trees and a set of syntactic rules called 
"M-Rule s". 
An "M-Rule" defines a partial function from tuples of 
S-trees to S-trees. 
Starting from basic expressions, an expression can be 
formed by applying syntactic rules. The result of this is a 
surface tree, in which the labels of the terminal nodes 
correspond to words. This process of making an expression 
is represented in an M-grammar by a "syntactic derivation 
tree', in which the basic expressions are labels of the 
terminal nodes and the names of the rules that are 
applicable are labels of the non-terminal nodes. In the 
example below (Fig. (25)), rule R I makes the NP "the cat" 
from the 'basic expression "cat" and rule R 2 makes the 
S-tree for the sentence (24) on the basis of the NP and the 
basic expression "walk" (the constructions to the left of 
the dotted llrms are abbreviations of ~lat the result of 
the application of the rule looks like). 
(24) the cat is walking 
(25) SEN%EN~ ...... R 2 ~z_J-- ~'~-- . /X 
the cat is walking / ~ 
~NP ...... R I walk 
J the cat cat 
(ii) The morp|mlogical component relates terminal S-trees 
to strings° This conu~nent will be ignor~l in the rest of 
the discussion. 
In this way the syntactic and the n~rphological component 
define sentences. 
(iii) The sem~mtic c~nponent. M-grammars obey the composi- 
tionality principle, i.e. every syntactic rule and every 
basic S-tree gets a model-theoretical interpretation. For 
translation purposes only the names of meanings and the 
names of meaning rules are relevant as ~il\] be shown later. 
The model-theoretical interpretation of the basic S-trees 
and the synt~tic m*les is represented in a "s~lantic 
derivation tree", which has the same ~eo~l~try as the 
syntactic derivation tree, hut is labelled with naales of 
meanings of r~es and basic expressions. An example is 
given below in (27). 
Before giving an example of an M-gramnmr in section (2.3), 
isomorphic M-grm~mars will be discussed. 
2.2 Isomorphic M-Grammars 
To establish the possible translation relation the gramnars 
must be attuned to each other as follows: 
- For each basic express:ion of a gr~maar G of a language L 
there is at least one basic expression of a grammar G" of a 
language L" with the same meaning. 
- For each syntactic rule of G there is at least one 
syntactic rile of G ~ corresponding to the sane meaning 
operation. Syntactically these roles nmy differ consider- 
ably. 
Two sentences are defined to be (possible) translations of 
each other if they have derivation trees with the sm~e 
geometry, in which the corresponding nodes are labelled 
with names of corresponding rt~es and basic expressions. If 
this is the case then the derivation trees are isomorphic 
and the two sentences have the same semantic derivation 
tree. 
Grammars that correspond to each other in the way described 
above will be called "isomorphic grammars" if the corre- 
sponding rules satisfy certain conditions on application, 
such that for each well-fonned syntactic derivation tree in 
one language there is at least one corresponding well- 
formed syntactic derivation tree in the other language. A 
syntactic derivation tree is well-formed if it defines a 
sentence, ioe~ if the rules are applicable. 
The following is an illustration of these principles= X~e 
left part of figure (27) shows the derivation tree of 
sentence (26) which is the Dutch trm~slation of sentence 
(24). Rule R~ h~ilds the NP "de kat" from the basic 
expression "kat' and rule R" 2 constructs the expression "de 
kat loopt" from the NP and the basic expression "lopen". 
There is a correspondence between both the basic ex- 
pressions and the syntactic rules of the two granmars. Each 
rule of the syntactic derivation tree is mapped onto a 
corresponding rule of the s~antic derivation tree and each 
basic expression is mapped onto the corresponding basic 
meaning. 
(26) de kat loopt 
( 27 ) Dutch English 
.... ......... Itbe cat de kat I 
kat B 1 cat 
s~tactic s~tmntic syntactic 
deriwltion derivation derivation 
tree tree tree 
The Rosetta machine translation syst~n is based on the 
isomorphic gram~qrs approach. The semantic derivation trees 
are used as the interlingua. The analysis cc~ponent trans- 
lates sentences into semantic derivation trees; the gener- 
ation component translates semantic derivation trees into 
target lang~mge sentences. In this paper the translation 
relation will ~ discussed generatively only. 
2.3 All Example of an M-Gr~mmmar 
In this section an example will be given of an M-grammar 
that generates sentence (28)i 
(28) Pete lends the glrl a book 
Only those M-Rules that are relevant to the discussion in 
the following sections will be dealt with. Note that the 
rules given here are in an info~xaal notation. 
The M-gran~nar needed for this exanple: 
(i) basic S-trees: 
VERB(lend) 
(in this infomnal notation the syntactic information in the 
basic S-trees, given in the focal of attribute-value pairs, 
has been omitted) 
NOUN(Pete) 
NOUN(girl) 
NOUN(book) 
VAR(xl) , VAR(x~),... 
(VAR'§ are s~itactic variables corresponding to logical 
variables) 
(ii) M-Rules: 
Some notationa\] conventions: 
- tl, t , etc. are S-trees, 
mu's ~ndicate arbitrary strings of relation/S-tree pairs, 
- square brackets indicate nesting, 
- in an expression of the fonn det/ART(the) det is the 
relation, ART the category and "the" a literal. 
So an expression like CL\[subJ/NP, head/VERB, n~l\] stands 
for: 
CL 
j / " NP VERB 
R I : if t I is of category VERB and t 2 is of category VAR with index i and 
t 3 is of category VAR with index j and 
321 
t 4 is of category VAR with index k 
then: CL\[subj/t , head/t , iobj/t , obj/t \] 2 i " 3 4 
The rule operates on a ditransitive verb and three vari~ 
ables at~d makes a clause in which the variables are the 
subject, indirect object and direct object respectively. 
R 2 : if t\] is of category NOUN 
then:" NP \[head/t i\] 
R 3 if t I is of category NOUN 
then: NP\[det/ART(the), head/tl\] 
R 4 : if t\] is of category NOUN 
then: NP\[det/ART(a), head/tl\] 
R5 i: if t I is of category NP and~ 
' t~ is of the form CL\[subj/VAR(xi) , mul\] 
then: z CL\[subj/tl, mul\] 
This is a rule scheme with an instance for every variable 
index i. The rule substitutes an NP for the subject 
variable. The same holds for rules R~ ~ and R~ , in which 
the NP's are substituted for the indif@.dt and di~ct object 
respectively. 
R6,j: if t I is of category NP and 
t 2 is of the form CL\[mul, iobj/VAR(xj), mu2\] 
then: CL\[mul, iobj/t I, mu2\] 
~,k: if t I is of category NP and 
t 2 is of the fon1~ CL\[mul, obj/VAR(x,.)\] 
then~" CL\[mul, obj/t~ \] 
if t I has the form ~l,\[s\[tbj/NP, llead/vERB, mull 
R8 : then: SENTENCE\[subj/NP, head/VERB, mu\].\] 
Apart from changing the category, this rule assigns the 
tense to the verb and specifies the form in accordance with 
the number and person of the subject, which is not 
indicated here (the correct form is spelled out in the 
morphological component ). 
In this example the rules operate as follows: 
-Rule R I applied to "lend", VAR(xi) , VAR(xj) and VAR(Xk) 
as indicated, 
- rule Rp applied to "'Pete" gives NP(Pete), 
- R 3 applied to "girl" renders NP(the girl), 
- R 4 applied to "'book" NP(a book), 
- rule R 5 { applied to "lend" and NP(Pete) renders CL(Pete 
lend x. x~'~, K ., 
- appl~cation of R~ . to lend" and NP(the girl) renders 
CL(Pete lend the gi~IJx~), ~ 
- application of ~ k fro "lend" and NP(a book) resLtlts in i 
CL(Pete lend the gir~ a book), 
- application of R 8 gives £EhTI~NCE(Pete lends the girl a 
book). 
~he derivation tree for this example is represented in 
(29): 
(29) R8~ 
R7 .k 
book "~- J'~"" 
R 
Pete eno x i xj x k 
3 Idioms and Isomorphic M-Grammmrs 
Traditionally, in Montague semantics, as for instance in 
the PTQ paper (Moutague, 1973), a basic expression has a 
primitive meaning. However, the semantic concept basic 
expression does not always coincide with what one would 
call a syntactic primitive. This is the case, for instance, 
with idioms. For exmnple the idiom "kick the bucket" has 
the primitive meaning "die', but the syntactic primitives 
are "kick", "the" and "bucket". 
322 
For reasons given in the introduction it is impossible to 
treat idioms as strings (i.e. syntactic primitives)° The 
possibility of applying syntactic transformations to (el- 
oments of) idioms, which are also applicable to non-idio- 
matic constructs, suggests that idi~l~s should be treated as 
having complex constituent structures, which are similar to 
non-ldiomatic constituent structures. The possibility of 
having free arguments, which are realized by various 
grammatical functions, suggests that parts of idioms do not 
have to be adjacent at any level of the syntactic process. 
The complex idiomatic constituent structure should accommo- 
date this. 
In Rosetta, before idioms were introduced, basic ex- 
pressions were terminal S-trees, i.e. tenminal nodes. 
Idioms can be treated as basic S-trees that have an 
internal structure. This type of expression is an example 
of what will be called a "comp\]ex basic expression" (CBE). 
A CBE is a basic expression from a semantic point of view, 
i.e. it correspands to a basic meaning, and a complex 
expression from a syntactic point of view, i.e. it is a 
non-terminal S-tree. For exmnple, the basic S~tree for 
"kick the bucket" looks like the following: 
(30) VERB 
VAN VERB 
V 1 "kick" det/~hea d 
/ x ART NOUN 
"the" "bucket" 
By extending the notion of basic expression in this way the 
attuning of grammars (as defined in section (2.2)) is 
easJer to achieve: corresponding basic expressions may be 
CBE's. For ex~nple the D~itch verb "doodgaan" may correspond 
to the English idiom "kick the bucket". Special measures 
are necessary to guarantee that the rules obey the condi- 
tions on application (cf. section (2.2)). 
Basic expressions are listed in the basic lexicon of a 
grammar. A CBE is represented as a canonical surface tree 
structure in the lexicon. A canonical surface tree struc- 
ture is the default tree structure for a certain sentence, 
phrase, etc., i.e. the structure to which no syntactic 
transformations have applied. For example: if there is a 
passive transformation, the canonical structure is in the 
active form. Figure (32) shows the lexicon representation 
of the idiom: 
(31) x I lend x 2 a hand 
(32) VERB 
sub J -. \]~ea~ iob~jbj~j obj 
VAR VERB VAR NP 
V 1 "lend" V 2 de/~k ead ~ .. 
/ \ 
ART NOUN 
"a" "hand" 
The VAR nodes are not specified (i.e. not referring to an 
actual VAR) in the dictionary. These variables will be 
replaced by syntactic variables, when the CBE is inserted 
into the syntactic tree, Apart from the category VERB and 
the usual attribute-value pairs~ the top node contains a 
set of attribute-value pairs that indicates which trans= 
formations are possible. 
3.1 Treatment of Complex Basic Expressions 
In this section an extension of the M-grammar of section 
(2.3) will be given that can deal with an interesting class 
of Complex Basic Expressions and t~o M-grarmnars will be 
related to each other according to the isomorphy approach. 
Some other reasons for having complex basic expressions 
will be given. 
3.1.1 An Example of an M-grmmmar for Complex Basic 
Expressions 
In this section an M-graml~mr will be presented that 
generates the idiomatic sentence: 
(33) Pete lends the girl a band 
The grmmnar of section (2.3) is extended in tile following 
way: 
(i) basic S-trees 
VERB(V I lend V 2 a hand) 
(ii) M-Rules: 
R 9 : if t I is of the form VERB\[suhj/VI, 
head/~\]U~B, iob~/V~\] and 
t 2 is of category VAR with index~i and 
t 3 is of category VAR with index j 
tllen: CL \[ sub J/t2 ,. head/VERB,• Lobj/t 3 \] 
This rule expects a complex, transitive verb and two 
variables; it constructs a clause in which tile variables 
are the subject and the indirect object. 
For this example the rules operate as follows: 
- R 9 renders CL(x i lend x. a hand), 
R 2 and R 3 as in section3(2.3), 
R5 i renders CL(Pete lend x. a hand), 
R6'. gives CL(Pete lend theJgirl a hand), 
ru~ R 8 results in CL(Pete lends the girl a hand). 
The derivation tree for this sentence is represented in the 
left part of figure (37). 
The resnlt of application of rules Rp, R2, R , R5.i and ..... 3 
R is represented as a tree structure in figure (34")." 6,j 
(34) CL ~J/¢I ~-- 
NP VERB NP NP 
I "lend" A /k 
hem\[ det head det head I / 
• \ / \ 
ART NOUN ART NOUN NOUN 
"Pete" "the''glrl" "a" "hand" 
111is construct is slmilar to the construct made after 
applying, rule~,~ R. to R 7_~, in the example of section (2.3). 
One of the basic expresJions differs. So the structures can 
he idiomatic or non-idiomatic and other rules of tlle 
M-granmmr (e.g. wh~novement or passiw\[sation) are appli- 
cable to both these structures, unless, as In tile case of 
certain idioms, they are prohibited as indicated at the top 
node. 
3.1.2 Complex Basic Expressions and Isomorphic Grammar~ 
Assume we have an M-grammar that generates the Dutch 
sentence (35) wldch is a translation of (33). It is then 
possible to let the English M-grammar given above for (33) 
correspond to this grmnmar in the following way: 
(35) Pete helpt het meisje 
(36) R M 6 " 
R M R" 
R~ "3 /~.~ 73 M4 73 /'k 
" 
Pete Vla .. V 2 B 1 B 2 Pete help 
Here "Pete" in both languages corresponds to the basic 
meaning B , "V lend V 2 a hand" and "help" to B , rules R 9 1 \]. 2 
and R" correspond to meaning rule MI, R 2 and R" 2 to M2, R 5 
and R'~ to M4, etc.. 
In this ~ray it is possible to establish a correspondence 
between complex basic expressions in one language and basic 
expressions that are not complex in another. In a similar 
fashion it is possible to establish a relation between 
complex basic expressions in one language and cc~nplex basic 
expressions in another. Note that in this way it is not 
necessary to incorporate a so-e&lled structural transfer in 
the machine translation syst~n for the translation of 
CBE" s. 
3.2 Other Reasons for Waving Complex Basic Expressions 
Expressions that: are not idiomatic, but that consist of 
more than one word can be handled by means of a c~nplex 
basic expression in order to retain the isomorphy. This is 
the case if the expression (i) corresponds to an idiom or 
(ii) corresponds to a word in another lmlguage. Examples 
are tlle follo~clng: 
(i) In l)utch (37) is not an idiom it* the sense defined 
above (i.e. the meaning of the expression "kwaad worden" 
can be composed in a natural way from "kwaad" and "wor- 
den"), but has an idiomatic equivalent in English (38). 
(37) kwaad worden (Eng. "become angry") 
(38) lose one's temper 
If "kwaad worden" has to correspond to "lose one's temper", 
then in a technical sense, in Dutch, "kwaad worden" can be 
treated in the s~ne way as an idiom. 
(ii) The Italian word (39) which translates into F.nglish 
(40) and Spanish (41) whieh translates into English (42) 
are words that correspond to complex expressions in English 
(and Dutch). From a translational point of view cases like 
"get up early" can be treated in the same way as idioms. 
(39) adagiare 
(40) lay down with care 
(4\]) madrugar 
(42) get up early 
4 Some Typical Problems 
In this section sonle of the problems mentioned in the 
introduction will he briefly discussed. 
4.1 Argument Variables Embedded in a Complex Bas:\[c 
Expression 
In sentence (43) there are two arguments "Pete" arid "Mary" 
and the idiom "x. break x_'s heart". The subject ("Pete") \[ z 
is treated in the same way as in the previous examples. The 
argument substitution 1"ele substitutes the variable by the 
NP "Pete", giving the structure in (44), in which, event- 
tmlly, the NP "Mary" substitutes for the argument variable 
x I. Special M-Rules will have to be added to an M-granmar 
to achieve this kind of substitution. "Normal" argument 
323 
substitution rules substitute for the variables in their 
canonical positions, i.e. as a subject or (indirect) object 
directly under the clause node or as an object to a 
preposition in a prepositional object. 
(43) Pete broke Mary's heart 
(44) CL 
NP VERB NP 
hJad "break" det/\head 
I / \ 
NOUN VAR NOUN 
"Pete" x I "heart" 
The argument substitution rule for this type of construct 
looks like the following: 
Rlo,h: if t I is of category NP and 
t 2 is of the form CL\[n~al, 
NP\[det/VAR(~), mu2\], mu3\] 
then: CL\[mul, NP\[de~/tl, mu2\], mu3\] 
In this rule tl is assigned genitive case. 
Rule R. 0 h applied to NP(Ymry) results in CL(Pete break 
Mary's ~e'ar t). 
In this way it is also possible to deal with the constructs 
mentioned in the introduction, as for example "x\] ram x 2 
down x3"s throat"~ 
4.2 Variables Bound hy Arguments 
Sentence (45) contains a possessive pronoun "his", that 
refers to the subject "the boy". In the lexicon the basic 
expression is represented as in (46). 
(45) the boy lost his te~nper 
(46) VERB 
I VAR VERB NP 
V 1 "lose" /N 
det head / \ 
POSSPRO NOUN 
V I "t~nper" 
The M-Rule that inserts the CBE makes all possible forms of 
the possessive pronoun (his, her, their, etc.). The substi- 
tution rule for the subject decides upon the form of the 
possessive pronoun. 
In (47) the possessive pronoun "her" is bound to the object 
"the woman". The treatment here is similar to the one 
above. The argument to which the pronoun has to be bound is 
indicated in the lexicon. 
(47) the man gave the woman her freedom 
5 Conclusion 
The method described in this paper for the treatment of 
idioms can deal with the problems traditionally related to 
expressions of this type. Structural transfer is not 
necessary, since idioms are mapped onto basic meanings. The 
grammar can operate on idiom structures in the same way as 
it operates on non-idiomatic structures , whil@, in the case 
of certain idioms, restrictions on operations are spec- 
ified. A test implementation in the Rosetta machine trans- 
lation system has shown that this approach is promising. 
324 
Acknowledgements 
The author would like to thank all n~mbers of the Rosetta 
team, particularly Jan Landsbergen and Jan Odijk, for their 
helpful comments on earlier versions of this paper. This 
work was supported in part by a grant from the Nederlandse 
Her st ruc tur eringsmaat schappij (NEHEM). 
Notes 
(i) Different native speakers of a language may vary in 
their judgements about the possible transformations an 
idiom may undergo. 1hough this Doses a problem, it will be 
ignored for the present. 
(2) This paper deals only with idioms with a verb as head. 
Idioms of the type "spic and spaN' and "at any rate" are 
"fixed', i.e. they cannot undergo any syntactic trans- 
formations. They are therefore less interesting from a 
theoretical point of view. In Rosetta fixed idioms will be 
treated as one word in the morphological component. 
(3) Basic S-trees are similar to the Montague gralmmr 
concept of basic expressions. The term basic expression 
will be used frequently to indicate both~ 

References 

Boisset, J. (1978), Idioms as Linguistic Convention (with 
illustrations from French and English), Doct. Diss. 
(University Microfilms International, 1981). 

Chomsky, N. (1981), Lectures on Government and Binding 
(Foris Publications, Dordreeht). 

Fraser, B. (1970), Idioms Within a Transformational 
Gramnmr, Foundations of Language 6, pp. 22-43° 

Kayne, R. (1982), Unambiguous Paths, in May, R., and Jo 
Koster, eds., Levels of Syntactic Representation 
(Foris, Dordrecht), pp. 143-183o 

Landsbergen, J. (1982), Machine Translation Based on 
Logically Isomorphic Montague Grsnmmrs, in Horecky, 
J.(ed.), COLING 82 (North-Holland), pp75-182

Landsbergen, J. (1984), Isomorphic Grammars and Their Use 
in the Rosetta Translation System, Paper presented at 
the Tutorial on Machine Translation Lugano, to 
appear in: King, M., ed., Machine Translation: The 
State of the Art (Edinburgh University Press)

Montague, Ro (1973), l~e Proper Treatment of Quantification 
in Modern English, in: Montague (1974), pp. 247-270. 

Montague, R. (1974), Formal Philosophy: Selected Papers of 
Richard Montague, ed. by Richmond Thomason (Yale 
University Press, New Haven). 

Partee, B.H. (1973), Some Transformational Extensions of 
Montague Grmnmar, Journal of Philosophical Logic 2~ 
pp. 509-534. Reprinted in Partee, B.H.(ed), Montague 
Gr~mnmr (Academic Press, New York, 1976), pp. 51-76. 

Pesetsky, D. (1985), Morphology and Logical Form, 
Linguistic Inquiry, Vol. 16, No. 2, Spring, pp. 
193-246. 

Rothkegel, A. (1973), Idioms in Automatic Language 
Processing, in: Zampolli, A., and N. Calzolari 
(eds.), Computational and Mathematical Linguistics. 
Proceedings of the International Conference on 
Computational Linguistics (Leo S.Olschki Editore, 
Firenze), pp. 713-728. 

Wehrli, E. (1984), A Government-Binding Parser for French, 
Working Papers No. 48, Institut pour les Etudes 
Samantiques et Cognitives. (Universit8 de Gen~ve). 
