II 
II 
II 
II 
II 
I! 
II 
II 
A Constructivist Approach to Machine Translation 
Michael Carl 
Institut ffir Angewandte Informationsforschung, 
Mattin-Luther-Strafle 14, 
66111 Saatbr~cken, Germany, 
cad@iai.uni-sb.de 
Introduction 
Constructivist cognitive theories conceptualize memory 
as a dynamic process which is directly linked to percep- 
tion, memories and conclusion/induction (cf. \[Sch91\]). 
From this point of view, memory serves to establish 
structures that ate relevant to the cognitive system in 
the present context of action. The function of memory 
is thus to participate in coherent behavior which makes 
survival of the acting cognitive system easier (or pos- 
sible). Memories ate similar to perceptions: they ate 
perceptions without an object. Perception on the other 
hand is an activity (and not a passive process) that is 
driven by the memory. 
In some situations, for example we perceive solid bod- 
ies where -- in terms of modern physics -- there ate 
no bodies at all, but processes of energy exchange; from 
psychoacoustics it is known that we can heat sounds that 
physically do not exist. Conversely, psychological exper- 
iments confirm that perception only takes place ff an 
interpretation can be assigned to the perceived phenom- 
enon \[Foe86\], \[Foe73\]. In successive steps of abstractions 
perception destroys parts of the information which can- 
not be expressed (or ate of no importance) in the agent's 
model of the situation. This destruction will increase in 
proportion to the extent new situations cannot be han- 
dled. Confirmation of old solutions, then, will take place 
at the expense of new experiences. 
In this paper, I present a new approach to Machine 
2~auslation (MT) that -- ~rnilas to memory -- does 
not simply 'remember' known solutions of former prob- 
lems but creates new solutions for new problems. Like 
perception it 'perceives' a problem (i.e. a text or a sen- 
tence) to the degree it can handle it. Both, memory and 
MT axe successful because they generate useful (e.g. con- 
sistent) behavior for an agent in a changing environment. 
MT has existed since the very beginning of computer 
science and there are a number of different systems in 
research and on the market. However, with respect to 
many other domains, the problems of MT are due to two 
peculiarities of natural language: compositionality and 
hierarchical structure. Hieraxchical structuring is we\]\] 
known in natural language processing (and MT) and ac- 
counts for the fact that words can be recursively grouped 
into constituents. 
Composltionality in MT is a more basic phenomenon and 
refers to the observation that words or groups of words 
ate translated identically in a different context. A source 
text and its translation can thus be considered to con- 
sist of composed units that ate independently translated. 
The choice of these units, however, is much discussed in 
MT literature. If these units ate too small, the system 
may become unreliable because it may be impossible to 
produce a correct target language text. If these units ate 
too large the system may become too inflexible. 
In this paper I propose a new MT system a~chitecture 
which can accommodate a number of ditferent transla- 
tion units and is thus appropriate for a number of dif- 
ferent user needs. In a constant interaction with the 
MT system, a user can tune 'her' system in order to ob- 
tain the results she would have expected. The architec- 
ture is similar to Example Based Machine Translation 1 
(EBMT) and owes alot to Case Based Reasoning (CBR). 
In the next section I will give a brief introduction to 
CBR. In order to apply the CBR paradigm to MT, I 
shall outline the need for a decomposition component 
and the way decomposition is usually treated in MT. 
Next, I will give a definition of composltionality. I will 
then show that the similarity metric -- as it is used 
in many CBR systems -- is not an appropriate means 
in MT either for the decomposition of the source text 
or for retrieval. Similarity, instead, can be based on 
abstraction: the more abstraction is performed, the more 
dissimilar two items ate. In order for a MT system to 
react creatively on nnlrnown input a certain degree of 
abstraction is required. 
These considerations imply a number of constraints on 
the design of MT systems and offer two degrees of free- 
dom that cannot analytically be determined. The choice 
of the translation units, their language corresponden- 
cies and the degree of creativity (abstraction) to which 
a MT system may recombine these units is a matter of 
user needs and depends on the goal of the application. 
Thus, in a very limited domain a MT system is likely to 
be different from an all-purpose MT system. The impli- 
cations of these insights shall be discussed and related 
to the parameters in a MT system design. 
The remainder of the paper is dedicated to an implemen- 
tation of the outlined architecture which has the capacity 
to accommodate to a number of different user require- 
ments. I shaft call this approach Constructivist Machine 
1 The paradigm of Example Based Machine Translation 
has been started only recently, by a number of different au- 
thors (e.g. \[SN90\], \[Bro96\], \[CC96\]). 
Carl 247 A Constructivist Approach to Machine Translation 
Michael Carl (1998) A Constructivist Approach to Machine Translation. In D.M.W. Powers (ed.) NeMLaP3/CoNLL98.. New 
Methods in Language Processing and Computational Natural Language Learning, ACL, pp 247-256. 
Figure 1 
Figure 2 
The proposal will not now be implemented .¢ 
T 
~Les propositions ne seront pas raises en application maintenant 
The proposel will no~ now be implemented 
"Les propositions" ne seront t~ raises en application maintenant" 
T~anslation because similar to memory, the success of a 
MT outcome -- in this scenario -- is solely validated by 
and dynamically tuned to the needs of the acting agent 
fo~ whom the translation is being computed. 
CBR-Systems 
CBR systems axe problem-solving systems that heavily 
rely on old experiences. For this reason they can be seen 
as a generalization of MT systems, too. CBR systems 
solve new problems by adapting (modifying) solutions to 
old problems that are stored in a case base. 
According to Richter \[Ric95\] four knowledge containers 
axe available in CBR systems: 
• In the case base are stored former problems with their 
solutions (cases). 
• A similarity metric makes possible retrieval of cases 
from the case base that axe similar to new problems. 
• An adaptation mechanism modifies the solutions of 
the retrieved cases according to the requirements of 
the problem. 
• The case vocabulary contains the language in which 
cases axe written. 
Retrieval of the appropriate case(s) is based on a.~m~lar- 
ity metric which is designed such that the retrieved cases 
represent an iustantiation of the problem concept. The 
problem and the retrieved case(s) axe ~ because 
they are different ~tantistious of the same concept. 
In the adaptation step, the solution that is part of the 
retrieved case is modified where it ditfevs from the prob- 
lem solution. Adaptation is one of the most difficult 
parts in CBR systems because it depends on domain 
specific knowledge and on the outcome of the other CBR 
components. 
In both, retrieval and adaptation, the original problem is 
thus destroyed in order to compute a solution: while clas- 
sifying a problem according to the available conceptual 
classes in the case base, some parts of the problem axe ex- 
pected to be more important than other parts. In order 
to dassify the problem appropriatdy, retrieval in CBR 
-- like perception -- thus evaluates pasts of the informa- 
tion differently. The retrieved (set of) case(s) deviates in 
a number of properties from the original problem that 
seem minor to the goal of the agent. In adaptation, 
the retrieved solutions axe further modified in order to 
generate a coherent outcome of the system. 
However, in order to apply the CB11 paradigm success- 
fully to MT, a decomposition component is required that 
divides a sentence into a sequence of chunks. In the next 
section I shall explain the need for such a component 
and outline its importance in MT. 
Decomposition in MT 
In Machine Translation (MT) compositionality is cru- 
cial to attain a reasonable coverage. One of the main 
problems in MT is transfer strategy: how and when to 
translate a unit of the source language into a unit of the 
target language. 
The underlying hypothesis of all MT systems is that 
units derived from the source language string can be 
mapped onto units from which the target language string 
can be computed. I use the following English-French 
translation exampleS: 
1 The proposal will not now be implemented ~ , 
Les propositions ne seront pas raises en application 
mainten~nt 
Several possibilities of decomposition shall be considered 
in this section. The corresponding adaptation knowledge 
will be discussed. 
If this sentence (1) consist of only one chunk i.e. decom- 
position takes place at a sentence level only as in figure 1, 
no adaptation is requited. This is typical for translation 
memories (TM) (e.g. TRADOS \[Hey96\], TRANSIT) 
which have only a case base and a retrieval component s. 
TMs are likely to have quite long cases stored in the case 
base, but have a well informed distance metric to return 
slmilar cases from the case base. However, due to lack of 
adaptation, the less the retrieved cases match the sen- 
tence, the more incomplete and incorrect the translation 
is. A major shortcoming is that the increase of coverage 
does not follow the growth of the case base to the same 
extent. 
2The example is taken from the Hansards-corpus and is 
discussed in \[BCDP+90\]. For purpose of illustration I will 
consider the translation correct and desirable. 
~Some TM, however, propose as an extra an interactive 
MT system or a batch MT system that is based on a different 
(e.g. 'traditional') paradigm. 
Carl 248 A Constructivist Approach to Machine Translation 
m 
R 
B 
m 
m 
m 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
I! 
II 
Figure 3 
The proposal• will not nou be 
Les propositions "he seront pus 
implemented 
raises en application ma~ nt enanl~ 
Figure 4 
The proposal 
Les propositions 
will not now be implemented 
ne seront pus raises en application main~enant 
In figure 2, the sentence is decomposed into two chunks. 
Notice, that the English subject (the first chunk) is in 
singada~, while its French translation is in plural. Be- 
cause subject and predicate agree in number and per- 
son, the (anrillary) verbs in the second chunk has to 
take the same features as in the first chunk. The adap- 
tation mechanism has therefore to be able to reconstruct 
these agreement reqni~ements. 
If the sentence is divided into the three chunks/The pro- 
posal/, ~will not now be/and/implemented/as in figure 
3 adaptation turns out to be much more compllcsted. It 
has to reconstruct agreement between the first and the 
second chunk (in the l~ench translation for the third 
chunk too). Further, adaptation must take into account 
the discontinuity of the second chunk. Thus, although 
now and maintenant are translations of each other, they 
are separated by the interposed chunk 3 on the French 
side. The adaptation mechanism has to be able to inte- 
grate a continuous chunk into a discontinuous one when 
translating from English to Fronch and the inverse when 
translating from French to English. 
Example based Machine Translation (EBMT) corre- 
sponds to a decomposition g~anularity as shown in fig- 
axes 2 and figure 3. Some systems (d. \[SNg0\], \[CC96\]) 
localize major constituents in the problem sentence by 
means of linguistic analysis. Adaptation in these sys- 
tems is essentially a matter of replacing items (words, 
sequences of words or constituents) in the target lan- 
guage structure. Linguistic analysis serves to better de- 
termine the location and the appropriateness of potential 
items to be replaced in the target language. 
In the Pangloss EBMT (\[Bro96\], \[NBD94\]) cases are se- 
lected from the case base that contain the problem case 
(or parts of it) as a substring. By means of a thesaurus 
and a bi-lingual lexicon the translation of the problem 
case (or its respective pa~t) is extracted from the re- 
trieved cases. Adaptation of the target language chunks 
is left to a statistical language model outside the Pan- 
gloss EBMT system. 
Many traditional MT systems have an atomic case base 
and have no information about the similarity between a 
sentence and some cases i.e. only exact matching cases 
are retrieved from the case base. According to figure 4, 
sentences are very free-grained and the main translation 
is carded out by the 'adaptation' mechanism. All three 
generations of traditional MT systems (cf. ~WK95\]) can 
be described by this schema. The direct approach seeks 
to map lexical items of the sotuce language onto lexical 
items of the target language and then tries to rearrange 
the tazget test. The interlingual approach (cf. \[Dor93\]) 
tries to calculate a language independent meaning rep- 
resentation from which the target text is generated. The 
transfer approach (cf. \[Str96\]) is situated in between the 
two: abstractions of the source language string are com- 
puted and then transferred (mapped) into target units 
from which the target language string is computed. T~a- 
ditional MT systems do not systematically make use of 
large chnnks that could facilitate the adaptation mecha- 
nism. They thus fail to account for what computers can 
most easily do: memorization and retrieval. 
The main di~erence, however, between traditional and 
more recent approaches to MT lies in the fact that the 
latter systems pedorm monolingual analysis and gener- 
ation while in former systems the analysis (decomposi- 
tion) of the source language is not independent from the 
regeneration possibilities in the target language. 
Compositionality and MT 
When decomposing a sentence in a particular way the 
sentence is dassitied with respect to the context 4. In the 
French sentence £e boueher sale la tranche the word sale 
can be classified as a verb (English: to salt) or as an 
adjective (English: dirty), la can be an article (English: 
the) or a pronoun (English: she~her) and t~znche can be 
a verb (English: to chop) or a noun (English: slice). 
4This view is of course ot new. E.g. in ~BDW96\] morpho- 
logical analysis (i.e. morphological classification) is defined as a decomposition task. 
Carl 249 A Construct\[vist Approach to Machine Translation 
2 French: (Le boncher) sale (la tranche) 
English: The butcher salts the slice 
3 French: (Le botcher sale) (la) tranehe 
English: The dirty butcher chops her 
If the sentence is decomposed according to (2) the phrase 
Le boucheris classified as the subject of the sentence and 
la tranche as the object. In (3) Le boncher saleis the sub- 
ject while his the object. Passing, for instance, classifies 
the components of a sentence and their relationship by 
giving it a structure according to a grammar. 
However, I will not use a grammar as a basis for de- 
composition but base the decomposition on examples. 
This, I shall show, leads to greater flexibility and better 
maintainability of the system. 
In MT different decompositions become particularly cru- 
cial if they lead to different translations i.e if the source 
language and the target language express ambiguities in 
a different way, which is quite often the case. A source 
sentence S is compositionally translatable into a target 
sentence T, if 
• it is decomposable into a set of chunks. 
A sentence S is decomposable, if it can be divided 
into a set of chunks Cl... Cn where the intersection of 
the chnnlr~' concepts equals the concept of the case: 
C(S) = I'k C(~,) 
• the case base covers the chunks. 
A case base CB covers a set of chunks c G S iff there 
exists for each c at least one solution case s E CB 
where both c and s are instantistions of the same con- 
cept: Vc G S 3s E CB : C(c) - C(s) 
• the retrieved solutions axe adaptable. 
A set of solutions sl... s,~ is adaptable, if it can be 
composed into one target sentence T and the intersec- 
tion of the solutions' concepts equals the concept of 
the result: \['k C(s,~) - C(T) 
We thus obtain a chain of conceptual equivalences during 
all processing steps as shown in equation 1. 
c(s)_= (, 
A source sentence S is decomposed into a set of chunks 
c for each of which a solution s is retrieved from the case 
base. The set of solutions is then composed into a target 
sentence T, 
Similarity and MT 
In CBR systems, the similarity metric is a means for clas- 
sifying the problem according to the empirical data in 
the case base. In this section I shall investigate whether 
a similarity metric is appropriate for decomposition and 
classification in MT. 
Similarity metrics in CBR are often based on nearest 
neighbor (NN) algorithms (e.g. \[WD95\]). NN algo- 
rithms make use of a continuous variable w which as- 
sociates a real value to the attributes ai of a problem 
a. The similarity between s problem a and a case b is 
inversely proportional to their distance Ds: 
5In a symbolic task, the distance d usually is d(a/, bl) = 0 
if al = b,, else 1. 
D(a,b) = 
i 
The nearest neighbor in the hyperspace is then retched 
as the most similar known instantiation to the problem. 
Other approaches use symbolic distance metrics. One 
such approach (\[Pla95\]) uses anti-uuification e that yields 
for each case in the case base the intersection (i.e. what 
is common) with the problem. By means of a prede- 
fined subsumption ordering these intersections are or- 
dered. The cases that are most similes to a problem are 
those that field most specific anti-unification results. 
Other approaches (e.g. \[Hut97\]) calculate for each case 
the minimal number of changes that would have to be 
done to transform it into the problem. The weighted 
sum of changes then indicates the distance between the 
case and the problem. 
While these metrics may be dynamically adapted in 
changing environments (the case base and/or the weight 
w may be dynamically altered), there axe at least two 
problems when applied to MT. 
One problem occurs when decomposing a text of arbi- 
trary length as was shown above. It is not at all evident 
how to decompose a text into units such that the adap- 
tation capacity of the system is respected. Of course, 
the smaller the units are, the higher the probability of 
retrieval success will be. But we may not necessarily be 
sure whether the retrieved solutions lead to a composi- 
tionally correct translation because it may be impossible 
for the adaptation mechanism to appropriately recom- 
pose them. 
Apart from the discontinuity of chunks as shown in fig- 
exe 3, there are other phenomena of lexical cooccuzrence 
that need to be treated by a MT system. For instance, 
if we know that German stark transhtes into English 
strong and German Band translates into English vol- 
ume we axe likely to translate German starker Band 
into English strong volume by simply concatenating the 
translated units. This might not always work well be- 
cause here thick volume would be a better translation. 
The sequence starker Band should hence be seen as one 
undividable unit. Suitable decomposition of the problem 
is thus a prerequisite for valuable retrieval of cases. 
A related problem is due to the way in which the solu- 
tion of a case is related to the matching part. Partial 
similarity of a phrase and a case does not necessarily 
allow one to conclude that there are comparable sim- 
ilarities between their translations. For instance if we 
know that German starke ErkSltnng translates into Eng- 
lish bad cold and German Rancher translates into English 
smoker we axe likely to translate the unknown German 
phrase starker Rancher into English bad smoker because 
the first word of the unknown phrase is just another in- 
fleeted form of the first word in the known phrase. We 
thus substitute the second word in the translation of 
6While unification yields least upper bound (hb), anti- 
unification yields the greatest lower bound (glb) with respect 
to a subsumptlon ordering. 
Carl 250 A Constructivist Approach to Machine Translation 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
!1 
II 
II 
the known phrase (cold) by the translation of the sec- 
ond word of the unknown phrase (smoker) to obtain the 
result. However, this might not always be a good solu- 
tion because hea~y smoker (and not bad smoker) would 
be the appropriate translation. Of course, one can argue 
that stark has different readings according to the context 
in which it appears so that starker Raucher and starke 
Erk~ltung are not similar at all. However, the knowl- 
edge concerning the appropriateness of words which can 
be replaced needs to be coded in some way 7. In the 
proposed architecture exceptions are stored in the case 
base and knowledge about replacable words is extracted 
(induced) from the case base. This makes possible a dy- 
namic graduation between regularities subregnlarities an 
exceptions as it occurs in natural languages. 
Merely similarity of an input text and some cases in the 
case base does not therefore lead to a satisfactory so- 
lution because it tells us neither how to decompose a 
text nor which parts in the retrieved cases are suitable 
for substitution. Further, from a logical point of view, 
similarity seems a useless notion because, as Goodman 
\[Goo72\] states, it cannot be measured in terms of, or 
equated with the possession of common characteristics: 
Where the number of things in the universe is n, each two 
things hare in common ezactly T '-z properties out of the 
total of 2 n- 1 properties; each thing has 2 n-z properties 
that the other does not, and there are 2 n-z- 1 properties 
that neither has. \[pp. 443-444\] 
The point here is that if two things have some proper- 
ties in common this is saying nothing more than that 
they have these properties in common i.e. that they are 
equal with respect to the common properties. However, 
which of the shared properties are more salient is ana- 
lytically untractable: it remains a matter of who makes 
the comparison and when. For instance in the examples 
above, one might find weighty tome to be a better trans- 
htion for starker Band than thick volume. On the other 
hand German starker Punk~ compositionally translates 
into English strong point. It is a decision of the present 
system architecture not to code these decisions into the 
program or into a grammar, but to leave it in the struc- 
ture of the case base. 
Abstraction as Similarity 
Instead of having a similarity metric to classify a sen- 
tence (or parts of it) decomposition is used as a method 
fox classification. In order to determine the similarity 
of a complex sentence and some cases in the case base 
abstraction by means of decomposition and reduction 
seems an appropriate means s. 
rI agree with the comment of an anonymous reviewer that 
a more thorough linguistic analysis may ~ell yield a better 
performance for direct use of similarity metrics, subverting 
the needs of post hoe adaptations. 
The problem is how cart you know when you have done suf- 
ficiently linguistic analyses without reference to the data? 
SA similar idea can be found in \[CC96\]. However, in 
their approach sentences undergo a (rule-drlven) syntactic 
analysis. 
Abstractions are induced from the input sentence based 
on cases in the case base. The less an input sentence is 
known to the system, the more abstractions are needed 
for the sentence to be matched onto the case base. How- 
ever, the more abstractions are performed, the greater 
will be the dissimilarity between the sentence and the 
matching case(us). Accordingly, it is stressed in \[BW96\] 
that the significance of similarity between a problem and 
a set of cases is more important the hss abstract the 
cases are. 
In the proposed system architecture a sentence is decom- 
posed into a set of chunks according to the available cases 
in the case base. Chunks which share all their proper- 
ties with a case are reduced and the sequence of reduced 
and unreduced chunks (i.e. the abstraction of the origi- 
nal sentence) is, again, decomposed and matched against 
the case base until no more decomposition and reduction 
is possible. In a number of steps, a sentence of length m 
is thus classified according to the available cases in the 
case base into maximal 2m chunks. 
A sentence is regenerated from an abstraction by specify- 
ing the reduced chunks and their subsequent refinement. 
This is repeated until the produced sequence contains no 
more reduced chunks. 
Abstraction (i.e. decomposition and reduction) and gen- 
eration (i.e. specification and refinement) are possible if 
the following criteria hold for the matching chunks in 
the abstraction process and the reduced chunks in the 
generation process: 
• Chunks are independentwith respect to some 'fixed' 
features. The fixed values of one chunk does not affect 
the fixed values of another chunk. 
• Chunks are adaptable with respect to some 'variable' 
features. The set of variabh features for each chunk 
reflects its inter-chunk dependencies. 
In order to translate the French sentence Le boncher sale 
la tranche into English The butcher salts the slice accord- 
ing to the classification 2 above we need the case base to 
contain the two concrete cases 4 and 5 and the abstract 
case 6: 
4 le boueher ~ the butcher 
5 la tranche ~ the slice 
6 XsaleY ~ ,XsaltY 
Based on the examples, the French sentence is decom- 
posed into the three chunks cx: le boucher, cz: sale and 
cz: la tranche. By reducing the (matching) chunks cl 
and c3 the abstraction q salt c3 then matches case 6. 
The adaptation mechanism subsequently re-specifies and 
refines the solutions sl and s3 of the reduced chunks cl 
and c3. Specification consists in replacing sl in the ab- 
straction by the butcher and s3 by the slice. Refinement 
adapts the variable features such that the main verb salt 
agrees in number and person with the chunk inserted in 
position X. Note that the required adaptation complex- 
ity corresponds to figure 2 above. 
In this translation the number of decompositions is 3. 
Another chunking is possible if the case base allows it. 
For the case base 7 below, the number of decompositions 
Carl 251 A Constructivist Approach to Machine Translation 
equals I: the granularity of decomposition thus relies on 
the structure of the case base. 
7 le boueher sale la tranche, .~ 
the butcher salts the slice 
Note that by means of case 4, 5 and 7, the abstract case 
6 can be induced. 
Undesirable abstraction is possible and is on the one 
hand an expression of the creativity of the system but on 
the other hand avoidable by adding further cases to the 
case base. Thus, abstraction 10 can be generated based 
on the cases 8 and 9. As outlined in the previous section, 
abstraction 10 has the potential to (wrongly) translate 
starker Punkt into heavy point. However, by adding case 
11 to the case base this is no longer possible. 
8 Raucher ~ ~ smoker 
9 starker Raucher 4. , heavy smoker 
10 starker X ~ heavy X 
11 starker Punkt ~ . , strong point 
Freedom in MT system design 
In the equivalence (1) w here reproduced as (2) -- the 
decomposition granularity, the structure of the case base 
and the adaptation mechanism depend on each other. 
c(s) - fqc( .) =_ NC(s.) - c(T) (2) 
To preserve the conceptual equivalence between a source 
sentence S and its translation T, all three components 
need to be synchronized: a certain type of decomposition 
requires an adequate case base which covers the decom- 
posed chunks and an appropriate adaptation mechanism 
which is able to re-combine the retrieved solutions into a 
target translation. However, the above equivalence con- 
rains two degrees of freedom: one degree is related to the 
number n of chunks the other is due to the definition of 
the conceptual equivalence C. Neither can analytically 
be determined because they axe closely related to the re- 
quirements of a user and his expectations with regard to 
a MT outcome. 
. The coverage of the case base increases while the 
length of the cases gets shorter. The coverage of the 
system depends on one hand on the coverage of the 
case base and on the other hand on the level of abstrac- 
tion on which the chunks are matched. The coverage 
of the system thus increases with finer decomposition 
granularity and a high degree of abstraction. 
• The reliability of the results is likely to increase while 
the length of the chunks gets longer and the system 
turns into a mere retrieval system of known solutions. 
Rehability thus increases with coarse decomposition 
granularity and low degree of abstraction. 
• The creativity of the system combines both, coverage 
and reliability. A high degree of creativity can thus 
be reached with coarse decomposition granularity and 
high degree of abstraction. 
Figure 5 shows possible realisations for MT systems. 
The horizontal axis represents the decomposition granu- 
larity; the vertical axis represents the degree of abstrac- 
tion. 
As the granularity of the decomposition becomes coarser, 
the system loses coverage but the translation result will 
become more reliable. The adaptation mechanism 
be very simple. Conversely, finer grmlula~ity implies bet- 
ter coverage but requires a more complex adaptation 
mechanism. Orthogonal to decomposition granularity 
is the degree of abstraction that a system performs. The 
more abstractions are performed, the less reliable will be 
the outcome 9. 
Creativity is necessary for a MT system unless the do- 
main is restricted such that retrieval of already known 
translations is sufficient for the coverage. To attain a 
certain degree of creativity, the system needs to dispose 
of an appropriate degree of abstraction capacity joined 
with an appropriate decomposition granularity. 
Figure 5 
Degree of abstraction C 
high 
low 
coverage creativity 
reliability 
~ne coarse 
Granularity of (de)composition n 
The more the system design moves into the upper left 
area (high degree of abstraction and fine decomposition 
granularity) in figure 5, the more the coverage of the 
system will increase. The more the design moves into 
the lower right area (low degree of abstraction and coarse 
decomposition granular/ty) the more the reliability of the 
system increases. 
To reach reliability for unknown a~bitrazy texts, recent 
approaches to NLP prefer shallow analyses that generate 
fiat representations (i.e. low degree of abstraction). Be- 
cause the number of possible wcong assignments in an 
analysis tree grows exponentially with its depth, fiat 
representations offer fewer possibilities to relate con- 
stituents and hence offer fewer possibilities to produce 
wrong analyses. 
To attmn a certain degree of creativity, an appropriate 
degree of abstraction and an appropriate degree of de- 
composition is required. What degree of creativity a user 
desires essentially depends on the variety of text types to 
be translated (i.e. the required coverage of the system) 
9Thls is supported by the findings in \[BDW96\]: the more 
abstraction is performed by a system (degree of eagerness), 
the worse the generalization performance will be. 
Carl 252 A Constructivist Approach to Machine Translation 
| 
| 
m 
m 
m 
I 
m 
| 
m 
| 
Figure 6 
English Phrase Descriptor of the sentence: The big man ea~s a green apple 
WD: WDThe I WDbia I • WDman I WDea,, I WD,I WDg .... 
LMA: the 
CAT: art 
VTP: 
TNS: 
NUM: 
CAS: 
DEG: 
WNR: 1 
big adj 
base 
2 
man 
verb noun 
fin infin 
pres -- -- 
-- -- sing 
-- -- n;a 
3 3 3 
eats 
verb 
tin 
pres 
4 
a 
art 
5 
green 
adj noun 
-- n;a 
base -- 
i 6 6 
\] WDa~ple 
apple 
noun 
sing 
n;a 
7 
German Phrase Descriptor of the sentence: Der grosse Mann isst einen gr~nen Apfel 
WD: WDve, l wDg ..... \[WDM.,,,, I WD,., \[ Izv'D.,,~.. \[ WDg.a,.,, \],, WDap,,, 
LMA: d_art d_rel 
CAT: art tel 
VTP: I -- -- 
TNS: I -- -- NUM: sg sg pin sg -- 
GEN: f m -- m f 
CAS: d;g n g n g;d 
DEG: -- -- 
WNR: 1 1 1 1 1 
gross a~ 
e • 
e • 
e • 
base 
2 
mann 
noun 
sg 
m 
n;d;a 
3 
essen 
verb 
fin 
pres 
sg 
4 
ein 
art 
sg 
m 
a 
5 
grin 
adj verb 
fin 
-- pres 
en* plu 
en* 
en* -- 
base -- 
6 6 
apfel 
noun 
sg 
m 
n;d;a 
7 
e* and en* denote the endings (e and en) of a German adjective. They can be multiplied into a matrix of AVM conta~nl-g 
information on GEN, CAS, NUM and the determination class. 
LMA: 
CAT: 
NUM: 
VTP: 
TNS: 
CAS: 
DEG: 
WNR: 
lemma (basic word form without inflectional information) 
part-of-speech (syntactic category) (adj; adv; art; noun;punct; re1; verb) 
number (sing; ph) 
verb type (fro; infin) 
tense (pres;past) 
case (n; g; d; a) 
degree of adjectives (base; comp; sup) 
word number 
and the expected reliability of the results. In a constant 
feed-back process, a user thus has the possibility of de- 
signing the MT system according to his requirements. 
In the remainder of this paper I will give a short overview 
of the CBAG *° system. It can be used as a stand alone 
MT system or can be integrated with the Rule Based 
Machine Translation CAT2 \[Str96\]. Here, only the prin- 
cipal functioning shall be considered. CBAG consists 
of three modules: the Case Based Compilation module 
(CBC), the Case Base Analysis module (CBA) and the 
Case Base Generation module (CBG). 
Case Structure in CBAG 
Instead of simply storing surface strings in the case base, 
morphological analysis and lemmatization is carried out 
and added to the cases n. 
I°CBAG stands for Case Based Analysis and Generation 
//We use MPRO (el. \[Maa96\]) for morphological analysis 
and lemmatization. MPRO is a very powerful tool, which 
generates more than 95% correct analyses for arbitrary Get- 
Lemmatization yields for a surface string a basic word 
form (lemma) that abstracts away from inflectional in- 
formation which is contained in the surface form. Inflec- 
tional information such as person, number and tense for 
verbs or ease and number for nouns is determined by the 
use of the word and is independent of the lemma. For in- 
stance, man and men are di~erent instances of the same 
1emma (man) that only differ with respect to number 
(singular vs. plural). 
The part of speech takes an intermediate position be- 
tween lexical and grammatical information. On one 
hand, the part of speech is linked to a lemma, on the 
other hand each part of speech has typical inflectional 
patterns (e.g. the examples above). 
The features of a word are stored in the form of sets of 
pairs of attribute/values AVM. We will refer to a set of 
AVM that belong to one word as a word descriptor WD. 
Note that the contents of each AVM is such that a single 
surface string can be regenerated from it. Morphological 
man and English texts. 
Carl 253 A Constructivist Approach to Machine Translation 
analysis and lemmatization are thus reversible: a surface 
string can be transformed into a WD (a set of AVM) 
and a surface form can be generated from a AVM. 
A phrase descriptor PD is a sequence of word descrip- 
tors WD1 ... WD,,. A case CASE is a pair of a source 
phrase descriptor PD,o~ce and a target phrase descrip- 
tor PDtarget that axe considered to be translations of 
each other. A case base CB is a set of cases. 
A PD can be represented as a M × N matrix where 
the columns describe the words of a phrase and the rows 
describe a sequence of attribute values. The figure 612 
reproduces the CASE of the following translation exam- 
ple: 
12 The big man eats a green apple : 
Der grosse Mann iss~ einen gr~nen Apfel 
Some words are ambiguous. For instance, the surface 
string man can be analyzed as a verb or as a noun as 
shown in the table in figure 6. The noun has an ac- 
cusative or a nominative case, the verb can be the in\]i- 
nlte form or the finite present form. The WD,na,~ has 
thus four interpretations that are melted here together 
into three AVM. 
Decomposition in CBAG 
Decomposition is example driven dad divides a PD into 
a set of chunks. I distinguish between two ways of de- 
composition: 
Horizontal decomposition divides the PD matrix 
into a set of 'fixed' (lexical) features and into a set of 
'variable' (grammatical) features. This division accounts 
for the distinction between lexical and grammatical in- 
formation inherent in every sentence. 
Agreement within a noun phrase is a grammatical phe- 
nomenon. The corresponding features are thus part of 
the set of variable features. In German, for instance, 
the determiner, the adjective and the noun in a noun 
phrase have to agree in number, gender and case, while 
the actual lexical fillers of this syntactic schema may 
vary. 
The set of variable features comprises all features that 
can be altered by a di~erent context (e.g. number and 
case for nouns, gender, tense, person for verbs, etc.). 
The set of fixed features comprises the lemma and the 
part of speech. 
Vertical decomposition divides the PD matrix into 
a sequence of chunks. Vertical decomposition accounts 
for the compositionality of languages. In many contexts, 
for example, the English noun phrase t/ie man would 
be translated into German dee Mann. Most sentences 
can be considered as being composed of a sequence of 
chunks, that, to a certain extent, can be translated in- 
dependently. 
The reasoning behind the double decompositions is as 
follows: 
X~For reasons of space, not all features are given in the 
matrix. 
Carl 254 
• reduce the size of the case base: Each set of values 
needs to be stored only once. Thus man and men 
are instances of the same lemma, which only needs to 
be stored once 13. Vertical decomposition reduces the 
size of the case base more dramatically: if sentences 
are considered to consist of words which are grouped 
into compositionally translatable chunks, only those 
groups of words must be stored as cases in the case 
base which do not allow for compositional translation. 
• reduce retrieval time: With a smaller case base the 
retrieval time of cases should also decrease. 
• increase coverage of the system: Cases that can be 
recomposed to complex solutions can occur in different 
contexts. These cases thus cover several problems. 
• make possible case abstraction: While fixed features 
axe specific for a chunk, variable features axe typical 
for it. Case abstraction consists in abstracting away 
from the specifidties of a chunk by keeping track of 
the variable (thus typical) features. 
Abstraction in CBAG 
Case abstraction is crucial in attaining a broader cov- 
erage of the system and to account for interdependen- 
cies of chunks. In the translation phase abstractions are 
computed from the input PD, o,,,ce in order to match 
abstractions in the case base. 
In the compilation phase of the case base, abstract cases 
are computed from the cases that use in the case base. 
Those chunks that match a case are reduced into a chunk 
descriptor CD which consists of a set of vafiabh fea- 
tures and a chunk index. To reduce a chunk the head 
information is extracted from it, where head informa- 
tion is made upon those features that are necessary and 
sufficient to express inter-chunk dependencies such as 
agreement. The index of the chunk is stored with the 
head information in the CD. A sequence of reduced 
and unreduced CDs (an abstract case) may, again, be 
decomposed and matched against the case base. 
In the following exaanple, the case base contains two 
cases: 
13 the big man ~ ~. der grosse Mann 
14 a green apple t , ein griner Apfel 
As shown in the figure 7, the English sentence The big 
man eats a green apple is decomposed into the three 
chunks/The big man//eats/and//a green apple/. By 
abstraction, the sequence COl C02 CD3 is generated, 
where CDI and CD3 represent reduced chunks that in- 
dude information on the type of constituent (such as 
gender, number etc.) and the index of the matching 
case. If a WD cannot be integrated into a chunk -- as 
is the case for eat in the example -- it is integrated as 
an unreduced chunk into the abstract case. 
Abstract cases are generated in a precompilation step as 
shown in figure 9. A new case base CB is incrementally 
created starting from an ordered set of examples. Based 
l~This is even more important for highly inflective lan- 
guages and paradigms, as for example for Romance verbs 
which can have up to 40 different surface forms dependent 
on person, number, modality, aspect and tense. 
A Constructivist Approach to Machine Translation 
II 
II 
| 
II 
II 
I! 
| 
II 
as 
k 
k 
II 
II 
II 
II 
II 
at 
II 
at 
II 
II 
II 
II 
Figure 7 
Decomposition and reduction: The English PD the big man eats a 9teen apple is decomposed and reduced into the 
abstraction CDI CD2 CD3. 
CDI CD2 CD3 
T T T 
A 
WD~h, WDb~ 9 WD~ WDeot WDa WD~ee. VfDa~,~ 
Figure 8 
Specification and refinement: The abstraction CD1 CD2 CD3 is specified and refined into the German PD Der 
grosse Mann isst einen #rinen Apfel. 
CDx CD2 CD3 
Figure 9 
Algorithm to induce abstract cases. 
Sort examples E by number of words 
CB <- empty 
for all examples E /* shortest first */ 
begin 
V<- TRUE 
while V = TRUE 
begin 
A <- reduce(decomp(E,CB)) 
add E to CB 
if valid(A) 
then E <- A 
else V <- FALSE 
end 
end 
on partial matches of the example g and the case base 
CB, first an abstraction A is calculated. Than the original 
concrete example is added to the case base. This order 
is important because the abstractions should not rely on 
the same example. 
Abstractions are calculated in two steps: decomposition 
decomp and reduction reduce. An abstraction is valid 
if: 
1. it contains at least one unreduced chunk 
2. it contains at least one reduced chunk for both, source 
and target language side 
3. decomposition is based on the same cases on both lan- 
guage sides 
4. if it is validated by some other cases 
The requirements 1 and 2 express we11-formedness condi- 
tions of abstractions. A case that only consists of unre- 
duced CDs is not an abstract case because it contains 
no reductions. An abstraction that only consists of re- 
duced chunks is not stored in the case base because it 
will never be matched by other cases. The requirement 
3 accounts for the fact that abstractions are generaliza- 
tions of regularities contained in the concrete case. A 
pair of a source and a target sentence that are equally 
decomposed are likely to be regular in that decomposi- 
tion. Requirement 4 excludes exceptions to serve as a 
basis of abstractions as is the case in the heary smoker 
example 9. However, finer grained criteria on this matter 
can be found in \[CC96\]. 
Generation 
Generation performs the reverse task of Abstraction. To 
generate a target language PD from an abstraction, the 
abstract case is first specified and then refined into a 
sequence of lower level CD. This is repeated until the 
sequence contains no more reduced CD i.e. the con- 
crete PDtarget. Specification extends the chunk descrip- 
tors index by searching from the case base the solutions 
(i.e. translations) of the appropriate cases. Refinement 
substitutes the variable features in the specified CDs 
according to the head information. 
In figure 8 the chunk descriptor CDI is specified 
into the sequence WDDe, WD~oss WDMan,. The 
' chunk descriptor CD2 is specified into the sequence 
WDein WD~r~. WD.4ple~ according to the cases in the 
case base given in the last section. While refining CDh 
the corresponding WDs are transformed into nomina- 
tive singular because the first chunk is the subject of the 
phrase. CDz is refined into accusative singular because 
the :Second chunk is the (accusative) object of the phrase. 
The sequence of WDs can then be sent to a morphologi- 
cal generation module, which calculates the appropriate 
surface strings. 
Implementation 
The system as described is implemented in C and runs 
under gnu-C on sun machine. It consists of several pro- 
grams, that are connected via (unix) pipes. The KURD 
(cf. \[CSW98\] in this volume) formalism is a constraint- 
based shallow parser that is used for chunk reduction 
and chunk refinement. Another program is used as a 
data base system for quick retrieval and decomposition 
of attribute value matrices as described above. 

References
Peter F. Brown, J. Cocke, Stephen A. Della Pietra, Vincent J. Della Pietra, F. Jelinek, Robert L. Mercer, and P. S. Roossin. A statistical approach to machine translation. Computational Linguistics, 1990.
Antal van den Bosch, Walter Daelemans, and Ton Wiejters. Morphological analysis as classification: an inductive-learning approach. NeMLaP, 1996.
D. Ralf Brown. Example-based machine translation in the Pangloss System. COLING 1996.
R. Bergmann and W. Wilke. On the role of abstractions in case-based reasoning. European Conference on case-based Reasoning. 1996.
Brona Collins and Padraig Cunningham. Adaptation guided retrieval in EBMT: A case-based approach to machine translation. Advances in CBR, LNAI, 1996.
Michael Carl and Antje Schmidt-Wigger. Shallow postmorphological processing with KURD. Proceedings of NeMLaP CoNLL. 1998.
Bonnie Jean Dorr. Machine translation: a view from the lexicon. MIT Press, 1993.
Heinz von Foerster, ed., Environmental design research. Dowden, Hutchinson and Ross, 1973.
Heins von Foerster. Das Konstruieren einer Wirklichkeit. In Paul Waslawick, ed., Die Erfundene Wirklichkeit, 1986.
Nelson Goodman. Seven strictures on similarity. In problems and projects, 1972.
Matthias Heyn. Integrating machine translation into translation memory systems. EAMT, 1996.
Alan Hutchinson. Metrics on terms and clauses. ECML 1997.
Heinz-Dieter Maas. MPRO - Ein system zur analyse und synthese deutscher Worter. In Roland Hausser, ed., Linguistiche verifikation, sprache und information. 1996.
Sergei Nirenburg, S. Beale, and C. Domashnev. A Full-text experiment in example-based machine translation. International conference on new methods in language processing. 1994.
E. Plaza. Cases as terms: a feature term approach to the structured representation of terms. First international conference ICCBR 1995.
Michael M. Richter. The knowledge contained in similarity measures. 1995.
J. Siegfried Schmidt. Gedachtnis: probleme und perspektiven der interdisziplinaren gedachtnisforschung. 1991.
S. Sato and M. Nagao. Towards memory based translation. COLING 1990.
Oliver Streiter. Linguistic modeling for multilingual machine translation. Informatik. 1996.
D. Wettschereck and T. G. Dietterich. An experimental comparison of the nearest-neighbor and nearest-hyperrectangle algorithms. Machine Learning, 1995.
Peter Whitelock and Kieran Kilby. Linguistic and computational system design. Computational Linguistics, 1995.
