TGE: Tlinks Generation Environment. 
Alicia Ageno, Francesc Ribas ~ , German Rigau:, I loracio Rodrfguez, Anna S~nniotou. 
Dep~tament de l~leugnatgcs i Sistcmes hfform:'ttics. Universitat Politb~cnica de Cat~dunya. 
Pan (',argallo 5, 08028 Barcelona. Spain. horacio@Isi.upc.es 
Abstract 
This paper describes the enhancements made, within a 
unification framework, based on typed feature 
structures, in order to support linking of lexical 
entries to their translation equivalents. To help this 
task we have developed an interactive environment: 
TGE. Several experiments, corresponding to rather 
"closed" semantic domains, have been developed in 
order to generate lexical cross-relations between 
English and Spanish. 
Keywords 
Lexicons, electronic dictionaries, machine translation. 
1 Introduction 
Recently, several approaches have been made to extend 
lexical unification-based formalisms to deal with 
multilinguistie phenomena in order to be used in 
applications such ,as Machine Translation \[7\]. 
Within Acquilex IP Project, a unification framework 
based on typed feature structures \[4\] was ddveloped, the 
LKB (Lexical Knowledge Base), in order to represent 
conceptual units corresponding to lexieal senses, lexical 
and phrasal rules, multilingual rclalionships, elc. 
This paper describes the enhancements made, to the LKB 
system \[6\], in order to support linking of lexical entries to 
their translation equivalents. The organisation of the paper 
is as follows: Section 2 presents the motivations and 
formalisation of tlinks (for "translation links"). Section 3 
deals with TGE (Tlinks Generation Environment), the 
way we propose to help in constructing lexical linkages 
semi-automatically from LKB data and bilingual 
dictionaries \[13\], \[8\], loaded in the LDR (Lexical Data 
Base) environment \[5\]. Section 4 shows the use of'l'GE 
1 This researcher has been snported by a grant o.f the 
Departament d'Ensenyament (~\]'Generalitat de Catahmya. 91- 
DOCG-1491. 
2 This researcher has been snported by o grant o/" the 
Ministerio tie Educacidn y Ciencia. 92.BOF. 16392. 
3 AcquilexH EC Esprit project BRA 7315. 
within SEISD 4 \[1\] (Sistema de extracci6n de 
Informacidn Sem~ntica de Diccionarios). In section 5 
some experimental results ,are presented. Finally in section 
6 we present our conclusions ,and furtl~er lines of research. 
2 Tlinks 
The initial assumption was that the basic units for 
defining lexical translation equivalence should be the 
lexical entries in the monolingual LKBs, which should, in 
general, correspond to word senses in the dictionary. 
Although in the simplest cases we c,'m consider the lexical 
entries themselves as translation equiv,-dent, in general 
more complex cases occur corresponding to lexical gaps, 
differences in morphologic or lexical features, specificity, 
etc. \[11\]. 
We will therefore represent the relationships between 
words in terms of tlinks. The tlink mechanism is general 
enough to allow the monolingual information to be 
augmented with translation specific information, in a 
variety of ways. We will first describe the tlink 
mechanism in the LKB and then outline how some of 
these more complex equivalences can be represented. 
The LKB formalism uses a typed feature structure (FS) 
system for representing lexical knowledge. So we can 
define tlinks in terms of relations between FSs. Lexical 
(or phrasal) transformatious in both source and target 
languages ~ are a desirable capability so that we can state 
that a tlink is essentially a relationship between two rules 
(of the sort already defined in the LKB) where the rule 
inputs have been instantiated by the representations of the 
word senses to be linked. 
As shown in fig 1, furniture can be encoded as 
translation equivalent to the plural muebles by specifying 
that the named rule plural has to be applied to the base 
sense in Spanish. As any other LKB object a tlink can be 
represented as a feature structure, as shown in fig 2. The 
type system mechanism, in the LKB, allows further 
4SEISD is an interactive environment built within Acquilex 
project in order to help in constructing the LKB entries from 
the LDB sources. 
5 in fact tlinks are undirected relations. 
324 
identity 
<fs0:l> <fs0:0> 
furniture furniture 
tlink plural 
<fsl:0> <fsl:l> 
muebles mueble 
Figt~re 1: A tlink #etween furnit,tre anti muebles. 
fl 
~ tlink (top) 
< fsO > = rule 
<fsl>=rule 
< fs0:0 : sere : id > = < fsl : 0 : sem : id >. 
simple-flink (tlink) partial-tlink (simple-tlink) 
<fs0:0>=<fs0:l > <fsl :0:rqs > = < fs0:O:rqs>. 
< fsl:0> =< fsl :1 >. 
,op \] 
I rule <0> = sign 
<1> = sign. 
,,, 
plmksal41ink (tlink) 
< fsI > = grammar-rule. 
Figttre 2." partial view of our tlink type hierarchy. 
refinEmEnt and differentiatioq of tlink classes ill several 
ways. A simple-tlink is applicable whenever two 
lexical entries which denote single place predicates (nouns, 
etc.) arEstraighfforwardly an equivalent translation, without 
any previous transformation. Thus, assuming that the 
LDOCE \[9\] sense absinth L 0 1 is translatio,l equivalent 
to the VOX \[12\] sense absenta X I 1, we will have tile 
next tlink: 
simple tlink 
<fs0:1 >==absinth L_0_I 
<fst : 1 >~absenta X I 1. 
The "syntactically sug~tred" version, which appears in 
tlink files, is: 
absinth L 0 1 / absenta X t 1 : 
simple-tlink. 
A partial tlink is applicable when we want to transfer 
the quidia structure from one sense to anc)ther, and a 
phrasal tlink is necessary when we need to describe a 
single word equivalent translation to a phrase \[I0\]. 
3 TGE: Tlinks Generation Environment 
The establisment of tlinks can be obviously perlbrmed 
manually, but the multiplicity of possible cases and the 
existence of several Knowledge St)nrces (such as bilingual 
dictionaries, monolingt,al LDBs, or a mtddlingual \[.KB) 
allows and motivates the (parlizd) antomalizalion of the 
process. To help in perl'orn~ing such a task wc have 
developed an interactive environment: T(;I.\]. 
TGE has been implemented using a Production Rules 
approach. This approach was already used within the 
SEISD enviromnent ,'rod was mainly motivated by the 
need of providing a flexible and open way of defining tlink 
formation mechanisms. The core of TGE is PRE 
(production rules Environment), a rnle-orieuted general 
purpose interpreter \[2\]. PPd~ follows the philosophy of 
most Production Rules Systems \[3\] but is deeply 
adapted to Natural Language applications. PRE offers a 
powerful (according to both expressiveness and 
performance) rule applicatio,I mecljanism and provides the 
possibilities of defining higher level mechanisms, such as 
rulescts (allowing inheritance capabilities) ,and of choice 
,'unong control strategies, either usEr-definEd or provided by 
the system. Consider the following example: 
(rule rule-l-all 
ruleset all 
control forever 
priority l 
(translation-in "trans-records (?translation *rest)) 
-> 
(modify 1 "trans-records (*rest)) 
(create a'anslation 
^trans-psorts nil 
^trans-record ?translation 
^tlink-type nil ^checked nil)) 
In this rule the pattern-condition is the occurrence of an 
object named translation-in ill the Working Memory. 
This object must ill tUrn contain a ^trans-records 
attribute whose value will be matched against file pattern 
(?translation *rest). If the matching succeeds then 
.:/2.5 
translation will be unified with the first element of tile 
list and rest with the remainder elements. The action part 
of the rule consists of two actions. The t'ormer is the 
modification of translation-in, popping its first 
element, and the latter performs the creation of another 
object, named translation. Rule-l-all rule is applied 
until all the objects n,-uned "translation-in" have emptied 
the list contained in their slot "trans-records. 
4 Using TGE for generating Tlinks 
The TGE may be considered a toolbox attd thus, it 
doesn't impose a fixed methodological strategy. Whatever 
methodology we follow, several decisions must be taken: 
the kind of control we need, the rulesets to be designed, 
"the rules belonging to each ruleset, the relative priority 
assigned to each rule ,and so on. 
As regards the control strategy, one of tile following four 
alternatives may be chosen lbr each source entry (see \[2\] of 
\[10\] for futher details): 
• All, which executes all tile rulesets. From tile 
proposed Oinks, finally the user chooses the correct ones. 
• Collect, which executes the rulesets one at a time 
and provides the results to the user (for selection of the 
correct ones) every time a ruleset succeeds. 
• One-by-one, which orderly executes the rulesets 
,and stops as soon as one of them succeeds. 
• Select, which only executes the rulcsets that the user 
chooses. 
An initial set of modules was designed according to tile 
typology of tlinks presented so far. It included four sorts 
of Oinks that showed distinct conceptual correspondences 
between both languages. A more in-depth study of 
English-Spanish mismatcfies \[11\] might lead to an 
enrichment of the typology, and co,lsequeully to a need Ibr 
extending the extant modules. 
Up tO now seven modules, each one iln\[}lumeulcd ,'is a 
ruleset, have been developed. Fach of them generates one 
out of/he three kinds of tlinks stated above. Each module 
follows a different strategy to guess a possible tlink, 
taking account of the lhree accessible knowledge sources. 
• Simple Tlink Module, this is tile case when there 
exists a direct translation of the source entry in the 
bilinguN dictionary. Ex,'unple: 
absenta x i 1 -->absenta LKB source entry 
absenta --> absinth bilingual dictionary 
absinth --> absinth L_0 I l.KBtargctcntry 
absenta x_i_I / absinth I. 0_1: 
SIMPt.E-qq-INK. 
"absenta" is translated in the bilingual dictionary by 
"absinth", ABSINTH L_0 1 is a valid lexical entry of the 
target lexicon, and therefore a SIMPLE-TLINK connecting 
both entries is created. 
• Orthographic Tlink Module, this case occurs 
when in both languages the same word with exactly the 
same spelling is used. Therelbre, no bilingual dictionary is 
needed. 
• Compound Tlink Module, this is the case when 
the corresponding entry in the target lexicon is a 
compound one, being the target lexio'd entry made up of 
the concatenation of the two English words that appear in 
the bilingual entry. 
• Phrasal Noun Tlink Module, this case takes 
place whenever the translation is the concatenation of two 
other nouns; for example, the Spanish nouns for trees 
often correspond.to two nouns in English, (like limonero - 
lemon tree, melocotonero - peach tree, etc.). More 
complex cases can be recovered by using different grammar 
rules (also implemented within the LKB formalism). 
• Parent Tlink Module, this is the case of those 
very specific terms in the source lexicon which are not 
treated in the bilingual dictionary, but whose hypcronyms 
in the taxonomy have a clear translation that can generate 
a partial tlink. 
• Grandparent Tlink Module, this is a very similar 
case to the previous one, in which the source word's 
grandpm-ent is used to produce the partial tlink. 
• General Tlink Module, this is the case when the 
translation appearing in the bilingual dictionary is 
composed of more than one word. Normally these 
explanations are made up as definitions in the form of a 
genus, plus some modifiers. A tlink connecting the source 
entry and the genus appearing in the definition must be 
crcaied. 
We will illustrate the tlink generation process with an 
example of an entry for which a number of different dinks 
have been generated, namely batido X I 5. In figure 3 
where batido_XI5 appears with the tlink options, we 
had selected file option all and subsequently, all tile 
possible tlinks have been suggested by the system. 
Ilowever the TGE allows for other selection criteria. As 
we can see iu figure 3, five tlinks are suggested by tile 
system for this p~ticular example: 
I) The first option is not a correct one. Among tile 
various translations given for the source LKB entry 
batidoX_15 the adjective shot appears. Because another 
syntactic realisation of shot is that of a noun denoting a 
drinkable thing therefore it is included ill the t,'uget subset. 
326 
r ~ File Edit Find Windows Packages fools Preferences Ldb Lk.b pro 
• i ........... 
"llinks selection mode fiLL for gflTIDO_H_l_5 
SIlOf_L_l_l 3 SII'4PLE-EL INK SIMPLE 
b41LK_SIIFIKE .... ~IN1PL.E-/LINK COMPOUND 
HILK_L_I_I +,,. PIIRRSFIL-TLINK PIIRtlSflL-NOUN 
StIR KE_L_2_3 PIIRTI flL-TI. INK GENEIIflL 
MILK_L_I_I PflltTIflL-TLINK GENEIIItL 
I n te~p~-mt i rig OME-OY-OHE 
Soot Ir~ OME-BV-OME 
I n t,~r'la¢'~ t ing SELECT 
So~" t I ~;; SELECT 
I n tew'p~l. |rig SIMPLE-TLI HK 
Sc,~ t I rw'J S,ilPLE-TL I ttK 
I n t ei+'l:x,-e '1. Ing COMPOUMO-TL I blK 
Sc~-t I ng COrIPOUMO-TL I PiK 
I nte~pc'et lag PHRflSFIL-UERB-TL UtK 
Sorting PHRRSRL-UERB-TL I ilk 
I nte~-pc-et lag PtlRRSFtL-tIOUPI-TL I ttK 
Tgl'~e hlerarch U 
RULE PERSOH 
LEXICRL-RULE GrlFltlMRR-RULE 1 2 3 
t\] 
Hultiple selection 
k; with <shift> or {i> 
; V0x Entry betide 
betide {-~kl 1 
acepci~l: I '** ~dJ. ** {teJ boo de seda} Qua r~esul ta 
con vi~ou dil;tinto~. 
ocepci6~:2 +* odj. *<' (coraillo} ttutJ ondodo tJ 
tr'l I I~aOo.- 
ocapci&:3 ** m. ++ Idano de ~ ~m hocan he,tie* tj 
b I zcochos. 
oc~p¢i&~:4 ** m. *+ CIor~w, ~Inow o humvo~ 
botido~. 
ocepcl6¢~:5 ** m. ** Oebldo qua se haee bail eyed 
htloda, lethe u otcos in~-~lient~. 
ocepci&l:5 ** m. *+ Reel on de batle. 
ocepci&l:7 ** m. ** gn Io dcmzo, loire en el qua 
los pies ~a entrechocon. 
ocepci~:8 ** m. +* Ouot. ** E~pecio de oral con 
olglmo5 #~toneio~ color~tes, U que ~e bate co~ 
chocolate. 
ocepci&l:9 ** m. *+ U~r~ez, ** tlulozo b~lido co* 
queue 0 ~\[S, 
Figure 3: OHion.r for creatiorl of tlinks . 
2) The second is a simplc-tlink type linking 
batido X I 5 with the target I.KB entry 
milk_shake L O_& In this case we lwve an example of 
the application of the compou nd-tli nk- nil eset. 
3) The third is a phrasal-tlink type, linking 
batido Xl_5 with the target LKB entries milk L I 1 and 
shekeL 2 3 composed by the + sign. This is an example 
of the application of the phrasal-noun-tlit~k-ruleset. 
4) Both the fourth and fifth :ue partial-tlinlc-types, 
linking batido_Xl5 with the target I.KB enlries 
shake L 2 3 and milk_L11 respectively. This is an 
exmnple of the application of the general-Ilink-rulesct. 
5 Results 
Several experiments correspondi ng to rather "closed" and 
narrow semm~tic domains have been performed. We discuss 
next thosc con'esl~mding It} "drinks" \[I0\]. 
The Spanish taxouomy of (lrink-n(mns, cxu'acted from 
VOX dictionary, consists of 235 noun senses atnd h:ls 5 
levels. The English taxonomy of drit|k-nouns, exlracted 
from LDOCE, consists at' 192 notre senses. Some of the 
obtained results are the following: 
• While translating f,'om Slmnish to l~lI,jlish, 223 out of 
235 drink-nouns have been linked by means of different 
aud often more than one tlinks (95 %). However, only 52 
English nouns have been linked with Spanish nouns 
(27%). Out of these 223 drink-nouns mentioned above, 
210 have been linked by using (mainly) the bilingu:d 
dictionary as a translation resource while the rest, 13 of 
them, have been linked by means of the orthographic-tlink 
ruleset and consequently, the gap of the bilingual 
dictionary has finally been bridged, for in both hntguages 
the stone word with exactly the same spelling is used. For 
example, chartreuse X l 1 aud chartrettse_.L_l O, 
sherry Xll mid sherry L 0_0, etc. 
• 74 out of 235 source I+KB entries for drink-nouns are 
also bilingual entries (31,5%). Consequently, 161 source 
I+KB entries have no corresponding bilingual entries 
(68,5%). This big gap in the bilingual dictionary is due to 
the \[act that the one nsed, VOX/llarrap's, is a very basic 
one, and as such it only contains 32,463 senses, lu 
contrast the VOX monolingual Spanish dictionary covers 
a total of 143,700 senses. 
• 30 out of tilt translations of tilt 74 source I+KI\] entries 
which were found in the bilingual dictionary are also tm'get 
I+KB entries. Consequently, the translations of 44 
bilingual entries have uo corresponding target I+KB 
entries. 
• 13 out of 16l source LKB entries arc also target LKB 
327 
entries (8 %). 
• For most entries, more than one tlink type has been 
extracted. The total number of tlinks which have been 
generated and selected for the t,~xonomy of bebida X I_3 
(drink) with the explained software is 372 tlinks. Next we 
show the different tlinks generated by each ruleset and the 
amount of lexical entries of each language involved. 
(:,) (t,) (c) 
simple-tlinks (14,5%) 55 
by simple-tlink-ruleset 41 26 31 
by compound-tlink-ruleset 1 1 1 
by orthographic- tlink-mleset 13 13 13 
phrasal-tlinks (0.5 %) 2 
by phrasal-noun-flink-ruleset 2 1 3 
partial-tlinks (85 %) 320 
by parent-tlink-ruleset 268 149 15 
by grandparent-tlink-rule set 44 30 10 
by general-tlink-ruleset 8 7 6 
(a) Total Number of Tlinks 
(b) Spanish entries 
(e) English entries. 
6 Conclusions 
In this paper we have presented TGE. an enviromnent 
designed and built in order to help in the recovery of cross- 
linguistic relations. We have reported ,and described results 
of an experiment for automatically extracting the relations 
of equivalence for Spanish and English drink-nouns by 
using the TGE software. The resulting process is semi- 
automatic, whilst the tlink generation is performed 
automatically, the selection of the desired tlinks is done 
manually. 
All the tlink-rulesets l~ave worked satisf,qclorily, 
therefore resulting in a considerable part of the subsets 
being linked (95% of the source lexicon), tlowever flmse 
PRE tlink-rulesets have only been tested over limited 
subsets of specific semantic fields. Its actual potenti:d will 
be proven in a later stage, once its application to larger 
and less restricted sets of word senses takes place, 
including also categories which ,are not considered to be 
nouns. 
7, 
\[1\] 
References 
Ageno A., Castell6n I., Martf M.A., Ribas F., 
Rigau G., Rodrfguez t1., Taul~. M., Verdejo F., 
SEISD: An enviromnent for extraction of Semantic 
In formation from on-line dictionaries. Proceedings of 
3th Conference on Applied Natural Language 
Processing. Trento. Italy. 1992. 
\[21 Ageno A., Ribas F., Rigau G., Rodrfguez II., 
Verdejo F., TGE: Tlink Generation Environment. 
Esprit BRA-7315 Acquilex II Working Paper. 1993. 
\[3\] Brownston L., Farrell R., Kant, E., Martin N., 
Programming Expert Systems in OPS5. Addison- 
Wesley. 1986. 
\[4\] Carpenter B, ~ Logic of T _vped Feature Stractures, 
Cambridge University Press, C,-unbridge, England, 
1992. 
\[5\] Carroll J. Lexical Data Base System. User Manual. 
Computer Laboratory. University of Cambridge. 
1990. 
\[6\] Copestake-A., The Acquilex LKB: represention 
issues in semi-atomatic acquisition of large lexicons. 
Proceedings of 3Oa Conference on Applied Natural 
Language Processing. Trento. Italy. 1992. 
\[7\] Copesmke, A., Jones B., Sanfilippo A., Rodriguez 
H., Vossen P., Multilingual Lexical Representation. 
Esprit BRA-3030 Acquilex Working Paper n%8. 
1992. 
\[8\] IIastings A., Rigau G., Soler C., Tuells A. Loading 
a bilingual dictionary into the LDB. Esprit BRA- 
7315 Acquilex II Working Paper. 1993. 
\[9\] Procter, P. et al. (eds). l,ongman Dictionary 9f 
Conlcr0porary Eno~!ish. Longman, IIarlow and 
London. 1987. 
\[10\] Smniotou, Anna, Performance of cross-linguistic 
equivalence relations: A lexicon-based approach. 
Msc. Dissertation. UMIST. 1993. 
\[11\] Soler, C., Dealing with Spanish-EnglistL/ English- 
Spanish mismatches. Esprit BRA 7315 Acquilex II 
Working Paper. 1993. 
\[12\] Diccionario General llustrado de la l,en~ua Esoafiola 
.Y_Q~. Ed. Biblograf S.A. Barcelona, 1987. 
\[131 VOX l larrap's Diccionario csencial lnt,.16s-gspafiol. 
\]E/~pa~ol-lmglds. Segunda Edicidn. Biblograf S.A. 
Barcelona, 1992. 
328 
