G~T : A GENERAL TRANSDUCER FOR TEACHING C~TIONAL LINGUISTICS 
P. Shann J.L. Cochard 
Dalle Molle Institute for Semantic and Cognitive Studies 
University of Geneva 
Switzerland 
ABSTRACT 
The GTI~syst~m is a tree-to-tree transducer 
developed for teaching purposes in machine transla- 
tion. The transducer is a specialized production 
system giving the linguists the tools for express- 
ing infon~ation in a syntax that is close to theo- 
retical linguistics. Major emphasis was placed on 
developing a system that is user friendly, uniform 
and legible. This paper describes the linguistic 
data structure, the rule formalism and the control 
facilities that the linguist is provided with. 
1. INTRODUCTION 
The GTT-system (Geneva Teaching Transducer)1 
is a ger~ral tree-to-tree transducer developed as 
a tool for training linguists in machine transla- 
tion and computational linguistics. The transducer 
is a specialized production system tailored to the 
requirements of ecmputational linguists providing 
them with a means of expressing information in a 
format close to the linguistic theory they are 
familiar with. 
GIT has been developed for teaching purposes 
and cannot be considered as a system for large 
scale development. A first version has been inple- 
mented in standard Pascal and is currently running 
on a Univac 1100/61 and a VAX-780 under UNIX. At 
present it is being used by a team of linguists 
for experimental devel~t of an MT system for a 
special purpose language (Buchmann et al., 1984), 
and to train students in cc~putational linguistics. 
2. THE UNIFORMITY AND SIMPLICITY OF THE SYSTEM 
As a tool for training ccr~putational linguists, 
major emphasis was placed on developing a system 
that is user friendly, uniform, and which provides 
a legible syntax. 
One of the important requirements in machine 
translation is the separation of linguistic data 
and algorithms (Vauquois, 1975). The linguist 
should have the means to express his knowledge 
declaratively without being obliged to mix ~u- 
This project is sponsored by the Swiss govern- 
ment. 
tational algorithms and linguistic data. Produc- 
tion systems (Rosner, 1983) seem particularly 
suited to meet such requirements (Johnson, 1982); 
the production set that expresses the object-level 
knowledge is clearly separated from the control 
part that drives the application of the produc- 
tions. Colmerauer's Q-system is the classic exam- 
ple of such a uniform production system used for 
machine translation (Colmerauer, 1970; Chevalier, 
1978: TAUM-METEO). The linguistic knowledge is ex- 
pressed declaratively using the same data structu- 
re during the whole translation process as well as 
tb~ sane type of production rules for dictionary 
entries, morphology, analysis, transfer and gene- 
ration. The disadvantage of the Q-system is its 
quite unnatural rule-syntax for non-prrx/rammers 
and its lack of flexible control mechanism for the 
user (Vauquois, 1978). 
In the design of our system the basic uniform 
sch~re of Q-systems has been followed, but the 
rule syntax, the linguistic data structure and the 
control facilities have been modernized according 
to recent developments in machine translation 
(Vauquois, 1978; Bo£tet, 1977; Johnson, 1980; 
Slocan, 1982). These three points will be deve- 
loped in the next section. 
3. DESCRIPTION OF THE SYST~4 
3.1 Overview 
The general framework is a production system 
where linguistic object knowledge is expressed in 
a rule-based declarative way. The system takes the 
dictionaries and the grammars as data, cc~piles 
these data and the interpreter then uses them to 
process the input text. The decoder transforms the 
result into a digestable form for the user. 
3.2 Data structure 
The data structure of the system is based on 
a chart (Varile, 1983). One of the main advantages 
of using a c~art is that the data structure does 
not change throughout the whole process of trans- 
lation (Vauquois, 1978). 
In the Q-system all linguistic data on the 
arcs is represented by bracketed strings causing 
an unclean mixture of constituent structure and 
other linguistic attributes such as grammatical 
and semantic labels, etc. With this representation 
88 
type checking is not possible. Vauquois proposes 
two changes : 
I) Tree structures with uun~lex labels on the nodes 
in order to allow interaction between different 
linguistic levels such as syntax or semantics, etc. 
2) A dissociation of the gecmetry from a particular 
linguistic level. With these modifications a single 
tree structure with complex labels increases the 
power of representation in that several levels of 
interpretation can be processed simultaneously 
(Vauquois, 1978; Boftet, 1977). 
In our system each arc of the chart carries a 
tree geometry and each node of the tree has a 
plex labelling consisting of a possible string and 
the linguistic attributes. Through the separation 
of gecmetry and attributes, the linguist can deal 
with two distinct objects: with tree structures and 
complex labels on the nodes of the trees. 
tring='linguist' \] 
at=noun, gender=p~ 
Figure i. Tree with cc~plex labelling 
The range or kind of linguistic attributes 
possible is not predefined by the system. The lin- 
guist has to define the types he wants to use in 
a declaration part. 
e.g.: category = verb, noun, np, pp. 
semantic-features = human, animate. 
gender = masc, fern, neut. 
An important aspect of type declaration is the con- 
trol it offers. ~ne system provides strong syntac- 
tic and semantic type checking, thereby constrain- 
ing the application range in order to avoid inap- 
propriate transductions. The actual implementation 
allows the use of sets and subsets in the type de- 
finition. Further extensions are planned. 
C~'ven that in this systmm the tree geometry 
is not bound to a specific linguistic level, the 
linguist has the freedom to decide which infommation 
will be represented by the geometry and which will 
be treated as attributes on the nodes. This repre- 
sentation tool is thus fairly general and allows 
the testing of different theories and strategies 
in MT or computational linguistics. 
3.3 The rule slnltax 
The basic tool to express object-knc~ledge is 
a set of production rules which are similar in form 
to context-free phrase structure rules, and well- 
known to linguists from fozmal grammar. In order to 
have the same rule type for all operations in a 
translation system the power of the rules must be 
of type 0 in the Chomsky classification, including 
string handling facilities. 
The rules exhibit two important additions to 
context-free phrase structure rules: 
- arbitrary structures can be matched on the left- 
hand side or built on the rlght-hand side, giving 
• (ge~etry) 
(conditions) 
the pfx~er of unrestricted rules or transforma- 
tional grammar ~ 
- arbitrary conditions on the application of the 
rule can be added, giving the pc~er of a context 
sensitive grammar. 
The power of unrestricted rewriting rules makes 
the transducer a versatile inset for express- 
ing any rule-governed aspect of language whether 
this be norphology, syntax, semantics. The fact 
that the statements are basically phrase structure 
rules makes this language particularly congenial 
to linguists and hence well-suited for teaching 
purposes. 
The fozmat of rules is detenuined by the sepa- 
ration of tree structure and attributes on the 
nodes. Each rule has three parts: geometry, condi- 
tions and assignments, e.g.: 
RULE1 
a + b ~ c(a,b) 
IF cat(a) = \[det\] and cat(b) = \[nou~ 
(assist) ~ cat(c) := \[n~; 
The geometry has the standard left-hand side, pro- 
duction symbol (~, and right-hand side of a pro- 
duction rule. a,b,c are variables describing the 
nodes of the tree structure. The '+' indicates the 
sequence in the chart, e.g. a+b : 
a b 
Tree configurations are indicated by bracketing, 
c(a,b) correspc~ds to : 
----9 /c\ 
a b 
Conditions and asslgrm~nts affect only the objects 
on the nodes. 
3.4 Control structure 
The linguist has ~ tools for controlling the 
application of the rewriting rules : 
i) The rules can be grouped into packets (grammars) 
which are executed in sequence. 
2) Within a given grammar the rule-application can 
be controlled by means of paraneters set by the 
linguist. According to the linguistic operation en- 
visaged, the parameters can be set to a ccmbination 
of serial or parallel and one-pass or iterate. 
In all, 4 different combinations are possible : 
parallel and one-pass 
parallel and iterate 
serial and one-pass 
serial and iterate 
89 
In the parallel mode the rules within a gram- 
mar are considered as being unordered from a logi- 
cal point of view. Different rules can be applied 
on the same piece of data and produce alternatives 
in the chart. The chart is updated at the end of 
every application-cycle. In the serial mode the 
rules are considered as being ordered in a sequen- 
ce. Only one rule can be fired for a particular 
piece of data. But the following rules can match 
the result prDduced by a preceding rule. The chart 
is updated after every rule that fired. The para- 
meters one-pass and iterate control the nunber of 
cycles. Either the interpreter goes through a cy- 
cle only once, or iterates the cycles as long as 
any rule of the grammar can fire. 
The four ccmbinations allow different uses 
according to the linguistic task to be performed, 
e.g.: 
Parallel and iterate applies the rules non-deter- 
ministically to cc~pute all possibilities, which 
gives the system the power of a Turing Maritime 
(this is the only control mode for the Q-system). 
Parallel and one-pass is the typical ccrnbination 
for dictionaries that contain alternatives. Two 
different rules can apply to the sane piece of 
data. The exhale below (fig. 2) uses this combi- 
nation in the first GRAMMAR 'vocabulary'. 
Serial and one-pass allows rule ordering. A 
possible application of this combination is a pre- 
ference mechanism via the explicit rule ordering 
using the longest-match-first technique. The 
'preference' in the example below (fig. 2) 
makes use of that by progressive weakening of the 
selectional restriction of the verb 'drink'. 
Rule 24 fires without semantic restrictions and 
rule 25 accepts sentences where the optional argu- 
ment is missing. 
The ~le should be sufficiently self-expla- 
natory. It begins with the declaration of the 
attributes and contains three grannars. The result 
is shown for two sentences (fig. 3). To demonstrate 
which rule in the preference gran~ar has fired 
each rule prDduces a different top label: 
rule 21 = PHI, rule 22 . PH2, etc. 
Figure 2. Example of a grammar file. 
DECLARE cat ~ dot, noun, verb, val_nodo, np, phi, ph2, ph3, ph4, phE; 
number 5 sg, pl; marker =human, liquld, notdrinkablo, phyeobj°abetr; 
valancu 5 vl, v2, v3~ argument - argl, erg\],arg3J 
GRAHMAR vocebulerU PARN_L ~t QNEPASS 
RULE 1 a -) • ZF strlnQ (a) 5 "the" 
THEN cat(aJ :~ \[dot\]; 
RULE 2 a -> a ZF strtna(a)5 "man" 
THEN cat(a~ :~ \[noun\]; number(a) :" \[sg\]J 
markor(e) :5 \[human\]; 
RULE 3 a :> a XF string(a) m "boor* 
THEN cat(a~ :5 \[noun\]; number(a) :~ Csg\]; marker(a) :~ C11qutd\]; 
RULE 4 a 5) a IF strlnq (a) m "car' 
THEN ca%Ear :m \[noun\]J number(a) :" \[eg\]; 
marker(a) :m \[phyeobj\]; 
RULE 5 a 5 \[F e~r~nala)" "gaxolLno' 
THEN cat(a~ :5 \[noun\]; number(a) :5 Gig\]; markor(a) :i £notdr£nkable\]l 
RULE & a 5~ a \]F string(e)- "drinks" 
THEN cat(el :~ \[noun\]; number(a) :5 \[pl\]~ markor(a) :m \[1Lqutd\]; 
RULE 7 a -) a(b0c) IF string(e)5 "drinks" 
: THEN cat(a?: ~\[Vorb\]J valencu (a):5\[V\]\]l cat(b).~\[val node\]; cat(c):5\[val node\]; 
argument(b): ;\[argl\]J markor(b):-C~uman\]; argument(c):5\[ar92\]; marko~(c):-CIL;utd\]; 
GRAMMAR nounphraee SERIAL ONEPASS 
RULE 21 a + b m) tEa, b) \[F cat(a) 5 \[dot\] and cat(b) 5 \[noun\] 
THEN cat(c) :5 \[np\]; marker(c) :u markor(b)J 
GRAMMAR proforence SERIAL ON\[PASS 
RULE 21 a + b(#l,c,#2, d, W3) + e_m) ~(b,a~a)m , . |F cat(a)ECnp\] and cat(b)ECveroJ ago ca;Le; ;npJ 
and valency(b) 5Cv2\] and araumont(¢)mCar9 L\] and marker(c)~marke r(a) 
and argument(d)ECar92\] end marker(d)mma~ko r(a) 
THEN cat(x) :- £phl\]J 
RULE 22 a + b(Ol, c,#a) + • 5> x(b,e,e) . . IF cat(a)mCnp\] and cat(b) mCvOrb\] and cat(e)~LnpJ 
and valencu(b) =\[v\]\] and argument(c)sCar91\] and ma~kor(c)-marker(a) 
THEN cat(x) :5 \[ph2\]; 
RULE 23 4 + b(#1, c,#2) + • ~) z(b,a,o) ZF ca%(a)-Cnp\] and cat(b)aCvorb\] and cet(o)~Cnp\] 
and valoncu(b) m£v2\] and aTgumlnt(c)m\[arg 2\] and marker(c)Emarkor(a ) 
THEN Cat(x) :m £ph3\]; 
RULE 24 a + b + • 5~ x(b,a.e) IF cat(a)m(np\] end cat(b)=Cverb\] and cat(e)~Cnp\] 
and valence(D) 5\[V2\] THEN cat(x) :5 £ph4\]; 
RULE 25 a + b 5) x(b,a) IF cat(a)5\[np\] and cat(b)m\[verb\] 
and valoncu(b) 5(v2\] THEN cat(x) :5 \[phE\]J 
ENDFILE 
Figure 3. Output of upper granmar file. 
Input sentence : 
(1) The men drinks tho boor. 
Result : 
PHI CATmCPHI\] ! 
I-~DRINKS' CATs\[VERB\] VALENCYEEV~\] 
i i -~AJ-'-NQDE CATE(VAL_NODE\] MARKER--\[HUMAN\] ARQUMENT--CARQI~ 
; i-VALNODE CATECVAL_NQDE\] MARKERECLIGU\[D\] AROUMENTECARQ23 
I-NP CAT'\[NP\] MARKER'\[HUMAN\] 
i; .i-'THE' CATmCDET\] 
!-'MAN' CAT~CNOUN\] NUHEER~CSQ\] MARKERs\[HUMAN\] 
I 
i-NP CATE\[NP\]  ARKERE\[LIGUID\] 
i -'THE' ¢AT-CDET\] i-'BEER" 
CATBCNGUN\] NUMBERE\[EQ\] RARKERE\[LZQUZD\] 
Xnput sentence : 
(2) The man drinks the gazoline. 
Result : 
PH2 CATmCPH2 \] 
!-'DRINKS" CATmEVERB\] VALENCYsEVS\] 
i I-VALNOgE CAT-CVAL,.NQDE\] NARKER=CHUHAN\] ARGUMENT-CARQI\] 
! !-VAL_NODE CAT=\[VALNQDE\] HARMER=CLZGUZD\] ARGUMENT=CARG2\] 
i -NP CAT-(NP\] NARKER=(HUNAN\] • ! 
I I-'THE" CAT=CDET\] 
' !-'MAN" CAT=(NOUN\] NUMBERmCSG\] MARKER-\[HUMAN\] ! 
~-NP CATBCNP\] MARKER~CNOTDRINKABLE\] 
~-'THE" CAT=(DET\] 
i-'GAZOL\[NE" CATuCNOUN\] NUMBERsCEQ\] HARKERs(NQTDRZNKABLE\] 
90 
4. FACILITIES FOR THE USER 
There is a system user-interaction in the two 
main prograns of the system, the compiler and the 
interpreter. The following exanple (fig. 4) shows 
how the error n~_ssages of the ccrnpiler are printed 
in the u~L~ilation listing. Each star with a number 
points to the approximate position of the error 
and a message explains the possible errors. The 
cc~piler tries to correct the error and in the 
worst case ignores that portion of the text follo- 
wing the error. 
@RAHMAR er~ortest PARALEL ITERATE 
*0 pop. O : -ES- ISERIAL/ ou /PARALLEL/ attendu 
RULE 1 a+b m) c(a,b) 
\[F ETRING(a)m'blable' ANO cot(b)m\[nom THEN cAt(d) :m \[nom\]; POe1 *2 
pos. 0 -E8- /,/ attendua pop. 1 -E8- /3/ ottendue 
pop. 2 -SEN- td. pop de~lni dane 14 geometria (cote d~oit) 
RULE 2 a(a) m) c(a,b) 
*0 pop. 0 : -SKM-- ld. deJa utlllso put pa~tie gouche 
ZF cot(a)m\[det\] THEN categ(b) :m \[noun\]; oO o1 
pop. ~ i -SEH- ld. ne represente poe un ensemble pos. -SEPI- id. ne ~ep~esente pas un o|ement 
Figure 4. Compilation listing with error message. 
The interpreter has a parameter that allows the 
sequence of rules that fired to be traced. The tra- 
ce in figure 5 below corresponds to the execution 
of the example (i) in figure 3. 
int|rpreteur do @-cedes O'J.| few-14-84 
applicotten de lo ~egle 1 application de la regle 1 
applicotion de 14 ~egle 2 application de lo regle 3 
application de la reglp 6 application de la ~ogle 7 
VOCABULARY execute(e) 
application de lo ~eglo 11 application de lo ~egle 11 
NOUNPHRASE execute(e) 
application de la ~ogle 21 PREFERENCE execute(e) 
temps d'lnterp~atotion : O.~lb Po¢. CPU 3.583 soc. utllisateur 
Figure 5. Trace of execution. 
5. CONCLUSION 
The transducer is implemented in a m0dular 
style to allow easy changes to or addition of ccm- 
ponents as the need arises. Tnis provides the pos- 
sibility of experimentation and of further deve- 
lopment in various directions: 
- integration of a lexical database with special 
editing facilities for lexioographers; 
- developments of special interpreters for trans- 
fer or scoring mechanis~s for heuristics; 
- refinement of linguistically motivated type 
d~ecking. 
In this paper we have mainly conoentrated on syn- 
tactic applications to illustrate the use of the 
transducer. However, as we hope to have shown, the 
formalism of the system is general enough to allow 
interesting applications in various domains of ion- 
guistics such as morphology, valency matching and 
preference mechanisms (Wilks, 1983). 
AC~N~ 
Special thanks should go to Roderick Johnson of 
CCL, UMIST, who contributed a great deal in the 
original design of the system presented here, and 
who, through frequent fruitful discussion, has 
continued to stimulate and influence later deve- 
lopments, as well as to Dominique Petitpierre and 
Lindsay Hammond who programmed the initial i~le- 
mentation. We would also like to thank all 
bets of ISSO0 who have participated in the work, 
particularly B. Buchmann and S. Warwick. 
r/~rmK~ES 
Buchmann, B., Shann, P., Warwick, S. (1984). 
Design of a Machine Translation System for a 
Sublanguage. Prooeedings, COLING' 84. 
Chevalier, M., Dansereau, 5., Poulin, G. (1978). 
TA\[94-M~I'~O : description du syst~. T.A.U.M., 
Groupe de recherdue en traduction autcmatique, 
Univez~it@ de Montreal, janvier 1978. 
Colmerauer, A. (1970). Los syst~nes-Q ou un forma- 
lisme pour analyser et synth~tiser des phrases 
sur ordinateur. Universit@ de Montreal. 
Johnson, R.L. (1982). Parsing - an MT Perspective. 
In: K. Spazk Jones and Y. Wilks (eds.), Automa- 
tic Natural Language Parsing, M~morand%~n I0, 
Cognitive Studies Centre, University of Essex. 
}~Dsner, M. (1983). Production SystEm~. In: 
M. King (ed.), Parsing Natural Language, Aca- 
demic Press, London. 
Sloc~n, J. and Bennett, W.S. (1982). Tne LRC Ma- 
chine Translation System: An Application of 
State-of-the-Art Text and Natural Language 
Processing Techniques to the Translation of 
Tedunical Manuals. Working paper LRC-82-1, 
Linguistics Research Center, University of 
Texas at Austin. 
Va~is, B. (1975). La traduction automatique 
Grenoble. Documents de Linguistique Quantita- 
tive, 24. Dunod, Paris. 
Vauquois, B. (1978). L'@vOlution des logiciels et 
des mod~les linguistiques pour la traduction 
autcmatis@e. T.A. Infolmations, 19. 
Varile, G.B. (1983). Charts: A Data Structure for 
Parsing. In: M. King (ed.), Parsing Natural 
Language, Ac~mic Press, London. 
Wilks, Y. (1973). An Artificial Intelligenoe Ap- 
proach to Maduine Translation. In: R.C. Schank 
and K.M. Colby (eds.), Computer Models of 
Thought and Language, W.H. Freeman, San Fran- 
cisco., pp. 114-151. 
91 
