NL Domain Explanations in Knowledge Based MAT 
Galia Angelova, Kalina Bontcheva 1 
Bulgarian Academy of Sciences, Linguistic Modelling Laboratory 
Acad. G, Bonchev Str. 25A, 1113 Sofia, Bulgaria, { galja,kalina} @bgcict.acad.bg 
Abstract 
This paper discusses an innovative approach to 
knowledge based Machine Aided Translation (MAT) 
where the translator is supported by an user-friendly 
environment providing linguistic and domain 
knowledge explanations. Our project aims at 
integration of a Knowledge Base (KB) in a MAT 
system and studies the integration principles as well 
as the internal interface between language and know- 
ledge. The paper presents some related work, rel~)rts 
the solutions applied in our project and tries to gene- 
raiize our evaluation of the selected MAT approach. 
1. Introduction 
The notion of MAT comprises approaches where - in 
contrast to MT - the human user keeps the initiative 
in translation. MAT ranges between intelligent text 
editors and workbenches aiming at user modelling 
and partial MT. A principal problem, however, is the 
support of domain knowledge since it affects the qua- 
lity of the translated text. Moreover, the time spent 
for domain familiarization is estimated as 30-40% of 
the total translation time (KiWi90). 
This paper discusses an innovative approach to 
knowledge based MAT: a KB is systematically 
integrated in a fr,'unework providing linguistic as well 
as domain knowledge support. I,inguistic support is 
assured by relevant resources: gl','unmatic~d and lexical 
data. Domain expl~mations are generated from a KB 
of Conceptual Graphs (CGs) (Sow84). The system 
interface offers a standard dialog: while translating, 
the user highlights words/texts, chooses queries from 
menus and receives NL answers from where new 
reqnests can be started. The results reported here were 
achieved in the joint Germ,m-Bulgarian project DB- 
MAT 2 (vHa91, vHAn94). 
Depending on the viewpoint, DB-MAT can be com- 
pared to various approaches and/or systems: (i) enco- 
ding of term's meaning: - to lexicons and tennbanks 
(see 2.1) and knowledge based termbanks (see 2.2); 
(ii) generation of explanations allowing follow-up 
questions and cl,'u'ifications - e.g. IDAS (see 2.3); 
(iii) NL generation from CGs (Bon96, AnBo96). 
Below we present related approaches and the DB- 
MAT paradigm. Our opinion about the costs and be- 
nefits of the knowledge based MAT is clearly stated. 
1 Current: University of Sheffield, Department of 
Computer Science, Regent Court, 211 Portobello Str., 
Sheffield S14DP, e-mail K.Bontcheva@dcs.shef.ac.uk 
2 Funded by Volkswagen Foundation (Germany) in 
1993-1995. See also www'inf°rmatik'uni-hamburg'de/ 
Ar beitsbereiche/NATS/proj eets/db-mat.h tlnl 
2. Related Work 
Some approaches ,are now discussed with comments 
on file rationale of a knowledge based MAT design. 
2.1. Lexicons and Termbanks 
Ill (machine-readable) terminological lexicons domain 
knowledge is contained in text definitions. A concep- 
tu,'d hierarchy is sometimes sketched by pointers like 
"see.." in the definition: the super- and sister-con- 
cepts ,are related to the lexic~d entry, i.e. to the deno- 
ted concept. An intuitive unification of the lexical 
units and their implicit knowledge items is assumed. 
Concept-oriented termbanks support a hierarchical 
skeleton of underlying concepts and thus the know- 
ledge cm~ be treated fonnalty during its construction, 
use and up&lting. Text definitions, however, are writ- 
ten mmmally; the progress is that the lexical entries 
and conceptmd elements are encoded independently. In 
monolingual tcnnbanks, the tenn is a concept label 
as well as a lexicon item (e.g. Co,'Mn93). In cases of 
bilingualism (e.g. Fis93) every NL has its own con- 
ceptual structure and translation equivalents are map- 
pings of one conceptmd structure onto another. 
An evaluation for "domain lolowledge in translation" 
reveals knowledge content, organization ,'rod usage : 
(1) conceptual knowledge is encoded in text 
definitions, written manually in multilingual 
environment. Domain facts are not included in any 
definition although they ,are very important for the 
understanding of technical texts; (2) knowledge is 
artifici,'dly segmented into text fragments, organized 
alphabetically around lexical entries. The thne-consu- 
ruing search for semantically related terms is to be 
performed by the reader; (3) the user gets the whole 
bulk of inlormation without any opportunity to filter 
for relevant aspects. The "inheritance of features" 
~dong the hierarchy is to be made by the reader. 
2.2. Terminological Knowledge Bases 
The term meanings are encoded in formal languages 
instead of text definitions. Briefly we mention: (1) 
COGNITERM (MSBE92): the term meaning is rep- 
resented in a frmne-like structure which is accessible 
by names of concepts or their characteristics. For a 
new NL, another KB is built up "based on the tran- 
slation equiwdents provided tbr concepts in the sour- 
ce language KB" (SkMe90); (2) Translators' Work- 
Bench (TWB): the meanings of each term are descri- 
bed by CGs. A concept is related to several terms via 
synonyms and foreign NL equivalents (HoAh92). 
The available examples present disconnected, though 
formal definitions of meanings. However, there is (1) 
1016 
no coherent, homogenous knowledge source for a 
systematic conceptual evaluation; (2) no access by a 
context sensitive user interface; (3) no theoretically 
sound solution for multilingmd systems. 
2.3. Generation of Explanations 
There are similarities between DB-MAT and other 
NL generation systems, e.g. IDAS which produces 
technical documentation l~om a domain KB and 
linguistic and contextual models (RMLe95). in a 
sense IDAS builds an on-line user interface to KBs 
and provides system mmwers by NL generation. The 
system generates hypertext nodes (both text and 
links) with relevant follow-up questions. The 
following particularities however display tile 
differences between the systems: (1) IDAS is a full- 
scale application, its KI,-ONE like KB contains more 
domain information and the proper system evaluation 
can be performed; (2) the hypertext links act as 
visuM hints for the available relevant information, 
while the user should "guess" that in DB-MAT; (3) 
the DB-MAT KB pretends to be arbitrary, i.e. we in- 
vestigated the integration of arbitrary domain KB into 
applications in the humanities; as a contrast the 
IDAS KB contains fixed number of (rusk-adequate) 
"conceptmd relations" mtd supports fixed query types; 
(4) the systems cert~dnly have diflcreut interface de- 
sign oriented towards dilfbrent go~ds and user types. 
3. Benefits of the Knowledge Based MAT 
The optimal separation of domain knowledge as an 
independent source facilitates its structuring and pro- 
cessing and makes its theoretic~d l'oundation sound. 
KBs seem difficult to acquire (compared to informal 
textual lexicons), but this is not true with respect to 
formal and heterogeneous lexicon structures. More- 
over, formal descriptions are built up increasingly 
both for research and industrial applications, e.g. 
formal specifications iu'e developed by wuious acqui- 
sition tools. Thus DB-MAT aims at the reuse of KBs 
in a MAT system. 
Keeping knowledge in a separate structure enables its 
processing with h)nnal operations. Especially li)nnal 
consistency and semantic coherency can be best 
achievexl in a well-defined representation lmlguagc. In 
DB-MAT the NL explanation semantics is kept as 
CGs as long its possible: we tailor the explanation to 
the users' expectatious by the formal operations 
projection mid join. By inheritance the adequate 
degree of detailness in the generated answer is 
provided (AuBo96). 
In multilinguM MAT, the CG type hierarchy proved 
to be particularly useful in case of e.g. 
terminological gaps. For missing translation 
equivalents, the type hierarchy provides NL 
explanations about the "relative position" of the 
highlighted term. The attributes of the node are 
verbalized in the source language to facilitate 
paraphrases in the target one (WiAn93). 
4. DB-MAT 
DB-MAT is a knowledge-bm;ed TWB providing lin- 
guistic as well as domain knowledge support. The 
system has a user-fliendly interface, with a main win- 
dow separated into two scrollable regions lor the so- 
urce and target texts. The mmslator selects the expla- 
nation language (Gern~ln, Bulgarian) mid Om detail- 
uess of the generated explanations (Less, More) with 
radio buttons. DB-MAT provides figures as well, to 
facilitate the user's undcrstandiug of the domain. 
Currently all figures are associated to lexicon entries. 
4.1. Main menu 
Except File and Edit with their stmtdard functionali- 
ty, the mmn menu contains three t~sk-spccific ilems: 
- tinder Note the user selects flags (<Check later>, 
<Gap>, etc.) to be inserted in the text as reminders; 
- Information provides mouolingual support and 
access to awfilable figures. (\]rmnmaticld data from 
the lexicon is shown to the user. Under the submenu 
Explanations, a NL explanation is generated for 
terms while for non-terms a textual definition is 
given instead (the user should always get something 
without bothering fl-om whcm the answer comes); 
- Multilingual offers bilingual data.Under Tran- 
slations the lexicon correspondences are presented. 
The other subitems are Idioms and Examples. 
4.2. The Lexicon 
I)B-MAT uses one lexicon, i.e. general vocabulary 
and terms arc distinguished by checking whether text 
dcfinitinns or KB-pointers arc avifilablc. There is one 
entry per meaning. Special links contained in and 
consists of (Fig.l, "crossref" of Ids #35, 29, 40), 
acquired scant-automatically, provide precise mapp- 
ings of tile chosen text segments out() lexicon items. 
The lexicon contains (BoEu95): (1) moq)hological 
data organized in morpho-groups (part of speech, inf 
lection class, verb types, etc); (2) syntactic informa- 
tion - syntax groups used by the NL generator only 
and some text striugs (e.g. list of collocations); (3) 
synonyms (Ids #29, 40), antonyms, abbreviations; 
(4) text definitions for general vocabulary (e.g. ld 
#17); (5) relbmnces for bilingmd correspondences. 
4.3. The KB and the Query Mapper (QM) 
The KB consists of concepts, a type hierarchy and 
conceptual graphs. Each graph is either a semantic 
definition of a term or contains certain f~tctual 
knowledge. The QM, our "what to say" component, 
extracts as temporary graphs (by CG projection) 
knowledge fragments to be verbalized. There is no 
fixed prcdefined schema mapping a user request to 
some knowledge fragments. Given a highlighted term 
(i.e. its KB concept), and the user request for domain 
knowledge, the QM searches the KB on the fly and 
extracts all relevant facts according to the conceptual 
relations. Depending on the detailncss level, all 
attributes mid char, tcteristics are inherited fl'om a more 
generic n(xle. 
For each query type, the QM m,-dntains a list of rele- 
vant conceptual relations. So far, the QM has a fixed 
scope of extraction: for most of the conceptual relati- 
1017 
er FREYL1T Minerali31- und RestNabscheider dient zur ~ser queries.,/ 
bscheidung von ~ und Feststoffen aus I Typ~s of... 
bwasserstr6men. \[ Chd~acteristics 
\] Examples 
'~) lex_entry_g(2C~ O1;~iiker,term,_,crossref(\[l,\[35,94D,25,none,sem(\[4Ol,\[l,no . (lexicon) 
" " ss f \[ lex entry_g(4C i Olphase ,term,_,cro,~re (\[\],\[35,89\]),25,none,sem(\[29\],\[\],non 
lex_entry__g(35 3r,term,_,crossref(\[29,40\],\[\]),30,none, sem(\[l,\[\],none)), lex_~\[~rpho_g(25,n,\[f16\]). 
lex_entry_g(17 ~trom',word,_,crossref(\[...\],\[\]),2,none,sem(\[\],\[\],none)). lex_n pho_g(30,n,\[n20\]). 
lex text_g(17 robe Menge yon etwas Fliissigem'). lex @;pho_g(2,n,\[m2\]). A \] 
% lex_kl Lex_Id, Kb_label). 
lex_kb(29, oiL\[r~agr0..~.Ig) ........... 
\[ Conceptual Graphs in internal ~ 
Prolog representation 
rextracted temporal graphs ) 
\[OILFRAGMENT: {* }1 -> (CHAR) -> \[DENSrl'Y: {* } 1. 
\[WASTE WATER\] -> (CONTAIN) -> \[OILFRAGMENT: {* } 1 - 
-> (AqTR) -> \[FLOATINGI 
-> (ATFR) -> \[ROUGHLY DISPERSED\] 
-> (ATI'R) -> \[LIGItTER THAN WATER\]. 
NN 
Generator 
fgenerated explanation ) ~'~ 
......... 1 2 3 • ¢;~:e~ ~ 4 Olphasen (Olpartikel) geh6ren zu Partikeln . Die Olphasen sind gekennzeichnet 
/Die ausschwimmenden 5 und grotxlispersen 60lphasen, welche leichter Ms Wasser sind 7, I 
/sind enthalten in Abwasser 8. 
Figure 1. Generation of explanations: 1 - a synonym from the lexicon; 2 - the supertype from the KB 
type hierarchy, 3 - definite article because \[OIL FRAGMENT\] is already present in the context; 4 - the 
characteristic \[DENSITY\]; 5,6 - the attributes \[FLOATING\] and \[ROUGHLY DISPERSED\]; 7 - a re- 
lative clause for \[LIGHTER THAN WATER\] since it cannot be verbalised as an adjective; 8 - the second 
graph with focused \[OIL FRAGMENT\], in passive voice because the graph is traversed against the arcs. 
ons it is "one step around" the selected concept. Nes- 
ted graphs (e.g. situations) are extracted as unbreakab- 
le knowledge fragments due to their specific 
meanings. The explanation semantics is under certain 
control: the QM does not allow trivial answers like 
"Oil separator is a non-animated physical object" etc. 
Detailed discussion is given in (AnBo96). 
4.4. The Generator (EGEN) 
The generation algorithms are strongly influenced by 
some features of the CGs and their well-defined 
operations. An important asset of the CGs proved to 
be their non-hierarchical structure, allowing for the 
generation to start from any KB node without any 
graph transformations. Thus EGEN may select the 
subject and the main predicate of each sentence from 
a linguistic perspective rather than being influenced 
by the structuring of the underlying semantics (as 
with the frequently used tree-like notations). 
EGENs input is: (1) the relevant knowledge pool; 
(2) the explanation language; (3) the highlighted 
concept(s) (the corresponding term(s) will become the 
global focus of the generated explanation); (4) the 
query type (necessary for the selection of an appropri- 
ate text-organisation schema); (5) all iterative call 
flag indicating a request for fuaher clarification. 
In order to produce a coherent explanation, EGEN or- 
ders the CGs by applying a suitable text organisation 
schema (AnBo96) - definition, similarity or difference 
(similar to those in McKe85). Afterwards the genera- 
tor breaks some CGs into smaller graphs or joins 
similar ones into a single graph to ensure that each 
CG is expressible in a single sentence. Finally 
EGEN verbalizes the CGs by applying the utterance 
path approach - the algorithm searches for a cyclic 
path which visits each node and relation at least once. 
If a node is visited several times then grammar rules 
determine when and how it is verbalized. As proposed 
1018 
in (Sow84), concepts are mapped into nouns, verbs, 
adjectives and adverbs, while conceptual relations are 
mapped into "functional words" or syntactic ele- 
ments. The Sowa's algorithm is extended (AnBo96) 
to: (1) process extended referents (e.g. measures, 
conjunctive and disjunctive sets, etc.); (2) group 
relevant features together (e.g. first utter all 
"dimension" attributes, then all "weight" attributes, 
etc. instead of mixing them up); (3) introduce 
relative clauses mtd conjunctions; (4) generate a 
sentence tree allowing Ibr future transformations. The 
APSG grammar used by EGEN is implemented in 
Prolog. 
Additionally, EGEN keeps all uttered concepts in a 
stack and later refers to them using a definite article 
or a pronoun. This stack is cleared in the end of the 
explanation, unless there is an iterative request. 
A request for definition ("What is?") of "Olphasen" is 
given on Fig. 1. Some relevant lexicon entries ,are 
presented. The QM extracts the supertype and the 
conceptual relations ATTR, CHAR and RESULT 
(AnBo96). The extracted temporal graphs ,are shown 
in linear notation. They contain ,all occurrences of the 
"highlighted" concept and rite necessary conceptual 
relations. The QM has applied the type contraction 
operation in order to "simplify" the graphs. Thus 
there are complex concepts like \[LIGHTER THAN 
WATER\] which have corresponding type definitions. 
5. Costs of the knowledge based MAT 
It is difficult to acquire the interrelated lexicon/KB 
although once the KB is acquired, the reuse effect 
will decrease tile costs of adding a new NL to the 
system. In DB-MAT we used special lexicon 
acquisition tools and we plan to develop tools with 
restricted NL interlace for future KB acquisition. Our 
estimation is that DB-MAT resources ,are not more 
complicated than the lexicons in sophisticated MT 
systems, e.g. the KBMT lexicon and ontology 
(GoNi91). However, tile proper use of AI-methods 
requires additional study, design efforts and evalualion 
expcrhnents oriented towards knowledge bxsed NLP. 
6. Implementation and Conclusion 
The DB-MAT demo is implemented in LPA MAC 
Prolog32. Special lexicon acquisition ttxlls were de- 
veloped. The German lexicon contains about 900 
entries. The KB (about 300 concept nodes and 30 
conceptual relations) was manually acquired from a 
textbook and encyclopedias in admixture separation. 
The lexicon covers a demo text but any MacProlog 
readable file demonstrates the DB-MAT features if it 
contains the basic terminology (enabling requests lor 
domain explanations). 
DB-MAT studies one of the possib!e applications of 
KB-methods to computational terminology and trans- 
lation aid tools. Further research is aimed at: (1) 
building a larger KB; (2') development of a general 
methodology relating the terminology to the corres- 
pouding conceptual knowledge; (3) expefimeuts with 
the role of negation; (4) improvement of the 
generation to ensure more elaborate and coherent 
output combining textual and graphical information. 
Acknowledgments: We are grateful to the DB- 
MAT project leader Prof. Dr. Walther yon Halm 
(Hamburg University) for his support during all 
project stages. DB-MAT would not have been 
possible without his efforts concerning the general 
design and the German language data. We also thank 
all those people who contributed to the project and/or 
to the quality of our papers and presenlations. 
References: 
\[AnBo96\] Angelova, G. and K. Bontcheva. DB- 
MAT: Knowledge Acquisition, Processing and NL 
Generation using CGs. To appear Pr¢az. of ICCS-96, 
Sydney, Australia, August 1996 (Lecture Notes AI). 
\[BoEu951 Boynov, N. ,and L. Euler. The Structure 
of the Lexicon and its Support in DB-MAT: Report 
1/95, Prqiect DB-MAT, Univ. Hamburg, 5/1995. 
\[Bon96\] K. Bontcheva. Generation of Multilingual 
Explanations .from CGs. To appear in Mitkov, 
Nikolov (eds.), 'Recent Advances in NLP', Current 
Issues ill Linguistic Theory 136, Benjamins Press. 
\[CoAm93\] Condamines, A. and P. Amsili. Termi- 
nology Between Language and Knowledge: an exam- 
ple of terminological knowledge base. In \[Schm93\]. 
\[GoNi91l Goodman, K. and S. Nirenburg (eds). A 
Case study to KBMT. Morgan Kanfmmm Pub. 1991. 
\[Fis93\] D. Fischer. Consistency Rules and Triggers 
for Multilingual Terminology. In \[Schm93\]. 
\[HoAh92\] Hook, S. and K. Ahmad. Conceptual 
Graphs and Term Elaboration: Explicating 
(Terminological) Knowledge. Tech.report, ESPRIT II 
No. 2315: TWB Project, Uuiversity of Surrey, 1992. 
\[KiWig0\] Kieselbach,C., H. Winshiers. Studie zur 
Anforderungsspezifikation einer computergestuetzten 
Uebersetzerumgebung. Studienarb.,Univ. H,'unburg. 
\[McKe85\] K. McKeown. Text Generation: Using 
Discourse Strategies and Focus Constraints to 
Generate NL Text. Cmnbridge Univ. Press, 1985. 
\[MSBE92\] Meyer, I., D.Skuce, L.Bowker, K.Eck. 
~bwards a new generation of term. resources: an ex- 
periment in building a TKB. COLING-92, 956-960. 
\[RMLe95\] E. Reiter, C. Mellish aud J. Levine. 
Automatic Generation of Technical Documentation. 
Applied AI, Vol. 9, No. 3, 1995, pp. 259-287. 
\[Schra93\] K. Schmitz (Ed.), Terminology and 
Knowledge Engineering, Proc. 3rd Int. Congress, 
Cologne, Germany, August 1993. 
\[SkMe90\] Skuce, D. and I. Meyer. Concept Ana- 
lysis and Terminology: A Knowledge-Based 
Approach to Documentation. COLING-90, 56-58. 
\[Sow84\] J. Sowa. Concept. Structures: Information 
Processing in Mira1 and Machine. Add.Wesley, 1984. 
\[vHa91\] W. von Hahn. Innovative Concepts for 
MAT. Proceedings VAKKI, V,'msa 1992, pp. 13-25. 
\[vHAn94\] v. Hahn, Walther and G. Angelova. 
Providing Factual In fornuttion in MAT. In Proc. Int. 
Conf. MT: Ten Years On, Cranfield, UK, Nov.1994. 
\[WiAn93\] Wiuschiers, H. and G. Angelova. Solv- 
ing Translation Problems of Terms and Collocations 
Using a Knowledge Base. Techn. report 3/93, Project 
DB-MAT, University of Hamburg, December 1993. 
1019 
