ON KNOWLEDGE-BASED MACHINE TRANSLATION 
Sergei Nirenburg*, Victor Raskin** az/d Alien Tucker* 
ABSTRACT 
This paper describes the design of tile knowledge representation medium 
used for representing concepts and assertions, respectively, in a subworld 
chosen for a knowledge-based machine u'anslation system. This design is 
used in the TRANSLATOR machine translation project. The kuowledge 
representation language, or interlingua, has two components, DIL and 
TIL. DIL stands for 'dictionary of interlingua' and descibes tile semantics 
of a subworld. TIL stands for 'text of interlingua' and is responsible for 
producing an interlingua text, which represents tile meaning of an input 
text in tile terms of trte interlingua. We maintain that involved analysis of 
various types of linguistic and eucyclopaedic meaniug is necessary for the 
task of autx)matic translatiou. The mechanisms for extracting and nlanipn- 
lating and reproducing the nteaning of te~ts will be reported in detail else- 
where. The linguistic (inchlding tile syutactic) knowledge about source 
altd target languages is used by the nlechanisnls that translate texts into 
aud from the btterlingua. Since interlingua is an artificial langnage, we 
can (and do, through TII,) control tile syntax and semantics of the allowed 
interlingua elements. The interlingua, snggesled for TRANSI.ATOR has a 
ln'oader coverage than other knowledge re, presentation schemata for 
natural language. It involves the knowledge about discourse, speech acts, 
focus, thne, space and other facets of the overall meaning of texts. 
to Delimiting file Problem. 
TRANS/,AfOR explores the knowledge based apln'oach to machine 
translation. "File basic translation strategy is to extract nleaniug froul tile 
inlmt text in source language, SL, represent this nmaning hi a language 
iudependeut senlantic representation and tlmn render this meauh~g in /, 
tw'get language, TI,. The knowledge representation language used in such 
a set-up is called, for historical reasons, interlingua (henceforth, ILl. 
TRANSLATOR'S ultima~ ainl is achieving good quality an/el/latin 
translation in n non-trivial snbworld and its corresponding sublangnage. 
The philosophy of 'rltANSI.ATOR ailns at tile independence of tile process of 
trauslafion froln human intervention in tile fcnnl of the traditional pre- 
and/or post-editing, hlteraction during tit/,* process of tra~lslation can be 
accommodated by this philosophy, but only as a temporary measure. 
Interactive modules will be phlgged into the system pendhlg the develop 
ment of autonlatic modules for perfbrnling tile various tasks as well as 
more powerful inference engines and representation schemata. This is a 
device that facilitates early testing of a system Even tlefbre all the modules 
are actually built. Another advantage of this strategy is that the systnlu 
becomes 'dynamic', in the sense that its knowledge is growing with use. 
This strategy is an exteusion of one of the approaches discussed, for 
example, in Carbonell and Tomita (1985) since it implies knowledge 
acquisition during the exploitation stage a/~d also involves a broader class 
of texts as its inlnlt. Johnson and Whitelock (1985) are also proponents of 
the interactive approach, lint their motivation is different, in that they per- 
ceive the human to be an integral part of their system even in its final 
incarnation. In any case, interactlvity is not tile central design feature of 
TRANSLATOR. 
Before proceeding to describe the knowledge chlsters in TItANSI.A.. 
"fOR we would like to colnnlent very briefly on a number of methodologi 
cal points concerning MT research. It seems that some of file opinions 
more or less commonly hekl by some members of the MT con/munity 
may need rethinking. In what follows we list some of these opinions, 
together with our comments. A more detailed treatment of these topics 
will be given elsewhere. 
l- Thin paper is based upon work suptx~rted by the National Science Foundation 
under Grant DCR-8407114. 
* Colgate University 
** Purdue University 
Opinion. It is nnnecessary to extract tile full meaning from the SL text in 
order to achieve adequate MT. 
Conunent, An MT system can do well withont (involved) semantics in 
nlany cases, bnt has to USE meaning in tile rest (or rely on hnlnau inter- 
yen/ion). Machines, unlike humans, cannot on demand prodnce interpre- 
tations of text at all arbitrary depth sufficient for understanding. There- 
fore, if one aims at fully automatic, one has to prepare tile system for tile 
treatment of even very semantically involved text. One Call, of course, 
think of designing a systenl that can decide how deeply each sentence can 
be analyzed semantically in an atlempt to minimize selnantic analysis. We 
maintain that tile decision nlaking involved is as complex as the initial 
problem of deep senlantic analysi:;. 
Opinion° II is not necessary to finish lnocessing the inlmt sentence 
before starting the translation. Indeed, people very often do this (consider 
interpleu~.rs) with very good re, suits. 
{2nlnment. This Opinion is based on iutrospEcdon. The \[eal thought 
processes that gt, on ill tile trans\[atols' or thE interprEtErs' heads are uot 
known. The (quite considerable) knowledge that the translators i/ave 
about the subject of the text (speech) and about tilt: speech situation itself 
prorupts them to preempt the text by following their expectations concern- 
ing the most probable set of meanings fbr the tcxl and deciding before tile 
final eorloboration arrives, biveu if' hi a majority of cases this strategy 
works (as it is supposed to, because otherwise humans, being intelligent 
creatures as thi:y are, Would not have had tile above expectations in the 
first place!), them is nothing unusual in making an crier of" judgenlent. 
Those ot us who worked as translators surely remeulbct nmltiple 
instances of this kind. Of course, tills disEussion is relative to the quality 
of product desired in tile Iranslatioii. 
Opinion° Apln'oaches to MT based on AI do not pay sufficient attention to 
the syntactic analysis of SI,, while syntactic information is important for 
MT. 
Cllnlnlexll. Syntactic structure of inpnt conveys meaning; this nmanmg is 
extracted by the semantic analyzer with the help of syntactic knowledge. 
All clues are indeed used. No resnlts of' syntactic analysis are, storEd 
because they are not needed. Any approach that attempts to relate directly 
various syntactic slyucturn trees between SE, lad T\]~, strikes us as quite 
nnpromising. It is only some early Al-otiented MT systems that were 
vuh/erable to this criticism. 
Opinion° lL-based approaches Inad to an overkill because no peculia,i. 
ties of SI, (and of the relationship between, or contrastive knowledge of, 
SI, and TI,) can be used in translation. Some languages have quite a lot in 
conunon in their syntax and meauing dislributiou. It is wastefltl not to USE 
this additional infbrmatiou iu translatiou. 
Comment. While snch insights cau sometimcs bE detected and nsed, 
hies/ of them comes fronl h/uuan intnitinu, and cannot be taken advautage 
of in an MT systeel, which can hardly he considered a model of human 
performance. It is also totally wrong to imply, in our opinion, that 
discovery and implenlentation of those pieCES of contrastive knowledge can 
be simpler or, in fact, distinct from invoiw?d semantic analysis. 
Opinion. With l\[,, the process of translation beconms one of interpreta- 
tiou, The structure of the SL text, whert used in addition to It, in MT, 
governs tile choice of one of tile paraphrases. Moreovm, again, II, is an 
overkill, because tile paraphrases are not needed and add an elemeut of 
ambiguity. 
Comment. thnnan translators always have a few practically Equally 
acceptable paraphrases for virtually every St. sentence. The degree of 
meaning similarity among the acceptable paraphrases is determined by 
external parameters. The translation is executed according to the human 
translator's intuitive understanding of these parameters. Only in II. 
approaches can one control tile required degree of sinlilarity among the 
acceptable paraphrases as la'anslatious of all SL sentence. 
Opiniou. Generation of TL is a relatively simple problem for which very 
little or no knowledge other than lexical or syntactic is needed. 
627 
Cmnment. Generation requires non-trivial decision making, for 
instance, in the light of tim discussion in the previous paragraph, or, for 
that matter, as regards the computational stylistics, which will have to be a 
part of the choice-making mechanisms in building TL texts. 
2. Configuration of TRANSLATOR 
The background of the TRANSLATOR MT project at Colgate is 
presented in Tucker and Nirenburg (1984). This paper focuses on the 
static knowledge clusters of TRArqSLA'roR. The latter are identified as fol- 
lows: 
• . IL dictionary 
e SL - IL dictionary 
e IL - TL dictionary 
• SL grammar and syntactic dictionary . 
• SL - IL translator 
• IL grammar 
• TL grammar and syntactic dictionary 
• IL - TL translator 
There are also dynamic knowledge clusters in TRANSLATOR: tile 
parser and the generator modules as well as the inferencing mechanism 
(known as the Inspector) used to derive additional knowledge from IL 
representations when troubleshooting becomes necessary. 
In this paper we will describe the structure of the IL dictionary and 
the IL grammar, the central components of the system. These two struc- 
tures are actually knowledge representation languages. IL dictionary is 
written in a language for describing tim types of concepts that can appear 
in the subworld of translation. IL grammar is written in a language for 
representing the assertions about tokens of those types that actually appear 
in texts. We will call these languages DIL (for Dictionary Interlingua) 
and TIL (Text Interlingua), respectively. The distinction between DIL and 
TIL is similar, for instance, to that between the description and the asser- 
tion languages in KL-ONE (cf., e.g., Brachman and Schmoltze, 1985). 
After discussing these languages we will briefly discuss the structure 
of knowledge about SL (the SL grammar and the SL - IL dictionary), 
enough only to help us through an illustration of how the IL dictionary 
and grammar are used. 
3. The IL Dictionary. 
The IL dictionary serves as the database where TRANSLATOR stores 
its knowledge about rite subworld of translation. It is purely semantic, 
conceptual. The IL dictionary is a source of information for representing 
the meanings of SL texts. In it one does not find any information pertain- 
ing to any particular SL or TL. Thus, it is pure coincidence that most of 
the entry heads in this dictionary, as well as most of the members of the 
property sets (cf. below) look like English words. This choice was made 
with the dictionary writers in mind. The other possibility would have 
been to assign non-suggestive identifiers to entries and values in the IL 
dictionaries. This would have slowed doffn the process of dictionary com- 
pilation. The dictionary writers must do their best not to mix the seman- 
tics of an IL dictionary entry with that of an English word whose graphi- 
cal form coincides with that of the IL dictionary entry head. 
There are two kinds of entities in DIL: concepts and properties. 
Concepts are IL 'nouns' (objects) and IL 'verbs' (events). IL 'adjectives', 
'adverbs' and 'numerals' are represented by properties. These are organ- 
ized as sets of property values indexed both by the name of the property 
set (e.g., 'color', 'time' or 'attitude') and by the individual values, to 
facilitate retrieval. Property values are applicable to specific concept 
types. Their tokens do not appear on their own in IL texts, but only as 
fillers of slots in the frames for concept tokens. Thus, for example, 'red' 
will be a potential filler for the 'color' property of a token of every physi- 
cal object. An explanation of the relationship between IL word types and 
tokens follows. 
The IL dictionary is organized as a set of entries (concept nodes) 
interconnected through a number of link types (properties). However, the 
structural backbone of the dictionary is the familiar isa hierarchy with 
property inheritance. Note that most of the time the translation system will 
be working with terminal nodes in this hierarchy. But the nonterminal 
nodes play a special role in it. By representing sets of entries, thereby 
providing a link among a number of (related) concepts, they serve as the 
628 
basis for a variety of inference-making procedures. Even more impor- 
tantly, these 'nonterminal entries' constitute, together with tile sets of 
various property values, the schema of the dictionary, the set of terms that 
arc used to describe the semantics of the rest of the dictionary entries. 
Just as all other nodes in the hierarchy, nonterminal nodes 
represent dictionary entries, which means that they can also have tokens. 
This device comes bandy when, on analyzing a segment of input, we con 
ciude that a certain slot filler is unavailable in the text. At the same time, 
if we know the identities of other slot fillers in the frame, we can come to 
certain conclusions about the nature of an absentee. For instance, if the 
Agent slot of a certain mental process is not filled, we, by consulting the 
'agent-of' slot of the nonterminal node 'mental-process', can infer (or, 
rather, abduce) that, whatever it is, it must be a 'creature'. This 
knowledge helps in finding referents for anaphoric phenomena. 
The dictionary entries represent IL concept and property types; IL 
texts consist of IL concept tokens (as well as \[L clause and sentence 
tokens). Every token of an IL concept stands in the is-token-of relationship 
to its corresponding type. Structurally both IL concept types and IL con- 
cept tokens are represented as frames. The frame for a type and the 
frame for a corresponding token are not identical in structure, though the 
intersection of their slot names is obviously non-zero. One must note, 
however, that even in this case the semantics of the slots in the dictionary 
frames is different from that of the corresponding slots in the text frame. 
Some of the slot names in the type frames refer to the paradigmatic 
relationships of this concept type with other concept types. These are the 
type parameters of an IL dictionary entry. The rest of the information in 
an entry describes syntagmatic relationships that tokens of this particular 
type have with tokens of other types on an IL text. These are called token 
parcaneters. Among the type parameters one finds the pointers in the isa 
hierarchy, relationships like part-of, belangs-to, etc. 
The token-parameter slots in the dictionary entries contain either 
default values for the properties (the 'no-value' value is among the possi- 
ble default choices) or acceptable ranges of values, for the purpose of vali- 
dity testing. IL concept tokens, which are components of IL text, not its 
dictionary, have their slots occupied by actual values of properties; if 
information about a property is not forthcoming, then the default value (if 
any) is inherited from the corresponding type representations. 
In what follows we will describe DIL, the IL dictionary language. 
We will do this by presenting the top levels of the isa hierarchy of 
concepts in our world and listing the frames for high-level nodes. Next, 
we'll present examples of IL dictionary frames, including one complete 
path in the isa hierarchy, from the root to a terminal node. 
The actual contents of the tree are, as we already said, idiosyncratic: 
it may be overdeveloped in some of its branches and underdeveloped in 
many others. This state of affairs corresponds to the strategy of working 
within a subworld. 
3.1. Frames. 
all :: = ('all' 
('id' string) 
('properties' properties) 
('subworld' subworld*)) 
This is the root of the isa hierarchy. The three slots present here mean 
that every node in the tree has an id; every node features some properties 
(which exactly, will be shown in lower-level nodes); and every node 
represents a concept that belongs to one or more subworlds. 
event :: = ('event' 
('isa' all) 
('patient' object)) 
At this level we meet the 'isa' slot for the first time. This is the,pointer to 
a node's parent in the hierarchy. Events divide into processes and states. 
The only overtly mentioned property common to all events is the concep- 
tual case of 'patient' (this reflects our opinion that in the sentence (1) 
John is not an agent, but rather a patient). Note that 'patient' in DIL sub- 
sumes the semantics of 'beneficiary'. 
(1) John is asleep. 
process :: = ('process' 
('isa' event) 
('is' process-sequence) 
('part-oP process*) 
('agent' creature) 
('object' objecO 
('instrument' objecO 
('source' object) 
('destination' object) 
('preconditions' state*) 
('effects' state*)) 
In addition lo the conceptual case slots, the process frame contains infor- 
mation about preconditions and effects. These are states that must typi- 
cally hold before and after the process takes place, respectively. A pro- 
cess can also be a part of other processes. Thus, for instance, move is a 
part of travel and, at the same time, of fetch or insert. The 'is' slot of a 
process frame contains either tile constant primitive, if the process is not 
furfller analyzable in DIL, or the description of file seqnence of processes 
which comprise the given process. The process-sequence is a list of pro- 
cess names connected by tile operators sequential, choice and shuffle. In 
other words, a process may be a sequence of subprocesses (sequential), a 
choice among several subprocesses (choice), a temporally unordered 
sequence of subprocesses (shuffle) or any recursive combination of file 
above. This treatment of processes is inspired by Nirenburg et al., 1985. 
For the purposes of machine translation it seems unnecessary to introduce 
a more involved t~nporal logic into consideration for the 'is' slot. 
physical-process :: = ('physical-process' 
('isa' process)) 
mental-process :: = ('mental-process' 
('isa' 'process') 
('is' primitive) 
('agent' creature) 
('object' object\[ eveu0) 
Only creatures can be fillers for the 'agent' slot. Mental objects classify 
into reaction processes (cf. the English 'please' or 'like'), cognition 
processes ('deduce') and perception processes ('see'). Objects of mental 
processes can be either objects, as in (2) or events, as in (3). 
(2) I know John 
(3) 1 know that John has traveled to Tibet. 
speech-process ::= ('speech-process' 
('isa' process) 
('is' primitive) 
('agent' person) 
('patient' person* \[ organization*) 
('object' event) 
('source' 'agent') 
('destination' 'patient')) 
Speech processes are primitives. The speech processes recognized by D1L 
include assertions (that further subdivide into definitions, opinions, facts, 
promises, etc.) and requests (questions or commands). The 'agent' slot 
filler has file semantics of the speaker. The 'patient' is the hearer. Note 
that there is a possibility for the hearer to be a group or an organization, 
as in (4). 
(4) I promised the band to let them have a ten-minute break every hour. 
The 'agent' is the 'source' and the 'patient' is the 'destination' of a 
speech process. 
state ::= ('skate' 
('isa' event) 
('part-of' state*)) 
The actant in states, which is the patient rather than the actor, is inherited 
from the event frame. 
object ::= ('object 
('isa' all) 
('part-of" object*) 
('consists-of' object*) 
('belongs-to' creature\[ organization) 
('object-of' (Mental-Process Speech-Process)) 
('patient-of' event) 
('instrument-of' event) 
('source-of' event) 
('destination-of' event) 
('source-of' event)) 
The '...-of' slots are used for consistency checks. 
3,2. Properties. 
Property values are primitive concepts of IL used as values for slots 
in concept frames. We give here just an illustration of these. Many more 
exist and will be used in the imphrmentation. 
size-set :: = nil I infinitesimal \[ ... I huge 
color-set :: = nil t black \] ... \[ white 
shape-set :: : nil \] flat I square \] spherical ... 
material-set :: = nil I (gold (specific-gravity 81) (unit-value 228))1 ... 
subworld-set :: = nil I computer-world \[ business-world \[ everyday world 
boolean-set :: = nil I yes \] no 
texture-set :: = nil I smooth \] ... \[ rough 
properties :: = ('properties' 
donne' 
('size' size set) 
('color' color-set) 
('shape' shape-set) 
('texture' texture-set) 
('belongs-to' creature \[ organization) 
('part-of' object I event) 
('consists-of' object \[ event) 
('power' real) 
('speed' real) 
('mass' real) 
('edibility' boolean-set) 
('made-of' material-set) 
...) 
3.3. From the Root to a Leaf. 
A path of concept representations fi'om the root to a leaf node is presented 
below. 
all-> object-> pobject-> +alive-> creature-> person-> 
computer-user 
Frames for 'all' and 'object' see above. 
pobject :: = ('pobject' 
('isa' object) 
('object-off (+ (Take Put)) 
('size' size-set) 
('shape' shape-set) 
('color' color-se0 
('mass' integer)) 
The '+' sign in slots means all inherited information plus the contents of 
tile current slot. 
629 
+alive :: = (' +alive' 
('isa' pobject) 
('edibility' boolean-set)) 
:feature :: = ('creature' 
('isa' +alive) 
('agent-off (Eat Ingest Drink Move Attack)) 
('consists-off (Head Body)) 
('object-of' (+ (Attack)) 
('power' real) 
('speed' real)) 
person :: = ('person' 
('isa' creature) 
('agent-off (+ (Take Put Find Speech-process Mental-Process))) 
('source-off Speech-process) 
('destination-off Speech-process) 
('consists-off (+ (Hand Foot ...))) 
('power' 50) 
('speed' 50) 
('mass' 55)) 
computer-user :: = ('computer-user' 
('isa 'person) 
('agent-off (+ (Operate))) 
('subworld' computer-world)) 
The complete frame of the leaf of this patb, 'computer-user', including all 
inherited slots and default values is listed below. In reality frames like tMs 
do not exist, because the tokens of this type do not contain all the possible 
slot fillers. 
(computer-user 
('isa' person) 
('agent-off (Operate Take Put Find Speech-process Mental-Process 
Eat Ingest Drink Move Attack)) 
('object-off (Find Mental-process Speech-process Attack Take Put)) 
('destination-of' Speech-process) 
('source-off Speech-process) 
('consists-off (Hand Leg Head Body)) 
('power' 50) 
('speed' 50) 
('mass' 55) 
('subworld' computer-world)) 
4. The Interlingua Grammar. 
In the previous section we dealt mostly with IL lexicon. This section 
is devoted to the syntax of IL text. Unlike a natural language text, an IL 
text is not linear. It is a (potentially) complex network of IL sentences, 
interconnected by \[L discourse markers. An IL sentence consists of a 
main clause and a number of subordinate clauses, possibly interconnected 
through discourse markers, with the speech act and focus information 
added. IL clauses are the place where rite tokens of events are put into the 
modal and spatio-temporal context. IL events are processes and states. It 
is in representations of the latter that tokens of IL 'verbs' and 'nouns' 
(retrieved from the dictionary and augmented by various property values 
identified during SL text analysis) meet for the first time. 
The above consideration led us to declare the language of the gram- 
mar a separate representation language, TIL. There are important differ- 
ences between TIL and DIL. At the same time there are regular 
correspondences. The values of the properties in entity tokens typically 
correspond to the data types listed as fillers for the corresponding slots in 
the~ IL dictionary. Thus, for instance, the color property slot in the \[L dic- 
tiohary frame for 'flower' can be occupied by a list (white yellow blue red 
purple pink ...), the one for 'snow' will presumably contain only (white). 
At the same time, 'rosell' will have the value 'red' as the contents of its 
'color' slot. This underscores the difference in the semantics of similarly 
named slots in DIL and TIE 
4.1. Text. 
text :: ~ nil \] 
sentence I 
(discourse-structure-type text text +) 
The above means that an IL text is either an empty string, a single 
sentence, or a number of sentences interconnected through discourse 
structure markers. 
4.2. Sentence. 
sentence :: = ('sentence-token' 
('main-clanse' clause) 
('clauses' clause*) 
('it' string) 
('subworld' subworld) 
('modality' modality) 
('focus' focus) 
('speech-act' speech-act)) 
Every sentence is declared to contain a speech act. Thus, we pro- 
pose to represent (5) as (6), provided we can infer the identities of the 
speaker and the hearer, as well as the identity of the process: 
(5) I'd rather not do it. 
(6) Boss ordered Employee X not to agree to the terms of Sales Offer Y. 
Both direct and indirect speech acts are represented with the help of 
speech process tokens. With direct speech acts, the information to be put 
into the sentence frame is present in the text, while with indirect speech 
acts it has to be inferred. 
Thematic information about the sentence is restricted to the values 
of the focus slot in the sentence frame. This slot contains pointers to the 
entitles that constitute the 'given' and the 'new' in this particular sen- 
tence. This entity can be a concept, a property of a concept, or an entire 
clause (cf. 4.11) The value of the modality slot for the IL sentence is 
chosen from the set of modalities (cf. 4.10). The subworld slot is a 
marker that shows that the sentence belongs to a 'semantic field' related to 
computers. In TRANSLATOR this is the designated topic for translation. In 
broader environments the subworld information will be helpful to prune 
unneeded inference paths. 
The fact that we allow only one clause to occupy the 'main-clause' 
slot of a sentence means that IL sentences cannot be compound (i.e., con- 
sist of a number of sentences connected through commas and coordinate 
conjunctions like the English 'and', 'or' or 'but'. The fact that other 
clauses can be present means that it can be complex. Sentences that are 
compound in SL are translated as texts in IL, the representations of the 
immediate constituents of die compound SL sentences being IL sentences. 
Appropriate discourse structure markers are used to represent the mean- 
ing carried by the conjunction. 
4,3. Clause. 
('clause-token' 
('it' string) 
('discourse-structure' discourse-structure) 
('focus' focus) 
('modality' modality) 
('time' time) 
('space' space) 
('event' event) 
('quantifier' quantifier2) 
('subworld' subworld) 
The major difference between the interlingua clauses and events is 
that clauses contain information that actually appears in the input text 
(augmented by anaphora resolution), while events can be either contained 
in the input or inferred from it. 
A clause may be connected discourse-wise not only with another 
clause but also with an object or an event, as well as with a sentence, a 
paragraph or even a whole text; also note that discourse structure assigns 
the given clause as one of tile two arguments in the discourse structure; 
one clause can be an argument in more titan one discourse-structure 
expression. 
630 
4.4. Process. 
(' physical-process -token' 
('id' string) 
('is-token-of' string) 
('agent' object-token) 
('object' object-token) 
('patient' object-token) 
('instrument' object-token) 
('source' object-token) 
('destination' object-token) 
('negation' negation) 
('quantifier' quantifier2) 
('phase' phase-set) 
('manner' manner-set) 
('space' space) 
('time' time) 
('snbworld' subworid)) 
An actual process token is represented as follows: 
(move-token 
('id' move21) 
('is-token-oF move) 
('is' primitive) 
('agent' personl2) 
('object' person 12) 
('sonrce' (in house2)) 
('destination' (in house3)) 
('uegation' nil) 
('quantifier' nil) 
('phase' static) 
('manner' easily) 
('part-of' travel5) 
('time' (before 1700)) 
('subworld' everyday-world)) 
4.5. State. 
('state-token' 
('id' string) 
('is-token-of' string) 
('negation' negation) 
('quantifier' quantifier2) 
('patient" object-token) 
('phase' phase-set) 
('part-of' state-token*) 
('space' space) 
('time' time) 
('subworld' subwortd)) 
Events in IL have a property of 'phase': they are either 'static', 
'beginning' or 'end'. This device is needed to represent changes of state. 
Changes of state are sometimes represented as a separate class of 
processes. The solution in IL may be more economical. 
4.6. Object. 
A typical frame for an object token in TIL is as follows. The 
'string' in the 'is-token-of' slot stands for the name of file corresponding 
object type. 
('object-token' 
('id' string) 
('is-token-of string) 
('subworld' subworld) 
('negation' negation) 
('quantifier' quantifier 1)) 
An example object token follows: 
('person-token' 
('id' person23) 
('is-token-of' person) 
('subworld' everyday-world) 
('negation' no) 
('quantifier' any) 
('power' 50) 
('speed' 50) 
('mass' 55)) 
Note tim difference from DIL object fi'ames. No '...-of' slots here. 
More emphasis on syutagmatic relationships and default overriding. 
4.7. Time. 
time :: = nil I absolute-time 1 relative-tium 
absolute-time :: = ('time' 
('quantifier' quantifier2) 
('point' integer) I 
('interval-begin' integer) 
('interval-end' integer)) 
relative-time := ('time' 
(temporal-operator event) 
('quantifier' quantifier2)) 
temporal-operator :: = simultaneous I before I during\[ around I always I none 
Relative time markers will predominantly appear in texts. 
4.8. Space. 
space :: = nil I absolute-space I relative-space 
absolute-space :: = ('space 
('quantifier' quantifier2) 
( 'coordinatel' real) 
('coordinate2' real) 
( 'coordiuateY real)) 
relative-space :: = ('space' 
(spatial-operatm" object) 
('quantifier' quantifier2)) 
spatial-operator :: = left-of I equal I between I in I above \] 
near I nolle 
AS in the case of'time, relative (topological) space specifications will 
predominate in texts. 
4.9. Slot Operators. 
quantifierl ::= nil\] all\[ any I most I many t some \] few \[ 1 I 2 \[ ... 
quantifier2 :: = nil I hardly I half I almost I completely 
4.10. Modality. 
modality :: = ('modality' modality-set) 
modality-set :: = real I desirable \] undesirable I conditional I 
possible I impossible I necessary \] nil 
4.11. Thematic Information 
focus :: = ('given' 
('object' obj) l 
('event' event) I 
('clause' clause) I 
('quantifier' event-quantifier I quantifier)) 
631 
(' new' 
('object' obj) I 
('event' even0 \[ 
('clause' clause) \[ 
('quantifier' event-quantifier I quantifier)) 
The thematic information, together with the discourse structure and 
speech act information, explicitly represents the rhetorical force of SL 
texts. The lack of this type of knowledge led many MT researchers to 
declare that SL traces are necessary in the internal representation. The 
above information inay prove sufficient for abandoning that requirement. 
4.12. Discourse Structure. 
discourse-structure :: = (discourse-structure-type 
(clausel clause-n I sentence I text) I 
(clause-n I sentence I text clausel)* ) 
discourse-structure-type :: = none l tempi equiv I +expan \[ -expan \[ 
condi I + simil I -simil I choice 
For a more detailed description of the discourse cchesion markers in 
TRANSLATOR see Tucker et al., 1986. 
A clause may be connected discourse-wise not only with another 
clause but also with a sentence, a paragraph or even a whole text; also 
note that discourse structure assigns the given clause as one of the two 
arguments in the discourse structure; one clause can be an argument in 
more than one discourse-structure expression. 
4.13. Speech Act. 
speech-act :: = ('speech-act' 
('type' speech-process) 
('direct?' yes I no) 
('speaker' object) 
('hearer' object+) 
('time' time) 
('space' space)) 
Every IL sentence features a speech act, irrespective of whether it 
was overtly mentioned in the SL text. If it was, it is represented through a 
token of a speech process. Otherwise, it is inferred. The time and.space of 
the speech act can be quite different from that of the proposition which is 
the information transferred through this speech act. 
4.14. Other Slots and Slot Fillers. 
negation :: = boolean-set 
referent-set :: = nil I above \] below I object-token 
manner-set :: = nil I difficulty I attitude 
difficulty :: = nil I easily I --. I difficultly 
attitude :: = nil \[ caring \[ ... I nonchalantly 
phase-set :: = nil I static I beginning I end 
5. Conclusion 
This paper suggested an approach to conceptual representation of a 
text in a natural language for the purposes of translation. An important 
distinction has been maintained between the representation of descriptions 
and assertions. We even suggested two different representation languages, 
DIL and TIL for the two tasks. 
The next task in the project is to actually implement the procedures 
for analysis, inference making and synthesis. One crucial prerequisite for 
that is the compilation of a substantial knowledge base (IL dictionary) for 
the subworld of computers. Now that the structure of IL has been speci- 
fied, we can actually do it. Strategies and aids for uniform and computer- 
aided knowledge acquisition are being developed. 
Acknowledgement. The authors wish to thank Irene Nirenburg for reading, 
discussing and criticizing the numerous successive versions of the 
manuscript. Needless to say, it's we who are to blame for the remaining 
errors. 
References. 

Brachman, R. and J. Schmoltze 1985. An overview on the KL-ONE 
knowledge representation system. Cognitive Science, vol. 9, issue 2. 
Carbonell, J. and M. Tomita 1985. New approaches to machine transla- 
tion. In: S. Nirenburg (ed.). 

Johnson, R. and P. Whitelock 1985. Machine translation as an expert 
task. In: S. Nirenburg (ed.). 

Nirenburg, S. (ed.), Proceedings of the Conference on Theoretical and 
Methodological Issues in Machine Translation. Hamilton, NY, August 
1985. 

Nirenburg, S. and J. Brolio (in preparation). A parsing strategy for 
knowledge-based machine translation. 

Nirenburg, S., V. Raskin and A. Tucker, Interlingua design in TRANSLA- 
TOR. In: S. Nirenburg (ed.) 

Tucker, A., S. Nirenburg and V. Raskin, Discourse, cohesion and 
semantics of expository text (this volume). COLING 1986
