Interaction of Knowledge Sources in a 
Portable Natural Language Interface 
Carole D. Hafner 
Computer Science Department 
General Motors Research Laboratories 
Warren, MI 48090 
Abstract 
This paper describes a general approach to the 
design of natural language interfaces that has 
evolved during the development of DATALOG, an Eng- 
lish database query system based on Cascaded ATN 
grammar. By providing separate representation 
schemes for linguistic knowledge, general world 
knowledge, and application domain knowledge, 
DATALOG achieves a high degree of portability and 
extendability. 
1. Introduction 
An area of continuing interest and challenge in 
computational linguistics is the development of 
techniques for building portable natural language 
(NL) interfaces (See, for example, \[9,3,12\]). The 
investigation of this problem has led to several 
NL systemS, including TEAM \[7\], IRUS \[i\], and 
INTELLECT \[10\], which separate domain-dependent 
information from other, more general capabilities, 
and thus have the ability to be transported from 
one application to another. However, it is impor- 
tant to realize that the domain-independent por- 
tions of such systems constrain both the form and 
the content of the domain-dependent portions. 
Thus, in order to understand a system's capabili- 
ties, one must have a clear picture of the struc- 
ture of interaction among these modules. 
This paper describes a general approach to the 
design of NL interfaces, focusing on the structure 
of interaction among the components of a portable 
NL system. The approach has evolved during the 
development of DATALOG (for "database dialogue") 
an esperimental system that accepts a wide variety 
of English queries and commands and retreives the 
answer from the user's database. If no items sat- 
isfy the user's request, DATALOG gives an informa- 
tive response explaining what part of the query 
could not be satisfied. (Generation of responses 
in DATALOG is described in another report \[6\].) 
Although DATALOG is primarily a testbed for 
research, it has been applied to several demon- 
stration databases ~nd one "real" database con- 
taining descriptions and rental information for 
more than 500 computer hardware units. 
The portability of DATALOG is based on the 
independent specification of three kinds of knowl- 
edge that such a system must have: a linguistic 
grammar of English; a general semantic model of 
database objects and relationships; and a domain 
model representing the particular concepts of the 
application domain. After giving a brief overview 
of the architecture of DATALOG, the remainder of 
the paper will focus on the interactions among the 
components of the system, first describing the 
interaction between syntax and semantics, and then 
the interaction between general knowledge and 
domain knowledge. 
2. Overview of DATALOG Architecture 
The architecture of DATALOG is based on Cas- 
caded ATN grammar, a general approach to the 
design of language processors which is an exten- 
sion of Augmented Transition Network grammar \[13\]. 
The Cascaded ATN approach to NL processing was 
first developed in the RUS parser \[2\] and was for- 
mally characterized by Woods \[14\]. Figure 1 shows 
the architecture of a Cascaded ATN for NL process- 
ing: the syntactic-and semantic components are 
implemented as separate processes which operate in 
parallel, communicating information back and 
forth. This communication (represented by the 
"interface" portions of the diagram) allows a lin- 
guistic ATN grammar to interact with a semantic 
processor, creating a conceptual representation of 
the input in a step-by-step manner and rejecting 
semantically incorrect analyses at an early stage. 
DATALOG extends the architecture shown in Fig- 
ure 1 in the direction of increased portability, 
by dividing semantics into two parts (see Figure 
2). A "general" semantic processor based on the 
relational model of data \[5\] interprets a wide 
variety of information requests applied to 
input 
ATN 
GRAMMAR 
interface 
) 
combined syntactic/ 
semantic analysis 
interface 
SEMANTICS 
Figure i. Cascaded Architecture for 
Natural Language Processing 
57 
ATN 
input combined syntactic/ 
semantic analysis 
interface 
1 
Figure 2. Architecture Of DATALOG 
abstract database objects. This level of knowl- 
edge is equivalent to what Hendrix has labelled 
"pragmatic grammar" \[9\]. Domain knowledge is rep- 
resented in a semantic network, which encodes the 
conceptual structure of the user's database. 
These two levels of knowledge representation ar~ 
linked together, as described in Section 4 below. 
The output of the cascaded ATN grammar is a 
combined linguistic and conceptual representation 
of the query (see Figure 3), which includes a 
"SEMANTICS" component along with the usual lin- 
guistic constituents in the interpretation of each 
phrase. 
3. Interaction of Syntax and Semantics 
The DATALOG interface between syntax and seman- 
tics is a simplification of the RUS approach, 
which has been described in detail elsewhere \[ii\]. 
The linguistic portion of the interface is imple- 
Pushing for Noun Phrase. 
ASSIGN Actions : 
employee 
employee 
employee 
employee 
Popping Noun Phrase: 
(NP 
(DET (the)) 
mented by adding a new arc action called "ASSIGN" 
to the ATN model of grammar. ASSIGN communicates 
partial linguistic analyses to a semantic inter- 
preter, which incrementally creates a conceptual 
representation of the input. If an assignment is 
nonsensical or incompatible with previous assign- 
ments, the semantic interpreter can reject the 
assignment, causing the parser to back up and try 
another path through the grammar. 
In DATALOG, ASSIGN is a function of three argu- 
ments: the BEAD of the current clause or phrase, 
the CONSTITUENT which is being added to the inter- 
pretation of the phrase, and the SYNTACTIC SLOT 
which the constituent occupies. As a simplified 
example, an ATN gram, mr might process noun phrases 
by "collecting" determiners, numbers, superlatives 
and other pre-modifiers in registers until the 
head noun is found. Then the head is assigned to 
the NPHEAD slot; the pre-modifiers are assigned 
(in reverse order) to the NPPREMOD slot; superla- 
tives are assigned to the SUPER slot; and numbers 
are assigned to the NUMBER slot. Finally, the 
determiners are assigned to the DETERMINER slot. 
If all of these assignments are acceptable to the 
s~m~ntic interpreter, an interpretation is con- 
structed for the "base noun phrase", and the par- 
ser can then begin to process the noun phrase 
post-modifiers. Figure 3 illustrates the inter- 
pretation of "the tallest female employee", 
according to this scheme. A more detailed 
description of how DATALOG constructs interpreta- 
tions is contained in another report \[8\]. 
During parsing, semantic information is col- 
lected in "semantic" registers, which are inacces- 
sible (by convention) to the grammar. This con- 
vention ensures the generality of the grammar; 
although the linguistic component (through the 
assignment mechanism) controls the information 
that is passed to the semantic interpreter, the 
only information that flows back to the grazm~ar is 
CONSTITUENT SYNTACTIC SLOT 
employee NPHEAD 
(AMOD female) NPPREMOD 
(ADJp SUPER 
(ADV most) 
(ADJ tall)) 
(the) DET 
(PREMODS ((ADJP (ADV most) (ADJ tall)) (AMOD female)) 
(HEAD employee) 
(SEMANTICS 
(ENTITY 
(Q nil) 
(KIND employee) 
(RESTRICTIONS ( 
((ATT sex) (RELOP ISA) (VALUE female)) 
((ATT height) (RANKOP MOST) (CUTOFF i)) ))))) 
Figure 3. Interpretation of "the tallest female employee". 
58 
the acceptance or rejection of each assignment. 
When the grammar builds a constituent structure 
for a phrase or clause, it includes an extra con- 
stituent called "SEMANTICS", which it takes from a 
semantic register. However, generality of the 
grammar is maintained by forbidding the gra~mmar to 
examine the contents of the SEMANTICS constituent. 
4. Interaction of General and Application 
Semantics 
The semantic interpreter is divided into two 
levels: a "lower-level" semantic network repre- 
senting the objects and relationships in the 
application domain; and a "higher-level" network 
representing general knowledge about database 
structures, data analysis, and information 
requests. Each node of the domain network, in 
addition to its links with other domain concepts, 
has a "hook" attaching it to the higher-level con- 
cept of which it is an instance. Semantic proce- 
dures are also attached to the higher-level con- 
cepts; in this way, domain concepts are indirectly 
linked to the semantic procedores that are used to 
interpret them. 
Figure ¢ illustrates the relationship between 
the general concepts of DATALOG and the domain 
semantic network of a personnel application. 
Domain concepts such as "female" and "dollar" are 
attached to general concepts such as /SUBCLASS/ 
and /UNIT/. (The higher-level concepts are delim- 
ited by slash "/" characters.) When a phrase such 
as "40000 dollars" is analyzed, the semantic 
procedures for the general concept ,'b~::T/ are 
invoked to interpret it. 
The general concepts also organized ~nto a net- 
work, which supports inheritance of s~msntic pro- 
cedures. For example, two of the general concepts 
in DATALOG are /ATTR/, which can represent any 
attribute in the database, and /NUMATTR/, which 
represents numeric attributes such as "salary" and 
"age". Since /ATTR/ is the parent of /NUMATTR/ in 
the general concept network, its semantic proce- 
dures are automatically invoked when required dur- 
ing interpretation of a phrase whose head is a 
numeric attribute. This occurs whenever no 
/NUMATTR/ procedure exists for a given syntactic 
slot; thus, sub-concepts can be defined by specif- 
ying only those cases where their interpretations 
differ from the parent. 
Figure 5 shows the same diagram as Figure 4, 
with concepts from the computer hardware database 
substituted for personnel concepts. This illus- 
trates how the semantic procedures that inter- 
preted personnel queries can be easily transported 
to a different domain. 
5. Conclusions 
The general approach we have taken to defining 
the inter-component interactions in DATALOG has 
led to a high degree of extendability. We have 
been able to add new sub-networks to the grammar 
without making any changes in the semantic inter- 
preter, producing correct interpretations (and 
correct answers from the database) on the first 
try. We have also been able to implement new gen- 
eral semantic processes without modifying the 
grammar, taking advantage of the "conceptual fac- 
toring" \[14\] which is one of the benefits of the 
Cascaded ATN approach. 
The use of a two-level semantic model is an 
experimental approach that further adds to the 
portability of a Cascaded ATN grammar. By repre- 
senting application concepts in an "epistemologi- 
cal" s~m~ntic network with a restricted set of 
primitive links (see Brao~un \[4\]), the task of 
building a new application of DATALOG is reduced 
to defining the nodes and connectivity of this 
network and the synonyms for the concepts repre- 
Which female Ph.D.s earn more than 40000 dollars 
female ' male Ph.D. masters earn 
i, Sex \] I-degree i I I salary i 
I  ploy-i 
Figure 4. Interaction of Domain and General Knowledge 
'59 
sented by the nodes. Martin et. al. \[12\] define a 
transportable NL interface as one that can acquire 
a new domain model by interacting with a human 
database expert. Although DATALOG does not yet 
have such a capability, the two-level semantic 
model provides a foundation for it. 
DATALOG is still under active development, and 
current research activities are focused on two 
problem areas: extending the two-level semantic 
model to handle more complex databases, and inte- 
grating a pragmatic component for handling ana- 
phora and other dialogue-level phenomena into the 
Cascaded ATN grammar. 
1. 
6. References 
Bates, M. and Bobrow, R. J., "Information 
Retrieval Using a Transportable Natural Lan- 
guage Interface." In Research and Development 
in Information Retrieval: Proc. Sixth Annual 
International ACM SIGIR Conf., Bathesda MD, 
pp. 81-86 (1983). 
2. Bobrow, R. "The RUS System." In "Research in 
Natural Language Understanding," BBN Report 
No. 3878. Cambridge, MA: Bolt Beranek and 
Newman Inc. (1978). 
3. Bobrow, R. and Webber, B. L., "Knowledge Rep- 
resentation for Syntactic/Semantic Process- 
ing." In Proc. of the First Annual National 
Conf. o.nn Artificial Intelligence, Palo Alto 
CA, pp. 316-323 (1980). 
4. Brachman, R. 3., "On the Epistemological Sta- 
tus of Semantic Networks." In Associative Net- 
works: Representation and Use of Knowledge by 
Computers, pp. 3-50. Edited by N. Y. Findler, 
New York NY (1979). 
5. Codd, E. F. "A Relational Model of Data for 
Large Shared Data Banks." Communications of 
th_.~e ACM, Vol. 13, No. 6, pp.377-387 (1970). 
6. Godden, K. S., "Categorizing Natural Language 
Queries for Int~lllgent Responses." Research 
Publication 4839, General Motors Research Lab- 
oratories, Warren MI (1984). 
7. Grosz, B. J., "TEAM: A Transportable Natural 
Language Interface System." In Proc. Conf. on 
Applied Natural Language Processing, Santa 
Monica CA, pp. 39-45 (1983). 
8. Hafner, C. D. and Godden, K. S., "Design of 
Natural Language Interfaces: A Case Study." 
Research Publication 4587, General Motors 
Research Laboratories, Warren MI (1984). 
9. Hendrix, G. G. and Lewis, W. H., "Transporta- 
ble Natural Language Interfaces to Data." 
Proc. 19th Annual Meeting of theAssoc, fo__~r 
Computational Linguistics, 5tanford CA, pp. 
159-165 (1981). 
10. INTELLECT Query System User's Guide, 2nd. Edi- 
tion. Newton Centre, MA: Artificial Intelli- 
gence Corp. (1980). 
11. Mark, W. S. and Barton, G. E., "The RUSGRAMMAR 
Parsing System." Research Publication 
GMR-3243. Warren, MI: General Motors Research 
Laboratories (1980). 
12. Martin, P., Appelt, D., and Pereira, F., 
"Transportability and Generality in a Natural- 
Language Interface System." In Proc. Eight 
International Joint Conf. on Artificial Intel- 
ligence, Karlsruhe, West Germany (1983). 
13. Woods, W. "Transition Network Grammars for 
Natural Language Analysis." Cowmunications of 
the ACM, Vol. 13, No. 10, pp. 591-606 (1970). 
14. WOodS, W., "Cascaded ATN Gra/~nars." American 
Journal of Computational Linguistics," Vol. 6, 
No. 1, pp. 1-12 (1980). 
Which IBM terminals weigh more than 70 pounds 
val o val_of verb o ' unit of 
/ 
Figure 5. Figure 4 Transported to a New Domain 
60 
