Design Tool Combining Keyword Analyzer and Case-based Parser for 
Developing Natural Language Database Interfaces 
Hideo Shimazu Seigo Arita Yosuke Takashima 
C&C Information Technology Research Laboratories 
NEC Corporation 
4-1-1 Miyazaki, Miyamae-ku Kawa.saki, Japan, 216 
shimazu%joke.cl.nec.co.jp@uunet.uu.net 
ABSTRACT 
We have designed and experimentally implemented a 
tool for developing a natural language systems that 
can accept extra-grammatical expressions, keyword 
sequences, and linguistic fragments, as well as ordi 
nary natural language queries. The key to this tool's 
efficiency is its effective use of a simple keyword an- 
alyzer in combination with a conventional case-based 
parser. TILe keyword analyzer performs a majority of 
those queries which are simple data retrievals. Since 
it uses only keywords in any qnery, this analyzer is 
robust with regard to extra-grammatical expressions. 
Since little labor is required of the application de- 
signer in using the keyword analyzer portion of the 
tool, and since the case-based parser processes only 
those queries which the keyword analyzer fails to in- 
terpret, total labor required of the designer is less 
than that for a tool which employs a conventional 
case-based parser alone. 
1 Introduction 
As the number of commercial on-line databases in- 
creases, so does user need for pragmatic natural lan- 
guage (NL) interface for communicating with tbose 
databases. Case-based parsing is an effective ap- 
proach to constructing NL interfaces to databases \[1\] 
\[5\] \[7\] \[11\]. A standard case-based parser consists ba- 
sically of a pattern marcher and a case base which 
stores a large number of linguistic pa~tern-concept 
pairs. In response to a new input query, the pat- 
tern matcher searches the case base for any matching 
linguistic patterns. If one is found, its concept portion 
is output as a semantic representation of the given in- 
put query. Though case-based parsing makes it easy 
to construct domain dependent NL interfaces, it has 
several serious drawbacks: 
• The application designer who uses it must define 
all possible linguistic patterns. 
• The application designer must also define a con- 
cept portion to correspond to each defined lin- 
guistic pattern. 
• Since such pattern-concept definitions will be 
highly dependent oo tile nature of the specific 
application, they must bc newly defined for each 
target system. 
In this paper, we propose a novel NL interface model, 
CAPIT (Cooperative Analyzer and Parser as Inter- 
face "Fool). It is a self-contained NL interface build- 
ing tool for relational-like databases, and it integrates 
NL processing mechanisms with the mechanism used 
for the incremental acquisition of knowledge needed 
in that NL processing. CAPIT combines a sim- 
ple keyword analyzer, KBP(Keyword-Based Parsing 
module), with a case-based parser, CBP(Case-Based 
Parsing module). KBP extracts only keywords from 
an input sentence, and constructs a meaning for the 
sentence from them. Since NL queries to on-line 
databases tend to be simple and straightforward, 
KIqP can interpret a majority of those queries. How- 
ever, because it constructs the meaning only from 
the keywords, KBP sometimes fails to interpret them. 
The ease-based parser (CBP) is a supplemental mod- 
ule to KBP. CBP is a conventional case-based parser. 
It consists of a pattern matcher and a case base. Lin- 
guistic pattern-concept pairs are stored in the case 
base. CBP must process only those queries which 
KBP fails to interpret correctly. Since an applica- 
tion designer do not have to define all the possible 
linguistic patterns, his/her labor required to define 
linguistic pattern-concept pairs is less than that for a 
tool which employs a conventional case-based parser 
alone. 
AcrEs DE COLING-92, NAtCrES, 23-28 Aot';r 1992 7 3 5 PROC. OF COLING-92, NAh'rEs, AUG. 23-28. 1992 
Input Sentence (Corpus) 
! 
Step-1 \[ Add Semantic 
i Categery, Pattern, Case-Based and Mapping 
Parser Definitions 
(CBP) 
Step-2 steP-3\[ u ..... he~ 
Partially Matched 
Fl\]lly Matche~ \[ Application 
Keyword Desiqner 
Analyzer ( KBP ) 
\]Pattern Definition\] 
~Correct ? ~ Interviewer 1-- 
_ ~ I (Pp~l 1 
Step-5 ~ 
Figured: CAPIT Flow 
We analyzed KBP's interpretation failures, and cat- 
egorized the types of KBP's interpretation failures. 
We regard defining pattern-concept pairs for CBP 
as repairs of KBP's interpretation failures. We de- 
fined four repair types which are corresponding to 
KBP's typical interpretation failures. When all appli- 
cation designer encounters KBP's interpretation fail- 
ure, he/she analyzes it, then selects the best and eas- 
iest repair type. Such a repair task is accomplished 
interactively between the application designer and the 
Pattern Definition Interviewer module (PDI). 
2 CAPIT Flow 
We have been collecting Japanese corpora which un- 
trained users typed from computer terminals in order 
to access on-line databases. We found that the large 
part of the corpora arc "Pass me salt" like simple 
data retrievals front databases. Many sentences have 
simple grammatical or extra-grammatical structures. 
Complex linguistic patterns are very rare. One ex- 
treme example is just a sequence of keywords like, 
"Dynamic Memory author", instead of asking "Who 
is the author of the book titled Dynamic Memory?". 
We hypothesized that the processing mechanism for 
such simple expressions is different front a process- 
ing mechanism for grammatical expressions, The two 
parsing module structure of CAPIT reflects this hy- 
pothesis. 
Figure-1 describes the flow of CAPIT. First, the ap- 
plication designer who develops a NL interface us- 
ing CAPIT collects the corpora of users' queries in 
the target domain. A query of tile collected cor- 
pora is given to CAP1T one by one. The case-based 
parser (CBP) tries to interpret the sentence (Step- 
1). If CBP finds a fully matched linguistic pattern in 
its case base, the corresponding concept is output as 
the meaning for the input sentence (Step-2). If CBP 
can not find any matching pattern, ttle NL query is 
passed to the keyword-bascd parsing module (KBP). 
If CBP finds a pattern which matches with a part of 
tile query in its case base, CBP replaces the matched 
part of the NL query with ttle corresponding concept, 
then passes the modified NL query to KBP (Step-3). 
KBP extracts only keywords from the query, and con- 
structs its meaning (Step-4). KBP always constructs 
the meaning for a given sentence. 
The meaning generated by CBP and/or KBP, is 
ACRES DE COLING-92, NAhq'ES, 23-28 hOOI 1992 7 3 6 PRoc. OF COLING-92, NANTES, AUG. 23-28, 1992 
h "book","title","named .... published" 
fleld-name Index field-name ir~dex 
Title Author Publisher Date 
Dynamic Memory Schank Cambridge 1983 U. Pr 
Society of Mind O S&S 1985 
I \[ield-v~lue index 
\["the fathe, r of AI","Minsky"\[ 
Table-1 : A Database Example 
Price page 
$15 240 
$20 339 
shown to the application designer. Tile application 
designer judges whether or not the interpretation is 
correct (Step-5). If it is correct, the examination us- 
ing tbis NL query finishes, mid the next NL query is 
taken from the corpora for the next examination. If 
it is not correct, the Pattern Definition Interviewer 
module (PI)I) is activated. PDI asks the applica- 
tion designer for the correct interpretation of the NL 
query. He/she defines linguistic patterns and/or se- 
mantic concepts and/or the mappings between lin- 
guistic patterns and semantic concepts for the NL 
query (Step-6). The new definition is stored in KBP's 
knowledge base mid/or CBP's case base. Next time 
CAPIT encounters the same query or similar queries 
to tile query, it succeeds in interpreting the queries 
correctly. 
After numbers of such examinations, CBP's case base 
becomes rich, and tile NL interface application can be 
released. 
3 KBP Mechanism 
This section describes the KBP mechanism, using 
a simple example. Table-1 shows a simple CAPIT 
target database example. Linguistic patterns are at- 
tached as indices whicb refer to specific fields and the 
values of specific fields of records in tile table. For 
example, the indices to the "Title" field are "book", 
"title", "book name", "named", etc. We call an index 
to a field name field-name index. An index attached 
to the value of a field of a record is called field-value 
index. For example, "the father of AI" is a field-value 
index to "Minsky" which is the value of tile "Au- 
thor" field in a specific record. Values of each field of 
each record is itself a field-value index. For example, 
"1983" is a field-value index to the value of "Date" 
field in a record. Field-name indices and field-value 
indices are stored in KBP's knowledge base. 
KBP always regards the meaning for a given NL 
query a~s an imperative, "Select records in s table 
which satisfy specific conditions, and return the value 
of the requested fields from the selected records". Tile 
imperative is represented in SQL: 
SELECT field-k, field-l, ... 
FROM target table 
WHERE field-i = value-i, 
field-j = value-j ....... ; 
The KBP algorithm to generate the SQL expression 
from a NL query is as follows: 
l. KBP extracts only field-name indices and field- 
value indices from a given NL query. The rest of 
tile NL query arc abandoncd. 
2. When a field-name index is extracted, its refer- 
ring field name is kept a.s a SELECT-clause ele- 
nlent. 
3. When a field-value index is extracted, its refer- 
ring field value and the field name of the field 
value are kept as a WlIERE-clause element, in 
tile form of (field name = field value). 
4. After all extracted indices are processed, all 
SELECT-clause elements and WHERE-clause 
elements are merged. Then, they are assigned 
into a SELECT-FROM-WlIERE structure. 
Next, we explain this algorithm, using a NL query 
example. 
AcrEs DE COLING-92, NAMES, 23-28 AoOr 1992 7 3 7 PRec. or COL1NG-92. NArcrEs. And. 23-28. 1992 
SI: "Show me the books published by S&S". 
KBP extracts only "book", "published" and "S&S" 
from $1. "Book" is a field-name index to tile "Title" 
field. "Published" is a field-name index to the "Pub- 
lisher" field. Since "S&S" is a field-value index to the 
value of the "Publisher" field, the WHERE-clause cle- 
ment, (Publisher = S&S) is kept. From these indices, 
the following SQL command is generated: 
SELECT Title, Publisher 
FROM Table- 1 
WHERE Publisher = S&S; 
The SQL command is evaluated, and its answer is re- 
turned. The answer is "Society of Mind" and "S&S". 
They are the reply to the above query. 
The actual KBP has several heuristic rules to se- 
lect SELECT-clause elements and WHERE-clause el- 
ements. For example, the right answer to $1 is just 
"Society of Mind". "S&S" must not be produced. 
With the actual KBP, a heuristic rule suppresses the 
production of "S&S" in the above example. 
Though the actual KBP is more complex than this 
simple explanation, it is still very simple \[2\]. Since 
KBP constructs a query meaning from only keywords 
in a NL query, it can treat extra-grammatical expres- 
sions, keyword sequences and linguistic fragments, in 
the same way as treating ordinary natural language 
queries. For example, even the following strange 
queries on Tabled are acceptable by KBP; "Publish- 
ers?", "Dynamic Memory author", "When the book 
named Society of Mind appear?", "Society of Mind, 
how much", etc. 
4 The Role of CBP 
4.1 The Situations KBP Fails to In- 
terpret 
KBP can perform a majority of those queries which 
are simple data retrievals. So, in what kind of situa- 
tions does KBP fail to interpret? CBP processes only 
those queries which KBP fails to interpret. The ap- 
plication designer must define pattern-concept pairs 
which CBP uses to interpret such queries. Therefore, 
we have to know the limitations of KBP's interpre- 
tation capability. The followings are KBP's typical 
failure cases. 
Failure-1 Cases an application designer forgot to 
define necessary pattcrns as indices: 
If a necessary linguistic pattern is not defined as ei- 
ther field-name index or field-value index, KBP can 
not interpret concerning NL queries correctly. 
Failure-2 Cases a NL query includes idiomatic ex- 
pressions or spatial expressions: 
KBP can not generate correct meanings, if idiomatic 
expressions like "greater than 10ft', or spatial expres- 
sions like "the switch between A and B" are included 
in a NL query. 
Failure-3 Cases the meaning for a NL query is not 
represented in tile form of SELECT-FROM-WHERE: 
KBP assumes that any NL query is translated into a 
SELECT-FROM-WHERE structure. If a NL query 
has a different SQL structure, like SELECT-FROM- 
GROUP BY-tIAVING, KBP can not generate a cor- 
rect meaning. For example, a NL query like "Select 
author and its amount which is bigger than 1000" are 
represented with the SELECT-FROM-GROUP BY- 
I1AVING structure. 
Failure-4 Cases the meaning for a NL query can 
not be represented in SQL language: 
If a NL query is a meta-level question for the target 
database, like "What kind of information can I get 
from this?", KBP can not interpret it. 
Failure-5 Cases KBP generates many candidate in- 
terpretations of a NL query: 
Since KBP generates tile meaning for a NL query us- 
ing onty keywords in the query, it sometimes gener- 
ates not only a correct meaning but also wrong mean- 
ings. \['or examptc, KBP generates several different 
meanings from the following query; "Show me the 
publisher of the book titled L.A.". 
In order to avoid these KBP's failures, when KBP en- 
counters these failures, the application designer must 
repair the failures, by enriching and modifying either 
KBP's knowledge base and/or CBP's case base. Such 
a failure-repair mechanism is analogous to those of 
case-based reasoning \[6\] \[8\]. 
4.2 Repairs of KBP's Failures 
There are four repair types of the KBP's failures. 
Three of the four are realized by defining a new 
linguistic pattern-concept pairs in CBP's case base. 
Failure-5 is solved by either of the four types. 
Repair-1 To define a linguistic pattern as either a 
field-name index or a field-value index: 
Ac'I'~ DE COLING-92, NANTEs, 23-28 AoIYr 1992 7 3 8 Prtoc. OF COLING-92, NANTES, AUG. 23-28, 1992 
Figure-2: I,inguistic Pattern-SQL Pair in CBP for 
Repair-3 
This is corresponding to Failure-l, and is the easiest 
of the four repmr types. 
Repair-2 To define a pattern-concept pair, where 
the concept part is represented as SELECT-clause el- 
ements and/or WHEH.E-clause elements: 
This is corresponding to Fuihtre-2. This is usefill to 
define idiomatic expressions or spatial expressions. 
Suppose that KBP could not interpret a NL query 
which included an expression, "price is more than 
$100, and less than $200". The aPl)lieation designer 
judges that the part of the query mnst be defined as 
a pattern-concept pair. Then, he/she defines a new 
pattern-concept pair: 
\[Definition- 1\] 
If a pattern sequence is: 
\[ "fiekl-nanm(Field), 1 {Field i~typc-of numerical}, ~ 
more than, number(N1), le~s thmt, number(N2)" 1, 
do the followings: 
(1) to kee l) a field name, "Field", ,as a SELECT- 
clause element, and 
(2) to keep an expression, " Fiekl > N1, Field < N2", 
as a WHERE-clause element. 
This definition means selecting records whose "Field" 
has the value more than N1 and less than N2, and 
returning the value of "Field" of the .selected records. 
Repair-3 '1"o define a pattern-concept pair, where 
the concept part is represented as an SQL expression 
which is not SELECT-FROM-WHERE: 
This is corresponding to Failure-3. The application 
IA terliu starting with a capital letter is a variable. 
2An expression tlurrounded by a pair of brace ({ ta*d )) is a 
constraint to be satisfied. It ia a meta~level description, al~d is 
not regalx|ed as a Imrt of pattern aequellce. 
17::::27: 
Figure 3: Linguistic Pattern-Semantic Concept Pair 
in CBI' for \]b~pair-4 
designer nmst enumeratively detine a new SQL struc- 
ture corresponding to a given linguistic pattern (See 
Figure-2). 
Repair-4 'fb define a pattern-concept pair, where 
the concept is represented im u senlantic concept 
which is a recta-level expression for the target 
database and can not be detined as an SQI, form: 
This is corresponding to Failure-4. CAPIT provides a 
frame-like tanguage to deline semantic concepts. The 
application designer detincs a new scm~mtic eonccl)t 
using the language, lie/she also defines a reply gem 
eration procedure. The procedure is called when the 
corresponding linguistic pattern is matched with an 
input qucry (See Figure-3). 
Repair-4 is tile most dilficult of all repair types for 
an apl)tieation designer. In Repair-d, he/she must 
dctine not only a new semantic concept, but al.qo 
the definitions of slots in the semantic cnncept, the 
procedures which fill the slots, the relations between 
the new semantic concept with existing other sentan- 
tic coucepts~ various constraiuts anlong concepts, etc. 
lIowever, relnember that he/she must carry out such 
eoml)licated tasks to all possible linguistic patterns in 
his/her target domain, if he/she uses the case-based 
parsing approach alone. 
5 Dialogue Example between 
PDI and an Application De- 
signer 
PDI (Pattern Definition interviewer) is CAPIT's 
interface to all application designer. A dialogue be- 
tween PDI and an application designer progresses as 
follows: 
1. PDI shows the application designer a NL query 
which both KBP and CBP have failed to inter- 
ACRES DE COLING-92, NAbrI'ES, 23-28 AO(.rr 1992 7 3 9 PROC. OF COL1NG-92, NAN'rES, AUG. 23-28, 1992 
Lir~uistic Pattern 
I why omissible (does) * exist I 
field-name index 
name function 
vcr-function-table 
Figure 4: The Repair in the Sample Dialogue 
pret. And, it asks him/her to define the correct 
interpretation to process the input NL query. 
2. The application designer analyzes tile reason 
why KBP failed to interpret the NL query. 
3. Tile application designer selects a repair type 
of the failure, and performs the repair. The 
definition is stored in either KBP's knowledge 
base or CBP's case base. Here, he/she can gen- 
eralize/modify the linguistic pattern, using lin- 
guistic pattern generalization/modification oper- 
ators \[10\]. 
4. PDI retries interpreting the NL query again, and 
asks the application designer whether or not the 
new interpretation is correct. If it is correct, the 
definition process of the NL query ends. If it is 
not correct, go back to 1. 
Next, we show a typical sample dialogue between 
PD1 and an application designer. The situation is 
that the application designer is developing a guid- 
ance system which can understand various natural 
language queries on a specific commercial VCR. The 
guidance system has an internal database containing 
data about the functions and the elements of tile spe- 
cific VCR. Each of them is represented its features 
in a record of the vet-function-table (Figure-4). The 
dialogue is an example of Failure-2 and Repair-2. In 
this example, KBP and CBP are cooperatively gen- 
erating the meaning for a given sentence. 
Suppose, CAPIT is trying to interpret a new input 
sentence, 
$2: "Why does PAUSE exist?" 
Since CBP finds no matching pattern, $2 is sent 
to KBP. KBP extracts keywords from the sentence. 
Then, KBP generates its meaning. The KBP's inter- 
pretation and its generating meaning is shown to the 
application designer. He/she rejects them. He/she 
defines a new linguistic pattern which matches with 
the part of $2, 
"why omissible(does) * exist?" 
as a field-name index to the "function" field of the 
target database (See Figure 4). Here, "omissible" is 
a linguistic pattern modification operator \[10\], and 
the special symbol, "*", ill a linguistic pattern, is 
a CAPIT's pattern definition notation, which means 
that it matches with any sequence of words. This 
definition means that the reason why a specific el- 
ement exists is described in the "function" field of 
its corresponding record. Aftcr tire designer defines 
tile repair of KBP's failure, PDI tries to interpret the 
same sentence again. This time, since CHP matches 
"why omissible(does) * exist" with a part of the $2 
sentence, CBP replaces tile matched part of tile $2 
sentence with its corresponding concept, that is the 
"function" field. As a result, the input sentence is 
transformed into, 
$2': "field-name(function) PAUSE ?". 
The transformed input sentence is passed to KBP. 
KBP extracts keywords from the input sentence. 
The extracted keywords are field-name(fimetion) and 
field-value(PAUSE). KBP generates a new SQL ex- 
pression, which is different from the previous one. 
The application designer judges if the new interpre- 
tation is right. 
\[PDI\] Next Sentence is: "Why does PAUSE exist?" 
\[CBP\]: Unmatched! 
\[KBP\]: Extract Keywords: 
"PAUSE" is field-value index of "name". 
\[KBP\]: Meaning: 
(SELECT * FROM vcr-function-table WHERE name = 
PAUSE) 
\[PDI\]: ANSWER: 
Its NAME is PAUSE. Its TYPE is SWITCII, ... 
\[PDI\]: CORRECT? - > no. 
\[PDI\]: Please define the correct interpretation. - > 
define-field-name-index ( 
\[why, omissible(does), *, exist\], 
field-name(function)). 
\[PD1\] Retry Sentence: "Wily does PAUSE exist? " 
\[CBP\]: Replaced to: 
\[field-name(function), PAUSE\] 
\[KBP\]: Extract Keywords: 
"PAUSE" is field-value index of "name". 
ACRES DE COLING-92, NANTES, 23-28 Ao(rr 1992 7 4 0 PROC. OF COLING-92, NANTES, AUO. 23-28, 1992 
\[KBP\]: Meaning: 
(SELECT flmction FROM vcr-function-table WIIERE 
name = PAUSE) 
\[PDI\]: ANSWER: 
Its FUNCTION is ... 
\[PDI\]: CORRE(?YF? - > yes. 
6 In Conclusion 
The proliferation of commercial on-line databases bas 
increased to demand for natural language interfaces 
that can be used by untrained people. Real world 
queries include not only fully grammatical expres- 
sions but also such abbreviated expressions as a se- 
quence of keywords, etc \[9\] \[3\]. U .... will not use a 
NL interface unless it can also interpret such queries, 
and CAPIT has that capability 
Speed is another important issue. Telephone charge 
and database access charge are based on time of use, 
and users require speed. Users will not use a NL in- 
terface unless its response time is fast enough. NI, 
interfaces designed with CAPIT are extremely fast. 
Users' queries are responded within a second. 
Ease of development and maintenance is also impor 
tant. CAPIT is a eombiuation of a keyword analyzer 
and a case-based parser. Since little labor is required 
of the application designer in using the keyword an- 
alyzer portion of the tool, and since the case-based 
parser processes only those queries whicb the keyword 
analyzer fails to interpret, total labor required of the 
designer is less than that for a tool which employs a 
conventional case-based parser alone. With CAPIT, 
it is possible to design an entirely new NL interface 
within a matter of weeks. 
guage", Technical Report CMU-CS-84-107, 
Dept. of Computer Science, CMU, 1984. 
\[4\] Cox, C.A., "ALANA Augmentable LANguage 
Analyzer", l~ep. UCB/CSD 86/283, 1986. 
\[5\] Hendrix, G.G., Saeerdoti, E.D., Sagalowicz, D., 
anti Slocum, J., "Developing a Natural Language 
Interface to Complex Data", In ACM Trans. on 
Database Systems, 1978. 
\[6\] Kolodner, J., "Retrieval and organizational 
strategies in conceptual memory: A computer 
model", ltillsdale, NJ.: Lawrence Erlbanm As- 
sociates, 1984. 
\[7\] Martin, C.E., "Cease-based Parsing", In 1n- 
side Case-based Reasoning edited by R. Schank 
and C. Riesbeck, Lawrence Erlbaum Associates, 
Ilillsdale, N J, 1989. 
\[8\] 1)~iesbeck, C.K., Schank, R.C., "Inside Case- 
based f~easoning", Lawrence Erlbaum Asso- 
ciates, |tillsdale, N J, 1989. 
\[9\] Sbneiderman, B., "Designing the User Inter- 
face", Addison-Wesley Pub., 1987. 
\[10\] Sbimazu, H. and Takashilna, Y., "Acquiring 
Knowledge for Natural Language Interpretation 
Based On Corpus Analysis", Proc. of IJCAI'91 
Natural Language Learning Workshop, 1991. 
\[11\] Wilensky, IL et. al., "UC - A Progress Report", 
Rep. UCB/CSD 87/303, 1986. 
References 
\[1\] Arens, Y., "CLUSTERS: An Approach to 
Contextual Language Understanding', Rep. 
UCB/CSD 86/293, Ph.D. Thesis, 1986. 
\[2\] Arita, S., Shimazu, \]\[1., Takashima, Y., "Sim- 
ple + Robust -- Pragmatic: A Natural Lan- 
guage Query Processing Model for Card-type 
Databases", Proc. of the 13th Annual Confer- 
ence of the Cognitive Science Society, 1992. 
\[3\] Carbonell, J.G., and Hayes, P.J., "Recov- 
ery strategies for parsing extragrammtical lan- 
AcrEs DE COLING-92, NANTES, 23-28 AOt~7' 1992 7 4 1 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 
