A Case Study of Natural Language Customisation: 
The Practical Effects of World Knowledge 
Maxilyn A. Walker Andrew L. Nelson Phil Stenton 
lyn@linc.cis.upenn.edu aln@hplb.hpl.hp.eom sps@hplb.hpl.hp.com 
University of Pennsylvania 
Computer Science Dept. 
200 S. 33rd St. 
Philadelphia, PA 19104 
Hewlett Packard Laboratories 
Personal Systems Lab 
Filton Rd., Stoke Gifford 
Bristol, BS12 6QZ, U.K. 
Abstract 
This paper proposes a methodology for the eustomisa- 
tion of natural language interfaces to information re- 
trieval applications. We report a field study in which 
we tested this methodology by customising a com- 
mercially available natural language system to a large 
database of sales and marketing information. We note 
that it was difficult to tailor the common sense reason- 
ing capabilities of the particular system we used to our 
application. This study validates aspects of the sug- 
gested methodology as well as providing insights that 
should inform the design of natural lauguage systems 
for this class of applications. 
1 Introduction 
It is commonly accepted that we understand discourse 
so well because we know so rauch\[5\]. Hobbs identifies 
two central research problems in understanding how 
people interpret discourse. We must characterise: (1) 
the knowledge that people have, and (2) the processes 
they use to deploy that knowledge. This includes speci- 
fying and constraining the inferential and retrieval pro- 
cesses that operate on what is known\[7\]. This problem 
is of practical interest for the design of various types 
of natural language interfaces (NLI's) that make use of 
different knowledge sources. 
The knowledge used by an NLI is often split into 
two types. DOMAIN-INDEPENDENT knowledge consists 
of grammatical rules and lexical definitions. It also in- 
cludes knowledge used for common-sense reasoning\[6\]. 
DOMAIN-DEPENDENT knowledge centres on modeling 
processes unique to the application task, or the partic- 
ular relations in the application database. The process 
of customising an NLI consists in adding the domain- 
dependent knowledge abont a particular application to 
the domain-independent knowledge that comes with 
the NLI\[4\]. Very little has been written about how 
this eustomisation is done. 
This paper results from a particular customisation 
effort in which we took a commercially available NLI 
and attempted to customise it to a large sales and mar- 
keting information database installed at a customer 
site. The application wa.s information retrieval for de- 
cision support. We suggest a particular method to 
be used in the customisation process and evaluate the 
success of this method. We note a number of prob- 
lems with using the domain independent knowledge 
provided with the NLI for our particular application. 
We al~ note eases where the inferential processes sup- 
ported by the NLI do not appear to be appropriately 
constrained. The application of this method leads to 
some general results about the process of customisa- 
tion, as well as .some specific insights regarding this 
type of application and the evaluation of an NLI. Sec- 
tion 2 describes the particular NLI and the application. 
Sections 3, 4, 5, 6 and 7 describe the methodology that 
we applied in our customisation effort. Section 8 de- 
scribes the results of testing the customisation. Fi- 
nally, Section 9 provides suggestions for customisers or 
designers of NLI's. 
2 NLI and Sales Application 
The database was a large and complex on-line sales 
database, containing information about orders, deliv- 
eries, brands, customer preferences, sales territories, 
promotions and competitors. There were 20-30 differ- 
ent types of records with over 200 views ranging over 
data summaries of 2-3 years. 
Our user group consisted of 50 managers, composed 
of accounts, brands, commercial and marketing man- 
agers, each with different data requirements. They 
fit the user profile recommended for NLI's\[8\]. They 
were relatively infrequent computer users, who were 
experts in the domain with at least one year's experi- 
ence. None knew anything about database languages. 
Some of them had used a previously installed NLI, In- 
tellect, as well as a menu-based interface that accessed 
AcrEs DE COLING-92, NANTES, 23-28 AOt~n" 1992 8 2 0 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 
tile name set of data 1. They required ad hoe access to 
information that was difficult to support with standard 
reports. 
The NLI we worked with was considered state of the 
art. It appeared to use a pipeline architecture con- 
sisting of morphological analysis, parser, semantic ill- 
terpretation, and database query translator. The se- 
mantic representation language was a hybrid of a se- 
mantic network and first order predicate logic, which 
supported time dependent facts, qnantified statements, 
tense information and general sets\[3\]. In addition, this 
NLI included a Generator that produced English from 
ttle semantic representation language, a Deductive Sys 
tern that reasoned about statements in the represen- 
tation language using forward and backward chain- 
ing, and which handled quantification, time dependent 
facts and truth maintenance. Aruoug the knowledge 
sources that came with the NLI was a Dictionary of 
10000 initial English words, and a set of Concepts that 
provided internal notions of predicates, and set and 
membership hierarchies. 
The semantic representation, concepts, and dictio- 
nary modules supported both intensional and exten- 
sional representation of data. In addition, users could 
add both new concepts and inference rules to tile sys 
tern with simple declarative sentences. 
3 Method 
Information sources for the customisation inchlded: 
the customisation manual, database scheina, NL tran- 
scripts of users accessing tile data in the database using 
the previous NLI, Intellect, and a test. suite of English 
sentences\[2\]. 
Our customisation metltod had four parts: 
1. NL transcript analysis 
2. Mapping NL terms onto all l';ntity-lLelation (FLR) 
diagram 
3. Constructing the customisation files 
4. Gencrating a test suite and testing the eustomisa- 
tion 
We restricted our efforts to implementing and testing 
coverage of a sub-part of the domain identified as im- 
portant through analysis of the NL transcripts, namely 
the deliveries subdomain 2. The important concepts are 
listed below and highlighted iu Figure 1. 
• 'Fhe Product Hierarchy : Markets, Sectors, 
Brands, etc. 
• The Customer IIierarchy : Corporations, Trading 
Companies, Concerns 
I In \[9\] we colnpalre tile menu system to Intellect. 
2T|te customislng team comprised two computational fin- 
gulsts, a computer scientiat aald two psychologists. 
o The Time llierarchy: Years, Book Months, Book 
Weeks 
• l)eliveric~: of Products to Cnstomers over Time 
PRODUCT 
HIERARC~dY 
IIIERARCIIY I IIILRARf21Y ½ 
I I f-1 
I ....... 1 I 3\[ ...... \] IIOOK MOral 1 COMPANY UNIT TYPIT. 
~__L .... _A L___ 
CONI'IARN BOOKWIZEK UN\] 
DELIVERY y 
Figure 1: Simplified model of tile Sales domain 
The following sections will discuss each aspect of the 
eustomisation procedure and the issues raised by each 
step of the method. 
4 Analysing the transcripts 
The N b transcripts consisted of every interaction with 
tile previuus NLI, Intellect, over a period of a year, 
by our user group, ascessing our target databasc. A 
detailed accotm t of the transcrillt analysis can be fouml 
ill \[9\]. llerc we focus on how the results atfected tile 
rest of the procedure. 
The transcripts showed that tile most important set 
of user queries were those about deliveries of the differ- 
ent levels of the t)roduct hierarchy to the different levels 
of the customer hierarchy. The transcripts also showed 
that over 30% of user errors were synouym errors or 
resulted from the use of term:~ to refer to concepts that 
were calculable from in~brmation in the database. We 
collected a list of all the unknown word errors from 
the Intellect installation. For example using the term 
wholesalers resulted ill an Intellect error, but it refers 
to a subset of trading companies with trade category of 
WSL. We didn't feel that the syntax of the transcripts 
was important since it reflected a degree of accommo- 
dation to Intellect, but the Intellect lexicon and the 
unknown word errors gave us a good basis for the re- 
quired lexical and conceptual coverage. In the absence 
of such information, a method to acqnire it, such as 
Wizard of Oz studies, would be necessary\[10, 1\]. 
ACRES DE COLING-92, NANTES, 23-28 AO(rl 1992 8 2 1 I)ROC. OF COLING-92, NANI'ES, AUO. 23-28, 1992 
5 Mapping NL terms onto an 
E-R diagram 
The steps we applied in this part of the proposed 
method are: (1) take the E-R diagram provided by the 
database designer at tim customer site as a conceptual 
representation of the domain, (2) associate each lexical 
item from the transcript analysis with either an entity 
or a relation, (3) Refine and expand the E-R diagram 
a~s neee.qsary. 
We started with a list of lexical items e.g. mar- 
kets, sectors, brands, deliver, pack size, date, corporate, 
trading concern, customer location, that were part of 
the Intellect lexicon or had appeared in the transcripts 
as unknown words. By placing these lexical items on 
the 1~1~. diagram we were able to sketch out tim map- 
ping between user terms and database concepts beforc 
cormnitting anything to the customisation files s. How- 
ew;r, we found mapping vocabnlary onto the E-R dia- 
gram to be rnore difficult than we had anticipated. 
First, a nmnber of words were ambiguous in that 
they could go in two different places on the E-R dia- 
gram, atul thus apparently refer to multiple concepts in 
the domain. This was most clearly demonstrated with 
certain generic terms such as customer. Customer can 
be used to refer to a relation at any level of the cuS- 
tomer hierarchy, the conceru, the trading company or 
the corporation. It can also be associated with the at- 
tribute of the customer reference number which is a key 
value in the 'Concern' database relation. 
Second, some words were based on relationships be- 
tween two entities, so they could have gone in two 
places. For instance market share is calculated frona in- 
formation associated with both the market entity and 
with the trade sector entity. Similarly, the term deity. 
cry refers to a relation between any level of the product 
hierarchy and any level of the customer hierarchy. Yet 
there was no entity that corresponded to a delivery, 
even though it was one of the main concepts in the 
domain. 
In both of these cases we created new entities to refer 
to concepts such as delivery and market share. We were 
then able to indicate links between these concepts and 
other related concepts ill the domain and could anno- 
tate these concepts with the relevant vocabulary items. 
In some cases it was difficult to determine whether a 
term should be a new entity. For instance the term 
wholesalers refers to members of the trading company 
entity with a particular value in the trade category at- 
tribute. However since trade category is not used in 
any other relation, it doesn't have a separate entity 
of its own. In this case we left wholesaler as a term 
associated with trading company. 
Third, in our lexicon there were operators or predi- 
3 In a perfect world, the NLI would target an ,E~R diagram and 
the snapping front the E~R diagram to the database would be aa 
independent aspect of the semantic modelling of the domain. 
cators such as less than, greater than, equal to, at least, 
change, decrease, latest estimate, over time, chart, 
graph, pie, during, without, across, display, earliest, 
available. These operators were domain independent 
operators; some of them were synonyms for functions 
that the system did support. Since these seem to be 
concepts perhaps related to the task, but not specific 
to the domain, for convenience we created a pseudo en- 
tity on the E-R diagram having to do with output and 
display concepts such as graphing, ranking, displaying 
information as a percentage etc. 
Finally, there were also terms for which there was no 
database information such as ingredients and journey, 
ambiguous terms such as take, get, accept, use, as well 
as terms that were about the database itself, such as 
database, information. For other terms such as earliest 
or available it was difficult to determine what domain 
concepts they should be associated with. 
tiowever, the benefits of this method were that once 
we had made the extensions to the E-R diagram, then 
all synonyms were clearly associated with the entities 
they referred to, words that could ambiguously refer to 
multiple concepts were obvious, and words for which a 
calculation had to be specified were apparent. We were 
also able to identify which concepts users had tried to 
access whiclt were not present in the domain. Once 
this was done the cnstomisation files were built incre- 
mentally over the restricted domain. 
6 Constructing the customisa- 
tion files 
The input to this part of the process was the annotated 
F~R diagram as well as the test suite. We chose not to 
use the menu system customisation tool that was part 
of the NLI 4. We preferred to use an interface in which 
declarative forms are specified in a file. 
As we developed the customisation file incrementally 
over the domain, we ensured that all the synonyms for 
a concept were specific(I, and thoroughly tested the 
system with each addition. This section discusses con- 
structing the customisation file. In section 7, we dis- 
cuss the test suite itself. The results are discussed in 
section 8. 
6.1 Grammatical and Conceptual In- 
formation 
The custvmiser's job is to link domain dependent 
knowledge about the application to domain indepen- 
dent knowledge about language and the world. Con- 
strutting a customisation file consisted of specifying 
a number of forms that would allow the NL1 to pro- 
tThe menu system was very large mad unwieldy with many 
levels, too many choices at each level, and a lack of clarity ~bout 
the ramifications of the choices. 
AcrEs D~ COLING-92, NArrrES, 23-28 ^o~t' 1992 8 2 2 Paoc. OF COL1NG-92, NANTES. AUG. 23-28. 1992 
ducea mapping between English words, database re- 
lations, attributes and values, and concepts used ill 
common sense reasoning by tile deductive conrponent 
of the NIA. 
A database relation, such as 'Deliveries', could \[lave 
nouns or verbs a~sociated with it, e.g. delivery or de- 
liver. In tile case of verbs, mappings are specified to 
indicate which attributes correspond to each argument 
slot of the verb. 
In either case, both relation and attribute mappings, 
give one an opportunity to state that the relation or 
the attribute is a particular type of entity. This type 
information means that each concept ha_q type pref- 
erences associated with its arguments. Tile NLI pro~ 
vided types such as person, orgallisation, location, 
manu:factured object~ category, transaction, date 
or time duration. The specification of these types 
supplies background information to support various in- 
ferential processes. There are three types of inference 
that will conccrn us here: 
• Coercion 
• Generalisation and Specification 
• Ambiguity rezolution 
COERCIONS depend on tile type information a.¢soci- 
ated with the arguments to verbs. For cxanlple, con- 
sider a verb like supply with arguments supplier and 
suppliee. Let's say that suppliers are specified to he 
of type concern, and suppliees are of type project. 
Then the query Who supplied London? violates a type 
prefcrence specified in the customisation file, namely 
that suppliee is a project. A coercion inference can 
coerce London, a city, to proj,~ct, by using the infer- 
ence path \[projsct located location in city\]. Then 
the question can be understood to mean who supplies 
projects which are in London?f3\]. 
GENERAI,ISATION inferences can suppnrt the. infer- 
cute that Life is a kind of Cheese given other facts 
such as Life is in sector Full feat Soft and Full 1:at Soft 
is a kind of Cheese. A similar inference is supported 
by tile type organiuation ; if X works for organisation 
Y, and Y is a suborganisation of organisation Z, then 
the NLI is supposed to be able to infer that X works 
for Z. 
AMBIGUITY resolution consists of iiliing ill under- 
specified relations. A cerumen case of unspecified rela- 
tions are those, that hold between the nouns of noun 
noun compounds (n-n-relations). For example a mo- 
torola processor is a processor with motorola as tit(: 
manufacturer. A deparlmenf monster is a nranager of 
dcpartment. The specification of conceptual types in 
Similarly from tile knowledge that an attribute is a 
location, the NLI can infer that it can be used as ml 
answer to a question about where Something is. 
6.2 Difficulties 
A minor difficulty in developing tile customisation file 
was that we identified lexical itenls for which there was 
no information in tile databa.se, hi this case we used a 
facility of tile NLI by which we could associate hdpfitl 
error messages with tile use of particnlar lexical items. 
In eases where the concept could be calculated from 
other database informatioir, we were able to use tile 
NLI to extend the database sehcma and specify tile 
calculations that were necdcd in order to support uscr's 
access to these concepts. 
The more major difficulty was to determine which of 
the concepts that the NLI knew about, was the type 
to use tbr a specific donlain lexical item. For exam- 
pie m specifying the 'Marke.ts' database relation, tar- 
get phrases nrigllt be the chocolate market, the mar- 
ket chocolate, sales of chocolates, how much chocolate 
or kinds of chocolate. One of the types available was 
categox'y which seems to be the way tile key market- 
name is used ill the phrase the chocolate market 5. llow- 
ever, another el)lion was to create an attribute map- 
ping far marketname. Attribute nlappings can specify 
that all attribnte type is onc of a different set of types 
such ass unique identifie~, a n~o, a pay, the 
employer, or a ~uperorganisation. And some of 
these have subtypes, e.g. name Call he of type proper, 
classifier, coulmon, lnndel or patternnumber. So per- 
haps if one wants to say sales of chocolates then mar- 
ketname shouhl he a e(unuloti IlaUle. A sohttion would 
he to say ntarketname belongs to a number of these 
types, possibly at tile expense of overgencrating. In 
the case of this l)articular NLI, attempting to do this 
gellerated warnings. 
7 Generating the test suite 
The tt~t suite of sentenccs w~ constructed by selecting 
selltene~:s that cover the requircnlcnts identified by our 
transcript analysis from tile published test suite \[2\]. 
We then substituted concepts to reflect our subdomain 
of sales. Sentences wcre generalised across hieraretlies 
in the donm.i~ and with respect to various words tbr 
relations in : hierarchy (e.g. ore in, belong to, contain, 
have, ore part of, are kind oil. 
As ~',oon as we I)egan testing our first eustomisation 
tile mappings, it was immediately obvious that this test 
suite r:~:~ inappropriate tor use ill early custnrnisation. 
This was because it was partitioned with respect to tile custmrtisation file is intended to support the infer- _ ........... 
enge of these unspecified n-n-relations. For example, SThe documentation on a category says that objects "fall 
the NIA first interprets these with a generic hove re- iltto" categories. If C i~ ~ c~.tegory you call ask, "who #ll into C f" It is uot clear aa tt~ witether thi~ Ilte$1lt that 'i~trketa' wan 
lation and then attempts to use tile conceptual types a category. 
to infer what relation the user UlUSt have intended. 
AcrEs DE COLING-92, NANIF.S, 23 28 Ao~r 1992 8 2 3 Ih~oc. OF COLIN(;-92, NAN1 F.S, AUG. 23-28, 1992 
syntactic form and not with respect to the boundaries 
of customisation sub-domains. This is a common fea- 
ture of most test suites. It also contained some queries 
which had too much syntactic complexity to be of use 
in identifying separable problems in the customisation 
file. 
We therefore created a smaller set of deliveries test 
queries that used only the more simple syntactic forms 
and which was organised with incremental domain cov- 
erage. This was ideal for iterative development of the 
eustomisation, and enabled us to concentrate on get- 
ting the basic coverage working first. Later in the cus- 
tomisation we used the more complete syntax-based 
test suite to get a more complete picture of the lim- 
itations of the resulting system with respect to user 
requirements. We will discuss a possible remedy to 
the situation of having two distinct test suites in the 
conclnsion. 
8 Testing the customisation 
Some of tile coverage limitations were specific to this 
NLI, but there are some general lessons to be learned. 
Many of the pernicious problems had to do with the 
NLI's ambitious use of common-sense knowledge. This 
section briefly discusses some of the limitations in syn- 
tactic coverage that we detected. The remainder of the 
discussion focusses on the NLI's nse of common sense 
reasoning. 
8.1 Testing syntactic coverage 
While the syntactic coverage of the NLI appeared to 
be better tban the Intellect systenr, we were able to 
identify some coverage limitations of tire system. 
NUMERIC QUANTITIES like the number of 'cases ~ de- 
livered and number of tonnes delivered were difficnlt 
to handle. We managed to engineer coverage for How 
many queries concerning the nnmber of eases of prod- 
ucts, hut were unable to get any coverage for How much 
queries concerning number of tonnes. 
COORDINATION worked for some cases arid not for 
otbers with no clear dividing line. Switching the order 
of noun conjuncts, e.g~ in List the market and scclor of 
Lile, could change whetber or not the system was able 
to provide a reasonable answer. Similarly NEGATION 
worked in some cases and not in otbers that were min- 
imally different. It appeared that the verb and some 
of its arguments could be negated What was not deliv- 
ered to Lee's?, while others emdd not, What was not 
deliver~ed in Janus771. 
DISCOURSE related functionality, such ms interpret- 
ing pronouns arrd the use of ellipsis was also variable at 
best, with furtber refinements to previous queries such 
ms and their sales not properly interpreted. 
8.2 The effects of world knowledge 
A number of problems concerned tile set of predefined 
concepts that came with the NLI, and that that were 
used in tile customisation file as types for each lexical 
item and its arguments. These seemed to be domain 
independent concepts, but to our surprise we discov- 
ered that this representation of common-sense knowl- 
edge incorporated a particular model of the world. For 
instance, a lot of support was provided for the concepts 
of t ime and time durat±onu, but time was fixed to tire 
calendar year. Our domain had its own notion of time 
in terms of bookweeks and bookmonths in which weeks 
did not run from Sunday to Sunday and months could 
consist of either 4 or 5 weeks. The English expression 
weekly deliveries was based on this and manager's com- 
missions were calculated over these time durations. 
\]'here were a number of cases where domain depen- 
dent knowledge was embedded in the presumably do- 
main independent conceptual and dictionary structure 
of the NLI. For instance how much was hard-wired to 
returu an answer in dollars. The point is not that it 
didn't respond in pounds sterling, but rather that our 
users wanted amounts such as eases, tonnes~ and ease 
equivalents in response to questions such as How much 
caviar was delivered to TinyGourmet? 
Another feature of world knowledge which made cus- 
tomisation difficult was tbe fact that predefined con- 
cepts comprise a set of built-in definitions for certain 
words. These definitions were part of tile core lex- 
icon of 10,000 words provided with the system, but 
the custouriser is not given a list of what these words 
are 6. This causes mysterious conflicts to arise with 
domain-specific definitions. For instance, we had to 
first discover by carefid sleuthing that the system had 
its own definitions of consumer, customer, warehouse, 
sale, and configuralion, and then purge these defini- 
tions. It was not pos.sible to determine the effects of 
these purges in terms of other concepts in tile system. 
hi particular, there were concepts that were not easy 
to renmve by pnrging lexical definitions such ms the 
concept of TIME mentioned shove. The ambiguity of 
predefined concepts also arose for certain verbs. For 
example, the verb to have was pre-defined with special 
properties, but no explicit definition was made avail- 
able to the customiser. It was impossible to determine 
the effects of nsing it, and yet it seemed unwise to purge 
it. 
Our application had a great need for GENERALISA- 
TION type inferences due to the product, customer and 
time hierarchies (see figure 1). Tbe most common verb 
was deliver and this could refer to deliveries of any 
level in the product hierarchy to any level in the cus- 
tomer hierarchy. We spent a great deal of time trying 
to get this to work properly and were not able to. In 
the examples below (Q) is the original query, (P) is 
the paraphrase provided by the system and (R) is the 
~Presumably because thln in consider proprietary knowledge. 
ACi-ES oE COL1NG-92, NAN'rI!s, 23-28 AOt~'r 1992 8 2 4 P~oc. Ot: COLING-92, NANTES, AUG. 23-28, 1992 
system's response. In example l the enstomer Lee is 
at the level of corporation and the query is properly 
interpreted, resulting in a table of product, customer, 
delivery date, etc. 
(1) Q: What are tile sales of Krunehy in Lee? 
P: List Lee's Krunchy sales. 
lIowever, in 2 the customer Foodmart is at tile lcvel 
of trading company and a query with the identical syn- 
tactic form is interpreted completely differently. 
(2) Q: What arc the sales of Kruuchy in Foodmart? 
P: List the Krunehy in \]?oodmart sales. 
R: "\[qmrc aren't any brands namell Foodmart. 
Other problems were not so clearly problems with 
common sense knowledge but rather with inappropri- 
ately constrained inferential powers. Some of these 
wcrc best identified by examining the paraphrases that 
the generator produced of the semantic interpretation 
of a user query or statement. By tile paraphrase pro- 
vided in (3)P, it appears that the u-n-relation ill report 
groups has I)een interpreted ms have. 
(3) Q: What do you know about report groups? 
t': What do you know about groups that have RF~ 
POR.T? 
1/o: The database contains no information about 
which groups haw~ cnstomers. 
Then another default inference is made, which con- 
sists of assuming that an unknown proper noun is of 
type customer. This results in the response given in 
(3)lL Of course to the user, this response seems to ill- 
dieate that tile system has not at all understood his 
query. 
Another example of a non-.fruitful assumption of a 
haw: relation for a non-speciticd n-n-relation call be 
seen in (4)1{. below. The NLI lirst expands the proper 
aamc BSL to traded unit type BSI,, thcn apparently 
treats this a~s a noun noun Colupound with all unspec- 
ified n-n-relation. This relation is then lilled m with 
tim have relation which appears ill (4)R. 
(4) Q: Show the total salem of bsl, bsj and bsr to Lee's 
I)LC. 
1': List tile total sale of traded unit type BSL, the 
total sale of traded unit type BSJ and the total 
sales of traded unit type BSR to Lee's PLC. 
R: Traded unit types don't have traded unit typt~s. 
Consumer unit types have traded unit types. 
In exanrple (5), the NLI appears to make au unwar- 
ranted inference that the number 17 must refer to 17 
dollars. It also tills in a null noun, taking the sentence 
to actually mean how much N was krunehy sold .... It 
replaces this null noun with tile type traded unit type 
which is given ms a default for how much queries. 
(5) Q: how much was kruuchy sold between week 17 
in 1988 and week 52 in 19887 
P: ilow much traded unit type that wan sohl to 
kruuehy costs between 17 and 52 dollars? 
R: The datab~.se contains no information about 
how expensive traded unit types are. 
It seems that the semantic information that the sys- 
tem has, such ,xu knowing that krunchy is a brand and 
that sales are of a product to a customer, should let 
it overcome the slightly nonstandard syntax how much 
was krunchy sold. Ilowever it apparently pays more 
attention to that aspect of the syntax here, while ig- 
noring the fact that 17 is speeilied to be a designator 
of a book week. 
9 Conclusions 
The NI, transcript analysis proved useful to identify 
the target cow,rage and to tocns our experiment on a 
priority part of tile domain. Ill most cases transcript 
information will not t)e available and so interview data 
or experimental Wizard-of-Oz data\[10\] will have to be 
generate d to make explicit the users' models of the do- 
main. 
The I'~R model of the domldn was very useful for 
carrying out an incremental dcvelopruent of tile cus- 
tomisation file. It lets the customiscr know where the 
reasonable domain boundaries lie, in order that sub 
parts of the customisation call sensibly bc developed 
and tested in isolation. In addition the eustomisation 
wa.u simplified by having the entities and attributes of 
tile E-I~ model labelled with the domain vocabulary 
in advance. Thus the process of associating synonyms 
with appropriate eustomisation tile relations and at- 
tributes wa.u straighttbrward. 
The main linfitation of tile approach seem to be that 
E-I~ diagrams are too limited to capture the use of the 
vocabulary ill the domain. Wc used an E-R diagram 
because it was the conceptual representation available 
for the domain and because it is the most prevalent 
semantic modeling tool used ill databa.sc design. How- 
ever, it does not in fact allow one to represent the 
information that one would like to represent for the 
purl)oses of linking NL concepts and lexical items to 
the domain. The only semantic information associated 
with relations that is represented in all E-It diagram 
are whether they are many-to-one or one-to-one. The 
attributes of the entity that participate in the rela- 
tion arc not indicated specifically. The representation 
AcrEs DF, COLINGO2, NANTES. 23-28 AOt~r 1992 8 2 5 Pl~oc:. OF COL1NG-92, NANTES, AUG. 23-28, 1992 
should be much richer, possibly incorporating seman- 
tic concepts such as whether a relation is transitive, or 
other concepts such as that an attribute represents a 
part of a wtmle. Of course this is part of what the NLI 
was attempting to provide with its concept hierarchy 
and dictionary of 10000 initial words. 
But it seemed that one of the main difficulties with 
the NLI was in fact exactly in attempting to provide a 
richer semantic model with common sense information 
to support inference. This is commonly believed to be 
helpful for the portability of a NL system across a num- 
ber of domains. We found it a hindrance mnre than a 
help. Some predefined concepts had to be purged from 
the lexicon. Some definitions were difficult to delete or 
work around e.g. time definitions. The problems we 
encountered made us wonder whether there is any gen- 
eral world knowledge or whether it is always flavoured 
by thc perspective of the knowledge base designers and 
the domains they had in mind. 
The process was not helped by the black box nature 
of the NL system. The general problem with black box 
systems is that it is difficult for a cnstomiser to get an 
internal model of system. It would help a great deal 
if the world model was made available directly to the 
customiser, the semantics of each concept was clearly 
defined, and a way to modify rather than purge certain 
parts of the conceptual structure was made available. 
The customiser sitould not be left to learn by example. 
During customisation of the NL system we found our 
user requirements test suite ditficult to use for debug- 
ging purposes. The test suite had to be modified to re- 
flect concepts in the database rather than syntax. This 
is because customisations must be done incrementally 
and tested at each phase. A solution to this problem is 
first to ensure that the test suite has a number of sen- 
tences which test only a single syntactic construction. 
Second, store the test suite components in a database. 
Each component would be retrievable through the se- 
mantic class it belonged to (i.e Temporal Expression or 
Complex NP). In addition each component would be 
retrievable through the concepts of the E-R diagram 
that it accessed. Then it should be possible to gen- 
erate test suites that are usable by developers for the 
purpose of testing customisation files. Simple querie~s 
of the test suite database about a particular concept 
would generate appropriate test sentences whose se- 
taantic categories and category fillers were limited to 
that concept. 
References 
\[1\] Ntis Dahlback and Arne Jonsson. Empirical stud- 
ies of discourse representations for natural lan- 
guage interfaces. In Proc. 4th Conference of the 
European Chapter of the ACL, Association of 
Computational Linguistics, pages 291-298, 1989. 
\[2\] Daniel Fliekinger, John Nerbonne, Ivan Sag, and 
Thomas Wasow. Towards evaluation of nip sy~ 
terns, 1987. Presented to the 25th Annual Meeting 
of tile Association for Computational Linguistics. 
\[3\] Jerrold M. Ginsparg. A robust portable natural 
hmguagc data base interface. In Proc. 1st Applied 
AG'L, Association of Computational Linguistics, 
Santa Monics, Ca., pages 25-30, 1983. 
\[4\] Barbara J. Grosz. Team: A transportable natural 
language interface system. In Proc. 1st Applied 
ACL, Association of Computational Linguistics, 
Santa Monica, Ca., 1983. 
\[5\] Jerry R. Hobbs. The logical notation: Ontolog- 
ical promiscuity, chapter 2 of discourse and in- 
ference. Technical report, SRI International, 333 
Ravenswood Ave., Menlo Park, Ca 94025, 1985. 
\[6\] Jerry IL Hobbs, Wiltiam Croft, Todd Davies, Dou- 
glas Edwards, and Kenneth Laws. The tacitus 
commonsense knowledge base. Technical report, 
SRI International, 333 Ravenswood Ave., Menlo 
Park, Ca 94025, 1987. 
\[7\] Aravind K. Joshi and Scott Weinstein. Control of 
inference: Role of some aspects of diseourse struc- 
ture - centering. In Proc. International Joint Con- 
ference on Artificial Intelligence, pages pp. 385- 
387, 1981. 
\[8\] S.R. Petrick. On natural language based computer 
systems. IBM Journal of Research and Develop- 
ment, pages 314-325, July, 1976. 
\[9\] Marilyn Walker and Steve Whittaker. When nat- 
ural langnage is better than menus: A field study. 
Technical Report HPL-BRC-TR-89-020, lip Lab- 
oratories, Bristol, England, 1989. 
\[10\] Steve Whittaker and Phil Stenton. User stndies 
and the design of natural language systems. In 
Proc. gth Conference of the European Chapter of 
the A CL, Association of Computational Linguis- 
tics, pages 116-123, 1989. 
Acll~S DE COLING-92, NANTES, 23-28 AOt'rf 1992 8 2 6 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 
