File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/c92-2122_metho.xml
Size: 25,350 bytes
Last Modified: 2025-10-06 14:12:59
<?xml version="1.0" standalone="yes"?> <Paper uid="C92-2122"> <Title>A Case Study of Natural Language Customisation: The Practical Effects of World Knowledge</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Method </SectionTitle> <Paragraph position="0"> Information sources for the customisation inchlded: the customisation manual, database scheina, NL transcripts of users accessing tile data in the database using the previous NLI, Intellect, and a test. suite of English sentences\[2\].</Paragraph> <Paragraph position="1"> Our customisation metltod had four parts: 1. NL transcript analysis 2. Mapping NL terms onto all l';ntity-lLelation (FLR) diagram 3. Constructing the customisation files 4. Gencrating a test suite and testing the eustomisa- null tion We restricted our efforts to implementing and testing coverage of a sub-part of the domain identified as important through analysis of the NL transcripts, namely the deliveries subdomain 2. The important concepts are listed below and highlighted iu Figure 1.</Paragraph> <Paragraph position="3"/> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> CONI'IARN BOOKWIZEK UN\] DELIVERY y </SectionTitle> <Paragraph position="0"> The following sections will discuss each aspect of the eustomisation procedure and the issues raised by each step of the method.</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Analysing the transcripts </SectionTitle> <Paragraph position="0"> The N b transcripts consisted of every interaction with tile previuus NLI, Intellect, over a period of a year, by our user group, ascessing our target databasc. A detailed accotm t of the transcrillt analysis can be fouml ill \[9\]. llerc we focus on how the results atfected tile rest of the procedure.</Paragraph> <Paragraph position="1"> The transcripts showed that tile most important set of user queries were those about deliveries of the different levels of the t)roduct hierarchy to the different levels of the customer hierarchy. The transcripts also showed that over 30% of user errors were synouym errors or resulted from the use of term:~ to refer to concepts that were calculable from in~brmation in the database. We collected a list of all the unknown word errors from the Intellect installation. For example using the term wholesalers resulted ill an Intellect error, but it refers to a subset of trading companies with trade category of WSL. We didn't feel that the syntax of the transcripts was important since it reflected a degree of accommodation to Intellect, but the Intellect lexicon and the unknown word errors gave us a good basis for the required lexical and conceptual coverage. In the absence of such information, a method to acqnire it, such as Wizard of Oz studies, would be necessary\[10, 1\].</Paragraph> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> ACRES DE COLING-92, NANTES, 23-28 AO(rl 1992 8 2 1 I)ROC. OF COLING-92, NANI'ES, AUO. 23-28, 1992 5 Mapping NL terms onto an E-R diagram </SectionTitle> <Paragraph position="0"> The steps we applied in this part of the proposed method are: (1) take the E-R diagram provided by the database designer at tim customer site as a conceptual representation of the domain, (2) associate each lexical item from the transcript analysis with either an entity or a relation, (3) Refine and expand the E-R diagram a~s neee.qsary.</Paragraph> <Paragraph position="1"> We started with a list of lexical items e.g. markets, sectors, brands, deliver, pack size, date, corporate, trading concern, customer location, that were part of the Intellect lexicon or had appeared in the transcripts as unknown words. By placing these lexical items on the 1~1~. diagram we were able to sketch out tim mapping between user terms and database concepts beforc cormnitting anything to the customisation files s. Howew;r, we found mapping vocabnlary onto the E-R diagram to be rnore difficult than we had anticipated.</Paragraph> <Paragraph position="2"> First, a nmnber of words were ambiguous in that they could go in two different places on the E-R diagram, atul thus apparently refer to multiple concepts in the domain. This was most clearly demonstrated with certain generic terms such as customer. Customer can be used to refer to a relation at any level of the cuStomer hierarchy, the conceru, the trading company or the corporation. It can also be associated with the attribute of the customer reference number which is a key value in the 'Concern' database relation.</Paragraph> <Paragraph position="3"> Second, some words were based on relationships between two entities, so they could have gone in two places. For instance market share is calculated frona information associated with both the market entity and with the trade sector entity. Similarly, the term deity.</Paragraph> <Paragraph position="4"> cry refers to a relation between any level of the product hierarchy and any level of the customer hierarchy. Yet there was no entity that corresponded to a delivery, even though it was one of the main concepts in the domain.</Paragraph> <Paragraph position="5"> In both of these cases we created new entities to refer to concepts such as delivery and market share. We were then able to indicate links between these concepts and other related concepts ill the domain and could annotate these concepts with the relevant vocabulary items. In some cases it was difficult to determine whether a term should be a new entity. For instance the term wholesalers refers to members of the trading company entity with a particular value in the trade category attribute. However since trade category is not used in any other relation, it doesn't have a separate entity of its own. In this case we left wholesaler as a term associated with trading company.</Paragraph> <Paragraph position="6"> Third, in our lexicon there were operators or predi3 In a perfect world, the NLI would target an ,E~R diagram and the snapping front the E~R diagram to the database would be aa independent aspect of the semantic modelling of the domain. cators such as less than, greater than, equal to, at least, change, decrease, latest estimate, over time, chart, graph, pie, during, without, across, display, earliest, available. These operators were domain independent operators; some of them were synonyms for functions that the system did support. Since these seem to be concepts perhaps related to the task, but not specific to the domain, for convenience we created a pseudo entity on the E-R diagram having to do with output and display concepts such as graphing, ranking, displaying information as a percentage etc.</Paragraph> <Paragraph position="7"> Finally, there were also terms for which there was no database information such as ingredients and journey, ambiguous terms such as take, get, accept, use, as well as terms that were about the database itself, such as database, information. For other terms such as earliest or available it was difficult to determine what domain concepts they should be associated with.</Paragraph> <Paragraph position="8"> tiowever, the benefits of this method were that once we had made the extensions to the E-R diagram, then all synonyms were clearly associated with the entities they referred to, words that could ambiguously refer to multiple concepts were obvious, and words for which a calculation had to be specified were apparent. We were also able to identify which concepts users had tried to access whiclt were not present in the domain. Once this was done the cnstomisation files were built incrementally over the restricted domain.</Paragraph> </Section> <Section position="8" start_page="0" end_page="0" type="metho"> <SectionTitle> 6 Constructing the customisa- </SectionTitle> <Paragraph position="0"> tion files The input to this part of the process was the annotated F~R diagram as well as the test suite. We chose not to use the menu system customisation tool that was part of the NLI 4. We preferred to use an interface in which declarative forms are specified in a file.</Paragraph> <Paragraph position="1"> As we developed the customisation file incrementally over the domain, we ensured that all the synonyms for a concept were specific(I, and thoroughly tested the system with each addition. This section discusses constructing the customisation file. In section 7, we discuss the test suite itself. The results are discussed in section 8.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 6.1 Grammatical and Conceptual In- </SectionTitle> <Paragraph position="0"> formation The custvmiser's job is to link domain dependent knowledge about the application to domain independent knowledge about language and the world. Construtting a customisation file consisted of specifying a number of forms that would allow the NL1 to protThe menu system was very large mad unwieldy with many levels, too many choices at each level, and a lack of clarity ~bout the ramifications of the choices.</Paragraph> <Paragraph position="1"> AcrEs D~ COLING-92, NArrrES, 23-28 ^o~t' 1992 8 2 2 Paoc. OF COL1NG-92, NANTES. AUG. 23-28. 1992 ducea mapping between English words, database relations, attributes and values, and concepts used ill common sense reasoning by tile deductive conrponent of the NIA.</Paragraph> <Paragraph position="2"> A database relation, such as 'Deliveries', could \[lave nouns or verbs a~sociated with it, e.g. delivery or deliver. In tile case of verbs, mappings are specified to indicate which attributes correspond to each argument slot of the verb.</Paragraph> <Paragraph position="3"> In either case, both relation and attribute mappings, give one an opportunity to state that the relation or the attribute is a particular type of entity. This type information means that each concept ha_q type preferences associated with its arguments. Tile NLI pro~ vided types such as person, orgallisation, location, manu:factured object~ category, transaction, date or time duration. The specification of these types supplies background information to support various inferential processes. There are three types of inference that will conccrn us here: ated with the arguments to verbs. For cxanlple, consider a verb like supply with arguments supplier and suppliee. Let's say that suppliers are specified to he of type concern, and suppliees are of type project.</Paragraph> <Paragraph position="4"> Then the query Who supplied London? violates a type prefcrence specified in the customisation file, namely that suppliee is a project. A coercion inference can coerce London, a city, to proj,~ct, by using the inference path \[projsct located location in city\]. Then the question can be understood to mean who supplies projects which are in London?f3\].</Paragraph> <Paragraph position="5"> GENERAI,ISATION inferences can suppnrt the. infercute that Life is a kind of Cheese given other facts such as Life is in sector Full feat Soft and Full 1:at Soft is a kind of Cheese. A similar inference is supported by tile type organiuation ; if X works for organisation Y, and Y is a suborganisation of organisation Z, then the NLI is supposed to be able to infer that X works for Z.</Paragraph> <Paragraph position="6"> AMBIGUITY resolution consists of iiliing ill under-specified relations. A cerumen case of unspecified relations are those, that hold between the nouns of noun noun compounds (n-n-relations). For example a motorola processor is a processor with motorola as tit(: manufacturer. A deparlmenf monster is a nranager of dcpartment. The specification of conceptual types in Similarly from tile knowledge that an attribute is a location, the NLI can infer that it can be used as ml answer to a question about where Something is.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 6.2 Difficulties </SectionTitle> <Paragraph position="0"> A minor difficulty in developing tile customisation file was that we identified lexical itenls for which there was no information in tile databa.se, hi this case we used a facility of tile NLI by which we could associate hdpfitl error messages with tile use of particnlar lexical items.</Paragraph> <Paragraph position="1"> In eases where the concept could be calculated from other database informatioir, we were able to use tile NLI to extend the database sehcma and specify tile calculations that were necdcd in order to support uscr's access to these concepts.</Paragraph> <Paragraph position="2"> The more major difficulty was to determine which of the concepts that the NLI knew about, was the type to use tbr a specific donlain lexical item. For exampie m specifying the 'Marke.ts' database relation, target phrases nrigllt be the chocolate market, the market chocolate, sales of chocolates, how much chocolate or kinds of chocolate. One of the types available was categox'y which seems to be the way tile key marketname is used ill the phrase the chocolate market 5. llowever, another el)lion was to create an attribute mapping far marketname. Attribute nlappings can specify that all attribnte type is onc of a different set of types such ass unique identifie~, a n~o, a pay, the employer, or a ~uperorganisation. And some of these have subtypes, e.g. name Call he of type proper, classifier, coulmon, lnndel or patternnumber. So perhaps if one wants to say sales of chocolates then marketname shouhl he a e(unuloti IlaUle. A sohttion would he to say ntarketname belongs to a number of these types, possibly at tile expense of overgencrating. In the case of this l)articular NLI, attempting to do this gellerated warnings.</Paragraph> </Section> </Section> <Section position="9" start_page="0" end_page="0" type="metho"> <SectionTitle> 7 Generating the test suite </SectionTitle> <Paragraph position="0"> The tt~t suite of sentenccs w~ constructed by selecting selltene~:s that cover the requircnlcnts identified by our transcript analysis from tile published test suite \[2\].</Paragraph> <Paragraph position="1"> We then substituted concepts to reflect our subdomain of sales. Sentences wcre generalised across hieraretlies in the donm.i~ and with respect to various words tbr relations in : hierarchy (e.g. ore in, belong to, contain, have, ore part of, are kind oil.</Paragraph> <Paragraph position="2"> As ~',oon as we I)egan testing our first eustomisation tile mappings, it was immediately obvious that this test suite r:~:~ inappropriate tor use ill early custnrnisation. This was because it was partitioned with respect to tile custmrtisation file is intended to support the infer- _ ........... enge of these unspecified n-n-relations. For example, SThe documentation on a category says that objects &quot;fall the NIA first interprets these with a generic hove re- iltto&quot; categories. If C i~ ~ c~.tegory you call ask, &quot;who #ll into C f&quot; It is uot clear aa tt~ witether thi~ Ilte$1lt that 'i~trketa' wan lation and then attempts to use tile conceptual types a category.</Paragraph> <Paragraph position="3"> to infer what relation the user UlUSt have intended.</Paragraph> <Paragraph position="4"> AcrEs DE COLING-92, NANIF.S, 23 28 Ao~r 1992 8 2 3 Ih~oc. OF COLIN(;-92, NAN1 F.S, AUG. 23-28, 1992 syntactic form and not with respect to the boundaries of customisation sub-domains. This is a common feature of most test suites. It also contained some queries which had too much syntactic complexity to be of use in identifying separable problems in the customisation file.</Paragraph> <Paragraph position="5"> We therefore created a smaller set of deliveries test queries that used only the more simple syntactic forms and which was organised with incremental domain coverage. This was ideal for iterative development of the eustomisation, and enabled us to concentrate on getting the basic coverage working first. Later in the customisation we used the more complete syntax-based test suite to get a more complete picture of the limitations of the resulting system with respect to user requirements. We will discuss a possible remedy to the situation of having two distinct test suites in the conclnsion.</Paragraph> </Section> <Section position="10" start_page="0" end_page="0" type="metho"> <SectionTitle> 8 Testing the customisation </SectionTitle> <Paragraph position="0"> Some of tile coverage limitations were specific to this NLI, but there are some general lessons to be learned.</Paragraph> <Paragraph position="1"> Many of the pernicious problems had to do with the NLI's ambitious use of common-sense knowledge. This section briefly discusses some of the limitations in syntactic coverage that we detected. The remainder of the discussion focusses on the NLI's nse of common sense reasoning.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 8.1 Testing syntactic coverage </SectionTitle> <Paragraph position="0"> While the syntactic coverage of the NLI appeared to be better tban the Intellect systenr, we were able to identify some coverage limitations of tire system.</Paragraph> <Paragraph position="1"> NUMERIC QUANTITIES like the number of 'cases ~ delivered and number of tonnes delivered were difficnlt to handle. We managed to engineer coverage for How many queries concerning the nnmber of eases of products, hut were unable to get any coverage for How much queries concerning number of tonnes.</Paragraph> <Paragraph position="2"> COORDINATION worked for some cases arid not for otbers with no clear dividing line. Switching the order of noun conjuncts, e.g~ in List the market and scclor of Lile, could change whetber or not the system was able to provide a reasonable answer. Similarly NEGATION worked in some cases and not in otbers that were minimally different. It appeared that the verb and some of its arguments could be negated What was not delivered to Lee's?, while others emdd not, What was not deliver~ed in Janus771.</Paragraph> <Paragraph position="3"> DISCOURSE related functionality, such ms interpreting pronouns arrd the use of ellipsis was also variable at best, with furtber refinements to previous queries such ms and their sales not properly interpreted.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 8.2 The effects of world knowledge </SectionTitle> <Paragraph position="0"> A number of problems concerned tile set of predefined concepts that came with the NLI, and that that were used in tile customisation file as types for each lexical item and its arguments. These seemed to be domain independent concepts, but to our surprise we discovered that this representation of common-sense knowledge incorporated a particular model of the world. For instance, a lot of support was provided for the concepts of t ime and time durat+-onu, but time was fixed to tire calendar year. Our domain had its own notion of time in terms of bookweeks and bookmonths in which weeks did not run from Sunday to Sunday and months could consist of either 4 or 5 weeks. The English expression weekly deliveries was based on this and manager's commissions were calculated over these time durations.</Paragraph> <Paragraph position="1"> \]'here were a number of cases where domain dependent knowledge was embedded in the presumably domain independent conceptual and dictionary structure of the NLI. For instance how much was hard-wired to returu an answer in dollars. The point is not that it didn't respond in pounds sterling, but rather that our users wanted amounts such as eases, tonnes~ and ease equivalents in response to questions such as How much caviar was delivered to TinyGourmet? Another feature of world knowledge which made customisation difficult was tbe fact that predefined concepts comprise a set of built-in definitions for certain words. These definitions were part of tile core lexicon of 10,000 words provided with the system, but the custouriser is not given a list of what these words are 6. This causes mysterious conflicts to arise with domain-specific definitions. For instance, we had to first discover by carefid sleuthing that the system had its own definitions of consumer, customer, warehouse, sale, and configuralion, and then purge these definitions. It was not pos.sible to determine the effects of these purges in terms of other concepts in tile system.</Paragraph> <Paragraph position="2"> hi particular, there were concepts that were not easy to renmve by pnrging lexical definitions such ms the concept of TIME mentioned shove. The ambiguity of predefined concepts also arose for certain verbs. For example, the verb to have was pre-defined with special properties, but no explicit definition was made available to the customiser. It was impossible to determine the effects of nsing it, and yet it seemed unwise to purge it.</Paragraph> <Paragraph position="3"> Our application had a great need for GENERALISA-TION type inferences due to the product, customer and time hierarchies (see figure 1). Tbe most common verb was deliver and this could refer to deliveries of any level in the product hierarchy to any level in the customer hierarchy. We spent a great deal of time trying to get this to work properly and were not able to. In the examples below (Q) is the original query, (P) is the paraphrase provided by the system and (R) is the ~Presumably because thln in consider proprietary knowledge. ACi-ES oE COL1NG-92, NAN'rI!s, 23-28 AOt~'r 1992 8 2 4 P~oc. Ot: COLING-92, NANTES, AUG. 23-28, 1992 system's response. In example l the enstomer Lee is at the level of corporation and the query is properly interpreted, resulting in a table of product, customer, delivery date, etc.</Paragraph> <Paragraph position="4"> (1) Q: What are tile sales of Krunehy in Lee? P: List Lee's Krunchy sales.</Paragraph> <Paragraph position="5"> lIowever, in 2 the customer Foodmart is at tile lcvel of trading company and a query with the identical syntactic form is interpreted completely differently.</Paragraph> <Paragraph position="6"> (2) Q: What arc the sales of Kruuchy in Foodmart? P: List the Krunehy in \]?oodmart sales.</Paragraph> <Paragraph position="7"> R: &quot;\[qmrc aren't any brands namell Foodmart.</Paragraph> <Paragraph position="8"> Other problems were not so clearly problems with common sense knowledge but rather with inappropriately constrained inferential powers. Some of these wcrc best identified by examining the paraphrases that the generator produced of the semantic interpretation of a user query or statement. By tile paraphrase provided in (3)P, it appears that the u-n-relation ill report groups has I)een interpreted ms have.</Paragraph> <Paragraph position="9"> (3) Q: What do you know about report groups? t': What do you know about groups that have RF~ POR.T? 1/o: The database contains no information about which groups haw~ cnstomers.</Paragraph> <Paragraph position="10"> Then another default inference is made, which consists of assuming that an unknown proper noun is of type customer. This results in the response given in (3)lL Of course to the user, this response seems to illdieate that tile system has not at all understood his query.</Paragraph> <Paragraph position="11"> Another example of a non-.fruitful assumption of a haw: relation for a non-speciticd n-n-relation call be seen in (4)1{. below. The NLI lirst expands the proper aamc BSL to traded unit type BSI,, thcn apparently treats this a~s a noun noun Colupound with all unspecified n-n-relation. This relation is then lilled m with tim have relation which appears ill (4)R.</Paragraph> <Paragraph position="12"> (4) Q: Show the total salem of bsl, bsj and bsr to Lee's I)LC.</Paragraph> <Paragraph position="13"> 1': List tile total sale of traded unit type BSL, the total sale of traded unit type BSJ and the total sales of traded unit type BSR to Lee's PLC.</Paragraph> <Paragraph position="14"> R: Traded unit types don't have traded unit typt~s.</Paragraph> <Paragraph position="15"> Consumer unit types have traded unit types.</Paragraph> <Paragraph position="16"> In exanrple (5), the NLI appears to make au unwarranted inference that the number 17 must refer to 17 dollars. It also tills in a null noun, taking the sentence to actually mean how much N was krunehy sold .... It replaces this null noun with tile type traded unit type which is given ms a default for how much queries.</Paragraph> <Paragraph position="17"> (5) Q: how much was kruuchy sold between week 17 in 1988 and week 52 in 19887 P: ilow much traded unit type that wan sohl to kruuehy costs between 17 and 52 dollars? R: The datab~.se contains no information about how expensive traded unit types are.</Paragraph> <Paragraph position="18"> It seems that the semantic information that the system has, such ,xu knowing that krunchy is a brand and that sales are of a product to a customer, should let it overcome the slightly nonstandard syntax how much was krunchy sold. Ilowever it apparently pays more attention to that aspect of the syntax here, while ignoring the fact that 17 is speeilied to be a designator of a book week.</Paragraph> </Section> </Section> class="xml-element"></Paper>