File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/92/c92-2122_concl.xml
Size: 4,817 bytes
Last Modified: 2025-10-06 13:56:50
<?xml version="1.0" standalone="yes"?> <Paper uid="C92-2122"> <Title>A Case Study of Natural Language Customisation: The Practical Effects of World Knowledge</Title> <Section position="11" start_page="0" end_page="0" type="concl"> <SectionTitle> 9 Conclusions </SectionTitle> <Paragraph position="0"> The NI, transcript analysis proved useful to identify the target cow,rage and to tocns our experiment on a priority part of tile domain. Ill most cases transcript information will not t)e available and so interview data or experimental Wizard-of-Oz data\[10\] will have to be generate d to make explicit the users' models of the domain. null The I'~R model of the domldn was very useful for carrying out an incremental dcvelopruent of tile customisation file. It lets the customiscr know where the reasonable domain boundaries lie, in order that sub parts of the customisation call sensibly bc developed and tested in isolation. In addition the eustomisation wa.u simplified by having the entities and attributes of tile E-I~ model labelled with the domain vocabulary in advance. Thus the process of associating synonyms with appropriate eustomisation tile relations and attributes wa.u straighttbrward.</Paragraph> <Paragraph position="1"> The main linfitation of tile approach seem to be that E-I~ diagrams are too limited to capture the use of the vocabulary ill the domain. Wc used an E-R diagram because it was the conceptual representation available for the domain and because it is the most prevalent semantic modeling tool used ill databa.sc design. However, it does not in fact allow one to represent the information that one would like to represent for the purl)oses of linking NL concepts and lexical items to the domain. The only semantic information associated with relations that is represented in all E-It diagram are whether they are many-to-one or one-to-one. The attributes of the entity that participate in the relation arc not indicated specifically. The representation AcrEs DF, COLINGO2, NANTES. 23-28 AOt~r 1992 8 2 5 Pl~oc:. OF COL1NG-92, NANTES, AUG. 23-28, 1992 should be much richer, possibly incorporating semantic concepts such as whether a relation is transitive, or other concepts such as that an attribute represents a part of a wtmle. Of course this is part of what the NLI was attempting to provide with its concept hierarchy and dictionary of 10000 initial words.</Paragraph> <Paragraph position="2"> But it seemed that one of the main difficulties with the NLI was in fact exactly in attempting to provide a richer semantic model with common sense information to support inference. This is commonly believed to be helpful for the portability of a NL system across a number of domains. We found it a hindrance mnre than a help. Some predefined concepts had to be purged from the lexicon. Some definitions were difficult to delete or work around e.g. time definitions. The problems we encountered made us wonder whether there is any general world knowledge or whether it is always flavoured by thc perspective of the knowledge base designers and the domains they had in mind.</Paragraph> <Paragraph position="3"> The process was not helped by the black box nature of the NL system. The general problem with black box systems is that it is difficult for a cnstomiser to get an internal model of system. It would help a great deal if the world model was made available directly to the customiser, the semantics of each concept was clearly defined, and a way to modify rather than purge certain parts of the conceptual structure was made available.</Paragraph> <Paragraph position="4"> The customiser sitould not be left to learn by example.</Paragraph> <Paragraph position="5"> During customisation of the NL system we found our user requirements test suite ditficult to use for debugging purposes. The test suite had to be modified to reflect concepts in the database rather than syntax. This is because customisations must be done incrementally and tested at each phase. A solution to this problem is first to ensure that the test suite has a number of sentences which test only a single syntactic construction. Second, store the test suite components in a database.</Paragraph> <Paragraph position="6"> Each component would be retrievable through the semantic class it belonged to (i.e Temporal Expression or Complex NP). In addition each component would be retrievable through the concepts of the E-R diagram that it accessed. Then it should be possible to generate test suites that are usable by developers for the purpose of testing customisation files. Simple querie~s of the test suite database about a particular concept would generate appropriate test sentences whose setaantic categories and category fillers were limited to that concept.</Paragraph> </Section> class="xml-element"></Paper>