File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/90/c90-1011_metho.xml
Size: 13,403 bytes
Last Modified: 2025-10-06 14:12:26
<?xml version="1.0" standalone="yes"?> <Paper uid="C90-1011"> <Title>Knowledge Acquisition Environment Integrating Natural Language and Logic&quot;. Proceedings IJCAI</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> CONCEPT ANALYSIS AND TERMINOLOGY: A KNOWLEDGE-BASED APPROACH TO DOCUMENTATION </SectionTitle> <Paragraph position="0"/> </Section> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> ABSTRACT The central concern of terminology, a </SectionTitle> <Paragraph position="0"> component of the general documentation process, is concept analysis, an activity which is becoming recognized as fundamental as term banks evolve into knowledge bases. We propose that concept analysis can be facilitated by knowledge engineering technology, and describe a generic knowledge acquisition tool called CODE (Conceptually Oriented Design Environment) that has been successfully used in two terminology applications: 1) a bilingual vocabulary project with the Terminology Directorate of the Secretary of State of Canada, and 2) a software documentation project with Bell Northern Research.</Paragraph> <Paragraph position="1"> We conclude with some implications of computer-assisted concept analysis for terminology.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 1. TERMINOLOGY AND CONCEPT ANALYSIS </SectionTitle> <Paragraph position="0"> Terminology, the discipline concerned with the formation, description and naming of concepts in specialized fields of knowledge, is a key component of the general documentation process: it is normally preceded by knowledge acquisition (usually not formalized), and followed by document preparation.</Paragraph> <Paragraph position="1"> While it is still very common for one person to be responsible for all stages of the documentation process, terminological activities are increasingly becoming a distinct specialization, due to 1) the exponential growth of technical concepts, and consequent interest in terminology banks, electronic dictionaries, and various other computer aids for terminology; 2) the growing need for efficient transfer of highly specialized knowledge across national and linguistic boundaries, and associated demand for regulating the terminology of specialized domains; 3) the increasing recognition by corporations that high-quality documentation, which presupposes high-quality terminology, is an important factor in the success of a product.</Paragraph> <Paragraph position="2"> Concept analysis involves 1) the description of concepU; through an enumeration of their characteristics, or properties, and 2) the description of relations that hold within systems of concepts. It is generally agreed, and particularly stressed by the Vienna School of Terminology (Waster 1985), that concept analysis is the central concern of terminology, essential to delimiting and partitioning nomenclatures, constructing definitions, distinguishing quasi-synonyms, dealing with neology, carrying out multilingual terminological analysis, and communicating with subject-field experts. Despite its importance, however, concept analysis is still done in an ad hoc fashion: to date, no developed methodology exists. Only rarely does one find graphical or structured textual presentations of concept systems in terminological publications: rather, one normally detects only traces of conceptual structures in the definitions of certain terms, &quot;somewhat like a puzzle that no one can put together because there are pieces missing, and there :is no picture of the whole that can serve as a guide&quot; (translated from Kukulska-Hulme and Knowles 1989:382).</Paragraph> <Paragraph position="3"> Apart from the lack of established methodology, a number of factors contribute to the difficulty of formalized concept analysis: 1) the terminologist is often not an expert in his subject fields, and thus faces all the knowledge elicitation and representation problems that characterize the knowledge engineering process (Skuce et al 1989); 2) since any partitioning of reality i3 arbitrary to some degree, concept relations often occur in complex &quot;layerings&quot; (Sowa 1984:349); 3) consistency and conceptual clarity are difficult to maintain in fields that are large, multidisciplinary, or rapidly evolving (Meyer and Skuce 1990).</Paragraph> <Paragraph position="4"> We belive that these problems c,'mnot be solved adequately using &quot;paper-and-pencil&quot; or &quot;do-it-all-inmy-head&quot; methods. The need for computer assistance is becoming all the more crucial as term banks evolve into multifunctional knowledge bases (Budin et al.), with various applications becoming dependent on them for example, management information, training, expert systems and machine translation.</Paragraph> <Paragraph position="5"> With the increasing focus on the knowledge component of terminological research comes a need for sophisticated documentation workstations that include a knowledge support tool.</Paragraph> <Paragraph position="6"> 2. CODE: A KNOWLEDGE SUPPORT ENVIRONMENT CODE (for Conceptually Oriented Design Environment, Skuce et al 1989, Skuce 1989a, 1990) is a generic knowledge acquisition enviromnent, written in Smalltalk, that runs on a UNIX, Macintosh or 386 machine. The system has been developed at the Artificial Intelligence Laboratory of the University of Ottawa, Canada, and a protototype has been tested in 2 terminology applications (described below). CODE's associated methodology 56 1 (Skuce 1989b) integrates knowledge representation ideas from artificial intelligence, and includes a logical and a natural language analysis component. It was also influenced by experience with major expert system tools like KEE and ART. CODE may be thought of as a &quot;spreadsheet for ideas&quot;, the intended user being any person faced with the task of systematically organizing expert knowledge. This knowledge, whether obtained verbally or textually, is rarely \]presented as precisely as terminologists would like: conceptual and terminological confusion are the rule rather than the exception. CODE employs a flexible knowledge representation which permits considerable variety in style and degree of formality. It includes mechanisms for catching many conceptual and terminological errors and guidance towards correcting them.</Paragraph> <Paragraph position="7"> CODE is organized around the two fundamental notions of concept and property. Concepts can be of two types: class concepts and instance concepls. For example, 'university' is a class concept with instances such as 'University of Ottawa'. A property is a unit of information that characterizes a concept, corresponding roughly to a succinct declarative sentence. CODE organizes knowledge into units called conceptual descriptors (CDs), which are analogous to frames in artificial intelligence or object.,; in object-oriented programming. CDs can be at'ranged in inheritance hierarchies, so that more specific concepts may inherit properties from more general ones. InheriUmce is controllcxl by a system of &quot;flags&quot;, which define the inheritance behaviour as a function of the kind of property and the kind of inheritance link.</Paragraph> <Paragraph position="8"> CODE offers the following useful features for terminology: 1. Detection of inconsistencies. A well-developed system for controlling inheritance of properties helps the terminologist maintain conceptual clarity and consistency. For example, the logical behaviour of properties can be flagged as &quot;necessary&quot;, &quot;sufficient&quot;, &quot;optional&quot; or &quot;typical&quot;; the modifiability of properties (in subconcepts) can be flagged as &quot;not permissible&quot;, 'Tree&quot;, or &quot;if logically consistent&quot;; etc. When a change is made to a property at a high conceptual level, one is queried as to whether the change also applies to subconcepts.</Paragraph> <Paragraph position="9"> Similarly, when a concept is moved from one branch of the network to another, one is queried about the properties that will be affected. These and other mechanisms for checking inconsistencies allow the terminologist to do &quot;what-if&quot; experiments and obtain quick feedback about the desirability of changes</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 2. Flexible means of specifying </SectionTitle> <Paragraph position="0"> relations and properties. CODE is not tied to any particular theory of concepts: the user can specify any properties and relations he wishes. As well as hierarchical relations (e.g. generic-specific, part--whole), the terminologist may also specify any number of user-defined associative relations (in the general sense of non-hierarchical).</Paragraph> <Paragraph position="1"> 3. Graphical and textual representation. The knowledge base can be visualized either by a graphical display, in the form of a directed graph, or by textual units, called CD Views. Any changes made on the graph are updated automatically in the corresponding CD Views, and vice versa. The graphical display is highly developed, offering features for managing large graphs, viewing multiple graphs (essential for multilingual terminology), indicating concepts and relations of special interest, and displaying hierarchical and associative relations. 4. Representation of multiple partitioning of reality. A subject field can often be partitioned in several ways, depending on which properties of concepts are emphasized. Since terminologists frequently need to take such multiple partitions into account, CODE offers two features of interest: 1) multiple inheritance is permitted, and certain properties can be blocked if necessary; 2) concepts can be assigned various keys, so that one can focus on only certain concepts within the knowledge base, or work with all conceptual partitions simultaneously.</Paragraph> <Paragraph position="2"> 5. Hypertext-like browsing capability.</Paragraph> <Paragraph position="3"> CODE's browsing facility, the Property Browser, allows the terminologist to &quot;navigate&quot; easily between concepts, between properties, and between concepts and properties. A multiple windowing capability allows simultaneous viewing of any number of graphs, CD views, and Property Browsers.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 3. APPLICATIONS OF CODE </SectionTitle> <Paragraph position="0"> A. Bilingual terminology. During the fall of 1989, CODE was tested by Meyer in the Terminology Directorate of the Department of the Secretary of State of Canada. The Terminology Directorate practises terminology as a discipline in its own right. Its efforts are largely geared towards translation needs. Knowledge acquisition is a vital part of the terminology work at the Secretary of State: most of it is done from documcnts, although subject-field experts are frequently consulted as well. The amount of knowledge acquisition depends on the type of project: it is most important tot thematic research of the vocabulary type, i.e. research aiming at a complete coverage of a specialized field, leading to a published work that includes definitions, and not simply bilingual lists of equivalents.</Paragraph> <Paragraph position="1"> CODE was used in a vocabulary project for typesetting. The system served two purposes: 1) to formally represent knowledge that had already been acquired in the field, and that was reflected to some degree in a previous vocabulary - it was found that the formal representation lead to improvements on the previous definitions; 2) to systematize knowledge on emerging concepts in the field, particularly regarding the role of computerization.</Paragraph> <Paragraph position="2"> B. Software documentation. Documentation is an essential aspect of the software production process, but unlortunately it is often not treated with sufficient care. Part of the problem is that careful conceptual analysis and terminological control are often not part of the design and development phases 2 57 that precede documentation. Indeed, one of the goals in designing CODE was to help software engineers organize knowledge for themselves and for documentation. Ideally, knowledge (and hence terminology) should be systematized, edited, verified, maintained, and then distributed to those who need it throughout the whole software cycle. Typically, however, this is left to the documentalists, who, like the terminologists described above, must try to piece together a consistent description of the system after the fact.</Paragraph> <Paragraph position="3"> An experiment in this application of CODE was completed in the fall of 1989 at Bell Northern Research, where Skuce spent some 60 days working closely with the designers of a new design environment for communications systems. The conceptual structure and terminology of this system were worked out in many long knowledge acquisition sessions. The resulting knowledge base is now being used to drive documentation production, on-line help, and subsequent design extensions.</Paragraph> </Section> class="xml-element"></Paper>