File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/02/c02-1083_abstr.xml

Size: 5,760 bytes

Last Modified: 2025-10-06 13:42:17

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1083">
  <Title>A Methodology for Terminology-based Knowledge Acquisition and Integration</Title>
  <Section position="2" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> In this paper we propose an integrated knowledge management system in which terminology-based knowledge acquisition, knowledge integration, and XML-based knowledge retrieval are combined using tag information and ontology management tools. The main objective of the system is to facilitate knowledge acquisition through query answering against XML-based documents in the domain of molecular biology.</Paragraph>
    <Paragraph position="1"> Our system integrates automatic term recognition, term variation management, context-based automatic term clustering, ontology-based inference, and intelligent tag information retrieval. Tag-based retrieval is implemented through interval operations, which prove to be a powerful means for textual mining and knowledge acquisition. The aim is to provide efficient access to heterogeneous biological textual data and databases, enabling users to integrate a wide range of textual and non-textual resources effortlessly.</Paragraph>
    <Paragraph position="2"> Introduction With the recent increasing importance of electronic communication and data sharing over the Internet, there exist an increasingly growing number of publicly accessible knowledge sources, both in the form of documents and factual databases. These knowledge sources (KSs) are intrinsically heterogeneous and dynamic. They are heterogeneous since they are autonomously developed and maintained by independent organizations for different purposes. They are dynamic since constantly new information is being revised, added and removed. Such an heterogeneous and dynamic nature of KSs imposes challenges on systems that help users to locate and integrate knowledge relevant to their needs.</Paragraph>
    <Paragraph position="3"> Knowledge, encoded in textual documents, is organised around sets of specialised (technical) terms (e.g. names of proteins, genes, acids).</Paragraph>
    <Paragraph position="4"> Therefore, knowledge acquisition relies heavily on the recognition of terms. However, the main problems that make term recognition difficult are the lack of clear naming conventions and terminology variation (cf. Jacquemin and Tzoukermann (1999)), especially in the domain of molecular biology. Therefore, we need a scheme to integrate terminology management as a key prerequisite for knowledge acquisition and integration.</Paragraph>
    <Paragraph position="5"> However, automatic term extraction is not the ultimate goal itself, since the large number of new terms calls for a systematic way to access and retrieve the knowledge represented through them. Therefore, the extracted terms need to be placed in an appropriate framework by discovering relations between them, and by establishing the links between the terms and different factual databases.</Paragraph>
    <Paragraph position="6"> In order to solve the problem, several approaches have been proposed. MeSH Term in MEDLINE (2002) and Gene Ontology (2002) provide a top-down controlled ontology framework, which aims to describe and constrain the terminology in the domain of molecular biology. On the other hand, automatic term acquisition approaches have been developed in order to address a dynamic and corpus-driven knowledge acquisition methodology (Mima et al., 1999; 2001a).</Paragraph>
    <Paragraph position="7"> Different approaches to linking relevant resources have also been suggested. The Semantic Web framework (Berners-Lee (1998)) aims to link relevant Web resources in bottom-up manner using the Resource Description Framework (RDF) (Bricklet and Guha, 2000) and an ontology. However, although the Semantic Web framework is powerful to express content of resources to be semantically retrieved, some manual description is expected using the RDF/ontology. Since no solution to the well-known difficulties in manual ontology development, such as the ontology conflictions/mismatches (Visser et al., 1997) is provided, an automated ontology management is required for the efficient and consistent knowledge acquisition and integration. TAMBIS (Baker et al., 1998) tried to provide a filter from biological information services by building a homogenising layer on top of the different sources using the classical mediator/wrapper architecture. It intended to provide source transparency using a mapping from terms placed in a conceptual knowledge base of molecular biology onto terms in external sources.</Paragraph>
    <Paragraph position="8"> In this paper we introduce TIMS, an integrated knowledge management system in the domain of molecular biology, where terminology-based knowledge acquisition (KA), knowledge integration (KI), and XML-based knowledge retrieval are combined using tag information and ontology management tools. The management of knowledge resources, similarly to the Semantic Web, is based on XML, RDF, and ontology-based inference. However, our aim is to facilitate the KA and KI tasks not only by using manually defined resource descriptions, but also by exploiting NLP techniques such as automatic term recognition (ATR) and automatic term clustering (ATC), which are used for automatic and systematic ontology population.</Paragraph>
    <Paragraph position="9"> The paper is organised as follows: in section 1 we present the overall TIMS architecture and briefly describe the components incorporated in the system, while section 2 gives the details of the proposed method for KA and KI. In the last section we present results, evaluation and discussion.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML