File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/02/w02-1408_abstr.xml

Size: 3,852 bytes

Last Modified: 2025-10-06 13:42:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1408">
  <Title>Automatic Discovery of Term Similarities Using Pattern Mining</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> Term recognition and clustering are key topics in automatic knowledge acquisition and text mining. In this paper we present a novel approach to the automatic discovery of term similarities, which serves as a basis for both classification and clustering of domain-specific concepts represented by terms. The method is based on automatic extraction of significant patterns in which terms tend to appear. The approach is domain independent: it needs no manual description of domain-specific features and it is based on knowledge-poor processing of specific term features. However, automatically collected patterns are domain specific and identify significant contexts in which terms are used. Beside features that represent contextual patterns, we use lexical and functional similarities between terms to define a combined similarity measure. The approach has been tested and evaluated in the domain of molecular biology, and preliminary results are presented.</Paragraph>
    <Paragraph position="1"> Introduction In a knowledge intensive discipline such as molecular biology, the vast and constantly increasing amount of information demands innovative techniques to gather and systematically structure knowledge, usually available only from text/document resources. In order to discover new knowledge, one has to identify main concepts, which are linguistically represented by domain specific terms (Maynard and Ananiadou (2000)).</Paragraph>
    <Paragraph position="2"> There is an increased amount of new terms that represent newly created concepts. Since existing term dictionaries usually do not meet the needs of specialists, automatic term extraction tools are indispensable for efficient term discovery and dynamic update of term dictionaries.</Paragraph>
    <Paragraph position="3"> However, automatic term recognition (ATR) is not the ultimate aim: terms recognised should be related to existing knowledge and/or to each other.</Paragraph>
    <Paragraph position="4"> This entails the fact that terms should be classified or clustered so that semantically similar terms are grouped together. Classification and/or clustering of terms are indispensable for improving information extraction, knowledge acquisition, and document categorisation. Classification can also be used for efficient term management and populating and updating existing ontologies in a consistent manner. Both classification and clustering methods are built on top of a specific similarity measure.</Paragraph>
    <Paragraph position="5"> The notion of term similarity has been defined and considered in different ways: terms can have functional and/or structural similarities, though they can be correlated by different relationships (Grefenstette (1994), Maynard and Ananiadou (2000)). In this paper we suggest a novel, domain-independent method for the automatic discovery of term similarities, which can serve as a basis for both classification and clustering of terms. The method is mainly based on the automatic discovery of significant term features through pattern mining.</Paragraph>
    <Paragraph position="6"> Automatically collected patterns are domain dependent and they identify significant contexts in which terms tend to appear. In addition, the measure combines lexical and syntactical similarities between terms.</Paragraph>
    <Paragraph position="7"> The paper is organised as follows. In Section 1 we overview term management approaches.</Paragraph>
    <Paragraph position="8"> Section 2 introduces the term similarity measure and Section 3 presents results and experiments.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML