File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-1813_intro.xml
Size: 2,847 bytes
Last Modified: 2025-10-06 14:02:40
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1813"> <Title>Determining the Specificity of Terms based on Information Theoretic Measures</Title> <Section position="2" start_page="0" end_page="2" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The specificity of terms represents the quantity of domain specific information contained in the terms.</Paragraph> <Paragraph position="1"> If a term has large quantity of domain specific information, the specificity of the term is high. The specificity of a term X is quantified to positive real number as equation (1).</Paragraph> <Paragraph position="3"> The specificity is a kind of necessary condition for term hierarchy, i.e., if X ). Thus this condition can be applied to automatic construction or evaluation of term hierarchy. The specificity also can be applied to automatic term recognition. Many domain specific terms are multiword terms. When domain specific concepts are represented as multiword terms, the terms are classified into two categories based on composition of unit words. In the first category, new terms are created by adding modifiers to existing terms. For example &quot;insulin-dependent diabetes mellitus&quot; was created by adding modifier &quot;insulin-dependent&quot; to its hypernym &quot;diabetes mellitus&quot; as in Table 1. In English, the specific level terms are very commonly compounds of the generic level term and some modifier (Croft, 2004). In this case, compositional information is important to get meaning of the terms. In the second category, new terms are independent of existing terms. For example, &quot;wolfram syndrome&quot; is semantically related to its ancestor terms as in Table 1. But it shares no common words with its ancestor terms.</Paragraph> <Paragraph position="4"> In this case, contextual information is important to get meaning of the terms.</Paragraph> <Paragraph position="5"> thesaurus. Node numbers represent hierarchical structure of terms Contextual information has been mainly used to represent the meaning of terms in previous works. (Grefenstette, 1994) (Pereira, 1993) and (Sanderson, 1999) used contextual information to find hyponymy relation between terms. (Caraballo, 1999) also used contextual information to determine the specificity of nouns. Contrary, compositional information of terms has not been commonly discussed. We propose new specificity measuring methods based on both compositional and contextual information. The methods are formulated as information theory like measures. This paper consists as follow; new specificity measuring methods are introduced in section 2, and the experiments and evaluation on the methods are discussed in section 3, finally conclusions are drawn in section 4.</Paragraph> </Section> class="xml-element"></Paper>