File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/p04-3015_intro.xml
Size: 2,677 bytes
Last Modified: 2025-10-06 14:02:29
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-3015"> <Title>Hierarchy Extraction based on Inclusion of Appearance</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The hierarchical relations of words are useful as language resources. Hierarchical semantic lexical databases such as WordNet (Miller et al., 1990) and the EDR electronic dictionary (1995) are used for NLP research worldwide to fully understand a word meaning. In current thesauri in the form of hierarchical relations, words are categorized manually and classified in a top-down manner based on human intuition. This is a good way to make a lexical database for users having a specific purpose.</Paragraph> <Paragraph position="1"> However, word hierarchies based on human intuition tend to vary greatly depending on the lexicographer. In addition, hierarchical relations based on various data may be needed depending on each user.</Paragraph> <Paragraph position="2"> Accordingly, we try to extract a hierarchical relation of words automatically and statistically. In previous research, ways of extracting from definition sentences in dictionaries (Tsurumaru et al., 1986; Shoutsu et al., 2003) or from a corpus by using patterns such as &quot;a part of&quot;, &quot;is-a&quot;, or &quot;and&quot; (Berland and Charniak, 1999; Caraballo, 1999) have been proposed. Also, there is a method that uses the dependence relation between words taken from a corpus (Matsumoto et al., 1996). In contrast, we propose a method based on the inclusion relation of appearance patterns from corpora.</Paragraph> <Paragraph position="3"> In this paper, to verify the suitability of our method, we attempt to extract hierarchies of abstract nouns co-occurring with adjectives in Japanese. We select two similarity measures to estimate the inclusion relation between word appearance patterns. One is a complementary similarity measure; i.e., a similarity measure developed for the recognition of degraded machine-printed text in the field (Hagita and Sawaki, 1995). This measure can be used to estimate one-to-many relations such as superordinate-subordinate relations from appearance patterns (Yamamoto and Umemura, 2002).</Paragraph> <Paragraph position="4"> The second similarity measure is the overlap coefficient, which is a similarity measure to calculate the rate of overlap between two binary vectors.</Paragraph> <Paragraph position="5"> Using each measure, we extract hierarchies from a corpus. After that, we compare these with the EDR electronic dictionary.</Paragraph> </Section> class="xml-element"></Paper>