File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/c96-2163_intro.xml
Size: 4,543 bytes
Last Modified: 2025-10-06 14:06:04
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-2163"> <Title>Sense Classification of Verbal Polysemy based-on Bilingual Class/Class Association*</Title> <Section position="2" start_page="0" end_page="968" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> In corpus-based NLP, acquisition of lexical knowledge has become one of the major research topics.</Paragraph> <Paragraph position="1"> Among several research topics in this field, acquisition from parallel corpora is quite attractive (e.g. Dagan et al. (1991)). The reason is that parallel sentences are useful for resolving both syntactic and lexical ambiguities in the monolingual sentences. Especially if the two languages have different syntactic structures and word meanings (such as English and Japanese), this approach has proved to be most effective in disambiguation (Matsumoto et al., 1993; Utsuro et al., 1993). Utsuro et al. (1993) proposed a method for acquiring surface case frames of Japanese verbs from Japanese-English parallel corpora. In this method, translated English verbs and case labels are used to classify senses of Japanese polysemous verbs. Clues to sense classification are found using English verbs and case labels, as well as the sense distribution of the Japanese case element *The author would like to thank Prof. Yuji MAT-SUMOTO for his valuable comments on this research.</Paragraph> <Paragraph position="2"> This work is partly supported by the Grants from the Ministry of Education, Science, and Culture, Japan, ~07780326.</Paragraph> <Paragraph position="3"> nouns. Then, a human instructor judges whether the clues are correct. One of the major disadvantages of the method is that the use of English information and sense distribution of Japanese case element nouns is restricted. Only surface forms of English verbs and case labels are used and sense distribution of English verbs is not used. Also, the threshold of deciding a distinction in the sense distribution of Japanese case element nouns is predetermined on a fixed level in a Japanese thesaurus. As a result, the human instructor is frequently asked to judge the correctness of the clue.</Paragraph> <Paragraph position="4"> In the field of statistical analysis of natural language data, it is common to use measures of lexical association, such as the information-theoretic measure of mutual information, to extract useful relationships between words (e.g.</Paragraph> <Paragraph position="5"> Church and Hanks (1990)). Lexical association has its limits, however, since often either the data is insufficient to provide reliable word/word correspondences, or the task requires more abstraction than word/word correspondences permit. Thus, Resnik (1992) proposed a useful mea~ sure of word/class association by generalizing information-theoretic measure of word/word association. The proposed measure addresses the limitations of lexical association by facilitating sta~ tistical discovery of facts involving word classes rather than individual words.</Paragraph> <Paragraph position="6"> We find the measure of word/class association of Resnik (1992) is quite attractive, since it is possible to discover a meaningful sense clubter in an arbitrary level of the thesaurus. We thus expect that the restrictions of the previous method of Utsuro et al. (1993) can be Overcome by employing the idea of the measure of word/class association. In this paper, we describe how this idea can be applied to the sense classification of Japanese verbal polysemy in case frame acquisition from Japanese-English parallel corpora. First, sense distribution of English predicates and Japanese case element nouns is represented using monolingual English and Japanese thesaurus, respectively (sections 2 and 3). Then, the measure of the association of classes of English predicates and Japanese case element nouns, i.e., a measure of bilingual class~class association, is introduced, and extended into a measure of bilingual class/frame association (section 4).</Paragraph> <Paragraph position="7"> Using these measures, sense clusters are discovered in the. sense distribution of English predicates and ,lapanese ease element nouns. Finally, examples of a Japanese polysemous verb collected from ,/apanese-l'\]nglish parallel corpora are clivided into disjoint clusters according to those discovered sense clusters (section 5). The results of a small experiment are presented and the proposed measure is evaluated (section 6).</Paragraph> </Section> class="xml-element"></Paper>