File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-2163_metho.xml
Size: 13,466 bytes
Last Modified: 2025-10-06 14:14:20
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-2163"> <Title>Sense Classification of Verbal Polysemy based-on Bilingual Class/Class Association*</Title> <Section position="3" start_page="968" end_page="968" type="metho"> <SectionTitle> 2 Bilingual Surface Case Structure </SectionTitle> <Paragraph position="0"> In the framework of verbal case frame acquisition fi'om parallel corpora, bilingually matched surface case structures (Matsumoto eta\[., 1993) are collected and surface case frames of Japanese verbs are acquired ti'om the collection, in this paper, each bilingually matched surface case structure is (:ailed a bilingual surface case structure, and represented as a feature structure:</Paragraph> <Paragraph position="2"> vj in(licat(~s the verb in the Japanese sentence, Pl,..., P,, denote the Japanese ease markers, and n~l,...,nj,, denote the Japanese ease element nouns. When a .Japanese noun nji tins several senses, it may appear in several leaf classes in the ,lapanese thesaurus. Thus, St';Mai is represented as a. set of&quot; those classes, and is referred to as a semantic label. St';ML, is a semantic label of the corresponding English predicate, i.e., a set of classes in the English thesaurus: ,gl';Mu = {cE1 ..... (:l~k}, SI'\]Mj; = {cal ..... cal} cI,:t,..., C/:gk and caa,..., cjl indicate the classes in the English and Japanese thesaurus, rcspect.ively. null By structurally matching the Japanese-English parallel sentences in Example 1, the following bilingual surface case structure is obtained:</Paragraph> <Paragraph position="4"> We use \[C/,oget's Thesaurus (Roget, 1911) as the English thesaurus and 'Bunrui Goi Hyon'(BGH) (NLRI, 1993) as the Japanese thesaurus. In Roget's Thesaurus, the verb &quot;han.q&quot; has four senses. In BGH, the nouns &quot;watash, i&quot; and &quot;uwagi&quot; have only one sense, respectively, and &quot;kagi&quot; has four senses.</Paragraph> </Section> <Section position="4" start_page="968" end_page="968" type="metho"> <SectionTitle> 3 Monolingual Thesaurus </SectionTitle> <Paragraph position="0"> A thesaurus is regarded as a tree in which each node represents a class. We introduce ~ as the superordinate-subordinate relation of closes. In general, c1 _~ e2 means that cl is subordinate to c2. We define -/ so that a semantic label SEM= {cl,...,cn} is subordinate to each class ci: Vc C SEM, SEM ~ c When searching for classes which give maximum association scm'e (section 5), this detinition makes it possible to calculate association score for all the senses in a semantic label and to find senses which give a maximum association score ~.</Paragraph> <Paragraph position="1"> BGIt has a six-layered abstraction hierarchy and more than 60,000 .Japanese words are assigned at the leaves and its nominal part contains about 45,000 words 2. Roget's l hesalrus has a sevenlayered abstraction hierarchy and over 100,000 words are allocated at the leaves a. In Roget's Thesaurus, sense classification is preferred to part of speech distinction. Thus, a noun and a verb which have similar senses are assigned similar classes in the thesanrus.</Paragraph> </Section> <Section position="5" start_page="968" end_page="970" type="metho"> <SectionTitle> 4 Class-based Association Score </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="968" end_page="969" type="sub_section"> <SectionTitle> 4.1 Word/Class Association Score </SectionTitle> <Paragraph position="0"> The measure of word/class association of Resnik (1992) can be illustrated by the problem of finding tile prototypical object classes for verbs. Let )2 and A/&quot; be the sets of all verbs and norms, respectively. (liven a verb v(E )2) and a norm class c(C N'), the joint probability of v and c is estimated as</Paragraph> <Paragraph position="2"> The association score A(v,c) of a verb v and a noun class c is defined as Pr(v, c) - Pr(c \[ v) l(v; e) A(v,c) -- Pr(c I v) log Pr(v)Pr(c) The association score takes the mutual information between the verb and a noun class, and scales the root node: abstract-relations, agents-@humanactivities, human-activities, products, and naturalobjects- and-natural-phenomena.</Paragraph> <Paragraph position="3"> SAt the next level from the root node, it has six classes: abstract-relations, space, matter, intellect, volition, and affections.</Paragraph> <Paragraph position="4"> it according to the likelihood that a member of the class will actually appear as the object of the verb. The first term of the conditional probability measures the generality of the association, while the second term of the mutual information measures the co-occurrence of the association.</Paragraph> </Section> <Section position="2" start_page="969" end_page="969" type="sub_section"> <SectionTitle> 4.2 Bilingual Class/Class Association Score </SectionTitle> <Paragraph position="0"> We now apply the word/class association score to the task of measuring the association of classes of English predicates and Japanese case element nouns in the collection of bilingual surface ease structures. First, we assume that for any polysemous Japanese verb v j, there exists a case marker p which is most effective for sense classification of vj. Given the collection of bilingual surface case structures for v j, we introduce the bilingual class/class association score for measuring the association of a class cE of English predicates and a class cj of Japanese case element nouns for a case marker p.</Paragraph> <Paragraph position="1"> Let Eg(vg,p) be the set of bilingual surface case structures collected fronl the Japanese-English parallel corpora, each element of which has a Japanese verb vj and a Japanese case marker p. Among the elements of Eg(vj,p), let Eg(vj,p,c~) be the set of those whose semantic label SEME of the English predicate satisfies the class c~, i.e., SEME ~ cE, and Eg(vj,p/cj) be the set of those whose semantic label SEMj of the Japanese case element noun for the case marker p satisfies the class c j, i.e., SEMj cj. Let l';g(vj,cE,p/cj) be the intersection of Eg(vj, p, c~i) and Eg(vj, p/cj). Then, conditional probabilities Pr(cE Ira,p), Pr(cj I va,p), and Pr(cE,cj I vj,p) are defined as the ratios of the numbers of the elements of those sets:</Paragraph> <Paragraph position="3"> Then, given vj and p, the association score A(c~,., cj Ivj,p) of cE and cj is defined as</Paragraph> <Paragraph position="5"> This definition is slightly different from that of the word/class association score in that it only needs the set Eg(vj,p) for a Japanese verb vy and a Japanese case marker p, but not the whole 3apanese-English parallel corpora. This is because our task is to discover strong association of an English (:lass and a Japanese class in Eg(vj,p), rather than in the whole Japanese-English parallel corpora. Besides, as the first term for measnring the generality of the association, we use</Paragraph> <Paragraph position="7"/> </Section> <Section position="3" start_page="969" end_page="970" type="sub_section"> <SectionTitle> 4.3 Bilingual Class/Frame Association Score </SectionTitle> <Paragraph position="0"> In the previous section, we assume that for any polysemous Japanese verb v j, there exists a case marker p which is most effective for sense classification of verbal polysemy vj. However, it can happen that a combination of more than one ease marker characterizes a sense of the verbal polysenly vj. Even if there exists exactly one case marker which is most effective for sense classification, it is necessary to select the most effective case marker automatically by some measure. For example, using some measure, it is desirable to automatically discover the fact that, for the task of sense classification of verbal polysenry, subject nouns are usually nlost effective for intransitive verbs, while object nouns are usually most effective for transitive verbs.</Paragraph> <Paragraph position="1"> This section generalizes the previous definition of bilingual class/class association score, and introduces the bilingual class/frame association score. In the new definition, we consider every possible set of pairs of a Japanese case marker p and a Japanese noun class c j, instead of predetermining the most effective case marker. The bilingual class/frame association score measures the association of an English class c~ and a set of pairs of a Japanese case marker p and a Japanese noun class cs marked by p. By searching for a large association score, it becomes possible to find any combination of case markers which characterizes a sense of the verbal polysemy vs.</Paragraph> <Paragraph position="2"> First, we introduce a data structure which represents a set of pairs of Japanese case marker p and a Japanese noun class cj marked by p, and call it; Japanese case-class frame. A Japanese case-class frame can be represented as a feature structure:</Paragraph> <Paragraph position="4"> large in lower parts of the thesaurus, since we focus oi1 examples which have a Japanese verb v.l and a Japanese case marker p. When we used the average of Pr(ej I vJ,p, cE) and Pr(e~ \[ vj,p/cj) instead of Pr(eE, cj \] vj,p) in the experiment d section 6, most discovered clusters consisted of only one example.</Paragraph> <Paragraph position="5"> Next, we introduce subsuraption relation ~/'of ~ a bilingual surface case structure e and a Japanese ease-class frame fa: e ~f f3 iff. for each case marker p in f's and its noun class c's, there exists the same case marker p in e and its semantic babel SEMj is subordinate to ca, i.e. SEM'S ~ ca This definition can be easily extended into a subsnmption relation of Japanese case-class frames. Let Eg(va) be the set of bilingual surface case strnctures collected from the Japanese-English parallel corpora, each element of which has a Japanese verb va. Among the elements e of Eg(va), let Eg(va,cE) be the set of those whose semantic label SEME of the English predicate satisfies the class cE, i.e., SEME ~ cE, and Eg(vj, fd) be the set of those which satisfy the Japanese case-class frame fa, i.e., e ~f fj. Let Eg(vj,cf,;, fa) be the intersection of Eg(va,cE) and Eg(va, fa). Then, conditional probabilities 1','(c~: Iv j), Pr(fa ira), and I'r(cm fjiva) are defined as the ratios of the numbers of the elements of those sets:</Paragraph> <Paragraph position="7"> Then, given va, the association score A(cE, fal V'S) of cg and fj is defined as</Paragraph> <Paragraph position="9"> As well as the case of the bilingual class/class association score, this definition only needs the set Eg('va) for a Japanese verb va, not the whole Japanese-English parallel corpora.</Paragraph> </Section> </Section> <Section position="6" start_page="970" end_page="970" type="metho"> <SectionTitle> 5 Sense Classification of Verbal </SectionTitle> <Paragraph position="0"> Polysemy This section explains how to classify the elements of the set l')g(va) of bilingual surface case structures according to the sense of the verbal polyscaly va, with the bilingual class/frame association score defined in the previous section, hi this classification process, pairs of an English class cz,: and a Japanese case-class frame fj which give large association score A(cE, fa ira) are searched for. It is desirable that the set Eg(vj) be divided into disjoint subsets by the discovered pairs of cu and fa. The classification process proceeds according to the following steps: 1. First, the index i and the set of examples Eg are initialized as i ~- \] and Eg *- Eg(va).</Paragraph> <Paragraph position="1"> 2. For the i-th iteration, let cE and fa be a pair of an English class and a Japanese case-class frame which satisfy the following constraint for all the pairs ofcEj and fjj (l<j<i- 1.): csu is not subordinate nor superordinate to cEj (i.e., cF, ~ cEj and cEj ~ cE), or fa is not subordinate nor superordinate to faj (i.e., fJ 74 1 faj and fjj Zf fa). Then, among those pairs of c~ and f j, search for a pair eel and fJi which gives maximum association score max A(cE, fa Iv j), a and collect the ele- cE ,fj ments of Eg which satisfy the restrictions of CEi and fJi into the set Eg(va, eel, f.'i).</Paragraph> <Paragraph position="2"> 3. Subtract the set Eg(va, cu~, fJi) from Eg as</Paragraph> <Paragraph position="4"> increment the index i as i +-- i + 1 and go to step 2. Otherwise, set the number k of the subsets as k +- i and terminate the class/flea-.</Paragraph> <Paragraph position="5"> t/on process.</Paragraph> <Paragraph position="6"> As the result of this classification process, the set Eg(vj) is divided into disjoint subsets Eg(vj, cNl, fdl), ..., Eg(v2, cEk, fak). 6 For example, if a Japanese polysemous verb vg has both intransitive and transitive senses, pairs with the sub-ject case like (era, \[subj : Cjl\]),... , <CEk, , \[s~l.bj : oak,\]) will be discovered for intransitive senses, while pairs with the object case like (cEk,~ l, \[obj: Cdk, ,q\]),..., (cEk, \[obj :csk\]) will be discovered for transitive senses.</Paragraph> <Paragraph position="7"> Given tile set Eg(va), the iterations of the association score calculation is O(IEg('oa)t) ~. Since the classification process can be regarded as sorting the calculated association score, its COnlputational complexity can be O(IEg(vj) I log IEg(~j)l ) if efficient sorting algorithms such as quick sort are employed.</Paragraph> </Section> class="xml-element"></Paper>