File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/91/p91-1026_metho.xml
Size: 17,683 bytes
Last Modified: 2025-10-06 14:12:49
<?xml version="1.0" standalone="yes"?> <Paper uid="P91-1026"> <Title>AUTOMATIC NOUN CLASSIFICATION BY USING JAPANESE-ENGLISH WORD PAIRS*</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> AUTOMATIC NOUN CLASSIFICATION BY USING JAPANESE-ENGLISH WORD PAIRS* Naomi Inoue KDD R & D Laboratories </SectionTitle> <Paragraph position="0"/> </Section> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> ABSTRACT </SectionTitle> <Paragraph position="0"> This paper describes a method of classifying semantically similar nouns. The approach is based on the &quot;distributional hypothesis&quot;. Our approach is characterized by distinguishing among senses of the same word in order to resolve the &quot;polysemy&quot; issue. The classification result demonstrates that our approach is successful.</Paragraph> </Section> <Section position="3" start_page="0" end_page="202" type="metho"> <SectionTitle> 1. INTRODUCTION </SectionTitle> <Paragraph position="0"> Sets of semantically similar words are very useful in natural language processing.</Paragraph> <Paragraph position="1"> The general approach toward classifying words is to use semantic categories, for example the thesaurus. The &quot;is-a&quot; relation is connected between words and categories.</Paragraph> <Paragraph position="2"> However, it is not easy to acquire the &quot;is-a&quot; connection by hand, and it becomes expensive.</Paragraph> <Paragraph position="3"> Approaches toward automatically classifying words using existing dictionaries were therefore attempted\[Chodorow\] \[Tsurumaru\] \[Nakamura\]. These approaches are partially successful. However, there is a fatal problem in these approaches, namely, existing dictionaries, particularly Japanese dictionaries, are not assembled on the basis of semantic hierarchy.</Paragraph> <Paragraph position="4"> On the other hand, approaches toward automatically classifying words by using a large-scale corpus have also been attempted\[Shirai\]\[Hindle\]. They seem to be based on the idea that semantically similar words appear in similar environments. This idea is derived from Harris's &quot;distributional hypothesis&quot;\[Harris\] in linguistics. Focusing on nouns, the idea claims that each noun is characterized by Verbs with which it occurs, and also that nouns are similar to the extent that they share verbs. These automatic classification approaches are also partially successful. However, Hindle says that there is a number of issues to be confronted. The most important issue is that of &quot;polysemy&quot;. In Hindle's experiment, two senses of&quot;table&quot;, that is to say &quot;table under which one can hide&quot; and &quot;table which can be commuted or memorized&quot;, are conflated in the set of words similar to &quot;table&quot;. His result shows that senses of the word must be distinguished before classification.</Paragraph> <Paragraph position="5"> (1)I sit on the table.</Paragraph> <Paragraph position="6"> (2)I sit on the chair.</Paragraph> <Paragraph position="7"> (3)I fill in the table.</Paragraph> <Paragraph position="8"> (4)I fill in the list.</Paragraph> <Paragraph position="9"> For example, the above sentences may appear in the corpus. In sentences (1) and (2), &quot;table&quot; and &quot;chair&quot; share the same verb &quot;sit on&quot;. In sentences (3) and (4), &quot;table&quot; and &quot;list&quot; share the same verb &quot;fill in&quot;. However, &quot;table&quot; is used in two different senses. Unless they are distinguished before classification, &quot;table&quot;, &quot;chair&quot; and &quot;list&quot; may be put into the same category because &quot;chair&quot; and &quot;list&quot; share the same verbs which are associated with &quot;table&quot;. It is thus necessary to distinguish the senses of &quot;table&quot; before automatic classification. Moreover, when the corpus is not sufficiently large, this must be performed for verbs as well as nouns. In the following Japanese sentences, the Japanese verb &quot;r~ < &quot;is used in different senses. One is ' l-' '1 t&quot; space at object El:Please ~ in the reply :form ahd su~rmiE the summary to you. A. .</Paragraph> <Paragraph position="10"> Figure 1 An example of deep semantic relations and the correspondence &quot;to request information from someone&quot;. The other is &quot;to give attention in hearing&quot;. Japanese words &quot; ~ ~l-~ (name)&quot; and &quot; ~ &quot;~ (music)&quot; share the same verb&quot; ~ < &quot;. Using the small corpus, &quot; ~ Hl~ (name)&quot; and&quot; ~ (music)&quot; may be classified into the same category because they share the same verb, though not the same sense, on relatively</Paragraph> <Paragraph position="12"> This paper describes an approach to automatically classify the Japanese nouns.</Paragraph> <Paragraph position="13"> Our approach is characterized by distinguishing among senses of the same word by using Japanese-English word pairs extracted from a bilingual database. We suppose here that some senses of Japanese words are distinguished when Japanese sentences are translated into another language. For example, The following Japanese sentences (7),(8) are translated into English sentences (9),(10), respectively.</Paragraph> <Paragraph position="15"> (9)He sends a letter.</Paragraph> <Paragraph position="16"> (t0)He publishes a book.</Paragraph> <Paragraph position="17"> The Japanese word &quot; ~ T&quot; has at least two senses. One is &quot;to cause to go or be taken to a place&quot; and the other is &quot;to have printed and put on sale&quot;. In the above example, the Japanese word&quot; ~ ~-&quot; corresponds to &quot;send&quot; from sentences (7) and (9). The Japanese word &quot; ~ -~&quot; also corresponds to &quot;publish&quot; from sentences (8) and (10). That is to say, the Japanese word&quot; ~ T&quot; is translated into different English words according to the sense. This example shows that it may be possible to distinguish among senses of the same word by using words from another language. We used Japanese-English word pairs, for example,&quot; ~ ~-send&quot; and&quot; ~ ~publish&quot;, as senses of Japanese words.</Paragraph> <Paragraph position="18"> In this paper, these word pairs are acquired from ATR's large scale database.</Paragraph> </Section> <Section position="4" start_page="202" end_page="202" type="metho"> <SectionTitle> 2. CONTENT OF THE DATABASE </SectionTitle> <Paragraph position="0"> ATR has constructed a large-scale database which is collected from simulated telephone and keyboard conversations \[Ehara\]. The sentences collected in Japanese are manually translated into English. We obtain a bilingual database. The database is called the ATR Dialogue Database(ADD).</Paragraph> <Paragraph position="1"> ATR aims to build ADD to one million words covering two tasks. One task is dialogues between secretaries and participants of international conferences. The other is dialogues between travel agents and customers. Collected Japanese and English sentences are morphologically analyzed.</Paragraph> <Paragraph position="2"> Japanese sentences are also dependency analyzed and given deep semantic relations.</Paragraph> <Paragraph position="3"> We use 63 deep semantic cases\[Inoue\].</Paragraph> <Paragraph position="4"> Correspondences of Japanese and English are made by several linguistic units, for example words, sentences and so on.</Paragraph> <Paragraph position="5"> Figure 1 shows an example of deep semantic relations and correspondences of Japanese and English words. The sentence is already morphologically analyzed. The solid line shows deep semantic relations. The Japanese nouns&quot; ') 7&quot; ~ 4 7 ~r -- ~&quot; and &quot;~ ~'~&quot; modify the Japanese verbs &quot;~ v~&quot; and &quot;~ &quot;, respectively. The semantic relations are &quot;space at&quot; and &quot;object&quot;, which are almost equal to &quot;locative&quot; and &quot;objective&quot; of Fillmore's deep case\[Fillmore\]. The dotted line shows the word correspondence between Japanese and English. The Japanese words &quot;V 7&quot;~ 4 7 ~- --./~&quot;,&quot;~&quot;,&quot;~,~)&quot;and&quot;~ L&quot; correspond to the English words &quot;reply form&quot;, &quot;fill out&quot;, &quot;summary&quot; and &quot;submit&quot;, respectively. Here, &quot; ~ v,&quot; and &quot; ~i \[~&quot; are conjugations of&quot; ~ < &quot; and &quot; ~ -C/&quot;, respectively. However, it is possible to extract semantic relations and word correspondence in dictionary form, because ADD includes the dictionary forms.</Paragraph> </Section> <Section position="5" start_page="202" end_page="206" type="metho"> <SectionTitle> 3. CLASSIFICATION OF NOUNS </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="202" end_page="202" type="sub_section"> <SectionTitle> 3.1 Using Data </SectionTitle> <Paragraph position="0"> We automatically extracted from ADD not only deep semantic relations between Japanese nouns and verbs but also the English word which corresponds to the Japanese word. We used telephone dialogues between secretaries and participants because the scale of analyzed words was largest.</Paragraph> <Paragraph position="1"> Table 1 shows the current number of analyzed words.</Paragraph> <Paragraph position="2"> extracted from ADD. Each field is delimited by the delimiter &quot;1&quot;- The first field is the dialogue identification number in which the semantic relation appears. The second and the third fields are Japanese nouns and their corresponding English words. The next 2 fields are Japanese verbs and their corresponding English words. The last is the semantic relations between nouns and verbs. Moreover, we automatically acquired word pairs from the data shown in Figure 2. Different senses of nouns appear far less frequently than those of verbs because the database is restricted to a specific task. In this experiment, only word pairs of verbs are used. Figure 3 shows deep semantic relations between nouns and word pairs of verbs. The last field is raw frequency of co-occurrence. We used the data shown in Figure 3 for noun classification.</Paragraph> <Paragraph position="4"> 1801~: ~ Inewspaperl~! ;5 Iseelspace at Figure 2 An example of data extracted from ADD The experiment is done for a sample of 138 nouns which are included in the 500 most frequent words. The 500 most frequent words cover 90% of words accumulated in the telephone dialogue. Those nouns appear more frequently than 9 in ADD.</Paragraph> </Section> <Section position="2" start_page="202" end_page="203" type="sub_section"> <SectionTitle> 3.2 Semantic Distance of Nouns </SectionTitle> <Paragraph position="0"> Our classification approach is based on the &quot;distributional hypothesis&quot;. Based on this semantic theory, nouns are similar to the extent that they share verb senses. The aim of this paper is to show the efficiency of using the word pair as the word sense. We therefore used the following expression(l), which was already defined by Shirai\[Shirai\] as the distance between two words. The</Paragraph> <Paragraph position="2"> second term of the expression can show the semantic similarity between two nouns, because it is the ratio of the verb senses with which both nouns (a and b) occur and all the verb senses with which each noun (a or b) occurs. The distance is normalized from 0.0 to 1.0. If one noun (a) shares all verb senses with the other noun (b) and the frequency is also same, the distance is 0.0. If one noun (a) shares no verb senses with the other noun (b), the distance is 1.0.</Paragraph> </Section> <Section position="3" start_page="203" end_page="203" type="sub_section"> <SectionTitle> 3.3 Classification Method </SectionTitle> <Paragraph position="0"> For the classification, we adopted cluster analysis which is one of the approaches fn multivariant analysis. Cluster analysis is generally used in various fields, for example biology, ps.ychology, etc.. Some hierarchical clustering methods, for example the nearest neighbor method, the centroid method, etc., have been studied. It has been proved that the centroid method can avoid the chain effect. The chain effect is an undesirable phenomenon in which the nearest unit is not always classified into a cluster and more distant units are chained into a cluster. The centroid method is a method in which the cluster is characterized by the centroid of categorized units. In the following section, the result obtained by the centroid method is shown.</Paragraph> <Paragraph position="2"/> </Section> <Section position="4" start_page="203" end_page="204" type="sub_section"> <SectionTitle> 4.1 Clustering Result </SectionTitle> <Paragraph position="0"> All 138 nouns are hierarchically classified. However, only some subsets of the whole hierarchy are shown, as space is limited. In Figure 4, we can see that semantically similar nouns, which may be defined as &quot;things made from paper&quot;, are grouped together. The X-axis is the semantic distance defined before. Figure 5 shows another subset. All nouns in Figure 5, &quot;~ ~_ (decision)&quot;, &quot;~ ~(presentation)&quot;, &quot;;~ ~&quot; - ~&quot; (speech)&quot; and &quot; ~(talk)&quot;, have an active concept like verbs. Subsets of nouns shown in Figures 4 and 5 are fairly coherent. However, all subsets of nouns are not coherent. In Figure 6, &quot; ~ ~ 4 b deg (slide)&quot;, &quot;~, ~ (draft)&quot;, &quot; ~&quot; ~ (conference site)&quot;, &quot;8 E (8th)&quot; and&quot; ~R (station)&quot; are grouped together. The semantic distances are 0.67, 0.6, 0.7 and 0.8.</Paragraph> <Paragraph position="1"> The distance is upset when &quot;~ ~(conference site)&quot; is attached to the cluster containing &quot;:~ ~ 4 b'(slide)&quot; and &quot;~ ~(draft)&quot;. This is one characteristic of the centroid method.</Paragraph> <Paragraph position="2"> However, this seems to result in a semantically less similar cluster. The word pairs of verbs, the deep semantic relations and the frequency are shown in Table 2.</Paragraph> <Paragraph position="3"> After &quot;~ ~ 4 b ~ (slide)&quot; and &quot;~ ~(draft)&quot; are grouped into a cluster, the cluster and &quot; ~ (conference site)&quot; share two word pairs, &quot; fE &quot;) -use&quot; and &quot;~ ~ -be&quot;. &quot;~ Yo -be&quot; contributes more largely to attach &quot; ~ ~(conference site)&quot; to the cluster than &quot;tE ~) -use&quot; because the frequency of co-occurrence is greater. In this sample, &quot; ~ ~-be&quot; occurs with more nouns than &quot;f~ &quot;) -use&quot;. It shows that &quot;~J~ Yo -be&quot; is less important in characterizing nouns though the raw frequency of co-occurrence is greater. It is therefore necessary to develop a means of not relying on the raw frequency of co-occurrence, in order to make the clustering result more accurate. This is left to further study.</Paragraph> </Section> <Section position="5" start_page="204" end_page="206" type="sub_section"> <SectionTitle> 4.2 Estimation of the Result </SectionTitle> <Paragraph position="0"> All nouns are hierarchically classified, but some semantically separated clusters are acquired if the threshold is used.</Paragraph> <Paragraph position="1"> It is possible to compare clusters derived from this experiment with semantic categories which are used in our automatic interpreting telephony system. We used expression (2), which was defined by Goodman and Kruskal\[Goodman\], in order to objectively compare them.</Paragraph> <Paragraph position="3"> A : a set of clusters which are automatically obtained.</Paragraph> <Paragraph position="4"> B : a set ofclusters which are used in our interpreting telephony system.</Paragraph> <Paragraph position="5"> p * the number of clusters of a set A q : the number of clusters of a set B nij : the number of nouns which are included in both the ith cluster of A and the jth cluster of B n.j : the number ofnouns which are included in the jth cluster of B n : all nouns which are included in A or B They proposed that one set of clusters, called 'A', can be estimated to the extent that 'A' associates with the other set of clusters, called 'B'. In figure 7, two results are shown. One (solid line) is the result of using the word pair to distinguish among senses of the same verb. The other (dotted line} is the result of using the verb form itself. The X-axis is the number of classified nouns and the Y-axis is the value derived from the above expression.Figure 7 shows that it is better to use word pairs of verbs than not use them, when fewer than about 30 nouns are classified. However, both are almost the same, when more than about 30 nouns are classified. The result proves that the distinction of senses of verbs is successful when only a few nouns are classified.</Paragraph> </Section> </Section> class="xml-element"></Paper>