File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/p99-1062_metho.xml
Size: 20,346 bytes
Last Modified: 2025-10-06 14:15:25
<?xml version="1.0" standalone="yes"?> <Paper uid="P99-1062"> <Title>Semantic Analysis of Japanese Noun Phrases : A New Approach to Dictionary-Based Understanding</Title> <Section position="4" start_page="481" end_page="482" type="metho"> <SectionTitle> 3 Interpretation of N1 no N2 using a Dictionary </SectionTitle> <Paragraph position="0"> Semantic-role information of nouns in an ordinary dictionary can be utilized to solve the difficult problem in the semantic analysis of N1 1Although our method handles Japanese noun phrases by using Japanese definition sentences, in this paper we use their English translations for the explanation. In some sense, the essential point of our method is language-independent.</Paragraph> <Paragraph position="1"> no N2 phrases. In other words, we can say the problem disappears.</Paragraph> <Paragraph position="2"> For example, rugby no coach can be interpreted by the definition of coach as follows: the dictionary describes that the noun coach has an semantic role of sport, and the phrase rugby no coach specifies that the sport is rugby. That is, the interpretation of the phrase can be regarded as matching rugby in the phrase to some sport in the coach definition. Furthermore, based on this interpretation, we can paraphrase rugby no coach into a person who teaches technique in rugby, by replacing some sport in the definition with rugby.</Paragraph> <Paragraph position="3"> Kaze 'cold' no virus is also easily interpreted based on the definition of virus, linking kaze 'cold' to infectious disease.</Paragraph> <Paragraph position="4"> Such a dictionary-based method can handle interpretation of most phrases where conventional classification-based analysis failed. As a result, we can arrange the diversity of N1 no N2 senses simply as in Table 1.</Paragraph> <Paragraph position="5"> The semantic-role relation is a relation that N1 fills in an semantic role of N2. When N2 is an action noun, an object-action relation is also regarded as a semantic-role relation.</Paragraph> <Paragraph position="6"> On the other hand, in the agent, possession and belonging relations, N1 and N2 have a weaker relationship. In theory, any action can be done by anyone (my study, his reading, etc.); anything can be possessed by anyone (my pen, his feeling, etc.); and anyone can belong to any organization (I belong to a university, he belongs to any community, etc.).</Paragraph> <Paragraph position="7"> The difference between the semantic-role relation and the agent, possession, belonging relations can correspond to the difference between the agent and the object of verbs. In general, the object has a stronger relationship with a verb than the agent, which leads several asymmetrical linguistic phenomena.</Paragraph> <Paragraph position="8"> The time and place relations have much clearer correspondence to optional cases for verbs. A modification relation is also parallel to modifiers for verbs. If a phrase has a modification relation, it can be paraphrased into N2 is N1, like gray no seihuku 'uniform' is paraphrased into seihuku 'uniform' is gray.</Paragraph> <Paragraph position="9"> The last relation, the complement relation is the most difficult to interpret. The relation between N1 and N2 does not come from Nl'S semantic roles, or it is not so weak as the other relations. For example, kimono no jyosei 'lady' means a lady wearing a kimono, and nobel-sho 'Nobel prize' no kisetsu 'season' means a season when the Nobel prizes are awarded. Since automatic interpretation of the complement relation is much more difficult than that of other relations, it is beyond the scope of this paper.</Paragraph> </Section> <Section position="5" start_page="482" end_page="485" type="metho"> <SectionTitle> 4 Analysis Method </SectionTitle> <Paragraph position="0"> Once we can arrange the diversity of N1 no N 2 senses as in Table 1, their analysis becomes very simple, consisting of the following two modules: 1. Dictionary-based analysis (abbreviated to DBA hereafter) for semantic-role relations. 2. Semantic feature-based analysis (abbreviated to SBA hereafter) for some semantic- null role relations and all other relations.</Paragraph> <Paragraph position="1"> After briefly introducing resources employed, we explain the algorithm of the two analyses.</Paragraph> <Section position="1" start_page="482" end_page="483" type="sub_section"> <SectionTitle> 4.1 Resources </SectionTitle> <Paragraph position="0"> RSK (Reikai Shougaku Kokugojiten), a Japanese dictionary for children, is used to find semantic roles of nouns in DBA. The reason why we use a dictionary for children is that, generally speaking, definition sentences of such a dictionary are described by basic words, which helps the system finding links between N1 and a semantic role of a head word.</Paragraph> <Paragraph position="1"> All definition sentences in RSK were analyzed by JUMAN, a Japanese morphological analyzer, and KNP, a Japanese syntactic and case analyzer (Kurohashi and Nagao, 1994; Kurohashi and Nagao, 1998). Then, a genus word for a head word, like a person for coach were detected in the definition sentences by simple rules: in a Japanese definition sentence, the last word is a genus word in almost all cases; if there is a noun coordination at the end, all of those nouns are regarded as genus words.</Paragraph> <Paragraph position="2"> tree, whose 3,000 nodes are semantic features, and a nominal dictionary containing about 300,000 nouns, each of which is given one or more appropriate semantic features. Figure 1 shows the upper levels of the semantic feature tree.</Paragraph> <Paragraph position="3"> SBA uses the dictionary to specify conditions of rules. DBA also uses the dictionary to calculate the similarity between two words. Suppose the word X and Y have a semantic feature Sx and Sy, respectively, their depth is dx and dy in the semantic tree, and the depth of their lowest (most specific) common node is de, the similarity between X and Y, sire(X, Y), is calculated as follows: sire(X, Y) = (dc x 2)/(dx + dy).</Paragraph> <Paragraph position="4"> If Sx and Sy are the same, the similarity is 1.0, the maximum score based on this criteria.</Paragraph> <Paragraph position="5"> NTT CS Lab also constructed a case frame dictionary for 6,000 verbs, using the semantic features described above. For example, a case frame of the verb kakou-suru (process) is as follows: null</Paragraph> <Paragraph position="7"> where ga and wo are Japanese nominative and accusative case markers. The frame describes that the verb kakou-suru takes two cases, nouns of AGENT semantic feature can fill the ga-case slot and nouns of CONCRETE semantic feature can fill the wo-case slot. KNP utilizes the case frame dictionary for the case analysis.</Paragraph> </Section> <Section position="2" start_page="483" end_page="484" type="sub_section"> <SectionTitle> 4.2 Algorithm </SectionTitle> <Paragraph position="0"> Given an input phrase N1 no N2, both DBA and SBA are applied to the input, and then the two analyses are integrated.</Paragraph> <Paragraph position="1"> Dictionary based-Analysis (DBA) tries to find a correspondence between N1 and a semantic role of N2 by utilizing RSK, by the following process: 1. Look up N2 in RSK and obtain the definition sentences of N2.</Paragraph> <Paragraph position="2"> 2. For each word w in the definition sentences other than the genus words, do the following steps: 2.1. When w is a noun which shows a semantic role explicitly, like kotogara 'thing', monogoto 'matter', nanika 'something', and N1 does not have a semantic feature of HUMAN or TIME, give 0.9 to their correspondence 2. 2.2. When w is other noun, calculate the similarity between N1 and w by using NTT Semantic Feature Dictionary (as described in Section 4.1.2), and give 2For the present, parameters in the algorithm were given empirically, not optimized by a learning method. the similarity score to their correspondence. null 2.3. When w is a verb, it has a vacant case slot, and the semantic constraint for the slot meets the semantic feature of N1, give 0.5 to their correspondence. .</Paragraph> <Paragraph position="3"> .</Paragraph> <Paragraph position="4"> If we could not find a correspondence with 0.6 or more score by the step 2, look up the genus word in the RSK, obtain definition sentences of it, and repeat the step 2 again. (The looking up of a genus word is done only once.) Finally, if the best correspondence score is 0.5 or more, DBA outputs the best correspondence, which can be a semantic-role relation of the input; if not, DBA outputs nothing.</Paragraph> <Paragraph position="5"> For example, the input rugby no coach is analyzed as follows (figures attached to words indicate the similarity scores; the underlined score is the best): (1) rugby no coach coach a person who teaches technique0.21 in some sport 1.0 Rugby, technique and sport have the semantic feature SPORT, METHOD and SPORT respectively in NTT Semantic Feature Dictionary. The lowest common node between SPORT and METHOD is ABSTRACT, and based on these semantic features, the similarity between rugby and technique is calculated as 0.21. On the other hand, the similarity between rugby and sport is calculated as 1.0, since they have the same semantic feature. The case analysis finds that all case slots of teach are filled in the definition sentence. As a result, DBA outputs the correspondence between rugby and sport as a possible semantic-role relation of the input.</Paragraph> <Paragraph position="6"> On the other hand, bunsho 'writings' no tatsujin 'expert' is an example that N1 corresponds to a vacant case slot of the predicate outstanding: null (2) bunshou 'writings' no tatsujin 'expert' expert a person being outstanding (at C/0.50) Puroresu 'pro wrestling' no chukei 'relay' is an example that the looking up of a genus word broadcast leads to the correct analysis: (3) puroresu 'pro wrestling' no chukei 'relay' relay a relay broadcast broadcast a radioo.o or televisiono.o presentation of news 0.48, entertainment 0.87, music o.so and others Since diverse relations in N1 no N2 are handled by DBA, the remaining relations can be detected by simple rules checking the semantic features of N1 and/or N2.</Paragraph> <Paragraph position="7"> The following rules are applied one by one to the input phrase. Once the input phrase meets a condition, SBA outputs the relation in the rule, and the subsequent rules are not applied any more.</Paragraph> <Paragraph position="8"> 1. NI:HUMAN, N2:RELATIVE --~ semanticrole(relative) null e.g. kare 'he' no oba 'aunt' 2. NI:HUMAN, N2:PERSONAL._RELATION --~ semantic-role(personal relation) e.g. kare 'he' no tomodachi 'friend' 3. NI:HUMAN, N2:HUMAN --~ modification(apposition) null e.g. gakusei 'student' no kare 'he' 4. NI:ORGANIZATION, N2:HUMAN ~ belonging null e.g. gakkou 'school' no sensei 'teacher' 5. NI:AGENT, N2:EVENT ~ agent e.g. senmonka 'expert' no chousa 'study' 6. NI:MATERIAL, N2:CONCRETE --+ modification(material) null e.g. ki 'wood' no hako 'box' 7. NI:TIME, N2:* 3 ___+ time e.g. aki 'autumn' no hatake 'field' 8. NI:COLOR, QUANTITY, or FIGURE, g2:* modification e.g. gray no seihuku 'uniform' 9. gl:*, N2:QUANTITY ~ semantic-role(attribute) null e.g. hei 'wall' no takasa 'height' 10. gl:* , N2:POSITION ~ semantic-role(position) null e.g. tsukue 'desk' no migi 'right' 11. NI:AGENT, Y2:* ~ possession e.g. watashi f no kuruma 'car' 12. NI:PLACE or POSITION, N2:* ---* place e.g. Kyoto no mise 'store' The rules 1, 2, 9 and 10 are for certain semantic-role relation. We use these rules because these relations can be analyzed more accurately by using explicit semantic features, rather than based on a dictionary.</Paragraph> </Section> <Section position="3" start_page="484" end_page="485" type="sub_section"> <SectionTitle> 4.2.3 Integration of Two Analyses </SectionTitle> <Paragraph position="0"> Usually, either DBA or SBA outputs some relation. In rare cases, neither analysis outputs any relation, which means analysis failure. When both DBA and SBA output some relations, the results are integrated as follows (basically, if the output of the one analysis is more reliable, the output of the other analysis is discarded): If a semantic-role relation is detected by SBA, discard the output from DBA.</Paragraph> <Paragraph position="1"> Else if the correspondence of 0.95 or more score is detected by DBA, discard the output from SBA.</Paragraph> <Paragraph position="2"> Else if some relation is detected by SBA, discard the output from DBA if the correspondence score is 0.8 or less.</Paragraph> <Paragraph position="3"> In the case of the following example, rojin 'old person' no shozo 'portrait', both analyses were accepted by the above criteria.</Paragraph> <Paragraph position="4"> 3,., meets any noun.</Paragraph> <Paragraph position="5"> portrait a painting0.17 or photograph0.17 of a face0.1s or figure0.0 of real person 0.s4 SBA : NI:AGENT , N2:* ----+ possession DBA interpreted the phrase as a portrait on which an old person was painted; SBA detected the possession relation which means an old per-son possesses a portrait. One of these interpretations would be preferred depending on context, but this is a perfect analysis expected for N1 no N2 analysis.</Paragraph> </Section> </Section> <Section position="6" start_page="485" end_page="486" type="metho"> <SectionTitle> 5 Experiment and Discussion </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="485" end_page="485" type="sub_section"> <SectionTitle> 5.1 Experimental Evaluation </SectionTitle> <Paragraph position="0"> We have collected 300 test N1 no N2 phrases from EDR dictionary (Japan Electronic Dictionary Research Institute Ltd., 1995), IPA dictionary (Information-Technology Promotion Agency, Japan, 1996), and literatures on N1 no N2 phrases, paying attention so that they had enough diversity in their relations. Then, we analyzed the test phrases by our system, and checked the analysis results by hand.</Paragraph> <Paragraph position="1"> Table 2 shows the reasonably good result both of DBA and SBA. The precision of DBA, the ratio of correct analyses to detected analyses, was 77% (=137/(137+19+21)); the recall of DBA, the ratio of correct analyses to potential semantic-role relations, was 78% (=137/(137+19+19)). The result of SBA is also good, excepting modification relation.</Paragraph> <Paragraph position="2"> Some phrases were given two or more relations. On average, 1.1 relations were given to one phrase. The ratio that at least one correct relation was detected was 81% (=242/300); the ratio that all possibly correct relations were detected and no incorrect relation was detected was 73% (=219/300).</Paragraph> </Section> <Section position="2" start_page="485" end_page="486" type="sub_section"> <SectionTitle> 5.2 Discussion of Correct Analysis </SectionTitle> <Paragraph position="0"> The success ratio above was reasonably good, but we would like to emphasize many interesting and promising examples in the analysis results.</Paragraph> <Paragraph position="1"> (5) mado 'window' no curtain 'curtain' curtain a hanging cloth that can be drawn to cover a window1.0 in a room0.s3, to divide a room0.s3, etc.</Paragraph> <Paragraph position="2"> (6) osetsuma 'living room' no curtain 'curtain' curtain a hanging cloth that can be drawn to cover a window0.s2 in a room 1.0, to divide a room 1.0, etc.</Paragraph> <Paragraph position="3"> (7) oya 'parent' no isan 'legacy' lagacy property left on the death of the owner 0.s4 Mado 'window' no curtain must embarrass conventional classification-based methods; it might be place, whole-part, purpose, or some other relation like being close. However, DBA can clearly explain the relation. Osetuma 'living room' no curtain is another interestingly analyzed phrase. DBA not only interprets it in a simple sense, but also provides us with more interesting information that a curtain might be being used for partition in the living room.</Paragraph> <Paragraph position="4"> The analysis result of oya 'parent' no isan 'legacy' is also interesting. Again, not only the correct analysis, but also additional information was given by DBA. That is, the analysis result tells us that the parent died. Such information would facilitate intelligent peformance in a dialogue system analyzing: User : I bought a brand-new car by the legacy from my parent.</Paragraph> <Paragraph position="5"> System : Oh, when did your parent die? I didn't know that.</Paragraph> <Paragraph position="6"> By examining these analysis results, we can conclude that the dictionary-based understanding approach can provide us with much richer information than the conventional classification-based approaches.</Paragraph> </Section> <Section position="3" start_page="486" end_page="486" type="sub_section"> <SectionTitle> 5.3 Discussion of Incorrect Analysis </SectionTitle> <Paragraph position="0"> It is possible to classify some of the causes of incorrect analyses arising from our method.</Paragraph> <Paragraph position="1"> One problem is that a definition sentence does not always describe well the semantic roles as follows: (8) shiire 'stocking' no saikaku 'resoucefulness' resoucefulness the ability to use one's head 0.1s cleverly Saikaku 'resourcefulness' can be the ability for some task, but the definition says nothing about that. On the other hand, the definition of sainou 'talent' is clearer about the semantic role as shown below. Concequently, shii~e 'stocking' no sainou 'tMent' can be interpretted correctly by DBA.</Paragraph> <Paragraph position="2"> (9) shiire 'stocking' no sainou 'talent' talent power and skill, esp. to do something 0.90 This represents an elementary problem of our method. Out of 175 phrases which should be interpreted as semantic-role relation based on the dictionary, 13 were not analyzed correctly because of this type of problem.</Paragraph> <Paragraph position="3"> However, such a problem can be solved by revising the definition sentences, of course in natural language. This is a humanly reasonable task, very different from the conventional approach where the classification should be reconsidered, or the classification rules should be modified.</Paragraph> <Paragraph position="4"> Another problem is that sometimes the similarity calculated by NTT semantic feature dictionary is not high enough to correspond as follows: null (10) ume 'ume flowers' no meisho 'famous place' famous place a place being famous for scenery 0.20, etc.</Paragraph> <Paragraph position="5"> In some cases the structure of NTT semantic feature dictionary is questionable; in some cases a definition sentence is too rigid; in other cases an input phrase is a bit metaphorical.</Paragraph> <Paragraph position="6"> As for SBA, most relations can be detected well by simple rules. However, it is not possible to detect a modification relation accurately only by using NTT semantic feature dictionary, because modifier and non-modifier nouns are often mixed in the same semantic feature category.</Paragraph> <Paragraph position="7"> Other proper resource should be incorporated; one possibility is to use the dictionary definition of N1.</Paragraph> </Section> </Section> <Section position="7" start_page="486" end_page="486" type="metho"> <SectionTitle> 6 Related Work </SectionTitle> <Paragraph position="0"> From the view point of semantic roles of nouns, there have been several related research conducts: the mental space theory is discussing the functional behavior of nouns (Fauconnier, 1985); the generative lexicon theory accounts for the problem of creative word senses based on the qualia structure of a word (Pustejovsky, 1995); Dahl et al. (1987) and Macleod et al.</Paragraph> <Paragraph position="1"> (1997) discussed the treatment of nominalizations. Compared with these studies, the point of this paper is that an ordinary dictionary can be a useful resource of semantic roles of nouns.</Paragraph> <Paragraph position="2"> Our approach using an ordinary dictionary is similar to the approach used to creat Mind-Net (Richardson et al., 1998). However, the semanitc analysis of noun phrases is a much more specialized and suitable application of utilizing dictionary entries.</Paragraph> </Section> class="xml-element"></Paper>