File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1174_metho.xml
Size: 17,791 bytes
Last Modified: 2025-10-06 14:08:47
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1174"> <Title>Automatic Construction of Nominal Case Frames and its Application to Indirect Anaphora Resolution</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Semantic Feature Dictionary </SectionTitle> <Paragraph position="0"> First of all, we briefly introduce NTT Semantic Feature Dictionary employed in this paper.</Paragraph> <Paragraph position="1"> NTT Semantic Feature Dictionary consists of a semantic feature tree, whose 3,000 nodes are semantic features, and a nominal dictionary containing about 300,000 nouns, each of which is given one or more appropriate semantic features. null The main purpose of using this dictionary is to calculate the similarity between two words.</Paragraph> <Paragraph position="2"> Suppose the word x and y have a semantic feature sx and sy, respectively, their depth is dx and dy in the semantic tree, and the depth of their lowest (most specific) common node is dc, the similarity between x and y, sim(x,y), is calculated as follows: sim(x,y) = (dc x2)/(dx +dy).</Paragraph> <Paragraph position="3"> If sx and sy are the same, the similarity is 1.0, the maximum score based on this criteria.</Paragraph> <Paragraph position="4"> We also use this dictionary to specify semantic category of words, such as human, time and place.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Semantic Analysis of Japanese </SectionTitle> <Paragraph position="0"> Noun Phrases Nm no Nh In many cases, obligatory cases of nouns are described in an ordinary dictionary for human being. For example, a Japanese dictionary for children, Reikai Shougaku Kokugojiten, or RSK (Tajika, 1997), gives the definitions of the word coach and virus as follows1: coach a person who teaches technique in some sport virus a living thing even smaller than bacteria which causes infectious disease like influenza null 1Although our method handles Japanese noun phrases by using Japanese definition sentences, in this paper we use their English translations for the explanation. In some sense, the essential point of our method is language-independent.</Paragraph> <Paragraph position="1"> Based on such an observation, (Kurohashi and Sakai, 1999) proposed a semantic analysis method of &quot;Nm no Nh&quot;, consisting of the two modules: dictionary-based analysis (abbreviated to DBA hereafter) and semantic feature-based analysis (abbreviated to SBA hereafter). This section briefly introduces their method.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Dictionary-based analysis </SectionTitle> <Paragraph position="0"> Obligatory case information of nouns in an ordinary dictionary can be utilized to solve the difficult problem in the semantic analysis of &quot;Nm no Nh&quot; phrases. In other words, we can say the problem disappears.</Paragraph> <Paragraph position="1"> For example, &quot;rugby no coach&quot; can be interpreted by the definition of coach as follows: the dictionary describes that the noun coach has an obligatory case sport, and the phrase &quot;rugby no coach&quot; specifies that the sport is rugby. That is, the interpretation of the phrase can be regarded as matching rugby in the phrase to some sport in the coach definition. &quot;Kaze 'cold' no virus&quot; is also easily interpreted based on the definition of virus, linking kaze 'cold' to infectious disease.</Paragraph> <Paragraph position="2"> Dictionary-based analysis (DBA) tries to find a correspondence between Nm and an obligatory case of Nh by utilizing RSK and NTT Semantic Feature Dictionary, by the following process: null 1. Look up Nh in RSK and obtain the definition sentences of Nh.</Paragraph> <Paragraph position="3"> 2. For each word w in the definition sentences other than the genus words, do the following steps: 2.1. When w is a noun which shows an obligatory case explicitly, like kotogara 'thing', monogoto 'matter', nanika 'something', and Nm does not have a semantic feature of human or time, give 0.8 to their correspondence2.</Paragraph> <Paragraph position="4"> 2.2. When w is other noun, calculate the similarity between Nm and w by using NTT Semantic Feature Dictionary, and give the similarity score to their correspondence.</Paragraph> <Paragraph position="5"> 3. Finally, if the best correspondence score is 0.75 or more, DBA outputs the best corre- null spondence, which can be an obligatory case of the input; if not, DBA outputs nothing.</Paragraph> <Paragraph position="6"> 2For the present, parameters in the algorithm were given empirically, not optimized by a learning method. 1. Nm:human, Nh:relative - <obligatory case(relative)> e.g. kare 'he' no oba 'aunt' 2. Nm:human, Nh:human - <modification(apposition)> e.g. gakusei 'student' no kare 'he' 3. Nm:organization, Nh:human - <belonging> e.g. gakkou 'school' no seito 'student' 4. Nm:agent, Nh:event - <agent> e.g. watashi 'I' no chousa 'study' 5. Nm:material, Nh:concrete - <modification(material)> e.g. ki 'wood' no hako 'box' 6. Nm:time, Nh:[?] - <time> e.g. aki 'autumn' no hatake 'field' 7. Nm:color, quantity, or figure, Nh:[?] - <modification> e.g. gray no seihuku 'uniform' 8. Nm:[?], Nh:quantity - <obligatory case(attribute)> e.g. hei 'wall' no takasa 'height' 9. Nm:[?], Nh:position - <obligatory case(position)> e.g. tsukue 'desk' no migi 'right' 10. Nm:agent, Nh:[?] - <possession> e.g. watashi 'I' no kuruma 'car' 11. Nm:place or position, Nh:[?] - <place> e.g. Kyoto no mise 'store' '[?]' meets any noun.</Paragraph> <Paragraph position="7"> In case of the phrase &quot;rugby no coach&quot;, &quot;technique&quot; and &quot;sport&quot; in the definition sentences are checked: the similarity between &quot;technique&quot; and &quot;rugby&quot; is calculated to be 0.21, and the similarity between &quot;sport&quot; and &quot;rugby&quot; is calculated to be 1.0. Therefore, DBA outputs &quot;sport&quot;.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Semantic feature-based analysis </SectionTitle> <Paragraph position="0"> Since diverse relations in &quot;Nm no Nh&quot; are handled by DBA, the remaining relations can be detected by simple rules checking the semantic features of Nm and/or Nh.</Paragraph> <Paragraph position="1"> Table 1 shows examples of the rules. For example, the rule 1 means that if Nm has a semantic feature human and Nh relative, <obligatory case> relation is assigned to the phrase. The rules 1, 2, 8 and 9 are for certain obligatory cases. We use these rules because these relationscanbeanalyzedmoreaccuratelybyusing explicit semantic features, rather than based on a dictionary.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 Integration of two analyses </SectionTitle> <Paragraph position="0"> Usually, either DBA or SBA outputs some relation. When both DBA and SBA output some relations, the results are integrated (basically, if DBA correspondence score is higher than 0.8, DBA result is selected; if not, SBA result is selected). In rare cases, neither analysis outputs any relations, which means analysis failure.</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Automatic Construction of </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Nominal Case Frames 4.1 Collection and analysis of Nm no Nh </SectionTitle> <Paragraph position="0"> Syntactically unambiguous noun phrases &quot;Nm no Nh&quot; are collected from the automatic parse results of large corpora, and they are analyzed using the method described in the previous section. null 'eaves/visor'.</Paragraph> <Paragraph position="1"> DBA result 1. a roof that stick out above the window of a house.</Paragraph> <Paragraph position="2"> [house] hall:2, balcony:1, building:1, *** [window] window:2, ceiling:1, counter:1, *** 2. the fore piece of a cap.</Paragraph> <Paragraph position="3"> [cap] cap:8, helmet:1, *** SBA result <place> parking:3, store:3, shop:2, *** <mod.> concrete:1, metal:1, silver:1, *** No semantic analysis result <other> part:1, light:1, phone:1, *** By just collecting the analysis results of each head word Nh, we can obtain its preliminary case frames. Table 2 shows preliminary case frames for hisashi 'eaves/visor'. The upper part of the table shows the results by DBA. The line starting with &quot;[house]&quot; denotes a group of analysis results corresponding to the word &quot;house&quot; in the first definition sentence. For example, &quot;hall no hisashi&quot; occurs twice in the corpora, and they were analyzed by DBA to correspond to &quot;house.&quot; The middle part of the table shows the results by SBA. Noun phrases that have no semantic analysisresult(analysisfailure)arebundledand named <other>, as shown in the last part of the table.</Paragraph> <Paragraph position="4"> A case frame should be constructed for each meaning (definition) of Nh, and groups starting with &quot;[...]&quot; or &quot;<...>&quot; in Table 2 are possible case slots. The problem is how to arrange the analysis results of DBA and SBA and how to distinguish obligatory cases and others. The following sections explain how to handle these problems.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 Case slot clustering </SectionTitle> <Paragraph position="0"> One obligatory case might be separated in preliminary case frames, since the definition sentence is sometimes too specific or too detailed.</Paragraph> <Paragraph position="1"> For example, in the case of hisashi 'eaves/visor' in Table 2, [house], [window], and <place> have very similar examples that mean building or part of building. Therefore, case slots are merged if similarity of two case slots is more than 0.5 (case slots in different definition sentences are not merged in any case). Similarity of two case slots is the average of top 25% similarities of all possible pairs of examples.</Paragraph> <Paragraph position="2"> In the case of Table 2, the similarity between [house] and [window] is 0.80, and that between [house] and <place> is 0.67, so that these three case slots are merged into one case slot.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.3 Obligatory case selection </SectionTitle> <Paragraph position="0"> Preliminary case frames contain both obligatory cases and optional cases for the head word.</Paragraph> <Paragraph position="1"> Since we can expect that an obligatory case co-occurs with the head word in the form of noun phrase frequently, we can take frequent case slots as obligatory case of the head word.</Paragraph> <Paragraph position="2"> However, we have to be careful to set up the frequency thresholds, because case slots detected by DBA or <obligatory case> by SBA are more likely to be obligatory; on the other hand case slots of <modification> or <time> should be always optional. Considering these tendencies, we set thresholds for obligatory cases as shown in Table 3.</Paragraph> <Paragraph position="3"> In the case of hisashi 'eaves/visor' in Table 2, [house-window]-<place> slot and [cap] slot are chosen as the obligatory cases.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.4 Case frame construction for each </SectionTitle> <Paragraph position="0"> meaning Case slots that are derived from each definition sentence constitute a case frame. If a case slot of <obligatory case> by SBA or <other> is not merged into case slots in definition sentences, it can be considered that it indicates a meaning of Nh which is not covered in the dictionary. Therefore, such a case slot constitutes an independent case frame.</Paragraph> <Paragraph position="1"> On the other hand, when other case slots by SBA such as <belonging> and <possessive> are remaining, we have to treat them differently. The reason why they are remaining is that they are not always described in the definition sentences, but their frequent occurrences indicate they are obligatory cases. Therefore, we add these case slots to the case frames derived from definition sentences.</Paragraph> <Paragraph position="2"> Table 4 shows several examples of resultant case frames. Hyoujou 'expression' has a case frame containing two case slots. Hisashi 'eaves/visor' has two case frames according to the two definition sentences. In case of hikidashi 'drawer', the first case frame corresponds to the definition given in the dictionary, and the second case frame was constructed from the <other> case slot, which is actually another sense of hikidashi, missed in the dictionary. In case of coach, <possessive> is added to the case frame which was made from the definition, producing a reasonable case frame for the word.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.5 Point of nominal case frame </SectionTitle> <Paragraph position="0"> construction The point of our method is the integrated use of a dictionary and example phrases from large corpora. Although dictionary definition sentences are informative resource to indicate obligatory cases of nouns, it is difficult to do indirect anaphora resolution by using a dictionary as it is, because all nouns in a definition sentence are not an obligatory case, and only the frequency information of noun phrases tells us which is the obligatory case. Furthermore, sometimes a definition is too specific or detailed, and the example phrases can adjust it properly, as in the example of hisashi in Table 2.</Paragraph> <Paragraph position="1"> On the other hand, a simple method that just collects and clusters &quot;Nm no Nh&quot; phrases (based on some similarity measure of nouns) can not construct comprehensive nominal case frames, because of polysemy and multiple obligatory cases. We can see that dictionary definition can guide the clustering properly even for such difficult cases.</Paragraph> <Paragraph position="2"> hisashi:1 'eaves/visor' (the edges of a roof that stick out above the window of a house etc.) [house, window] parking, store, hall, *** hisashi:2 'eaves/visor' (the fore piece of a cap.) [cap] cap, helmet, *** hyoujou 'expression' (to express one's feelings on the face or by gestures.) [one] people, person, citizen, *** [feelings] relief, margin, *** hikidashi:1 'drawer' (a boxlike container in a desk or a chest.) [desk, chest] desk, chest, dresser, *** hikidashi:2 'drawer' <other> credit, fund, saving, *** coach (a person who teaches technique in some sport.) [sport] baseball, swimming, *** <belonging> team, club, *** kabushiki 'stock' (the total value of a company's shares.) [company] company, corporation, ***</Paragraph> </Section> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Indirect Anaphora Resolution </SectionTitle> <Paragraph position="0"> To examine the practical usefulness of the constructed nominal case frames, we built a preliminary system of indirect anaphora resolution based on the case frames.</Paragraph> <Paragraph position="1"> An input sentence is parsed using the Japanese parser, KNP (Kurohashi and Nagao, 1994). Then, from the beginning of the sentence, each noun x is analyzed. When x has more than one case frame, the process of antecedent estimation (stated in the next paragraph) is performed for each case frame, and the case frame with the highest similarity score (described below) and assignments of antecedents to the case frame are selected as a final result. For each case slot of the target case frame of x, its antecedent is estimated. A possible antecedent y in the target sentence and the previous two sentences is checked. This is done one by one, from the syntactically closer y. If the similarity of y to the case slot is equal to or greater than a threshold a (currently 0.95), it is assigned to the case slot.</Paragraph> <Paragraph position="2"> The similarity between y and a case slot is defined as the highest similarity between y and an example in the case slot.</Paragraph> <Paragraph position="3"> For instance, let us consider the sentence shown in Figure 1. soccer, at the beginning of the sentence, has no case frame, and is considered to have no obligatory case.</Paragraph> <Paragraph position="4"> For the second noun ticket, soccer, which is a nominal modifier of ticket, is examined first.</Paragraph> <Paragraph position="5"> The similarity between soccer and the examples of the case slot [theater, transport] exceeds the ticket [theater, transport] stage, game,*** soccer nedan [things] thing, ticket,*** ticket ticket a printed piece of paper which shows that you have paid to enter a theater or use a transport nedan the amount of money for which things are sold or Lastly, for nedan 'price', its possible antecedents are ticket and soccer. ticket, which is the closest from nedan, is checked first. The similarity between ticket and the examples of the case slot [things] exceeds the threshold a, and ticket is judged as the antecedent of nedan.</Paragraph> </Section> class="xml-element"></Paper>