File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/h01-1043_metho.xml
Size: 17,681 bytes
Last Modified: 2025-10-06 14:07:32
<?xml version="1.0" standalone="yes"?> <Paper uid="H01-1043"> <Title>Japanese Case Frame Construction by Coupling the Verb and its Closest Case Component</Title> <Section position="3" start_page="1" end_page="1" type="metho"> <SectionTitle> 2. VARIOUS METHODS FOR CASE FRAME CONSTRUCTION </SectionTitle> <Paragraph position="0"> Weemploythefollowingprocedureofcaseframeconstruction from rawcorpus (Figure 1): 1. A large raw corpus isparsed by KNP [5],and reliable modifier-head relations are extracted from the parse results. We call these modifier-head relations examples. null saurus. We call the output of this process example case frames, which is the final result of the system. We call words which compose case components case examples, andagroup ofcaseexamples case example group. In Figure 1, nimotsu 'baggage', busshi In English, several unsupervised methods have been proposed[7, 1]. However, it is different from those that combinations ofnouns and verbs must be collected in Japanese. example patterns raw corpus tagging or analysis+extraction of reliable relations thesaurus by hand or learning I. examples example case frames III. merged frame II. co-occurrences IV. semantic case frames</Paragraph> <Paragraph position="2"> 'supply', and keiken 'experience' are case examples, and {nimotsu 'baggage', busshi 'supply'} (of wo case markerinthefirstexamplecase frameof tsumu 'load, accumulate') is a case example group. A case componentthereforeconsistsofacaseexampleandacase null marker (CM).</Paragraph> <Paragraph position="3"> Let us now discuss several methods of case frame construction as shown in Figure 1.</Paragraph> <Paragraph position="4"> First, examples (I of Figure 1) can be used individually, but this method cannot solve the sparse data problem. For example, (1) kuruma ni nimotsu wo tsumu car dat-CM baggage acc-CM load (load baggage onto the car) (2) truck ni busshi wo tsumu truck dat-CM supply acc-CM load (load supply onto the truck) even if these two examples occur in a corpus, it cannot be judged whether the expression &quot;kuruma ni busshi wo tsumu&quot;(load supply onto the car) isallowed or not. Secondly, examples can be decomposed into binomial relations (II of Figure 1). These co-occurrences are utilized bystatisticalparsers,andcanaddressthesparsedataproblem. Inthis case, however, verb sense ambiguity becomes a serious problem. For example, (3) kuruma ni nimotsu wo tsumu car dat-CM baggage acc-CM load (load baggage onto the car) (4) keiken wo tsumu experience acc-CM accumulate (accumulate experience) fromthese two examples, three co-occurrences (&quot;kuruma ni tsumu&quot;,&quot;nimotsu wo tsumu&quot;,and &quot;keiken wo tsumu&quot;)are extracted. They, however, allow the incorrect expression &quot;kuruma ni keiken wo tsumu&quot; (load experience onto the car, accumulate experience onto the car).</Paragraph> <Paragraph position="5"> Thirdly, examples can be simply merged into one frame (III of Figure 1). However, information quantity of this is equivalent to that of the co-occurrences (II of Figure 1), so verb sense ambiguity becomes a problem as well. We distinguish examples by the verb and its closest case component. Our method can address the two problems above: verb sense ambiguity and sparse data.</Paragraph> <Paragraph position="6"> Ontheotherhand, semanticmarkers can beused ascase componentsinsteadofcaseexamples. Thesewecallsemantic case frames (IV of Figure 1). Constructing semantic caseframesbyhandleadstotheproblemmentionedinSection1. Utsuroetal. constructed semanticcase framesfrom a corpus [8]. There are three main differences to our approach: they use an annotated corpus, depend deeply on a thesaurus, and did not resolve verb sense ambiguity.</Paragraph> </Section> <Section position="4" start_page="1" end_page="3" type="metho"> <SectionTitle> 3. COLLECTING EXAMPLES </SectionTitle> <Paragraph position="0"> This section explains how to collect examples shown in Figure1. Inorder toimprovethequalityofcollectedexamples, reliablemodifier-head relationsare extracted fromthe parsed corpus.</Paragraph> <Section position="1" start_page="1" end_page="2" type="sub_section"> <SectionTitle> 3.1 Conditions of case components </SectionTitle> <Paragraph position="0"> When examples are collected, case markers, case examples,and case components must satisfythefollowingconditions. null Conditions of case markers Case components which have the following case markers (CMs) are collected: ga (nominative), wo (accusative), ni (dative), to (with, that), de (optional), kara (from), yori (from), he (to),and made (to). Wealsohandle compound case markers such as ni-tsuite 'in terms of', wo-megutte 'concerning', and others.</Paragraph> <Paragraph position="1"> Inadditiontothesecases,weintroduce time case marker.</Paragraph> <Paragraph position="2"> Casecomponents whichbelong totheclass <time>(seebelow) and contain a ni, kara,ormade CM are merged into time CM. This is because it is important whether a verb deeplyrelatestotimeornot,butnottodistinguishbetween surface CMs.</Paragraph> <Paragraph position="3"> Generalization of case examples Case examples which have definite meanings are generalized. Weintroducethefollowingthreeclasses,andusethese classes instead of words as case examples.</Paragraph> <Paragraph position="4"> <time> * nouns which mean time e.g. asa 'morning', haru 'spring', rainen 'next year' * case examples which contain a unit of time e.g. 1999nen 'year',12gatsu 'month', 9ji 'o'clock' * words which are followed by the suffix mae 'before', tyu 'during',or go 'after'anddonothavethesemantic marker <place> on the thesaurus e.g. kaku mae 'before *** write', kaigi go 'afterthe meeting' <quantity> * numerals e.g. ichi 'one', ni 'two', juu 'ten' * numerals followedby anumeralclassifier suchas tsu, ko,and nin.</Paragraph> <Paragraph position="5"> They are expressed with pairs of the class <quantity>andanumeralclassifier: <quantity>tsu,<quantity>ko,and <quantity>nin. e.g. 1tsu - <quantity>tsu</Paragraph> <Paragraph position="7"> function as quotations (&quot;*** koto wo&quot; 'that ***').</Paragraph> <Paragraph position="8"> e.g. kaku to 'that *** write', kaita koto wo 'that *** wrote' Exclusion of ambiguous case components We do not use the followingcase components: * Since case components which contain topic markers (TMs) and clausal modifiers do not have surface case markers, wedo not use them. For example, sono giin wa *** wo teian-shita.</Paragraph> <Paragraph position="9"> the assemblyman TM acc-CM proposed wa is a topic marker and giin wa 'assemblyman TM' depends on teian-shita 'proposed',butthereisnocase marker for giin 'assemblyman' in relation to teian-shita 'proposed'.</Paragraph> <Paragraph position="10"> *** wo teian-shiteiru giin ga *** acc-CM proposing assemblyman &quot;*** wo teian-shiteiru&quot;isaclausalmodifierand teian-shiteiru 'proposing' depends on giin 'assemblyman', but there is no case marker for giin 'assemblyman' in relation to teian-shiteiru 'proposing'.</Paragraph> <Paragraph position="11"> * Casecomponentswhichcontaina ni or de casemarker are sometimes used adverbially. Since they have the optional relation to theirverbs, wedo not use them. e.g. tame ni 'because of', mujouken ni 'unconditionally', ue de 'in addition to' For example, 30nichi ni souri daijin ga 30th on prime minister nom-CM sono 2nin ni those two people dat-CM syou wo okutta award acc-CM gave Most nouns must take a numeral classifier when they are quantifiedinJapanese. AnEnglishequivalenttoitis'piece'. (On 30th the prime minister gave awards to those twopeople.) null from this sentence, the followingexample is acquired.</Paragraph> <Paragraph position="13"/> </Section> <Section position="2" start_page="2" end_page="3" type="sub_section"> <SectionTitle> 3.2 Conditions of verbs </SectionTitle> <Paragraph position="0"> Wecollectexamplesnotonlyforverbs,butalsoforadjectivesand noun+copulas . However,whenaverbisfollowed by a causative auxiliary or a passive auxiliary, we do not collect examples, since the case pattern is changed.</Paragraph> </Section> <Section position="3" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 3.3 Extraction of reliable examples </SectionTitle> <Paragraph position="0"> When examples are extracted from automatically parsed results, the problem is that the parsed results inevitably contain errors. Then, to decrease influences of such errors, we discard modifier-head relations whose parse accuracies are low and use only reliablerelations.</Paragraph> <Paragraph position="1"> KNPemploys thefollowingheuristicrulestodeterminea head of a modifier: HR1 KNP narrows the scope of a head by finding a clear boundary ofclauses inasentence. When thereisonly onecandidate verb inthe scope,KNPdeterminesthis verb as the head of the modifier.</Paragraph> <Paragraph position="2"> HR2 Among the candidate verbs, verbs which rarely take case components are excluded.</Paragraph> <Paragraph position="3"> HR3 KNPdeterminestheheadaccordingtothepreference: a modifier which isnot followedby a comma depends onthenearestcandidate,andamodifierwithacomma depends on the second nearest candidate.</Paragraph> <Paragraph position="4"> Our approach trusts HR1 but not HR2 and HR3. That is, modifier-head relations which are decided in HR1 (there is only one candidate of the head in the scope) are extracted as examples, but relations which HR2 and HR3 are applied to are not extracted. The following examples illustrate the application ofthese rules.</Paragraph> <Paragraph position="5"> (5) kare wa kai-tai hon wo he TM want to buy book acc-CM takusan mitsuketa node, a lot found because Tokyo he okutta.</Paragraph> <Paragraph position="6"> Tokyo to sent (Because he found a lotofbooks which hewants tobuy, he sent them toTokyo.) Inthisexample,anexamplewhichcanbeextractedwithout ambiguityis&quot;Tokyo he okutta&quot;'sent ph toTokyo'attheend ofthesentence. Inaddition,since node 'because'isanalyzed as a clear boundary of clauses, the head candidate of hon wo 'book acc-CM' is only mitsuketa 'find', and this is also extracted.</Paragraph> <Paragraph position="7"> Verbs excluded from head candidates by HR2 possibly become heads, so wedonot use theexamples whichHR2 is applied to. For example, when there is a strong verb right In this paper, we use 'verb' instead of 'verb/adjective or noun+copula' for simplicity.</Paragraph> <Paragraph position="8"> In this example, the correct head of mawari ga 'spread' is hayaku 'rapidly'. However, since hayaku 'rapidly' is excluded from the head candidates, the head of mawari ga 'spread' is analyzed incorrectly.</Paragraph> <Paragraph position="9"> We show an example of the process HR3: (7) kare ga shitsumon ni he nom-CM question acc-CM sentou wo kitte kotaeta.</Paragraph> <Paragraph position="10"> lead acc-CM take answered (He took the lead to answer the question.) In this example, head candidates of shitsumon ni 'question acc-CM' are kitte 'take' and kotaeta 'answered'. According tothepreference&quot;modifythenearerhead&quot;,KNPincorrectly decides the head is kitte 'take'. Like this example, when there are many head candidates, the decided head is not reliable,so we do not use examples in this case. We extracted reliable examples from Kyoto University Corpus[6],thatisasyntacticallyanalyzed corpus, andevaluated the accuracy of them. The accuracy of all the case examples which have the target cases was 90.9%, and the accuracy of the reliable examples was 97.2%. Accordingly, this process isvery effective.</Paragraph> </Section> </Section> <Section position="5" start_page="3" end_page="5" type="metho"> <SectionTitle> 4. CONSTRUCTION OF EXAMPLE CASE FRAMES </SectionTitle> <Paragraph position="0"> As shown in Section 2, when examples whose verbs have differentmeaningsaremerged,acaseframewhichallowsan incorrect expression is created. So, for verbs with different meanings, different case frames should be acquired.</Paragraph> <Section position="1" start_page="3" end_page="3" type="sub_section"> <SectionTitle> Inmostcases,animportantcasecomponentwhichdecides </SectionTitle> <Paragraph position="0"> thesenseofaverbistheclosestonetotheverb,thatis,the verb sense ambiguity can be resolved by coupling the verb and itsclosest casecomponent. Accordingly,wedistinguish examples by the verb and its closest case component. We call the case marker of the closest case component closest case marker.</Paragraph> <Paragraph position="1"> The number of example patterns which one verb has is equal to that of the closest case components. That is, example patterns which have almost the same meaning are individually handled as follows: tween example patterns (Numerals in the lower right of examples represent their frequencies.) ample case frames consist of the example pattern clusters. Thedetailoftheclusteringisdescribedinthefollowingsection. null</Paragraph> </Section> <Section position="2" start_page="3" end_page="4" type="sub_section"> <SectionTitle> 4.1 Similarity between example patterns </SectionTitle> <Paragraph position="0"> Theclustering ofexample patterns isperformed by using the similarity between example patterns. This similarity is based on the similarities between case examples and the ratio of common cases. Figure 2 shows an example of calculating the similaritybetween example patterns.</Paragraph> <Paragraph position="2"> are the depths of x,y inthe thesaurus, andthe depthoftheirlowest(most specific)commonnodeis L.Ifx and y areinthesamenode of the thesaurus, the similarity is 1.0, the maximum score based on this criterion.</Paragraph> <Paragraph position="4"> where the cases ofexample pattern F are defined in the same way. The square root in this equation decreases influences of the frequencies. The similarity between F is the product of the ratioof common cases and thesimilaritiesbetweencase example groups of common cases of F</Paragraph> </Section> <Section position="3" start_page="4" end_page="5" type="sub_section"> <SectionTitle> 4.2 Selection of semantic markers of example </SectionTitle> <Paragraph position="0"> patterns The similarities between example patterns are deeply influenced by semantic markers of the closest case components. So,when the closest case components have semantic ambiguities, a problem arises. For example, when clustering example patterns of awaseru 'join, adjust', the pair of example patterns (te 'hand', kao, 'face') is created with thecommon semanticmarker <partofananimal>,and (te 'method', syouten 'focus') is created with the common semanticmarker <logic,meaning>. Fromthesetwopairs,the pair(te 'hand', kao 'face',syouten'focus')iscreated,though <part of an animal> is not similar to <logic,meaning> at all.</Paragraph> <Paragraph position="1"> recalculated.</Paragraph> <Paragraph position="2"> 3. These two processes are iterated whilethere are pairs of two example patterns, of which the similarity is higher than a threshold.</Paragraph> </Section> <Section position="4" start_page="5" end_page="5" type="sub_section"> <SectionTitle> 4.3 Clustering procedure </SectionTitle> <Paragraph position="0"> The followingis the clustering procedure: 1. Elimination of example patterns which occur infrequently null Target example patterns of the clustering are those whose closest case components occur more frequently than a threshold. We set this threshold to 5.</Paragraph> <Paragraph position="1"> terns which have the same closest CM are calculated, and semantic markers of closest case components are selected. These two processes are iterated as mentioned in 4.2.</Paragraph> <Paragraph position="2"> (b) Eachexamplepatternpairwhosesimilarityishigher than some threshold is merged.</Paragraph> <Paragraph position="3"> 3. Clusteringof all the example patterns Theexample patterns which areoutput by 2 areclustered. In this phase, it is not considered whether the closestCMsarethesameornot. Thefollowingexample patterns have almost the same meaning, but they are not merged by 2 because of the different closest CM.Thisclusteringcanmergetheseexamplepatterns. CM. If the frequency of a CM is less than the threshold, it is discarded. For example, suppose the most frequent CM for a verb is wo,100 times,and the frequency of ni CMfor the verb is 16, ni CM is discarded (since it is less than the threshold, 20).</Paragraph> <Paragraph position="4"> However,sincewecansaythatalltheverbshave ga (nominative)CMs, ga CMsarenotdiscarded. Furthermore,ifan example case frame do not have a ga CM, we supplement its ga case withsemantic marker <person>.</Paragraph> </Section> </Section> <Section position="6" start_page="5" end_page="5" type="metho"> <SectionTitle> 6. CONSTRUCTED CASE FRAME DICTIO- NARY </SectionTitle> <Paragraph position="0"> We applied the above procedure to Mainichi Newspaper Corpus (9years,4,600,000 sentences). Wesetthethreshold oftheclustering0.80. Thecriterionforsettingthisthreshold is that case frames which have different case patterns or differentmeaningsshouldnotbemergedintoonecaseframe.</Paragraph> <Paragraph position="1"> Table1showsexamplesofconstructed examplecase frames.</Paragraph> <Paragraph position="2"> Fromthecorpus, examplecaseframesof71,000 verbsare constructed; the average number of example case frames of a verb is 1.9; the average number of case slots of a verb is such as sanseida 'positiveness+copula (agree)', and compound casemarkers such as ni-tsuite 'intermsof'of tadasu 'examine' are acquired.</Paragraph> </Section> class="xml-element"></Paper>