File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-1811_metho.xml
Size: 16,505 bytes
Last Modified: 2025-10-06 14:08:41
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1811"> <Title>A Disambiguation Method for Japanese Compound Verbs</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Ambiguities of JCVs </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Types of Ambiguities </SectionTitle> <Paragraph position="0"> Kageyama (1993) has proposed that JCVs can be analyzed by the argument structure of each constituent and divided into two types: syntactic compounds and lexical compounds. Lexical compounds have semantic constraints and are limited to lexically specified combinations, whereas syntactic compounds are basically compositional and have no lexical idiosyncrasies .</Paragraph> <Paragraph position="1"> We do not differentiate these two types in advance, because our method may be also useful for identifying them. There are two types of ambiguity in JCVs: ambiguities within lexical compounds and ambiguities between lexical compounds and syntactic compounds.</Paragraph> <Paragraph position="2"> Lexical compounds containing an ambiguous V2 (as in example (1)) are examined in this study. Semantic constraints govern the pairs of verbs which may be compounded. The semantic features of V1 play a key role in identifying the meaning of V2. We focus on extracting commonalities of semantic features from V1 in order to disambiguate V2.</Paragraph> <Paragraph position="3"> On the other hand, some JCVs are ambiguous, because they may be either syntactic or lexical compounds depending on context. Syntactic information is important in disambiguating this type of JCV. Example (2) indicates that JCVs with the same morphology change their meanings depending on specific context.</Paragraph> <Paragraph position="4"> (2) a. Basu wa basutei o hashiri-sugita .</Paragraph> <Paragraph position="5"> &quot;The bus ran past the bus stop.&quot; b. Kare wa shiai no tame ni hashiri-sugita.</Paragraph> <Paragraph position="6"> &quot;He ran too much because of the game.&quot; V2 sugiru in the lexical compound in (2a) means path of motion (&quot;go past&quot;), but in the syntactic compound in (2b) it denotes excessiveness (&quot;too much&quot;). As sugiru is most commonly used as a compositional V2 (&quot;too much&quot;), it is difficult to identify the difference between (2a) and (2b). Since sentence (2a) includes a word indicating the place like basutei &quot;bus stop&quot;, we can distinguish the difference between the lexical compound (2a) and the syntactic compound (2b) by co-occurring words. We identify the meaning of such JCVs using syntactic information gained from co-occurrence and verb complements.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Ambiguities of V2 </SectionTitle> <Paragraph position="0"> We classified ambiguities of V2 into three semantic clusters: aspectual, spatial and adverbial (Niimi 1987). An ambiguous V2 is defined as a word with multiple meanings which overlap several semantic clusters. This framework makes it easier to distinguish the difference in meaning for an ambiguous V2.</Paragraph> <Paragraph position="1"> We listed all the ambiguous V2, i.e. the following 20 words, based on the previous study (Himeno 2001).</Paragraph> <Paragraph position="2"> agaru &quot;go up&quot;, ageru &quot;lift&quot;, otosu &quot;drop&quot;, kakeru &quot;hang&quot;, kakaru &quot;hang onto&quot;, kaeru &quot;go back&quot;, kaesu &quot;send back&quot;, iru &quot;enter&quot;, komu &quot;insert&quot;, sugiru &quot;go past&quot;, tatsu &quot;stand&quot;, tateru &quot;make stand&quot;, tsuku &quot;be attached&quot;, tsukeru &quot;attach&quot;, dasu &quot;put out&quot;, kiru &quot;cut&quot;, toosu &quot;pierce&quot;, nuku &quot;pull out&quot;, tobasu &quot;scatter&quot;, wataru &quot;go across&quot; For the first step of analysis, we extracted 10 ambiguous words at random, agaru &quot;go up&quot;, ageru &quot;lift&quot;, otosu &quot;drop&quot;, kakeru &quot;hang&quot;, kakaru &quot;hang onto&quot;, kaeru &quot;go back&quot;, kaesu &quot;send back&quot;, iru &quot;enter&quot;, komu &quot;insert&quot; and sugiru &quot;go past&quot;. Table 1 shows examples of ambiguities of V2 as JCVs.</Paragraph> <Paragraph position="3"> We are not concerned here with disambiguation of the meanings within a single cluster. For example, tabe-kakeru &quot;already begin to eat&quot; and hashiri-kakeru &quot;be about to run&quot; are both classified as members of the aspectual cluster. JCVs of the adverbial cluster which include naosu &quot;fix&quot; and au &quot;fit&quot; seem to be similar cases. Such differences are not analyzed in this study.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.3 Criteria for Classification of Semantic Cluster </SectionTitle> <Paragraph position="0"> In order to classify JCVs into semantic clusters, we need to establish certain criteria. We define the syntactic roles and semantic relations between each constituent as criteria for classification into a semantic cluster. Each constituent of a JCV has syntactic roles such as dependency to a noun phrase and suffix usage. We refer to any JCV component verb that requires a complement as a main verb, and any suffixing component as a subsidiary verb.</Paragraph> <Paragraph position="1"> We also need to examine how the two verbs are related each other within a JCV. The semantic relation classes are assigned to the JCV constituents respectively based on Tagashira's (1986) study. As the paraphrasing of V2 facilitates understanding of these semantic relations, we investigate them by paraphrasing.</Paragraph> <Paragraph position="2"> The ambiguous JCVs were classified into three types based on the syntactic roles of V2: complementation, modification and directional motion. In complementation, V2 plays a complementary role to V1, and can be paraphrased using other aspectual words such as hajimeru &quot;start&quot; and owaru &quot;finish&quot;. In modification, the V2 modifies, so we can paraphrase V2 with an adverb. Directional motion consists of two main verbs. The V1 expresses the manner of motion, and V2 the direction.</Paragraph> <Paragraph position="3"> We describe the semantic relations of JCVs as &quot;SEM&quot;, the syntactic roles of V1 as &quot;SYN&quot; and the paraphrase as &quot;PAR&quot;, with examples as follows. The symbol '/' means &quot;or&quot;. The criteria for classification into each semantic cluster are given in (3).</Paragraph> <Paragraph position="4"> (3) Criteria for classification a. Aspectual cluster In order to build disambiguation rules which are applicable to novel JCVs, we need to examine and analyze frequency, types of semantic features and co-occurring words of JCVs not in the dictionary.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Extraction of JCVs </SectionTitle> <Paragraph position="0"> We used data from the Mainichi Shinbun (Mainichi 1993) in order to examine JCVs not in the printed dictionary (Kindaichi 1999). The newspaper articles were tagged by the morphological analysis system Chasen (Matsumoto et al. 2000). All occurrences of &quot;Verb-Verb&quot; JCVs and non-compounded single verbs were extracted from the tagged articles. Table 2 shows the number of tokens extracted and the number of tokens after duplicates had been removed.</Paragraph> <Paragraph position="1"> The &quot;Verb-Verb&quot; form accounts for only 1.36% of the total tokens, however, this type accounts for 44.06% of types. In addition, 3525 words (accounting for 65.71% of all &quot;Verb-Verb&quot; tokens) are not registered in the dictionary. The result shows a rich variety of JCVs and difficulty of processing JCVs using a static dictionary. 829 types of ambiguous JCVs, using of the 10 ambiguous V2s, mentioned in 2.2, were found in the 3525 JCVs not in the dictionary.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Semantic Features for Disambiguation of JCVs </SectionTitle> <Paragraph position="0"> The semantic features are necessary for representing appropriate meanings of V1. We used Ruigo Shin Jiten (Oono and Hamanishi 1989) to label the semantic feature. The framework of the semantic feature in Ruigo Shin Jiten provides enough accuracy to distinguish the meaning of V1.</Paragraph> <Paragraph position="1"> For instance, Ruigo Shin Jiten defines the semantic feature of musu &quot;steam&quot; as suiji &quot;kitchen work&quot; and that of nageru &quot;throw&quot; as dageki &quot;throw and hit&quot;. These features can identify the different semantic clusters of mushi-ageru &quot;finish steaming&quot; in the aspectual cluster and nage-ageru &quot;throw into the air&quot; in the spatial cluster. Ruigo Shin Jiten is organized in three levels and constitutes 1000 categories. The labels from the second level, which include 60 categories for verbs, are used in assigning a semantic feature to V1. If it is difficult to identify the meaning of V1 using the label from the second level, the label from the third label is applied as the semantic feature.</Paragraph> </Section> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Construction of Disambiguation Method </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 Information for Disambiguation Rules </SectionTitle> <Paragraph position="0"> We analyzed semantic features of all 328 V1s among 829 target words. We examined ambiguities of V1 based on the co-occurring nouns in a sentence using the syntactic information in IPAL verb dictionary (IPA 1987).</Paragraph> <Paragraph position="1"> In order to construct disambiguation rules, we took two steps. The first step was to disambiguate the meaning of V1. The second step was to clarify semantic and syntactic information for use in the disambiguation rules.</Paragraph> <Paragraph position="2"> As for the first step, we used the IPAL verb dictionary. The IPAL verb dictionary defines the meaning of verbs using valency patterns and assigns a semantic feature from Ruigo Shin Jiten to each entry.</Paragraph> <Paragraph position="3"> Initially, the co-occurring nouns and verb complements of JCVs were extracted from a sentence. For the purpose of examining the correlation between co-occurring nouns and V1, we investigate the valency patterns of V1 in the IPAL dictionary. For example, in the case of a sentence like kare wa udon o uchi-ageta &quot;He completed making buckwheat noodle&quot;, we try to find the sub-entry of utsu &quot;hit&quot; having a complement such as 'kare wa' and 'udon o' in IPAL dictionary. When we can identify the sub-entry of utsu &quot;hit&quot; which fulfills this condition, a semantic label like seisan &quot;production&quot; is selected as the semantic feature for utsu &quot;hit&quot;.</Paragraph> <Paragraph position="4"> The second step is to classify JCVs into semantic clusters based on the criteria as defined in 2.3, and to extract commonalities of semantic features on V1 within the same semantic cluster. For example, JCVs such as yude-ageru &quot;finish boiling&quot;, mushi-ageru &quot;finish steaming&quot; and yaki-ageru &quot;finish baking&quot; classified into the aspectual cluster, have a common semantic feature: suiji &quot;house keeping&quot;.</Paragraph> <Paragraph position="5"> Verb complements which are not related to V1 and their semantic features are used as syntactic information in disambiguating the meaning of V2.</Paragraph> <Paragraph position="6"> For instance, unazuku &quot;nod&quot; has a single meaning of &quot;agreement&quot;. In combining unazuku &quot;nod&quot; with kakeru &quot;hang&quot; as V2, unazuki-kakeru causes two ambiguities. The first meaning is the aspectual meaning such as kare wa sono kotoba ni unazuki-kaketa &quot;he was about to nod at what was being said.&quot; The second meaning is a spatial one such as kare ni unazuki-kaketa &quot;I nodded at him&quot;. In this case, we need semantic features and syntactic information including verb complements.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 Disambiguation Rules </SectionTitle> <Paragraph position="0"> In order to construct of disambiguation rules, the JCVs were classified into two groups using the results of the analysis in section 4.1.</Paragraph> <Paragraph position="1"> The rules of the first group are based on the semantic features of V1. For example, utsu &quot;hit&quot; has two meanings, &quot;hit&quot; and &quot;make&quot;. The semantic feature of utsu &quot;hit&quot; in the first meaning is labeled as dageki &quot;hit and throw&quot; in a specific context such as kare wa bouru o uchi-ageta: &quot;he hit the ball up&quot;, and classified in spatial cluster. The second meaning is assigned suiji &quot;cooking&quot; as a semantic feature in a sentence such as kare wa udon o uchi-ageta: &quot;he finished making buckwheat noodles&quot;, and categorized in aspectual cluster.</Paragraph> <Paragraph position="2"> We built compounding rules for disambiguation utilizing the semantic features of Ruigo Shin Jiten.</Paragraph> <Paragraph position="3"> The rules are composed of the semantic features of V1 and verb of V2 and the corresponding semantic cluster.</Paragraph> <Paragraph position="4"> Examples of these disambiguation rules are shown as follows.</Paragraph> <Paragraph position="5"> kare wa kanojo ni mukatte unazuita &quot;He nodded at her&quot; We extracted 829 JCVs from the newspaper articles as target words for the analysis. As a result of analyzing the 328 types of V1, 143 rules of semantic information and 35 rules of syntactic information were constructed for the disambiguation of JCVs.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.3 Expanding of Disambiguation Rules </SectionTitle> <Paragraph position="0"> To expand the rules comprehensively in addition to the results in 4.1, we prepared a matrix based on our rules. Table 3 illustrates a part of a matrix we used.</Paragraph> <Paragraph position="1"> The lists of V2 are shown in the row headings and the semantic features of V1 are described in the column. We verified the ability or inability of a V1 having semantic feature shown in the column to combine with the V2 in the row, and marked the ability with &quot;+&quot; and the inability with &quot;-&quot;. The rules were compiled from the matrix and reconstructed.</Paragraph> <Paragraph position="2"> As the reconstruction reduced the number of rules, 110 disambiguation rules were obtained.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.4 Disambiguation Method </SectionTitle> <Paragraph position="0"> We propose a semantic analysis method for JCVs based on disambiguation rules. The following 6 steps comprise our method.</Paragraph> <Paragraph position="1"> (1) Input a sentence which includes JCVs (2) Tag each word in the input sentence using a morphological analysis system called Chasen (3) Extract the JCVs and their syntactic information from the sentence (4) Assign a semantic feature to V1 using co-occurring words, referring to IPAL dictionary (5) Compare the semantic feature of V1 and syntactic information with the disambiguation rules (6) Output the semantic cluster obtained by application of the matching rule Through this procedure, we can handle novel JCVs not in the dictionary.</Paragraph> </Section> </Section> class="xml-element"></Paper>