File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/w03-1811_evalu.xml
Size: 6,079 bytes
Last Modified: 2025-10-06 13:59:03
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1811"> <Title>A Disambiguation Method for Japanese Compound Verbs</Title> <Section position="5" start_page="0" end_page="0" type="evalu"> <SectionTitle> 5 Evaluation </SectionTitle> <Paragraph position="0"> The JCVs in Shin Meikai Kokugo dictionary were selected for the evaluation of our rules, because the meanings of JCVs can be judged objectively from their definition. Before the evaluation procedure, we categorized JCVs from the dictionary into idiomatic, fused, high frequency and exception categories.</Paragraph> <Paragraph position="1"> Idiomatic JCVs are those where the meaning of the compound cannot be construed from the meaning of two verbs independently. The meaning of fused JCVs and high frequency JCVs can be inferred from each constituent. Fused JCVs are those which are used only in a specific context. High frequency JCVs can be divided into two verbs semantically and the case particle 'te' or 'de' can be inserted between the two verbs in certain cases.</Paragraph> <Paragraph position="2"> Exceptional JCVs are those with certain V2s such as hajimeru &quot;start&quot; and tsuzukeru &quot;continue&quot; which can be processed easily using only the definition of V2.</Paragraph> <Paragraph position="3"> Since idiomatic and fused JCVs cannot be processed by our method, registering such JCVs in the dictionary is a reasonable approach for computer implementation. Exceptions may also be registered in the dictionary. However, all high frequency JCVs can be treated with our method and are designated as target words for evaluation.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.1 Evaluation by using Japanese Dictionary </SectionTitle> <Paragraph position="0"> We extracted all JCVs which included any of 10 ambiguous V2s from the Japanese dictionary. Our target words for evaluation were all high frequency JCVs. In order to classify target words by semantic cluster, their dictionary definitions must include words related to some semantic cluster, such as owaru &quot;finish&quot; in the case of the aspectual cluster, ue &quot;up&quot; in the spatial cluster and kurikaeshi &quot;again&quot; in the adverbial cluster, etc.</Paragraph> <Paragraph position="1"> Table 4 indicates the result of analyzing these JCVs. Idiomatic and fused JCVs and the name of semantic cluster are abbreviated in table 4, for example the aspectual cluster is shown as &quot;ASPECT&quot;, etc. Half of the JCVs in the dictionary are regarded as idiomatic and fused words.</Paragraph> <Paragraph position="2"> We took the following steps for evaluation.</Paragraph> <Paragraph position="3"> (1) Extract target JCVs for evaluation from Japanese dictionary.</Paragraph> <Paragraph position="4"> (2) Classify JCVs into each semantic cluster by referring to their definition.</Paragraph> <Paragraph position="5"> (3) Assign the semantic features of Ruigo Shin Jiten to the V1 of JCVs. In the case that syntactic information is needed, it can be extracted from the examples of the dictionary. (4) Prepare test sets including the target JCV, the semantic feature of V1 and the semantic cluster.</Paragraph> <Paragraph position="6"> (5) Compare the test sets with our rules.</Paragraph> <Paragraph position="7"> (6) Evaluate the accuracy of our rules.</Paragraph> <Paragraph position="8"> We evaluated 242 JCVs from the dictionary, and obtained 211 correct rules and 31 errors. This corresponds to a high accuracy rate of 87.19%.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.2 Discussion </SectionTitle> <Paragraph position="0"> As a result of the evaluation, 31 errors are observed.</Paragraph> <Paragraph position="1"> These errors can be divided into three types, corresponding to the lack of rules, the problem of semantic features and exceptions. Lack of rules including the semantic features of V1 and V2 has not yet been registered in our rules.</Paragraph> <Paragraph position="2"> The second problem occurs where semantic features cannot be assigned to V1 appropriately. For instance, koneru &quot;knead&quot; is assigned hendo &quot;fluctuation&quot; as the semantic feature of V1, but the verb means motion for making something. The difference between hendo &quot;fluctuation&quot; and seisan &quot;production&quot; is important in identifying the semantic cluster, because kone-ageru &quot;complete kneading&quot; is in the aspectual cluster, but maki-ageru &quot;roll up&quot; is assigned the semantic feature of &quot;fluctuation&quot; in the spatial cluster. We consider that such verbs should be rearranged in an appropriate framework.</Paragraph> <Paragraph position="3"> The errors classified as exceptions are those where an unusual usage of V1 causes the wrong cluster to be selected by our rules. For example, moeru &quot;burn&quot; used in moe-agaru &quot;flare up&quot; is assigned bussho &quot;physical phenomena&quot; as its semantic feature. Moe-agaru &quot;flare up&quot; should be regarded as spatial cluster, because of its dictionary definition that something burns with rising flames.</Paragraph> <Paragraph position="4"> However, a JCV with a V1 of bussho &quot;physical phenomena&quot; and V2 of agaru &quot;go up&quot; is classified into the aspectual cluster by our rules, similarly to waki-agaru &quot;boil up&quot; and atatame-ageru &quot;finish heating&quot;. Moe-agaru &quot;flare up&quot; should be registered in the dictionary as an exception.</Paragraph> <Paragraph position="5"> The accuracy of our rules is improved up to nearly 99% by the addition of 10 rules for 19 verbs and by rearranging the semantic features of 8 verbs.</Paragraph> <Paragraph position="6"> The result confirms the advantage of our method for disambiguating JCVs.</Paragraph> </Section> </Section> class="xml-element"></Paper>