File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/94/a94-1026_evalu.xml
Size: 4,860 bytes
Last Modified: 2025-10-06 14:00:15
<?xml version="1.0" standalone="yes"?> <Paper uid="A94-1026"> <Title>Handling Japanese Homophone Errors in Revision Support System for Japanese Texts; REVISE</Title> <Section position="8" start_page="159" end_page="160" type="evalu"> <SectionTitle> 5 Experiments </SectionTitle> <Paragraph position="0"> The validity of this method was confirmed with experiments in detecting and correcting homophone errors. We assumed that the input compound nouns were already segmented into component words and that their</Paragraph> <Section position="1" start_page="159" end_page="159" type="sub_section"> <SectionTitle> 5.1 Experimental data </SectionTitle> <Paragraph position="0"> * Homophones used in experiments: Table 1 shows the 100 homophones (32 readings) that were used in the experiments.</Paragraph> <Paragraph position="1"> * Compound nouns evaluated: We prepared two kinds of data: compound nouns that included correct homophones (correct homophone data sets) and compound nouns that included wrong homophones (wrong homophone data sets). Table 2 outlines the sets of experimental data used.</Paragraph> <Paragraph position="2"> data set 1 461 data set 2 53 data set 3 1310 data set 4 ! 170 name number outline of data set compound nouns extracted from newspaper articles compound nouns extracted from text books in high schools compound nouns formed by substituting a correct homophone in data set 1 with a wrong homophone compound nouns formed by substituting a correct homophone in data set 2 with a wrong homophone</Paragraph> </Section> <Section position="2" start_page="159" end_page="159" type="sub_section"> <SectionTitle> 5.2 Description of semantic restriction </SectionTitle> <Paragraph position="0"> * The semantic category system: The semantic category system used in the experiments was constructed by referring to BUNRUI-GOI-HYO edited by the National Language Research Institute (1964) and RUIGO-SHIN-JITEN written by Ono and Hamanishi (1981), which are the most famous semantic category systems for the Japanese language. The semantic system has about 200 nodes and covers about 35,000 words.</Paragraph> <Paragraph position="1"> * The semantic restriction dictionary: Compound nouns including all homophones in table 1, were collected from newspaper articles over a 90 day period, and the semantic restriction dictionary was made based on the semantic restrictions between the homophones and the adjoining words in compound nouns.</Paragraph> </Section> <Section position="3" start_page="159" end_page="160" type="sub_section"> <SectionTitle> 5.3 Experimental results </SectionTitle> <Paragraph position="0"> Generally speaking, the performance of an error detection method can be measured by two indices: the detection rate indicates the percentage of errors correctly determined and the misdetection rate indicates the percentage of correct words that are erroneously detected as errors.</Paragraph> <Paragraph position="1"> The detection rate is defined as; Detection rate = the number of errors detected actual number of wrong compounds in the sample.</Paragraph> <Paragraph position="2"> The misdetection rate is defined as; Misdetection rate= the number of homophones misdetected actual number of correct compounds in the sample. The experimental results are shown in table 3. The detection rate is over 95%. This value is much higher than the 48.9% rate previously reported (Suzuki and Takeda, 1989). On the other hand, the misdetection rate is less than 30%. This value shows that the proposed method determined that over 70% of the correct homophones in compound nouns were correct. This means that the confirmation process can be significantly shortened because fewer correct compounds are presented for confirmation. Moreover, in the correction process, for more than 80% of detected errors, the correct homophone was a candidate. These results show that this method can detect and correct homophone errors in compound nouns successfully.</Paragraph> </Section> <Section position="4" start_page="160" end_page="160" type="sub_section"> <SectionTitle> 5.4 Discussion </SectionTitle> <Paragraph position="0"> We analyzed the experimental results and determined that misdetection is caused by two factors; (a) imperfect semantic restriction dictionary, Co) semantic categories that belong to sets that can adjoin words having the same reading.</Paragraph> <Paragraph position="1"> The number of compound nouns used to make the semantic restriction dictionary was different for each word reading. When the number of compound nouns used to construct the dictionary is large enough, misdetection caused by factor (a) will be minimized. Factor (b) can be offset by optimizing the semantic category system to improve semantic discrimination. This problem will be researched in the future.</Paragraph> </Section> </Section> class="xml-element"></Paper>