File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/e99-1024_metho.xml

Size: 4,561 bytes

Last Modified: 2025-10-06 14:15:22

<?xml version="1.0" standalone="yes"?>
<Paper uid="E99-1024">
  <Title>Detection of Japanese Homophone Errors by a Decision List Including a Written Word as a Default Evidence</Title>
  <Section position="4" start_page="183" end_page="184" type="metho">
    <SectionTitle>
4 Experiments
</SectionTitle>
    <Paragraph position="0"> First, we obtain each identifying strength of the written word for the 12 homophone sets shown in Table 1, by the above method. We show this result in Table 4. LRO in this table means the lowest rank of DL0. That is, LR0 is the rank of the default evidence. LR1 means the lowest rank of DL1. That is, LR1 is the rank of the evidence of the written word. Moreover, LR0 and LR1 mean the sizes of each decision list DL0 and DL1.</Paragraph>
    <Paragraph position="1"> Second, we extract sentences which include a word in the 12 homophone sets from a corpus. We note that this corpus is different from the training corpus; the corpus is one year's worth of Mainichi newspaper articles, and the training corpus is one year's worth of Nikkei newspaper articles. The extracted sentences are the test sentences of the experiment. We assume that these sentences have no homophone errors.</Paragraph>
    <Paragraph position="2"> Last, we randomly select 5% of the test sentences, and forcibly put homophone errors into these selected sentences by changing the written</Paragraph>
    <Paragraph position="4"/>
    <Paragraph position="6"> homophone word to another homophone word.</Paragraph>
    <Paragraph position="7"> As a result, the test sentences include 5% errors. From these test sentences, we detect homophone errors by DL0 and DL1 respectively.</Paragraph>
    <Paragraph position="8"> We conducted this experiment ten times, and got the mean of the precision, the recall and the F-measure. The result is shown in Table 5.</Paragraph>
    <Paragraph position="9"> For all homophone sets, the F-measure of our proposed DL1 is higher than the F-measure of the original decision list DL0. Therefore, it is concluded that our proposed method is effective.</Paragraph>
  </Section>
  <Section position="5" start_page="184" end_page="185" type="metho">
    <SectionTitle>
5 Remarks
</SectionTitle>
    <Paragraph position="0"> The recall of DL1 is no more than the recall of DL0. Our method aims to raise the F-measure by raising the precision instead of sacrificing the recall. We confirmed the validity of the method by experiments in sections 3 and 4. Thus our method has only a little effect if the recall is evaluated with importance. However, we should note that the F-measure of DL1 is always not worse than the F-measure of DL0.</Paragraph>
    <Paragraph position="1"> We set the occurrence probability of the homophone error at p = 0.05. However, each homophone set has its own p. We need decide p exactly because the identifying strength of the written word depends on p. However, DL1 will produce better results than DL0 if p is smaller than 0.05, because the precision of judgment by the written word improves without lowering the recall. The recall does not fall due to smaller p because It0 and R1 are independent of p. Moreover, from the definitions of P0 and Pt, we can confirm that the precision of judgments by the written word improves with smaller p.</Paragraph>
    <Paragraph position="2">  The number of elements of all homophone sets used in this paper was two, but the number of elements of real homophone sets may be more.</Paragraph>
    <Paragraph position="3"> However, the bigger this number is, the better the result produced by our method, because the precision of judgments by the default evidence of DL0 drops in this case, but that of DL1 does not. Therefore, our method is better than the original one even if the number of elements of the homophone set increases.</Paragraph>
    <Paragraph position="4"> Our method has an advantage that the size of DL1 is smaller. The size of the decision list has no relation to the precision and the recall, but a small decision list has advantages of efficiency of calculation and maintenance.</Paragraph>
    <Paragraph position="5"> On the other hand, our method has a problem in that it does not use the written word in the judgment from a; Even the identifying strength of the evidence in a must depend on the written word.</Paragraph>
    <Paragraph position="6"> We intend to study the use of the written word in the judgment from a. Moreover, homophone errors in our experiments are artifidal. We must confrm the effectiveness of the proposed method for actual homophone errors.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML