File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/a94-1026_metho.xml

Size: 13,691 bytes

Last Modified: 2025-10-06 14:13:37

<?xml version="1.0" standalone="yes"?>
<Paper uid="A94-1026">
  <Title>Handling Japanese Homophone Errors in Revision Support System for Japanese Texts; REVISE</Title>
  <Section position="3" start_page="156" end_page="156" type="metho">
    <SectionTitle>
2 Definition of key terms
</SectionTitle>
    <Paragraph position="0"> Key terms used in this paper are defined as follows: * Japanese compound noun; A noun that consists of several nouns, none of which have JOSHI (i.e. Japanese postpositions).</Paragraph>
  </Section>
  <Section position="4" start_page="156" end_page="156" type="metho">
    <SectionTitle>
* Homophone;
</SectionTitle>
    <Paragraph position="0"> A word that sounds the same as another but has different spelling (i.e. KANJI sequence) and meaning.</Paragraph>
    <Paragraph position="1"> * Homophone error; An error that occurs when a KANA sequence is converted into the wrong word which has the same KANA sequence (i.e. the same reading) as the correct one.</Paragraph>
  </Section>
  <Section position="5" start_page="156" end_page="156" type="metho">
    <SectionTitle>
* Semantic category;
</SectionTitle>
    <Paragraph position="0"> A class for dividing nouns themselves into concepts according to their meaning. For example, both &amp;quot;~t .~.'.&amp;quot; and &amp;quot;~/'~,.&amp;quot; belong to the same semantic category \[nature\].</Paragraph>
  </Section>
  <Section position="6" start_page="156" end_page="156" type="metho">
    <SectionTitle>
3 A variety of homophone errors
</SectionTitle>
    <Paragraph position="0"> It is necessary to use semantic information, such as the semantic restriction between words in a sentence, to handle homophone errors. We note that it is difficult, if may not impossible, to handle all homophone errors uniformly.</Paragraph>
    <Paragraph position="1"> For example, within a compound noun, the semantic restriction is mainly seen between adjacent words. The case frame semantic restriction encompasses the whole sentence. Therefore, the discussion of this paper focuses on the detection and correction of homophone errors in compound nouns.</Paragraph>
  </Section>
  <Section position="7" start_page="156" end_page="159" type="metho">
    <SectionTitle>
4 A method for handling homophone
</SectionTitle>
    <Paragraph position="0"> errors Tanaka and Yoshida (1987) pointed out that the collocation of words in compound nouns is restricted semantically. This means that the existence of compound noun component &amp;quot;X&amp;quot; semantically restricts the set of words that can appear next to &amp;quot;X&amp;quot;. In order to describe this set, we use semantic categories instead of the words themselves to significantly reduce dictionary size.</Paragraph>
    <Paragraph position="1"> Namely, if a word is to be accepted as an immediate neighbor of &amp;quot;X&amp;quot;, its semantic category must be within the set defined by &amp;quot;X&amp;quot;.</Paragraph>
    <Paragraph position="2">  errors. Handling consists of two processes: error detection and error correction. In the error correction process, the correct candidates for detected homophone errors can be indicated to the user automatically. The user is responsible for the final selection of the correct homophone from among the indicated candidates. Semantic restrictions, which are used in both processes, are described in a semantic restriction dictionary using semantic categories.</Paragraph>
    <Section position="1" start_page="156" end_page="157" type="sub_section">
      <SectionTitle>
4.1 Detecting homophone errors in compound
</SectionTitle>
      <Paragraph position="0"> nouns The compound noun that includes only one homophone, h i, is represented as; wp hiwdeg, where up, Wo are words that have no homophones. The set of words with the same reading as h i is H= { hl, h 2, &amp;quot;&amp;quot;, hl, &amp;quot;&amp;quot;, hm }. PS i is the set of semantic categories that can appear immediately before homophone h i. NS i is the set of semantic categories that can appear immediately after h i. Here, we assume that each semantic restriction for each word in set H is exclusive. That is, for every i, j,</Paragraph>
      <Paragraph position="2"> In the compound noun wp h i w n, when h i is the correct homophone, the semantic categories of wp and w, satisfy the semantic restrictions of h,, i.e.,  the semantic category ofwp e PS~ and the semantic category of w, e NS ~ &amp;quot;. (2) On the other hand, when h i is the wrong homophone, semantic categories of wp and w, do not satisfy the semantic restriction for h i, i.e., from (1) and (2); the semantic category ofwp 4E PS i and/or the semantic category of w. ~ NS ~ -'- (3) Therefore, we can detect homophone errors in compound nouns based on (2) and (3).</Paragraph>
    </Section>
    <Section position="2" start_page="157" end_page="157" type="sub_section">
      <SectionTitle>
4.2 Insufficient semantic discrimination
</SectionTitle>
      <Paragraph position="0"> It is possible that set H contains two or more words whose PSs and/or NSs overlap, such that the semantic sets do not yield sufficient discrimination performance.</Paragraph>
      <Paragraph position="1"> Namely, several semantic restrictions for words in set H do not satisfy formula (1), i.e., for the semantic categories of several words in set H,</Paragraph>
      <Paragraph position="3"> In this case, semantic categories which do not belong to PS i CI PS i or NS t CI NS i can also be used for detecting homophone errors based on (2) and/or (3). The words with semantic categories belonging to PS i N PS i or NS~ CI NSi, however, fail to distinguish h i and h because such categories satisfy both semantic restrictions in terms of h i and h i .</Paragraph>
      <Paragraph position="4"> It is very difficult to construct a semantic category system that would satisfy formula (1) for all words.</Paragraph>
      <Paragraph position="5"> Therefore, in REVISE, when a word whose semantic categories belong to PS i N PS i or NS i N NS i adjoin h~ or h i in compound nouns, h i or h i is detected as a homophone error. This may wrongly indicate correct homophones as errors but no error will be missed. This is a basic requirement of any text revision support system and/or any text proofreading system.</Paragraph>
    </Section>
    <Section position="3" start_page="157" end_page="157" type="sub_section">
      <SectionTitle>
4.3 Correcting homophone errors in
</SectionTitle>
      <Paragraph position="0"> compound nouns The correct homophone in a compound noun should satisfy the semantic restrictions established by its adjoining words. The semantic category for the adjoining word of the homophone error should be included in the sets of semantic categories that can appear immediately before/after the correct homophone. Namely, it is the correct candidates for the detected homophone error that satisfy formula (2) and that have the same KANA sequence (i.e. the same reading) as the error. When the semantic category sets of homophones partially overlap and the category of the adjoining word falls into the overlap region, the homophone is detected as erroneous even if it is correct, as described above in 4.2. In this case, the detected homophone itself is 'also indicated as one of the correct candidates if it satisfies formula (2). To indicate only candidates which satisfy formula (2) leads us to a shortened correction process because the correct homophone will be included in the candidates.</Paragraph>
    </Section>
    <Section position="4" start_page="157" end_page="157" type="sub_section">
      <SectionTitle>
4.4 Semantic restriction dictionary
</SectionTitle>
      <Paragraph position="0"> The semantic restriction dictionary describes which semantic categories can adjoin, either before or after, each homophone. Figure 3 shows the format of the semantic restriction dictionary. A record consists of the following four items; * homophone reading: the semantic restriction dictionary is retrieved by the homophone reading in the error correction process, to find the correct candidates for the detected homophone error.</Paragraph>
      <Paragraph position="1"> * KANJI homophone spelling: the dictionary is retrieved by the KANJI homophone spelling in the error detection process, to determine whether the homophone is misused in the compound noun or not.</Paragraph>
      <Paragraph position="2"> * information whether semantic restrictions in this record apply to the preceding or following word.</Paragraph>
      <Paragraph position="3"> * semantic restrictions: this is the set of semantic categories that can adjoin the homophone. Semantic categories which are included in two or more sets of the homophones are marked as to show insufficient semantic discrimination.</Paragraph>
      <Paragraph position="4"> Ways of using the semantic restriction dictionary in both processes, error detection and error correction, will be described using examples in the next section.</Paragraph>
      <Paragraph position="5"> preceding I reading spelling or semantic restrictions following</Paragraph>
    </Section>
    <Section position="5" start_page="157" end_page="159" type="sub_section">
      <SectionTitle>
4.5 Examples of handling homophone errors
</SectionTitle>
      <Paragraph position="0"> An example of detecting homophone errors in the compound noun &amp;quot;~1.~',~&amp;quot;, which includes the homophone &amp;quot;~-~-(chemistry)&amp;quot; is shown in figure 4. &amp;quot;~ ~&amp;quot;, whose reading is &amp;quot;C/~'~&lt; (kagaku)&amp;quot;, has the homophonic word &amp;quot;~-~-(science)&amp;quot; while &amp;quot;I~ .~,',&amp;quot; has no homophonic word. The word preceding homophone &amp;quot;~ ~--&amp;quot; in the compound noun &amp;quot;\[~ ,~,',~-~--&amp;quot; is &amp;quot;l~t ,~,',&amp;quot; and it has the semantic category \[nature\]. As shown in figure 4, semantic category \[nature\] is not included in the set that represents the semantic restriction on the possible prior neighbors of &amp;quot;'\[~-~-&amp;quot;. Therefore, &amp;quot;~j~:&amp;quot; is detected as a homophone error in the compound noun &amp;quot;~ ~\[~ -~&amp;quot; based on formula (3). Next, the error correction process is invoked after detecting homophone error 'q~ L~.,,. In order to indicate the correct candidates for the error, the semantic restriction dictionary is accessed using the reading &amp;quot;7)'7) C/ &lt; &amp;quot;. The semantic set of possible prior</Paragraph>
      <Paragraph position="2"> {(~rsmi.nttlanl. (,=#-,al. (,,,m='ialL \[,chda,,hip\]. lftwla,tL In,ti,,=,~l) {\[o,pai*ttlaal. \[re#aaL lmt,,,-l.</Paragraph>
      <Paragraph position="3"> \[tot,oSr~l. \[oeo\]. (,dmlan,~l.</Paragraph>
      <Paragraph position="4"> (~.-.,i..,!. 11~1. \[t,=n~l} * The semantic set possible prior neighbors of &amp;quot;~-~&amp;quot;: { \[organization\], \[region\], \[nature\], \[topography\], \[orb\], \[scholarship\], \[creation\], \[life\], \[temper\]} * &amp;quot;. &amp;quot;'4&amp;quot;~&amp;quot; is indicated as a correct candidate  word &amp;quot;SE~'~(operation)&amp;quot; whose semantic category is \[act\]. &amp;quot;~&amp;quot; has homophonic words &amp;quot;~,r/~(machine) '' and &amp;quot;~/~(chance)&amp;quot;, while &amp;quot;-E4~&amp;quot; has no homophonic word. Although, as shown in figure 5, semantic category \[act\] is included in the semantic category set for the words preceding &amp;quot;~to~&amp;quot;, this category is also included in the other semantic category set (in figure 5, this fact is shown by outlining). As mentioned in section 4.2, such a case is flagged as a homophone error even though it is correct neighbors of homophonic word &amp;quot;~ff~-&amp;quot; is then obtained. Because the semantic category \[nature\] for &amp;quot;t~l~&amp;quot; is included in this set, &amp;quot;~&amp;quot; is indicated as a correct candidate for homophone error &amp;quot;'f~-~--&amp;quot; in the compound noun &amp;quot;I~I ,~.'.4~-~&amp;quot; based on formula (2).</Paragraph>
      <Paragraph position="5"> Let's consider an example that exhibits insufficient semantic discrimination. The compound noun &amp;quot;-V~ tt~&amp;quot; shown in figure 5 includes the homophone &amp;quot;~ (machine)&amp;quot; whose reading is &amp;quot;~ ~W(kikai)&amp;quot; and the</Paragraph>
      <Paragraph position="7"> But, \[act\] can also appear prior to other homophonic word (shown by outlining in this figure).</Paragraph>
      <Paragraph position="8"> l'he semantic set possible prior neighbors of &amp;quot;~k~&amp;quot;.'~ \] Error correction process \[act\] E {\[body\], \[tool\], \[at~\] } ,,-~,,. l Access semantic restriction dictionary \['he semantic set possible prior neighbors of &amp;quot;.~&amp;quot;.&amp;quot; ,&amp;,/ using reading&amp;quot; ~ ~'v~&amp;quot;. \[act\] ~ {\[dominate\], \[duty\] , \[tram~cfion\] } .'. &amp;quot;~,~&amp;quot;and &amp;quot;~&amp;quot; are indicated as correct candidates.</Paragraph>
      <Paragraph position="9">  (actually &amp;quot;/~-l~&amp;quot; is correct in this example). Therefore, &amp;quot;~&amp;quot; in the compound noun &amp;quot;SE~d~ &amp;quot; is detected as the error, and the correction process is invoked. The semantic restriction dictionary is accessed using the reading &amp;quot;~ ~),p~&amp;quot;. The semantic set of possible prior neighbors of homophonic words &amp;quot;~-~&amp;quot; and &amp;quot;~&amp;quot; are then obtained. The semantic category \[act\] is an element of the set for &amp;quot;~-~-~&amp;quot; but is not included the set for &amp;quot;~ ~&amp;quot;. According to formulae (2) and (3), only &amp;quot;g\]\[~&amp;quot; and &amp;quot;-~-~&amp;quot; are indicated as correct candidates. Although the correct homophone is detected as the error, that the correct homophone (the original homophone) will be a candidate shortens the correction process.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML