XML Viewer - a97-1010

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/a97-1010_metho.xml
Size: 15,284 bytes
Last Modified: 2025-10-06 14:14:32
<?xml version="1.0" standalone="yes"?>
<Paper uid="A97-1010">
  <Title>Applying Repair Processing in Chinese Homophone Disambiguation</Title>
  <Section position="4" start_page="58" end_page="59" type="metho">
    <SectionTitle>
3 Spoken Corpus
</SectionTitle>
    <Paragraph position="0"> The spoken corpus used in this paper consists of two commonplace, everyday conversations among friends.</Paragraph>
    <Paragraph position="1"> Each is about forty-minute long. There are four and five speakers in these two conversations, respectively.</Paragraph>
    <Paragraph position="2"> In total, this corpus contains 5,395 utterances, 22,409 words and 2,602 turns. There are totally 440 self- .</Paragraph>
    <Paragraph position="3"> 5 repairs. On the average, 17% of turns contain at least one repair. Table 1 lists the frequency distribution of each type of repairs in two conversations.</Paragraph>
    <Paragraph position="4">  In Table 1, the repetition repairs form the majority 5 The speech repairs discussed in this paper are all self-repairs. That is, only the repairs accomplished by the same speaker are considered. This is because this kind of repairs is the most common form of repairs. Nevertheless, the present study includes repairs placed across different turns.</Paragraph>
    <Paragraph position="5">  (72.62% in conversation 1 and 73.16% in conversation 2) of the repairs. Addition (Replacement) repairs have 13.69% (9.52%) and 9.56% (8.09%) in conversations l and 2, respectively. The rest (4.17% in conversation l and 9.19% in conversation 2) are the most complex type of repairs, i.e., Abandon. Because this paper corrects repairs based on acoustic and prosodic cues, the Chinese characters in the spoken corpus are converted into the corresponding syllables manually&amp;quot;.</Paragraph>
  </Section>
  <Section position="5" start_page="59" end_page="59" type="metho">
    <SectionTitle>
4 Baseline Model
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="59" end_page="59" type="sub_section">
      <SectionTitle>
4.1 Simple Pattern Matching
</SectionTitle>
      <Paragraph position="0"> Because the repetition repairs form the majority, we focus on the repetition repairs in this paper. Although the repetition repairs have the simple surface form, correcting such a kind of speech repairs is not trivial.</Paragraph>
      <Paragraph position="1"> That is, a simple pattern matching mechanism cannot work perfectly. Table 2 explains this point. A repair is proposed when a string of syllables repeats within an utterance or between two consecutive utterances.</Paragraph>
      <Paragraph position="2">  Columns 2, 3 and 4 denote the total repetition repairs, the number of repairs proposed by the simple pattern marcher and the number of correct proposed repairs, respectively. For example, 243 repairs are proposed by the simple pattern marcher in conversation 1, but only 118 of them are correct. That is, there are 125 false alarms. Since there are 122 repetition repairs in conversation I, 4 repetition repairs are not captured.</Paragraph>
      <Paragraph position="3"> They are all English repairs. Because only Chinese repairs are considered, English repairs are lost.</Paragraph>
      <Paragraph position="4"> Although this technique can achieve recall rate of 97.82%, it has a relatively low precision rate, i.e., 47.94%.</Paragraph>
      <Paragraph position="5"> Since the simple pattern matching mechanism cannot solve this problem properly, two additional cues are firstly considered in the baseline model: the length of the repeated syllable string and the number of interutterances. null 6 Because we focus our efforts on correcting speech repairs, the identification of acoustic and prosodic cues does not discuss in this paper.</Paragraph>
    </Section>
    <Section position="2" start_page="59" end_page="59" type="sub_section">
      <SectionTitle>
4.2 The Length of the Repeated Syllable
String
</SectionTitle>
      <Paragraph position="0"> How many syllables are repeated in the repetition repairs is an interesting problem in cognition, Table 3 lists the distribution of length of the repeated syllable strings in the repetition repairs.</Paragraph>
      <Paragraph position="1">  The length ranges from 1 to 4. Thus, when a string of syllables repeats and the length of this string is greater than 4, we do not regard it as a repetition repair.</Paragraph>
    </Section>
    <Section position="3" start_page="59" end_page="59" type="sub_section">
      <SectionTitle>
4.3 The Number of Inter-Utterances
</SectionTitle>
      <Paragraph position="0"> In human conversation, most of the repetition repairs occur within an utterance or between two consecutive utterances of one speaker without interrupting by other speakers. That is, if many utterances issued by other speakers are inserted between two utterances of the same speaker, the repetition repairs usually do not occur. The spoken corpus shows this point.</Paragraph>
      <Paragraph position="1"> * Total 13.69% of repetition repairs occur in the same utterance.</Paragraph>
      <Paragraph position="2"> * Total 71.66% of repetition repairs occur between two consecutive utterances without interrupting by other speakers.</Paragraph>
      <Paragraph position="3"> * Only 0.32% of repetition repairs occur across more than 3 utterances issued by other speakers.</Paragraph>
      <Paragraph position="4"> According to the heuristic rule, when more than 3 utterances pronounced by other speakers interrupt the speech of a speaker, we do not check whether there is a repetition repair or not.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="59" end_page="60" type="metho">
    <SectionTitle>
5 Advanced Model
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="59" end_page="60" type="sub_section">
      <SectionTitle>
5.1 Unfilled Pause (...)
</SectionTitle>
      <Paragraph position="0"> In spontaneous or conversational speech, we find that there is a significant unfilled pause (silence) between a repaired segment and a repairing segment for repetition  repairs 7, whereas actual or intended repeated characters (syllables) usually do not have any unfilled pauses between them. (1) and (2) are examples. After the unfilled pause information is added to the baseline model, the experimental results for two conversations are listed below.</Paragraph>
      <Paragraph position="1">  The experimental results show that the precision rate is increased to 84.14%, and the recall rate is decreased to 76.01%.</Paragraph>
    </Section>
    <Section position="2" start_page="60" end_page="60" type="sub_section">
      <SectionTitle>
5.2 Glottal Stop (%)
</SectionTitle>
      <Paragraph position="0"> Glottal stop has the similar functions to unfilled pause.</Paragraph>
      <Paragraph position="1"> That is, a glottal stop may occur between the repaired segment and the repairing segment for the repetition repairs, whereas actual repeated characters usually do not have such a marker between them. (1) is an example. Table 5 shows the results when the glottal stop information is used to enhance the baseline model.</Paragraph>
      <Paragraph position="2">  From Table 5, we find that glottal stop is a more reliable cue than unfilled pause, but it does not occur as frequently as unfilled pause. These points are verified by the high precision rate (97.41%) and the low recall rate (35.20%). When the unfilled pause information and the glottal stop information are all applied to the baseline model, the experimental results for two conversations are listed in Table 6. Both the precision rate (84.71%) and the recall rate (82.87%) are all better than those in the former models.</Paragraph>
      <Paragraph position="3"> 7 Because the filled pauses such as urn, un and er do not occur frequently in the spoken corpus, the effects of filled pauses are not demonstrated in this paper.</Paragraph>
    </Section>
    <Section position="3" start_page="60" end_page="60" type="sub_section">
      <SectionTitle>
5.3 Two Consecutive Equal Utterances
</SectionTitle>
      <Paragraph position="0"> If two consecutive utterances are equal, repetition repairs usually do not occur within and between them when the length of the utterances is long enough. This is because the matched string usually denotes an emphasis when it is long enough. This cue can eliminate some implausible repairs, so that the precision rate can be increased.</Paragraph>
    </Section>
    <Section position="4" start_page="60" end_page="60" type="sub_section">
      <SectionTitle>
5.4 Cue Patterns
</SectionTitle>
      <Paragraph position="0"> In Chinese conversation, some words or phrases are frequently repeated, but they are not repairs. Typical examples are interjections (e.g., ~ (o2, oh)) and phrase-final particles (e.g., ~I (a5, a)). These patterns called type I cue patterns are used to increase the precision rate. That is, a repair is proposed when a string of syllables repeats, satisfies the criteria of baseline model, unfilled pause and glottal stop, and the first syllable of the string does not belong to type I cue patterns.</Paragraph>
      <Paragraph position="1"> In contrast to type I cue patterns, another kind of patterns, type II cue patterns, are also considered to increase the recall rate. That is, some repeated syllable strings that do not satisfy the criteria of unfilled pause and glottal stop, but they are usually repetition repairs. Typical examples are pronouns such as (wo3, I) and '\[g (ni3, you). Based on type II cue patterns, some additional repairs can be proposed when a string of syllables repeats, it does not satisfy the criteria of unfilled pause and glottal stop, but the first syllable of the string belongs to type II cue patterns.</Paragraph>
      <Paragraph position="2"> When all the cues proposed in the previous subsections are all applied to the baseline model, the final experimental results are listed in Table 7.</Paragraph>
      <Paragraph position="3">  The experimental results show that the precision rate of 93.87% and the recall rate of 90.65% can be achieved.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="60" end_page="60" type="metho">
    <SectionTitle>
6 Repair Processing in Chinese
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="60" end_page="60" type="sub_section">
      <SectionTitle>
Homophone Disambiguation
</SectionTitle>
      <Paragraph position="0"> Mandarin Chinese has approximately 1,300 syllables, 13,094 commonly used characters, and more than 100,000 words. Each character is pronounced as a syllable and many syllables are shared by several characters. Some syllables correspond to even more than 100 characters. Thus, Chinese homophone disambiguation is difficult but important in a Chinese phonetic input method and a Chinese speech recognition system.</Paragraph>
      <Paragraph position="1"> The problem of Chinese homophone disambiguation is defined as how to convert a sequence of syllables S into the corresponding sequence of characters ~ correctly. Thus, Chinese homophone disambiguation can be regarded as a process of conversion of syllable-to-character. Let S=&lt;s I, s 2, s3, ..., Sn&gt; be a syllable string and C=&lt;Cl, c 2, c 3 ..... Cn&gt; be one corresponding character string. Here, s i denotes one of 1,300 Chinese syllables and c i denotes one of 13,094 Chinese characters. The conversion can be formulated as follows.</Paragraph>
      <Paragraph position="3"> The denominator part does not effect the maximization and it merely serves as a constant multiplier. The above formula therefore becomes as follows.</Paragraph>
      <Paragraph position="4"> only. Because repairs introduce much noise, direct application of this method without repair processing is expected to have worse performance s .</Paragraph>
      <Paragraph position="5"> For evaluating the effects of repair processing in this application, we count how many syllables in the repairing segments are wrongly converted and how many wrongly converted syllables are recovered after the repair processing. The experimental results are  converted syllables before the repair processing.</Paragraph>
      <Paragraph position="6"> Columns 3 and 4 then indicate the performance changes. They are classified into two types: Wrong-to-Correct (WC) and Correct-to-Wrong (CW). In the WC type, a wrongly converted syllable is changed to the correct one by the repair processing. In the CW type, a syllable which is correctly converted before repair processing, is changed to a wrong one after the repair processing. The performance of the repair processing can be evaluated as the net gain shown as follows.</Paragraph>
      <Paragraph position="7"> Net Gain = # of WC - # of CW In Table 8, the number of the original errors is 126. After the repair processing, the number of the errors is reduced to 63. That is, 63 (50%) errors are recovered by the repair processing. It reveals that the repair processing has much effect in these experiments.</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="60" end_page="60" type="metho">
    <SectionTitle>
- argmax P(SIC)*P(C)
C
</SectionTitle>
    <Paragraph position="0"> As most Chinese characters are unambiguous in their pronunciation (Sproat, 1990), we assume that P(SIC) is one in general case. Finally, this formula is simplified as a Markov character bigram model shown below.</Paragraph>
    <Paragraph position="2"> The language model is usually trained on fluent text</Paragraph>
  </Section>
  <Section position="9" start_page="60" end_page="60" type="metho">
    <SectionTitle>
7 Concluding Remarks
</SectionTitle>
    <Paragraph position="0"> Any spoken language systems will not perform well without treating speech repairs. Correcting speech repairs make more reliable environments for the subsequent processing. This paper employs acoustic and prosodic cues to correct the repetition repairs.</Paragraph>
    <Paragraph position="1"> The experimental results show that our method can</Paragraph>
  </Section>
  <Section position="10" start_page="60" end_page="60" type="metho">
    <SectionTitle>
8 Stolcke and Shriberg (1996) described that &amp;quot;'cleaning up&amp;quot;
</SectionTitle>
    <Paragraph position="0"> disfluencies reduces perplexity.</Paragraph>
  </Section>
  <Section position="11" start_page="60" end_page="62" type="metho">
    <SectionTitle>
9 The Academia Sinica Balance Corpus (1995) is adopted as the
</SectionTitle>
    <Paragraph position="0"> training corpus in this experiment. It contains text of several categories and includes approximately 360,000 sentences comprising of about 3,300,000 characters.</Paragraph>
    <Paragraph position="1">  achieve the precision rate of 93.87% and the recall rate of 90.65%. At the same time, 50% of errors in the repairing segment can be reduced for the Chinese homophone disambiguation.</Paragraph>
    <Paragraph position="2"> O'Shaughnessy (1992) claims that most speech repairs do not have lengthening prior to the hesitation pause. If this cue is used in our model, it can slightly increase the precision rate (95.37%), but the recall rate (76.95%) is greatly decreased.</Paragraph>
    <Paragraph position="3"> Although our method can perform well in repetition repairs, other kinds of repairs such as addition, replacement and abandon repairs are not addressed in this paper. They have more complex surface forms and should be investigated further.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML