File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-2125_metho.xml

Size: 11,720 bytes

Last Modified: 2025-10-06 14:07:16

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-2125">
  <Title>Modelling Speech Repairs in German and Mandarin Chinese Spoken Dialogues</Title>
  <Section position="3" start_page="865" end_page="866" type="metho">
    <SectionTitle>
5 Verbatim translation: Hc should
</SectionTitle>
    <Paragraph position="0"> NEGATION-particle should promote engineer(word fragment) engineer so quickly DISCOURSE-particle.</Paragraph>
    <Paragraph position="1"> Sentential translation: He shouM should not be promoted to engineel(word fragment) engineer so soon.</Paragraph>
    <Paragraph position="2"> location of interruption and their repair structure.</Paragraph>
    <Section position="1" start_page="865" end_page="865" type="sub_section">
      <SectionTitle>
2.2 Mandarin Chinese Data: Taiwan
</SectionTitle>
      <Paragraph position="0"> Putonghua refers to Mandarin Chinese, was recorded in Taiwan. The speakers were all born in Taiwan and their first language is Taiwancsc (Southern Min). The speakers wcrc given the instructions in advance to speak in usual conversation style and they could speak on any topic they wanted to, or even on no topic at all. Thus, the spontaneous and conversation-oriented speech data were obtained. A total of 40 speakers were recorded including five dialogues and 30 monologues. Three dialogues were analysed for the study in this paper and each is about 20 nfinutes long. In total, 325 immediate speech repairs were identified in these three dialogues and they were annotated according to the POS system developed for the Sinica Corpus (CKIP 1995).</Paragraph>
    </Section>
    <Section position="2" start_page="865" end_page="866" type="sub_section">
      <SectionTitle>
2.3 Comparison of Repair Data
</SectionTitle>
      <Paragraph position="0"> Seine central statistics on BAUFIX and TWPTH data are summarised in Table 1:  involved in repairs NP 38 % NP 41.2 % Table 1 shows that the percentage of problem words (words involved in speech repairs) is similar in both BAUFIX and TWPTH corpora. Witb regard to the number of words (i.e. lexical itelns) 10.4% of overall words in TWPTH are involved in repair sequences, whereas only 5.2% of words in BAUFIX are found in repair sequences. However, the statistics show a pattern, Mlich is more closely related, 3.4% and 5.2% respectively, if we consider the number of characters instead of words ill Chinese. Chinese  words can bc mono- or multi-syllabic. In Chinese, lexical items are composed of characters, where each character is all independent lneaningful monosyllabic morpheme. This study can possibly provide insights into the role of characters in Chinese at syntactic and morphological levels.</Paragraph>
      <Paragraph position="1"> Other interesting results that can be noted from Table 1 are the types of phrases involved in repair sequences. In BAUFIX, because of the task-oriented corpus setting, few verbs were used. lnstead, the focus is more on NPs and PPs, since the speakers had to express exactly what the parts look like and where to place them. Different from BAUF1X, the TWPTH speakers did not have to give exact descriptions.</Paragraph>
      <Paragraph position="2"> Therefore, a considerable number of verbs were used, which we can observe from the high pereentage of VPs involved in repair sequences. However, in both corpora, NPs make up a high percentage, 38% and 41.2% respectively. For this reason, NPs will bc further investigated for their syntactic structures.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="866" end_page="867" type="metho">
    <SectionTitle>
3 Analysis of Repair Syntax in NPs
</SectionTitle>
    <Paragraph position="0"> Tiffs section is concerned with the distribution and patterns of NPs in the context of repair syntax in German and Mandarin Chinese.</Paragraph>
    <Section position="1" start_page="866" end_page="866" type="sub_section">
      <SectionTitle>
3.1 Regular Patterns
</SectionTitle>
      <Paragraph position="0"> Among 190 NPs involved in repair sequences in BAUFIX, there arc 147 NPs for which the internal structure within the NPs can bc given exactly as follows (Tscng 1999),</Paragraph>
      <Paragraph position="2"> where lhe other 43 NPs in repairs are abridged repairs, therefore, their internal structures cannot be determined.</Paragraph>
      <Paragraph position="3"> Compared with Gennan NP-rcpairs, Chinese speakers produce rather simple repair sequences in NPs. Only 62.7% (84 out of 134) of Chinese repairs found in the corpus are single NP phrases. The rest of repair sequences in which NPs are involvcd, contain other phrasal categories such as verb phrases or adverbials. Since these dialogues arc concerned with normal and everyday conversations, no complicated noun phrases were used. These NP-rcpairs have the following structures:</Paragraph>
      <Paragraph position="5"> where QUAN denotes numbers and CLASS means classifiers in Chinese.</Paragraph>
    </Section>
    <Section position="2" start_page="866" end_page="867" type="sub_section">
      <SectionTitle>
3.2 Syntactic Formalization
</SectionTitle>
      <Paragraph position="0"> 83.4% out of 147 specific NP repairs in German start at phrase-initial positions and end at phrase-final positions. In the Chinese data, only thrcc NP-repairs among the 84 single NP-repairs were not traced back to file phrase-initial position. Phrasal boundaries play a role while speech repairs are produced in both languages, especially phrase-initial positions before the rcparandum. The syntactic structure of the maiority of German and Chinese repairs in NPs can bc fonnally described by means of phrasal modelling.</Paragraph>
      <Paragraph position="1"> Figure 1 : Phrasal Modelling of German NP-Rcpairs  Figure 1 models 50% of NP repair sequcnces of the type DET ADJN in BAUFIX, where the reflexive arrow on DET designates the sequence  DET DET. The first DET can be a fragmentary or a false determiner, whereas the second DET is supposed to be the corrected word accordingly. The initial element DET in a German noun phrasc, i.e. the phrase-initial boundary is the most frequent location at which a repair is restarted. In other words, while producing repairs, speakers tend to go back to the determiner to repair NPs.</Paragraph>
      <Paragraph position="2"> Although the data investigated here is not necessarily representative for most Chinese speakers, this result, does not empirically confirm Chui's conclusion (1996) that syntax should play a less important role than the lexical complexity and the quantity constraint of the to-be-repaired lexical items, hlstead, the phrase-initial position seems to be the location to restart repairs in Chinese. Therefore, the results indicate that the lexical content of the to-be-repaired itclns tends to play a less important role than syntax in both languages.</Paragraph>
    </Section>
    <Section position="3" start_page="867" end_page="867" type="sub_section">
      <SectionTitle>
3.3 Cross-Linguistic Differences
</SectionTitle>
      <Paragraph position="0"> In contrast to the similarities between German and Chinese speech repairs lncntioned in the sections above, differences can also be identified.</Paragraph>
      <Paragraph position="1"> Some differences can bc noted through a comparison of repair syntax in German and Mandarin Chinese. It is more colnnlon for NPs in German to be repaired directly within NPs, whereas in Chinese NPs are often repaired within a more complex syntactic context, i.e.</Paragraph>
      <Paragraph position="2"> Chinese repairs arc composed of more than one phrasal category. To investigate the syntactic and morphological distribution of speech repairs in both languages, the length of retracing in both languages is examined. The results are presented in Table 2.</Paragraph>
      <Paragraph position="3">  No similarity between German and Chinese was obtained by checking the nulnbcr of retraced words in Chinese, because the majority of &amp;quot;the retraced parts&amp;quot; in Chinese are word fragments. But it is clearly shown in Table 2 that Gennan words and Chinese characters play a similar role in the production of speech repairs. Whether it has to do with the syllabic weighting in both languages or the semantic contcnt of characters in Chinese necds fnrther linguistic investigation.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="867" end_page="868" type="metho">
    <SectionTitle>
4 Formal Modelling
</SectionTitle>
    <Paragraph position="0"> With regard to relations of repair syntax and the editing structuring in repairs, instead of only looking into their surface structure, the syntactic regularity in German and Chinese NP-repairs can be modelled in the form of finite state automata. We again take German as example.</Paragraph>
    <Section position="1" start_page="867" end_page="868" type="sub_section">
      <SectionTitle>
4.1 Finite State Automata
</SectionTitle>
      <Paragraph position="0"> Finite state automata similar to M with e-transitions denoted by a quintuple &lt;Q, E, 8, q0, IF&gt; defined as follows can model more than 80% of overall German NP-repairs:</Paragraph>
      <Paragraph position="2"> ~5(q0, det)=ql, 8(q l, adj)=q2, 6(@, n)=q3, 8(q0, det-d)-qf, 6(ql, adj-d)=qf, 8(q2, n-d)=qf, 6(qf, e)=q0, 8(ql, e)=q0, 6(@, e)=q(), 8(@, e)=q0 M is graphically illustrated in Figure 2. Several particularities are described in this automaton. First, when NP-repairs are produced, no matter where the real problmn word is located (It can be dct-d, adj, adj-d, n or n-d), speakers tend to go back to the phrase-initial position to restart lheir speech. It the case of NPs, the determiner is the most frequent location for re-initiating a correct speech. The final position is in most cases phrase-final. Therefore, in M, there is only one final state q3. This models the coherence within NP phrases in German that speakers usually complete pluTases, after they have started them.</Paragraph>
      <Paragraph position="3"> 6 Det-d, adj-d, and n-d denote fragmentary (or false) determiners, adjectives and nouns respectively.</Paragraph>
    </Section>
    <Section position="2" start_page="868" end_page="868" type="sub_section">
      <SectionTitle>
4.2 Discussion
</SectionTitle>
      <Paragraph position="0"> The FSA M suggested above is suitable for the syntactic characteristics of speech repairs in both German and Chinese. Repair syntax has been taken into consideration from a procedural point of view, instead of simply dcscribing the sequential structures. In this modcl, probabilities (for instance, word frequency or acoustic features) on the arcs can be implemented to operate a parsing system, which can deal with speech repairs, ttowcver, speech data of appropriate size are needed to obtain significant probabilities.</Paragraph>
      <Paragraph position="1"> \[&amp;quot;or more linguistic insights into the word-character relations in Chinese or across languages, i.e. the ovcrlapping syntactic and morphological role of phrasal boundaries, further modification is necded to make the rcpair processing and detection in the Chinese case more realistic.</Paragraph>
      <Paragraph position="2"> Conclusion This paper has shown that speech repairs not only play a decisive role in speech processing technology systems, they also provide empirical evidence and insights into the inherent linguistic characteristics of languages. Based on the results of corpus analysis, similar syntactic features of speech repairs ill German and Chinese were identified and the repair syntax was formally modelled by means of phrasal modelling and finite state automata. Discrepancy at the morphological level of both languages was shown and more detailed investigations are necessary. Further analyses on acoustic-prosodic features of cross-linguistic data am CmTently being can'ied out.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML