File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-1021_metho.xml

Size: 20,098 bytes

Last Modified: 2025-10-06 14:10:17

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1021">
  <Title>PCFGs with Syntactic and Prosodic Indicators of Speech Repairs</Title>
  <Section position="4" start_page="161" end_page="162" type="metho">
    <SectionTitle>
2 Prosodic disjuncture
</SectionTitle>
    <Paragraph position="0"> Everyday experience as well as acoustic analysis suggests that the syntactic interruption in speech repairs is typically accompanied by a change in prosody (Nakatani and Hirschberg, 1994; Shriberg, 1994). For instance, the spectrogram corresponding to example (2), shown in Figure 1, (2) the jehovah's witness or [ or ] mormons or someone reveals a noticeable pause between the occurrence of the two ors, and an unexpected glottalization at the end of the first one. Both kinds of cues have been advanced as explanations for human listeners' ability to identify the reparandum even before the repair occurs.</Paragraph>
    <Paragraph position="1"> Retaining only the second explanation, Lickley (1996) proposes that there is no &amp;quot;edit signal&amp;quot; per se but that repair is cued by the absence of smooth formant transitions and lack of normal juncture phenomena.</Paragraph>
    <Paragraph position="2"> One way to capture this notion in the syntax is to enhance the input with a special disjuncture symbol. This symbol can then be propagated in the grammar, as illustrated in Figure 2. This work uses a suffix ~+ to encode the perception of abnormal prosody after a word, along with phrasal -BRK tags to decorate the path upwards to reparandum constituents labeled EDITED. Such  fluent juncture, from acoustics to syntax.</Paragraph>
    <Paragraph position="3"> disjuncture symbols are identified in the ToBI labeling scheme as break indices (Price et al., 1991; Silverman et al., 1992).</Paragraph>
    <Paragraph position="4"> The availability of a corpus annotated with ToBI labels makes it possible to design a break index classifier via supervised training. The corpus is a subset of the Switchboard corpus, consisting of sixty-four telephone conversations manually annotated by an experienced linguist according to a simplified ToBI labeling scheme (Ostendorf et al., 2001). In ToBI, degree of disjuncture is indicated by integer values from 0 to 4, where a value of 0 corresponds to clitic and 4 to a major phrase break. In addition, a suffix p denotes perceptually disfluent events reflecting, for example,  hesitation or planning. In conversational speech the intermediate levels occur infrequently and the break indices can be broadly categorized into three groups, namely, 1, 4 and p as in Wong et al.</Paragraph>
    <Paragraph position="5"> (2005).</Paragraph>
    <Paragraph position="6"> A classifier was developed to predict three break indices at each word boundary based on variations in pitch, duration and energy associated with word, syllable or sub-syllabic constituents (Shriberg et al., 2005; Sonmez et al., 1998). To compute these features, phone-level time-alignments were obtained from an automatic speech recognition system. The duration of these phonological constituents were derived from the ASR alignment, while energy and pitch were computed every 10ms with snack, a public-domain sound toolkit (Sjlander, 2001). The duration, energy, and pitch were post-processed according to stylization procedures outlined in Sonmez et al.</Paragraph>
    <Paragraph position="7"> (1998) and normalized to account for variability across speakers.</Paragraph>
    <Paragraph position="8"> Since the input vector can have missing values such as the absence of pitch during unvoiced sound, only decision tree based classifiers were investigated. Decision trees can handle missing features gracefully. By choosing different combinations of splitting and stopping criteria, an ensemble of decision trees was built using the publicly-available IND package (Buntine, 1992).</Paragraph>
    <Paragraph position="9"> These individual classifiers were then combined into ensemble-based classifiers.</Paragraph>
    <Paragraph position="10"> Several classifiers were investigated for detecting break indices. On ten-fold cross-validation, a bagging-based classifier (Breiman, 1996) predicted prosodic breaks with an accuracy of 83.12% while chance was 67.66%. This compares favorably with the performance of the supervised classifiers on a similar task in Wong et al. (2005). Random forests and hidden Markov models provide marginal improvements at considerable computational cost (Harper et al., 2005).</Paragraph>
    <Paragraph position="11"> For speech repair, the focus is on detecting disfluent breaks. The precision and recall trade-off on its detection can be adjusted using a threshold on the posterior probability of predicting &amp;quot;p&amp;quot;, as shown in Figure 3.</Paragraph>
    <Paragraph position="12"> In essence, the large number of acoustic and prosodic features related to disfluency are encoded via the ToBI label 'p', and provided as additional observations to the PCFG. This is unlike previous work on incorporating prosodic information (Gre null from acoustics.</Paragraph>
    <Paragraph position="13"> gory et al., 2004; Lease et al., 2005; Kahn et al., 2005) as described further in Section 6.</Paragraph>
  </Section>
  <Section position="5" start_page="162" end_page="163" type="metho">
    <SectionTitle>
3 Syntactic parallelism
</SectionTitle>
    <Paragraph position="0"> The other striking property of speech repairs is their parallel character: subsequent repair regions 'line up' with preceding reparandum regions. This property can be harnessed to better estimate the length of the reparandum by considering parallelism from the perspective of syntax. For instance, in Figure 4(a) the unfinished reparandum noun phrase is repaired by another noun phrase the syntactic categories are parallel.</Paragraph>
    <Section position="1" start_page="162" end_page="163" type="sub_section">
      <SectionTitle>
3.1 Levelt's WFR and Conjunction
</SectionTitle>
      <Paragraph position="0"> The idea that the reparandum is syntactically parallel to the repair can be traced back to Levelt (1983). Examining a corpus of Dutch picture descriptions, Levelt proposes a bi-conditional well-formedness rule for repairs (WFR) that relates the structure of repairs to the structure of conjunctions. The WFR conceptualizes repairs as the conjunction of an unfinished reparandum string (a) with a properly finished repair (g). Its original formulation, repeated here, ignores optional interregna like &amp;quot;er&amp;quot; or &amp;quot;I mean.&amp;quot; Well-formedness rule for repairs (WFR) A repair &lt;ag&gt; is well-formed if and only if there is a string b such that the string &lt;ab and[?] g&gt; is well-formed, where b is a completion of the constituent directly dominating the last element of a. (and is to be deleted if that last element is itself a sentence connective) In other words, the string a is a prefix of a phrase whose completion, b--if it were present--would  render the whole phrase ab grammatically conjoinable with the repair g. In example (1) a is the string 'the first kind of invasion of', g is 'the first type of privacy' and b is probably the single word 'privacy.' This kind of conjoinability typically requires the syntactic categories of the conjuncts to be the same (Chomsky, 1957, 36). That is, a rule schema such as (2) where X is a syntactic category, is preferred over one where X is not constrained to be the same on either side of the conjunction.</Paragraph>
      <Paragraph position="2"> If, as schema (2) suggests, conjunction does favor like-categories, and, as Levelt suggests, well-formed repairs are conjoinable with finished versions of their reparanda, then the syntactic categories of repairs ought to match the syntactic categories of (finished versions of) reparanda.</Paragraph>
    </Section>
    <Section position="2" start_page="163" end_page="163" type="sub_section">
      <SectionTitle>
3.2 A WFR for grammars
</SectionTitle>
      <Paragraph position="0"> Levelt's WFR imposes two requirements on a grammar * distinguishing a separate category of 'unfinished' phrases * identifying a syntactic category for reparanda Both requirements can be met by adapting Tree-bank grammars to mirror the analysis of McKelvie1 (1998a; 1998b). McKelvie derives phrase structure rules for speech repairs from fluent rules by adding a new feature called abort that can take values true and false. For a given grammar rule of the form A - B C a metarule creates other rules of the form A [abort = Q] B [abort = false] C [abort = Q] where Q is a propositional variable. These rules say, in effect, that the constituent A is aborted just in case the last daughter C is aborted. Rules that don't involve a constant value for Q ensure that the same value appears on parents and children. The 1McKelvie's metarule approach declaratively expresses Hindle's (1983) Stack Editor and Category Copy Editor rules. This classic work effectively states the WFR as a program for the Fidditch deterministic parser.</Paragraph>
      <Paragraph position="1"> WFR is then implemented by rule schemas such</Paragraph>
      <Paragraph position="3"> that permit the optional interregnum AFF to conjoin an unfinished X-phrase (the reparandum) with a finished X-phrase (the repair) that comes after it.</Paragraph>
    </Section>
    <Section position="3" start_page="163" end_page="163" type="sub_section">
      <SectionTitle>
3.3 A WFR for Treebanks
</SectionTitle>
      <Paragraph position="0"> McKelvie's formulation of Levelt's WFR can be applied to Treebanks by systematically recoding the annotations to indicate which phrases are unfinished and to distinguish matching from non-matching repairs.</Paragraph>
      <Paragraph position="1">  Some Treebanks already mark unfinished phrases. For instance, the Penn Treebank policy (Marcus et al., 1993; Marcus et al., 1994) is to annotate the lowest node that is unfinished with an -UNF tag as in Figure 4(a).</Paragraph>
      <Paragraph position="2"> It is straightforward to propagate this mark upwards in the tree from wherever it is annotated to the nearest enclosing EDITED node, just as -BRK is propagated upwards from disjuncture marks on individual words. This percolation simulates the action of McKelvie's [abort = true]. The resulting PCFG is one in which distributions on phrase structure rules with 'missing' daughters are segregated from distributions on 'complete' rules.</Paragraph>
    </Section>
    <Section position="4" start_page="163" end_page="163" type="sub_section">
      <SectionTitle>
3.4 Reparanda categories
</SectionTitle>
      <Paragraph position="0"> The other key element of Levelt's WFR is the idea of conjunction of elements that are in some sense the same. In the Penn Treebank annotation scheme, reparanda always receive the label EDITED. This means that the syntactic category of the reparandum is hidden from any rule which could favor matching it with that of the repair.</Paragraph>
      <Paragraph position="1"> Adding an additional mark on this EDITED node (a kind of daughter annotation) rectifies the situation, as depicted in Figure 4(b), which adds the notation -childNP to a tree in which the unfinished tags have been propagated upwards. This allows a Treebank PCFG to represent the generalization that speech repairs tend to respect syntactic category.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="163" end_page="165" type="metho">
    <SectionTitle>
4 Results
</SectionTitle>
    <Paragraph position="0"> Three kinds of experiments examined the effectiveness of syntactic and prosodic indicators of  mar read-off from example trees annotated as in Figures 2 and 4. The third experiment measures the benefit from syntactic indicators alone in Charniak's lexicalized parser (Charniak, 2000). The tables in subsections 4.1, 4.2, and 4.3 summarize the accuracy of output parse trees on two measures. One is the standard Parseval F-measure, which tracks the precision and recall for all labeled constituents as compared to a gold-standard parse. The other measure, EDIT-finding F, restricts consideration to just constituents that are reparanda. It measures the per-word performance identifying a word as dominated by EDITED or not. As in previous studies, reference transcripts were used in all cases. A check ([?]) indicates an experiment where prosodic breaks where automatically inferred by the classifier described in section 2, whereas in the (x) rows no prosodic information was used.</Paragraph>
    <Section position="1" start_page="164" end_page="164" type="sub_section">
      <SectionTitle>
4.1 CYK on Fisher
</SectionTitle>
      <Paragraph position="0"> Table 1 summarizes the accuracy of a standard CYK parser on the newly-treebanked Fisher corpus (LDC2005E15) of phone conversations, collected as part of the DARPA EARS program. The parser was trained on the entire Switchboard corpus (ca. 107K utterances) then tested on the 5368-utterance 'dev2' subset of the Fisher data. This test set was tagged using MX-POST (Ratnaparkhi, 1996) which was itself trained on Switchboard. Finally, as described in section 2 these tags were augmented with a special prosodic break symbol if the decision tree rated the probability a ToBI 'p' symbol higher than the threshold value of 0.75.</Paragraph>
      <Paragraph position="1">  The Fisher results in Table 1 show that syntactic and prosodic indicators provide different kinds of benefits that combine in an additive way. Presumably because of state-splitting, improvement in EDIT-finding comes at the cost of a small decrement in overall parsing performance.</Paragraph>
    </Section>
    <Section position="2" start_page="164" end_page="165" type="sub_section">
      <SectionTitle>
4.2 CYK on Switchboard
</SectionTitle>
      <Paragraph position="0"> Table 2 presents the results of similar experiments on the Switchboard corpus following the  train/dev/test partition of Charniak and Johnson (2001). In these experiments, the parser was given correct part-of-speech tags as input.</Paragraph>
      <Paragraph position="1">  The Switchboard results demonstrate independent improvement from the syntactic annotations. The prosodic annotation helps on its own and in combination with the daughter annotation that implements Levelt's WFR.</Paragraph>
    </Section>
    <Section position="3" start_page="165" end_page="165" type="sub_section">
      <SectionTitle>
4.3 Lexicalized parser
</SectionTitle>
      <Paragraph position="0"> Finally, Table 3 reports the performance of Charniak's non-reranking, lexicalized parser on the Switchboard corpus, using the same test/dev/train partition.</Paragraph>
      <Paragraph position="1">  Since Charniak's parser does its own tagging, this experiment did not examine the utility of prosodic disjuncture marks. However, the combination of daughter annotation and -UNF propagation does lead to a better grammar-based reparandum-finder than parsers trained on flattened EDITED regions. More broadly, the results suggest that Levelt's WFR is synergistic with the kind of head-to-head lexical dependencies that Charniak's parser uses.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="165" end_page="165" type="metho">
    <SectionTitle>
5 Discussion
</SectionTitle>
    <Paragraph position="0"> The pattern of improvement in tables 1, 2, and 3 from none or baseline rows where no syntactic parallelism or break index information is used, to subsequent rows where it is used, suggest why these techniques work. Unfinished-category annotation improves performance by preventing the grammar of unfinished constituents from being polluted by the grammar of finished constituents.</Paragraph>
    <Paragraph position="1"> Such purification is independent of the fact that rules with daughters labeled EDITED-childXP tend to also mention categories labeled XP further to the right (or NP and VP, when XP starts with S). This preference for syntactic parallelism can be triggered either by externally-suggested ToBI break indices or grammar rules annotated with -UNF. The prediction of a disfluent break could be further improved by POS features and N-gram language model scores (Spilker et al., 2001; Liu, 2004).</Paragraph>
  </Section>
  <Section position="8" start_page="165" end_page="166" type="metho">
    <SectionTitle>
6 Related Work
</SectionTitle>
    <Paragraph position="0"> There have been relatively few attempts to harness prosodic cues in parsing. In a spoken language system for VERBMOBIL task, Batliner and colleagues (2001) utilize prosodic cues to dramatically reduce lexical analyses of disfluencies in a end-to-end real-time system. They tackle speech repair by a cascade of two stages - identification of potential interruption points using prosodic cues with 90% recall and many false alarms, and the lexical analyses of their neighborhood. Their approach, however, does not exploit the synergy between prosodic and syntactic features in speech repair. In Gregory et al. (2004), over 100 real-valued acoustic and prosodic features were quantized into a heuristically selected set of discrete symbols, which were then treated as pseudo-punctuation in a PCFG, assuming that prosodic cues function like punctuation. The resulting grammar suffered from data sparsity and failed to provide any benefits.</Paragraph>
    <Paragraph position="1"> Maximum entropy based models have been more successful in utilizing prosodic cues. For instance, in Lease et al. (2005), interruption point probabilities, predicted by prosodic classifiers, were quantized and introduced as features into a speech repair model along with a variety of TAG and PCFG features. Towards a clearer picture of the interaction with syntax and prosody, this work uses ToBI to capture prosodic cues. Such a method is analogous to Kahn et al. (2005) but in a generative framework.</Paragraph>
    <Paragraph position="2"> The TAG-based model of Johnson and Charniak (2004) is a separate-processing approach that rep- null resents the state of the art in reparandum-finding.</Paragraph>
    <Paragraph position="3"> Johnson and Charniak explicitly model the crossed dependencies between individual words in the reparandum and repair regions, intersecting this sequence model with a parser-derived language model for fluent speech. This second step improves on Stolcke and Shriberg (1996) and Heeman and Allen (1999) and outperforms the specific grammar-based reparandum-finders tested in section 4. However, because of separate-processing the TAG channel model's analyses do not reflect the syntactic structure of the sentence being analyzed, and thus that particular TAG-based model cannot make use of properties that depend on the phrase structure of the reparandum region. This includes the syntactic category parallelism discussed in section 3 but also predicate-argument structure. If edit hypotheses were augmented to mention particular tree nodes where the reparandum should be attached, such syntactic parallelism constraints could be exploited in the reranking framework of Johnson et al. (2004).</Paragraph>
    <Paragraph position="4"> The approach in section 3 is more closely related to that of Core and Schubert (1999) who also use metarules to allow a parser to switch from speaker to speaker as users interrupt one another.</Paragraph>
    <Paragraph position="5"> They describe their metarule facility as a modification of chart parsing that involves copying of specific arcs just in case specific conditions arise.</Paragraph>
    <Paragraph position="6"> That approach uses a combination of longest-first heuristics and thresholds rather than a complete probabilistic model such as a PCFG.</Paragraph>
    <Paragraph position="7"> Section 3's PCFG approach can also be viewed as a declarative generalization of Roark's (2004) EDIT-CHILD function. This function helps an incremental parser decide upon particular treedrawing actions in syntactically-parallel contexts like speech repairs. Whereas Roark conditions the expansion of the first constituent of the repair upon the corresponding first constituent of the reparandum, in the PCFG approach there exists a separate rule (and thus a separate probability) for each alternative sequence of reparandum constituents.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML