XML Viewer - w03-1703

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-1703_intro.xml
Size: 8,059 bytes
Last Modified: 2025-10-06 14:01:58
<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1703">
  <Title>Utterance Segmentation Using Combined Approach Based on Bi-directional N-gram and Maximum Entropy</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Related Work and Our Motivations
2.1 Related Work
</SectionTitle>
    <Paragraph position="0"> Stolcke et al. (1998, 1996) proposed an approach to detection of sentence boundaries and disfluency locations in speech transcribed by an automatic recognizer, based on a combination of prosodic cues modeled by decision trees and N-gram language models. Their N-gram language model is mainly based on part of speech, and retains some words which are particularly relevant to segmentation. Of course, most part-of-speech taggers require sentence boundaries to be pre-determined; so to require the use of part-of-speech information in utterance segmentation would risk circularity. Cettolo et al.'s (1998) approach to sentence boundary detection is somewhat similar to Stolcke et al.'s.</Paragraph>
    <Paragraph position="1"> They applied word-based N-gram language models to utterance segmentation, and then combined them with prosodic models. Compared with N-gram language models, their combined models achieved an improvement of 0.5% and 2.3% in precision and recall respectively.</Paragraph>
    <Paragraph position="2"> Beeferman et al. (1998) used the CYBERPUNC system to add intra-sentence punctuation (especially commas) to the output of an automatic speech recognition (ASR) system. They claim that, since commas are the most frequently used punctuation symbols, their correct insertion is by far the most helpful addition for making texts legible.</Paragraph>
    <Paragraph position="3"> CYBERPUNC augmented a standard trigram speech recognition model with lexical information concerning commas, and achieved a precision of 75.6% and a recall of 65.6% when testing on 2,317 sentences from the Wall Street Journal.</Paragraph>
    <Paragraph position="4"> Gotoh et al. (1998) applied a simple non-speech interval model to detect sentence boundaries in English broadcast speech transcripts. They compared their results with those of N-gram language models and found theirs far superior. However, broadcast speech transcripts are not really spoken language, but something more like spoken written language. Further, radio broadcasters speak formally, so that their reading pauses match sentence boundaries quite well. It is thus understandable that the simple non-speech interval model outperforms the N-gram language model under these conditions; but segmentation of natural utterances is quite different. null Zong et al. (2003) proposed an approach to utterance segmentation aiming at improving the performance of spoken language translation (SLT) systems. Their method is based on rules which are oriented toward key word detection, template matching, and syntactic analysis. Since this approach is intended to facilitate translation of Chinese-to-English SLT systems, it rewrites long sentences as several simple units. Once again, these results cannot be regarded as general-purpose utterance segmentation. Furuse et al. (1998) similarly propose an input-splitting method for translating spoken language which includes many long or ill-formed expressions. The method splits an input into well-balanced translation units, using a semantic dictionary.</Paragraph>
    <Paragraph position="5"> Ramaswamy et al. (1998) applied a maximum entropy approach to the detection of command boundaries in a conversational natural language user interface. They considered as their features words and their distances to potential boundaries.</Paragraph>
    <Paragraph position="6"> They posited 400 feature functions, and trained their weights using 3000 commands. The system then achieved a precision of 98.2% in a test set of 1900 commands. However, command sentences for conversational natural language user interfaces contain much smaller vocabularies and simpler structures than the sentences of natural spoken language. In any case, this method has been very helpful to us in designing our own approach to utterance segmentation.</Paragraph>
    <Paragraph position="7"> There are several additional approaches which are not designed for utterance segmentation but which can nevertheless provide useful ideas. For example, Reynar et al. (1997) proposed an approach to the disambiguation of punctuation marks. They considered only the first word to the left and right of any potential sentence boundary, and claimed that examining wider context was not beneficial. The features they considered included the candidate's prefix and suffix; the presence of particular characters in the prefix or suffix; whether the candidate was honorific (e.g. Mr., Dr.); and whether the candidate was a corporate designator (e.g. Corp.). The system was tested on the Brown Corpus, and achieved a precision of 98.8%. Elsewhere, Nakano et al. (1999) proposed a method for incrementally understanding user utterances whose semantic boundaries were unknown. The method operated by incrementally finding plausible sequences of utterances that play crucial roles in the task execution of dialogues, and by utilizing beam search to deal with the ambiguity of boundaries and with syntactic and semantic ambiguities. Though the method does not require utterance segmentation before discourse processing, it employs special rule tables for discontinuation of significant utterance boundaries. Such rule tables are not easy to maintain, and experimental results have demonstrated only that the method outperformed the method assuming pauses to be semantic boundaries.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Our motivations
</SectionTitle>
      <Paragraph position="0"> Though numerous methods for utterance segmentation have been proposed, many problems remain unsolved.</Paragraph>
      <Paragraph position="1"> One remaining problem relates to the language model. The N-gram model evaluates candidate sentence boundaries mainly according to their left context, and has achieved reasonably good results, but it can't take into account the distant right context to the candidate. This is the reason that N-gram methods often wrongly divide some long sentences into halves or multiple segments. For example:Xiao Wang Bing Liao [?] Ge Xing Qi . The N-gram method is likely to insert a boundary mark between &amp;quot;Liao &amp;quot; and &amp;quot;[?] &amp;quot;, which corresponds to our everyday impression that, if reading from the left and not considering several more words to the right of the current word, we will probably consider &amp;quot;Xiao Wang Bing Liao &amp;quot; as a whole sentence. However, we find that, if we search the sentence boundaries from right to left, such errors can be effectively avoided. In the present example, we won't consider &amp;quot;[?] Ge Xing Qi &amp;quot; as a whole sentence, and the search will be continued until the word &amp;quot;Xiao &amp;quot; is encountered. Accordingly, in order to avoid segmentation errors made by the normal N-gram method, we propose a reverse N-gram segmentation method (RN) which does seek sentence boundaries from right to left.</Paragraph>
      <Paragraph position="2"> Further, we simply integrate the two N-gram methods and propose a bi-directional N-gram method (BN), which takes into account both the left and the right context of a candidate segmentation site. Since the relative usefulness or significance of the two N-gram methods varies depending on the context, we propose a method of weighting them appropriately, using parameters generated by a maximum entropy method which takes as its features information about words in the context. This is our Maximum-Entropy-Weighted Bi-directional N-gram-based segmentation method.</Paragraph>
      <Paragraph position="3"> We hope MEBN can retain the correct segments discovered by the usual N-gram algorithm, yet effectively skip the wrong segments.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML