File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-1703_metho.xml

Size: 12,037 bytes

Last Modified: 2025-10-06 14:08:36

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1703">
  <Title>Utterance Segmentation Using Combined Approach Based on Bi-directional N-gram and Maximum Entropy</Title>
  <Section position="4" start_page="0" end_page="21" type="metho">
    <SectionTitle>
3 Maximum-Entropy-Weighted Bi-
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="21" type="sub_section">
      <SectionTitle>
directional N-gram-based Segmentation
Method
3.1 Normal N-gram Algorithm (NN) for Ut-
</SectionTitle>
      <Paragraph position="0"> (where m is a natural number) is a word sequence, we consider it as an n order Markov chain, in which the word</Paragraph>
      <Paragraph position="2"> [?][?] is predicted by the n-1 words to its left. Here is the corresponding formula:</Paragraph>
      <Paragraph position="4"> From this conditional probability formula for a word, we can derive the probability of a word se-</Paragraph>
      <Paragraph position="6"/>
      <Paragraph position="8"> In the normal N-gram method, the above iterative formulas are computed to search the sentence  In contrast to the normal N-gram segmentation method, we compute the above iterative formulas to seek sentence boundaries from</Paragraph>
      <Paragraph position="10"/>
    </Section>
    <Section position="2" start_page="21" end_page="21" type="sub_section">
      <SectionTitle>
3.3 Bi-directional N-gram Algorithm for Ut-
terance Segmentation
</SectionTitle>
      <Paragraph position="0"> From the iterative formulas of the normal N-gram algorithm and the reverse N-gram algorithm, we can see that the normal N-gram method recognizes a candidate sentence boundary location mainly according to its left context, while the reverse N-gram method mainly depends on its right context.</Paragraph>
      <Paragraph position="1"> Theoretically at least, it is reasonable to suppose that, if we synthetically consider both the left and the right context by integrating the NN and the RN, the overall segmentation accuracy will be improved. null  )11( [?][?][?] mi respectively to indicate the probability that the current site i really is, or is not, a sentence boundary. Thus, to compute the word sequence segmentation, we must compute )(iP is and )(iP no for each of the m-1 candidate sites. In the bi-directional BN, we compute )(iP</Paragraph>
      <Paragraph position="3"> by combining the NN results and RN results. The combination is described by the following formulas:</Paragraph>
      <Paragraph position="5"> denote the probabilities calculated by NN which correspond to  in section 3.2 respectively. We say there exits a sentence boundary at site i )11( [?][?][?] mi if and only if )()(</Paragraph>
      <Paragraph position="7"/>
    </Section>
    <Section position="3" start_page="21" end_page="21" type="sub_section">
      <SectionTitle>
3.4 Maximum Entropy Approach for Utter-
ance Segmentation
</SectionTitle>
      <Paragraph position="0"> In this section, we explain our maximum-entropy-based model for utterance segmentation. That is, we estimate the joint probability distribution of the candidate sites and their surrounding words. Since we consider information concerning the lexical context to be useful, we define the feature functions for our maximum method as follows:  may contain the sentence boundary mark 'SB'.) The candidate c's state is denoted by b, where b=1 indicates that c is a sentence boundary and b=0 indicates that it is not a boundary. Prefix(c) denotes all the word sequences ending with c (that is, c's left context plus c) and Suffix(c) denotes all the word sequences beginning with c (in other words, c plus its right context). For example: in the utter-</Paragraph>
      <Paragraph position="2"> where k is the total number of the Matching Strings and p is a parameter set to make P(c,1) and P(c,0) sum to 1. The unknown parameters  a are chosen to maximize the likelihood of the training data using the Generalized Iterative Scaling (Darroch and Ratcliff, 1972) algorithm. In the maximum entropy approach, we say that a candidate site is a sentence boundary if and only if P(c, 1) &gt; P(c, 0). (At this point, we can anticipate a technical problem with the maximum approach to utterance segmentation. When a Matching String contains SB, we cannot know whether it belongs to the Prefixes or Suffixes of the candidate site until the left and right contexts of the candidate site have been segmented. Thus if the segmentation proceeds from left to right, the lexical information in the right context of the current candidate site will always remain uncertain. Likewise, if it proceeds from right to left, the information in the left context of the current candidate site remains uncertain. The next subsection will describe a pragmatic solution to this problem.)</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="21" end_page="21" type="metho">
    <SectionTitle>
3.5 Maximum-Entropy-Weighted Bi-
</SectionTitle>
    <Paragraph position="0"> directional N-gram Algorithm for Utter-</Paragraph>
    <Section position="1" start_page="21" end_page="21" type="sub_section">
      <SectionTitle>
ance Segmentation
</SectionTitle>
      <Paragraph position="0"> In the bi-directional N-gram based algorithm, we have considered the left-to-right N-gram algorithm and the right-to-left algorithm as having the same significance. Actually, however, they should be assigned differing weights, depending on the lexical contexts. The combination formulas are as follows: null</Paragraph>
      <Paragraph position="2"> are the functions of the context surrounding candidate site i which denotes the weights of</Paragraph>
      <Paragraph position="4">  respectively. Assuming that the weights of )(</Paragraph>
      <Paragraph position="6"> depend upon the context to the left of the candidate site, and that the weights of</Paragraph>
      <Paragraph position="8"> depend on the context to the right of the candidate site, the weight functions can be rewritten as:</Paragraph>
      <Paragraph position="10"> will increase in significance. (The joint probability in question is the probability of the current candidate's left context, taken together with the probability that the candidate is a sentence boundary.) Therefore the value of )(  We can easily get the values of</Paragraph>
      <Paragraph position="12"> using the method described in the maximum entropy approach section. For example:  )!,( ap As mentioned in last subsection, we need segmented contexts for maximum entropy approach. Since the maximum entropy parameters for MEBN algorithm are used as modifying NN and RN, we just estimate the joint probability of the candidate and its surrounding contexts based upon the segments by NN and RN. Using NLeftC i indicate the left context to the candidate i which has been segmented by NN algorithm and RRightC i indicate the right context to i which has been segmented by RN, the combination probability computing formulas for MEBN are as follows:</Paragraph>
    </Section>
    <Section position="2" start_page="21" end_page="21" type="sub_section">
      <SectionTitle>
4.1 Model Training
</SectionTitle>
      <Paragraph position="0"> Our models are trained on both Chinese and English corpora, which cover the domains of hotel reservation, flight booking, traffic information, sightseeing, daily life and so on. We replaced the full stops with &amp;quot;SB&amp;quot; and removed all other punctuation marks in the training corpora. Since in most actual systems part of speech information cannot be accessed before determining the sentence boundaries, we use Chinese characters and English words without POS tags as the units of our N-gram models. Trigram and reverse trigram probabilities are estimated based on the processed training corpus by using Modified Kneser-Ney Smoothing (Chen and Goodman, 1998). As to the maximum entropy model, the Matching Strings are chosen as all the word sequences occurring in the training corpus whose length is no more than 3 words. The unknown parameters corresponding to the feature functions are generated based on the training corpus using the Generalized Iterative Scaling algorithm. Table 1 gives an overview of the training corpus.</Paragraph>
    </Section>
    <Section position="3" start_page="21" end_page="21" type="sub_section">
      <SectionTitle>
4.2 Testing Results
</SectionTitle>
      <Paragraph position="0"> We test our methods using open corpora which are also limited to the domains mentioned above. All punctuation marks are removed from the test corpora. An overview of the test corpus appears in  We have implemented four segmentation algorithms using NN, RN, BN and MEBN respectively. If we use &amp;quot;RightNum&amp;quot; to denote the number of right segmentations, &amp;quot;WrongNum&amp;quot; denote the number of wrong segmentations, and &amp;quot;TotalNum&amp;quot; to denote the number of segmentations in the original testing corpus, the precision (P) can be computed using the formula P=RightNum/(RightNum+WrongNum), the recall (R) is computed as R=RightNum/TotalNum, and the F-Score is computed as F-Score =  From the result tables it is clear that RN, BN, and MEBN all outperforms the normal N-gram algorithm in the F-score for both Chinese and English utterance segmentation. MEBN achieved the best performance which improves the precision by 7.3% and the recall by 1.5% in the Chinese experiment, and improves the precision by 5.4% and the recall by 1.9% in the English experiment.</Paragraph>
    </Section>
    <Section position="4" start_page="21" end_page="21" type="sub_section">
      <SectionTitle>
4.3 Result analysis
</SectionTitle>
      <Paragraph position="0"> MEBN was proposed in order to maintain the correct segments of the normal N-gram algorithm while skipping the wrong segments. In order to see whether our original intention has been realized, we compared the segments as determined by RN with those determined by NN, compare the segments found by BN with those of NN and then compare the segments found by MEBN with those of NN. For RN, BN and MEBN, suppose TN denotes the number of total segmentations, CON denotes the number of correct segmentations overlapping with those found by NN; SWN denotes the number of wrong NN segmentations which were skipped; WNON denotes the number of wrong segmentations not overlapping with those of NN; and CNON denotes the number of segmentations which were correct but did not overlap with those of NN. The statistical results are listed in Table 5 and Table 6.</Paragraph>
      <Paragraph position="1">  Comparison.</Paragraph>
      <Paragraph position="2"> Focusing upon the Chinese results, we can see that RN skips 1098 incorrect segments found by NN, and has 9525 correct segments in common with those of NN. It verifies our supposition that RN can effectively avoid some errors made by NN.</Paragraph>
      <Paragraph position="3"> But because at the same time RN brings in 1077 new errors, RN doesn't improve much in precision. BN skips 753 incorrect segments and brings in 355 new segmentation errors; has 9906 correct segments in common with those of NN and brings in 622 new correct segments. So by equally integrating NN and RN, BN on one hand finds more correct segments, on the other hand brings in less wrong segments than NN. But in skipping incorrect segments by NN, BN still performs worse than RN, showing that it only exerts the error skipping ability of RN to some extent. As for MEBN, it skips 1274 incorrect segments and at the same time brings in only 223 new incorrect segments. Additionally it maintains 9646 correct segments in common with those of NN and brings in 678 new correct segments. In recall MEBN performs a little worse than BN, but in precision it achieves a much better performance than BN, showing that modified by the maximum entropy weights, MEBN makes use of the error skipping ability of RN more effectively. Further, in skipping wrong segments by NN, MEBN even outperforms RN, which indicates the weights we set on NN and RN not only act as modifying parameters, but also have direct beneficial affection on utterance segmentation.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML