File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/w02-1819_metho.xml

Size: 18,374 bytes

Last Modified: 2025-10-06 14:08:08

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1819">
  <Title>SS</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Rule Learning Algorithms
</SectionTitle>
    <Paragraph position="0"> Research on machine learning has concentrated in the main on inducing rules from unordered set of examples. And knowledge represented in a collection of rules is understandable and effective way to realize some kind of intelligence. C4.5 (Quinlan, 1986) and transformation-based learning (Brill, 1995) are typical rule-learning algorithms that have been applied to various NLP tasks such as part-of-speech tagging and named entity extraction etc.</Paragraph>
    <Paragraph position="1"> Both algorithms are supervised learning and can be used to induce rules from examples.</Paragraph>
    <Paragraph position="2"> But they also have difference from each other. Firstly the C4.5 rule induction is a completely automatic process. What we need to do is to extract appropriate features for our problem.</Paragraph>
    <Paragraph position="3"> As to transformation-based learning (henceforth TBL), transformation rule templates, which determine the effectiveness of the acquired rules, have to be designed manually before learning. Thus TBL can only be viewed as a semi-automatic method.</Paragraph>
    <Paragraph position="4"> Secondly the induction of C4.5 rules using a divide-and-conquer strategy is much faster than the greedy searching for TBL ones. In view of the above facts, C4.5 rules are induced from examples first in our experiments. And then the rules are used to guide the design of rule templates for TBL. See section 4.8 for detail.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="22" type="metho">
    <SectionTitle>
3 Prosodic Phrase Prediction
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="21" type="sub_section">
      <SectionTitle>
3.1 The Methodology
</SectionTitle>
      <Paragraph position="0"> Linguistic research has suggested that Chinese utterance is also structured in a prosodic hierarchy, in which there are mainly three levels of prosodic units: prosodic word, prosodic phrase and intonation phrase (Li and Lin, 2000).. Figure 1 shows the prosodic structure of a Chinese sentence. In the tree structure, the non-leaf nodes are prosodic units and the leaves are syntactic words. A prosodic phrase is composed of several prosodic words, each of which in turn consists of several syntactic words. Since intonation phrase is usually indicated by punctuation marks, we only need to consider the prediction of prosodic word and phrase.</Paragraph>
      <Paragraph position="2"> for intonation phrase, PP for prosodic phrase, PW for prosodic word) Suppose we have a string of syntactic words i.e.</Paragraph>
      <Paragraph position="4"> , the boundary between two neighbouring words is represented as</Paragraph>
      <Paragraph position="6"> . There are total three types of boundaries labelled as B  (the words are in the same prosodic phrase, but not the same prosodic word), or B  (the words are in different prosodic phrases) respectively. Thus prosodic phrase prediction is to predict such boundary labels, which can be viewed as a classification task. We believe these labels are determined by the contextual linguistic information around the boundary. If we have a speech corpus with prosodic labelling, features related to prosodic phrasing can be extracted at each boundary and combined with the corresponding boundary labels to establish an example database. Then rule-learning algorithms are executed on the database to induce rules for predicting boundary labels.</Paragraph>
    </Section>
    <Section position="2" start_page="21" end_page="22" type="sub_section">
      <SectionTitle>
3.2 Evaluation Parameters
</SectionTitle>
      <Paragraph position="0"> As a classification task, prosodic phrase prediction should be evaluated with consideration on all the classes. The rules induced from examples are applied on a test corpus to predict the label of each boundary.</Paragraph>
      <Paragraph position="1"> The predicted labels are compared with labels given by human, which are thought to be true, to get a confusion matrix as follows:</Paragraph>
      <Paragraph position="3"> but predicted as B j . From these counts, we can deduce the evaluation parameters for prosodic phrasing.</Paragraph>
      <Paragraph position="5"> defines the recall rate of boundary</Paragraph>
      <Paragraph position="7"> ePr defines the precision rate of</Paragraph>
      <Paragraph position="9"> is a combination of recall and precision rate, suggested by (Rijsbergen, 1979).</Paragraph>
      <Paragraph position="10">  into one label, which can be viewed as the prediction of prosodic word boundary,  Acc defines the overall accuracy of this case.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="22" end_page="22" type="metho">
    <SectionTitle>
4 Experiments
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="22" end_page="22" type="sub_section">
      <SectionTitle>
4.1 The Corpus
</SectionTitle>
      <Paragraph position="0"> In our experiments, the speech corpus of our TTS system is used for training and testing.</Paragraph>
      <Paragraph position="1"> The corpus has 3167 sentences, which are randomly selected from newspaper and read by a radiobroadcaster. We manually labelled the sentences with two-level prosodic structure by listening to the record speech. For example, the sentence in Figure 1 is labelled as &amp;quot;G35GA7/ B  Preliminary tests show that manually labelling can achieve a high consistency rate among human. Therefore it is reasonable to make the manually labelled results as the target of learning algorithms.</Paragraph>
      <Paragraph position="2"> The sentences of the corpus are also processed with a text analyzer, where Chinese word segmentation and part-of-speech tagging are accomplished in one step using a statistical language model. The segmentation and tagging yields a gross accuracy rate over 94%. The output of the text analyzer is directly used as the training data of learning algorithms without correcting segmentation or tagging errors because we want to train classifiers with noisy data in the real situation.</Paragraph>
      <Paragraph position="3"> Here are some statistical figures about the corpus. There are 56446 Chinese characters in the corpus, which constitute 37669 words. The number of prosodic word boundaries is 16194 and that of prosodic phrase ones is only 7231. The average length of syntactic word, prosodic word, prosodic phrase and sentence are 1.5, 2.4, 7.8 and 17.0 in character, respectively.</Paragraph>
    </Section>
    <Section position="2" start_page="22" end_page="22" type="sub_section">
      <SectionTitle>
4.2 Candidate Features
</SectionTitle>
      <Paragraph position="0"> Feature selection is crucial to the classification of prosodic boundary labels. Linguistic information around the word boundary is the main source of features. The features may come from different levels including syllable, word, phrase, sentence level. And the type of features may be phonetic, lexical, syntactic, semantic or pragmatic. Which features have most close relation with prosodic phrasing and how to represent them are still open research problems. In our approach, we decide to list all the possible features first and figure out the most effective ones by experiments. The features we currently consider are presented in the following.</Paragraph>
      <Paragraph position="1">  Chinese is well known as a monosyllabic, tonal language. And phonetic study shows sound will change in continuous speech because of context or prosodic structure.</Paragraph>
      <Paragraph position="2"> Retroflex, neutral tone and tone sandhi are important phonetic phenomena that cause sound variation. (Li and Lin, 2000). Thus phonetic information about phone and syllable is related to prosodic phasing. There are too many tonal syllables (about 1300) in Chinese to consider. Instead, the initials and finals of the syllables (total about 60) near a word boundary are taken into accounts, which are represented as SYIF in the following text.</Paragraph>
      <Paragraph position="3"> Similarly the tones of the syllables, denoted by TONE, are also included as phonetic features.</Paragraph>
      <Paragraph position="4">  Words in natural language have different occurrence frequency. And words that have high occurrence frequency may be especially important to prosodic phrasing (e.g. some functional words in Chinese, G00G58G01GC8G00G60G01 etc). Therefore lexical word is treated as a candidate feature, represented as WORD.</Paragraph>
      <Paragraph position="5">  Syntactic information has close relation with prosodic structure. POS, which denotes part-of-speech of words, is a basic syntactic feature much easier to obtain with automatic POS taggers. And it has been widely adopted in previous researches. Since POS tag sets varies with taggers, we try to determine the best one for predicting prosodic phrase by experiments.</Paragraph>
      <Paragraph position="6">  From the statistical figures of the corpus, both prosodic word and phrase have limitation in length. The length of syntactic word (WLEN), the length of the sentence in character (SLENC) and word (SLENW) are considered as length features. In HMM-based methods, the chain of boundary labels in a sentence is supposed to conform to Markov assumption. And according to experience, it is less possible for two boundaries with label B  to locate very close to each other. Thus the label of previous boundaries (BTYPE) and the distances from them to current position are also possible features.</Paragraph>
    </Section>
    <Section position="3" start_page="22" end_page="22" type="sub_section">
      <SectionTitle>
4.3 Example Database
</SectionTitle>
      <Paragraph position="0"> All of the possible features are extracted from the corpus at each boundary to establish an example database. Table 2 shows parts of the example entries of two word boundaries in  row name has a format of feature name plus a number. The number indicates which word the feature comes from. And the range of the number is limited by a window size. For example, POS_0 denotes part-of-speech of the word just before the word boundary, POS_-1 denotes that of the second word previous to the boundary and POS_1 denotes that of the word just after the boundary. The rest may be deduced by analogy. BTYPE_0 is the label of current boundary and also the target to be predicted.</Paragraph>
    </Section>
    <Section position="4" start_page="22" end_page="22" type="sub_section">
      <SectionTitle>
4.4 Feature Selection Experiments
</SectionTitle>
      <Paragraph position="0"> Once the example database is established, we can begin to induce rules from it with rule learners. If all the features were used in one experiment, the feature space would get too large to learn rules quickly. Moreover we want to eliminate less significant features from the database. A series of comparative experiments is carried out to figure out the effective features. C4.5 learner is used to perform the learning task in the following experiments.</Paragraph>
      <Paragraph position="1">  Since POS features are widely used, a baseline experiment is performed with only two POS features that are POS_0 and POS_1. The POS tag set has total 30 tags from the tagger.</Paragraph>
      <Paragraph position="2">  The window size determines the number of words whose features are considered. Suppose the window size is L+R, which means the features of L words left to the boundary and R words right to it are used. We design experiments with the combination of different value of L and R to find the best window of POS features. The features in the window are denoted by POS{-L+1, R} in a range form.</Paragraph>
      <Paragraph position="3">  Experiments are conducted on three POS sets, which are BSET, LSET and CSET. BSET is the basic POS set from the tagger. LSET is an enlarged version of BSET, which includes the most frequent 100 words as independent tags. CSET is built with clustering technique. Each POS in the BSET is represented as a 6-dimension vector, whose components are the probabilities of the boundary labels after and before that POS. Then these vectors are clustered into 10 groups. The window size used is 1+1.</Paragraph>
      <Paragraph position="4">  WORDLEN and SLEN are added into the baseline system to investigate the importance of length features in No.12 and 13. SYIF, TONE features of syllables around the boundary are considered in No.14. Previous boundary labels (BTYPE_-1, BTYPE_-2) are tested in the experiments No.15 and 16.</Paragraph>
      <Paragraph position="5"> WORD features are used in No.17 to find if there exist some words that have special prosodic effects.</Paragraph>
      <Paragraph position="6">  are defined in section 3.2)</Paragraph>
    </Section>
    <Section position="5" start_page="22" end_page="22" type="sub_section">
      <SectionTitle>
4.5 Feature selection results
</SectionTitle>
      <Paragraph position="0"> The results of these experiments are listed in Table 3. From the evaluation figures in the table, we can draw the following conclusions on the effect of the features on prosodic phrase prediction: 1) Part-of-speech is a basic and useful feature. A window size of 2+1 is already enough. Larger window size will greatly lengthen the time of training but make no significant improvement on the accuracy rate.</Paragraph>
      <Paragraph position="1"> 2) The largest POS set LSET performs better than smaller ones like BSET and CSET.</Paragraph>
      <Paragraph position="2"> That's because small POS sets lead to small feature space, which may be not big enough to distinguish the training examples.</Paragraph>
      <Paragraph position="3">  3) Length features are beneficial to prosodic phrase prediction.</Paragraph>
      <Paragraph position="4"> 4) Phonetic features are less useful than what we think before.</Paragraph>
      <Paragraph position="5"> 5) Former boundary information is also  useful. When training, the former and latter boundary labels are both known, but when testing, exact former boundary labels do not exist. We can use the boundary labels that are already predicted to help make decision on current label. Although the error prediction of former labels may lead to error of current prediction, the result shows the accuracy rate is improved.</Paragraph>
      <Paragraph position="6"> 6) WORD feature is not appropriate to use, since the using of it greatly enlarges the feature space and needs more training examples.</Paragraph>
    </Section>
    <Section position="6" start_page="22" end_page="22" type="sub_section">
      <SectionTitle>
4.6 C4.5 Experiments
</SectionTitle>
      <Paragraph position="0"> According to the feature selection results, we know some features are effective to prosodic phrase prediction but some are not. And the solely using of effective features doesn't result in a high enough accuracy rate. In order to improve the prediction accuracy, we combine the effective features such as WLEN{-1, 1}, BTYPE{-1}, SLEN and POS{-1,1} in LSET tag set together to induce C4.5 rules.</Paragraph>
    </Section>
    <Section position="7" start_page="22" end_page="22" type="sub_section">
      <SectionTitle>
4.7 Examples of C4.5 Rules
</SectionTitle>
      <Paragraph position="0"> As mentioned above, rule systems have the advantage of simplicity and understandability.</Paragraph>
      <Paragraph position="1"> We examine the rules learned by C4.5 and find they certainly reflect the usage of prosodic structure in some sense. Here are some rules followed by example sentences with the current boundary labels in bold:  Rule 1, 2 and 3 shows the special prosodic effect of functional words such as &amp;quot;G5A&amp;quot;, &amp;quot;G58&amp;quot;, &amp;quot;GE1&amp;quot;, which tends to adhere to prosodic words in the sentences. Rule 4 exemplifies that the syntactic structure &amp;quot;Verb+G62&amp;quot; usually acts as a prosodic word. Rule 5 concerns the conjunction word, the boundary before which would be B  (prosodic phrase boundary) if the previous word had a length above 2. The B  boundary is thought to accentuate the word before the conjunction. Rule 6 deals with the structure &amp;quot;Noun+G03&amp;quot;. We can see that these rules coincide with the experience of prosodic phrasing by human.</Paragraph>
    </Section>
    <Section position="8" start_page="22" end_page="22" type="sub_section">
      <SectionTitle>
4.8 TBL Experiments
</SectionTitle>
      <Paragraph position="0"> A general TBL toolkit (Grace and Radu, 2001) is used in our TBL experiments. The analysis on C4.5 rules casts lights on the design of the transformation rule templates of TBL. Since the same features as C4.5 learning are used in the rule templates, linguistic knowledge, which has been embodied by C4.5 rules, should also be captured by transformation rule templates.</Paragraph>
      <Paragraph position="1"> Suppose a C4.5 rule, &amp;quot;if (POS_0 == n &amp;&amp;</Paragraph>
      <Paragraph position="3"> &amp;quot;, has a high prediction accuracy, it is reasonable to make this rule as an instantiation of TBL rule templates. Table 4 lists some of the rule  The left part of a rule template is a list of features, and the right is the target, BTYPE_0. For example, &amp;quot;POS_0 POS_1 =&gt; BTYPE_0&amp;quot;, which is a short form of &amp;quot;if (POS_0 == X &amp;&amp; POS_1 == Y) then BTYPE_0 = Z&amp;quot;, means if current POS were X and the next POS were Y, the boundary label would be Z. X, Y, Z are template variables. Let X=n Y=u Z=B</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="22" end_page="22" type="metho">
    <SectionTitle>
, the
</SectionTitle>
    <Paragraph position="0"> template is instantiated into the C4.5 rule above.</Paragraph>
    <Paragraph position="1"> Due to the mechanism of TBL rules, there exist rule templates like &amp;quot;BTYPE_0 POS_0 POS_1 =&gt; BTYPE_0&amp;quot;, in which the former BTYPE_0 is the label before applying the rule and the latter is after applying it. That's actually what transformation means. When training, the initial boundary labels are all set to B  . At each step, the algorithm tries all the possible values for template variables to find an instantiated rule that can achieve the best score. When testing, the initial boundary labels are set the same way, and then transformation rules are applied one by one.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML