File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-2125_metho.xml
Size: 10,160 bytes
Last Modified: 2025-10-06 14:10:31
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-2125"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics An HMM-Based Approach to Automatic Phrasing for Mandarin Textto-Speech Synthesis</Title> <Section position="5" start_page="977" end_page="977" type="metho"> <SectionTitle> 3 Word-based Prediction </SectionTitle> <Paragraph position="0"> As noted previously, the prosodic phrasing is associated with words to some extent in Mandarin TTS synthesis. We observe that some function words (such as &quot; a0 &quot;) never occur in phrase-initial position. Some prepositions seldom act as phrase-finals. These observations lead to investigating the role of words in prediction of prosodic phrase. In addition, large-scale training data is readily available, which enables us to apply data-driven models more conveniently than before.</Paragraph> <Section position="1" start_page="977" end_page="977" type="sub_section"> <SectionTitle> 3.1 The Model </SectionTitle> <Paragraph position="0"> The sentence length in real text can vary significantly. A model with a fixed-dimension input does not fit the issue in prosodic breaking. Alternatively, the breaking prediction can be converted into an optimization problem that allows us to adopt the hidden Markov model (HMM).</Paragraph> <Paragraph position="1"> An HMM for discrete symbol observations is characterized by the following: - the state set Q ={qi}, where 1 [?] i [?] N, N is the number of states - the number of distinct observation symbol per state M -the state-transition probability distribution A={aij}, where aij=P[qt+1=j|qt=i], 1 [?] i,j [?] N -the observation symbol probability distribution B={bj(k)}, where</Paragraph> <Paragraph position="3"> - the initial state distribution pi={pii}, where pii =P[ot=vk|qt=j] , 1 [?] i,j [?] M .</Paragraph> <Paragraph position="4"> The complete parameter set of the model is denoted as a compact notation l=(A,B,pi).</Paragraph> <Paragraph position="5"> Here, we define our prosodic positions for a word to apply the HMM as follows.</Paragraph> </Section> </Section> <Section position="6" start_page="977" end_page="978" type="metho"> <SectionTitle> 3 separate </SectionTitle> <Paragraph position="0"> This means that Q can be represented as Q={0,1,2,3}, corresponding to the four prosodic positions. The word itself is defined as a discrete symbol observation.</Paragraph> <Section position="1" start_page="977" end_page="977" type="sub_section"> <SectionTitle> 3.2 The Corpus </SectionTitle> <Paragraph position="0"> The text corpus is divided into two parts. One serves as training data. This part contains 17,535 sentences, among which, 9,535 sentences have corresponding utterances. The other is a test set, which includes 1,174 sentences selected from the Chinese People's Daily. The sentence length, namely the number of words in a sentence varies from 1 to 30. The distribution of word length, phrase length and sentence length(all in character number) is shown in Figure 1.</Paragraph> <Paragraph position="1"> In a real text, there may exist words that are difficult to enumerate in the system lexicon, called &quot;non-standard&quot; words (NSW). Examples of NSW are proper names, digit strings, derivative words by adding prefix or suffix.</Paragraph> <Paragraph position="2"> Proper names include person name, place name, institution name and abbreviations, etc.</Paragraph> <Paragraph position="3"> Alternatively, some characters are usually viewed as prefix and suffix in Chinese text. For instance, the character a1 (pseudo-) always serves as a prefix, while another character a2 (like) serves as a suffix. There are 130 analogous Chinese characters have been collected roundly.</Paragraph> <Paragraph position="4"> A word segmentation module is designed to identify these non-standard words.</Paragraph> </Section> <Section position="2" start_page="977" end_page="978" type="sub_section"> <SectionTitle> 3.3 Parameter estimation </SectionTitle> <Paragraph position="0"> Parameter estimation of the model can be treated as an optimization problem. The parametric methods will be optimal if distribution derived from the training data is in the class of distributions being considered. But there is no known way so far for maximizing the probability of the observation sequence in a closed form. In the present approach, a straightforward, reasonable yet, method to re-estimate parameters of the HMM is applied. Firstly, statistics for the occurring times of word, prosodic position, prosodic-position pair are conducted. Secondly, the simple ratio of occurring times is used to calculate the probability distribution. The following expressions are used to implement calculations, State probability distribution , Ni [?][?]1 Fi is the occurring times of state qi the state-transition probability</Paragraph> <Paragraph position="2"> times of state pair (qi,qj).</Paragraph> <Paragraph position="3"> Observation probability distribution</Paragraph> <Paragraph position="5"> is the concurring times of state qj and observation vk.</Paragraph> <Paragraph position="6"> With respect to the proper names, all the person names are dealt with identically. This is based on an assumption that the proper names of individual category have the same usage.</Paragraph> </Section> <Section position="3" start_page="978" end_page="978" type="sub_section"> <SectionTitle> 3.4 Parameter adjustment </SectionTitle> <Paragraph position="0"> Note that the training corpus is discrete, finite set.</Paragraph> <Paragraph position="1"> The parameter set resulting from the limited samples cannot converge to the &quot;true&quot; values with probability. In particular, some words may not be included in the corpus. In this case, the above expressions for training may result in zero valued observation-probability. This, of course, is unexpected. The parameters should be adjusted after the automatic model training. The way is to use a sufficiently small positive constant e to represent the zero valued observationprobabilities. null</Paragraph> </Section> </Section> <Section position="7" start_page="978" end_page="979" type="metho"> <SectionTitle> 3.5 The search procedure </SectionTitle> <Paragraph position="0"> In this stage, an optimal state sequence that explains the given observations by the model is searched. That is to say, for the input sentence, an optimal prosodic-position sequence is predicted with the HHM. Instead of using the popular Viterbi algorithm, which is asymptotically optimal, we apply the Forward-Backward procedure to conduct searching.</Paragraph> <Paragraph position="1"> Backward and forward search All the definitions described in (Rabiner, 1999) are followed in the present approach.</Paragraph> <Paragraph position="2"> The forward procedure forward variable: )|,()( 21 la iqoooPi ttt == initialization: N.i1 ),()( 11 [?][?]= obi iipia induction:</Paragraph> <Paragraph position="4"> where T is the number of observations.</Paragraph> <Paragraph position="6"> posteriori probability variable: )(itg , this is the probability of being in state i at time t given the observation sequence O and the model l. It can be expressed as follows:</Paragraph> <Paragraph position="8"> Here comes a question. It is, whether the optimal state sequence means the optimal path.</Paragraph> <Paragraph position="10"> Search based on dynamic programming The preceding search procedure targets the optimal state sequence satisfying one criterion. But it does not reflect the probability of occurrence of sequences of states. This issue is explored based on a dynamic programming (DP) like approach, as described below.</Paragraph> <Paragraph position="11"> For convenience, we illustrate the problem as shown in Figure 2.</Paragraph> <Paragraph position="12"> From Figure 2, it can be seen that the transition from state i to state j only occurs in the two consecutive stages, namely time synchronous.</Paragraph> <Paragraph position="13"> Totally, there are T stages, TN 2 arcs. Therefore, the optimal-path issue is a multi-stage optimization problem, which is similar to the DP problem. The slight difference lies in that a node in the conventional DP problem does not contain any additional attribute, while a node in HMM carries the attribute of observation probability distribution. Considering this difference, we modify the conventional DP approach in the following way.</Paragraph> <Paragraph position="14"> In the trellis above, we add a virtual node (state), where the start node qs corresponding to time 0 before time 1. All the transitions from qs to nodes in the first stage (time 1) equal to 1/N. Furthermore, all the observation probability distributions equal to 1/M. Denoting the optimal path from qs to the node qi of time t as path(t,i), path(t,i) is a set of sequential states. Accordingly, we denote the score of path(t,i) as s(t,i). Then, s(t,i) is associated with the state-transition probability distribution and observation probability distribution. We describe the induction process as follows.</Paragraph> <Paragraph position="15"> initialization: optimal path.</Paragraph> <Paragraph position="16"> Basically, the main idea of our approach lies in that if the final optimal path passes a node j at time t, it passes all the nodes in path(t,j) sequentially. This idea is similar to the forward procedure of DP. We can begin with the termination T and derive an alternative approach. As for time complexity, the above trellis can be viewed as a special DAG. The state transition from time t to time t+1 requires 2N2 calculations, resulting in the time complexity O(TN 2).</Paragraph> <Paragraph position="17"> Intuitively, the optimal path differs from the optimal state sequence generated by the Forward-Backward procedure. The underlying idea of Forward-Backward procedure is that the target state sequence can explain the observations optimally. To support our claim, we can give a simple example (T=2, N=2,pi =[0.5,0.5]T ) as follows: Apparently, the optimal state sequence is (1,1), while the optimal path is {1,2}.</Paragraph> </Section> class="xml-element"></Paper>