Word Informativeness and Automatic Pitch Accent Modeling 
Shimei Pan and Kathleen R. McKeown 
Dept. of Computer Science 
Columbia University 
New York, NY 10027, USA 
{pan, kathy}@cs, columbia, edu 
Abstract 
In intonational phonology and speech syn- 
thesis research, it has been suggested that 
the relative informativeness of a word can be 
used to predict pitch prominence. The more 
information conveyed by a word, the more 
likely it will be accented. But there are oth- 
ers who express doubts about such a correla- 
tion. In this paper, we provide some empiri- 
cal evidence to support the existence of such 
a correlation by employing two widely ac- 
cepted measures of informativeness. Our ex- 
periments show that there is a positive corre- 
lation between the informativeness of a word 
and its pitch accent assignment. They also 
show that informativeness enables statisti- 
cally significant improvements in pitch ac- 
cent prediction. The computation of word 
informativeness is inexpensive and can be 
incorporated into speech synthesis systems 
easily. 
1 Introduction 
The production of natural, intelligible 
speech remains a major challenge for speech 
synthesis research. Recent research has 
focused on prosody modeling (Silverman, 
1987; Hirschberg, 1990; Santen, 1992), which 
determines the variations in pitch, tempo 
and rhythm. One of the critical issues in 
prosody modeling is pitch accent assign- 
ment. Pitch accent is associated with the 
pitch prominence of a word. For example, 
some words may sound more prominent than 
others within a sentence because they are as- 
sociated with a sharp pitch rise or fall. Usu- 
ally, the prominent words bear pitch accents 
while the less prominent ones do not. A1- 
though native speakers of a particular lan- 
guage have no difficulty in deciding which 
words in their utterances should be accented, 
the general pattern of accenting in a lan- 
guage, such as English, is still an open ques- 
tion. 
Some linguists speculate that relative in- 
formativeness, or semantic weight of a word 
can influence accent placement. Ladd (1996) 
claims that "the speakers assess the relative 
semantic weight or informativeness of poten- 
tially accentable words and put the accent on 
the most informative point or points" (ibid, 
pg. 175). He also claims that "if we un- 
derstand relative semantic weight, we will 
automatically understand accent placement" 
(ibid, pg. 186). Bolinger (Bolinger, 1972) 
also uses the following examples to illustrate 
the phenomenon: 
1. "He was arrested because he KILLED a 
man." 
2. "He was arrested because he killed a 
POLICEMAN." 
The capitalized words in the examples are 
accented. In (1), "man" is semantically 
empty relative to "kill"; therefore, the verb 
"kill" gets accented. However, in (2), "po- 
liceman" is semantically rich and is accented 
instead. 
However, different theories, not based on 
informativeness, were proposed to explain 
the above phenomenon. For example, Bres- 
nan's (1971) explanation is based on syn- 
tactic function. She suggests that "man" 
in the above sentence does not get accented 
because "man" and other words like "guy" 
or "person" or "thing" form a category of 
148 
"semi-pronouns". Counter-examples listed 
below raise more questions about the use- 
fulness of semantic informativeness. The ac- 
cent pattern in the following examples can- 
not be explaihed solely by semantic informa- 
tiveness. 
3. "HOOVER dam." 
4. "Hoover TOWER." 
While researchers have discussed the pos- 
sible influence of semantic informativeness, 
there has been no known empirical study 
of the claim nor has this type of informa- 
tion been incorporated into computational 
models of prosody. In this work, we employ 
two measurements of informativeness. First, 
we adopt an information-based framework 
(Shannon, 1948), quantifying the "Informa- 
tion Content', (IC)" of a word as the negative 
log likelihood of a word in a corpus. The 
second measurement is TF*IDF (Term Fre- 
quency times Inverse Document Frequency) 
(Salton, 1989; Salton, 1991), which has been 
widely used :to quantify word importance in 
information retrieval tasks. Both IC and 
TF*IDF are well established measurements 
of informativeness and therefore, good can- 
didates to investigate. Our empirical study 
shows that word informativeness not only 
is closely related to word accentuation, but 
also provides new power in pitch accent pre- 
diction. Our results suggest that informa- 
tion content is a valuable feature to be in- 
coiporated in speech synthesis systems. 
In the following sections, we first define IC 
and TF*IDF. Then, a description of the cor- 
pus used in ~this study is provided. We then 
describe a Set of experiments conducted to 
study the relation between informativeness 
and pitch accent. We explain how machine 
learning techniques are used in the pitch ac- 
cent modeling process. Our results show 
that: 
• Both iC and TF*IDF scores are 
strongly correlated with pitch accent as- 
signment. 
• IC is a more powerful predictor than 
TF*IDF. 
• IC provides better prediction power in 
pitch accent prediction than previous 
techniques. 
The investigated pitch accent models can be 
easily adopted by speech synthesis systems. 
2 Definitions of IC and 
TF*IDF 
Following the standard definition in infor- 
mation theory (Shannon, 1948; Fano, 1961; 
Cover and Thomas, 1991) the IC of a word 
is IC(w) 
= -log(P(w)) 
where P(w) is the probability of the word 
w appearing in a corpus and P(w) is esti- 
matted as: _~2 where F(w) is the frequency 
of w in the corpus and N is the accumula- 
tive occurrence of all the words in the cor- 
pus. Intuitively, if the probability of a word 
increases, its informativeness decreases and 
therefore it is less likely to be an information 
focus. Similarly, it is therefore less likely to 
be communicated with pitch prominence. 
TF*IDF is defined by two components 
multiplied together. TF (Term Frequency) 
is the word frequency within a document; 
IDF (Inverse Document Frequency) is the 
logarithm of the ratio of the total number of 
documents to the number of documents con- 
taining the word. The product of TF*IDF is 
higher if a word has a high frequency within 
the document, which signifies high impor- 
tance for the current document, and low dis- 
persion in the corpus, which signifies high 
specificity. In this research, we employed 
a variant of TF*IDF score used in SMART 
(Buckley, 1985), a popular information re- 
trieval package: 
(TF*IDF)~o,,d; = 
N (1.0 + log F,o,,dj) log N,o~ 
I M N 2 E ((1"0 + log F~ok,dj ) log ~) 
k=l 
where F,~.dj is the the frequency of word wi 
in document dj, N is the total number of 
149 
documents, Nw~ is the number of documents 
containing word w~ and M is the number of 
distinct stemmed words in document dj. 
IC and TF*IDF capture different kinds of 
informativeness. IC is a matrix global in 
the domain of a corpus and each word in 
a corpus has a unique IC score. TF*IDF 
captures the balance of a matrix local to a 
given document (TF) and a matrix global 
in a corpus (IDF). Therefore, the TF*IDF 
score of a word changes from one document 
to another (different TF). However, some 
global features are also captured by TF*IDF. 
For example, a common word in the domain 
tends to get low TF*IDF score in all the doc- 
uments in the corpus. 
3 Corpus Description 
In order to empirically study the relations 
between word informativeness and pitch ac- 
cent, we use a medical corpus which includes 
a speech portion and a text portion. The 
speech corpus includes fourteen segments 
which total about 30 minutes of speech. The 
speech was collected at Columbia Presbyte- 
rian Medical Center (CPMC) where doctors 
informed residents or nurses about the post- 
operative status of a patient who has just un- 
dergone a bypass surgery. The speech corpus 
was transcribed orthographically by a medi- 
cal professional and is also intonationally la- 
beled with pitch accents by a ToBI (Tone 
and Break Index) (Silverman et al., 1992; 
Beckman and Hirschberg, 1994) expert. The 
text corpus includes 1.24 million, 2,422 dis- 
charge summaries, spanning a larger group 
of patients. The majority of the patients 
have also undergone cardiac surgery. The or- 
thographic transcripts as well as the text cor- 
pus are used to calculate the IC and TF*IDF 
scores. First, all the words in the text cor- 
pus as well as the speech transcripts are pro- 
cessed by a stemming model so that words 
like "receive" and "receives" are treated as 
one word. We employ a revised version of 
Lovins' stemming algorithm (Lovins, 1968) 
which is implemented in SMART. Although 
the usefulness of stemming is arguable, we 
choose to use stemming because we think 
"receive" and "receives" are equally likely 
to be accented. Then, IC and TF*IDF are 
calculated. After this, the effectiveness of 
informativeness in accent placement is veri- 
fied using the speech corpus. Each word in 
the speech corpus has an IC score, a TF*IDF 
score, a part-of-speech (POS) tag and a pitch 
accent label. Both IC and TF*IDF are used 
to test the correlation between informative- 
ness and accentuation. POS is also investi- 
gated by several machine learning techniques 
in automatic pitch accent modeling. 
4 Experiments 
We conducted a series of experiments to 
determine whether there is a correlation 
between informativeness and pitch accent 
and whether informativeness provides an im- 
provement over other known indicators on 
pitch accent, such as part-of-speech. We ex- 
perimented with different forms of machine 
learning to integrate indicators within a sin- 
gle framework, testing whether rule induc- 
tion or hidden Markov modeling provides a 
better model. 
4.1 Ranking Word Informativeness 
in the Corpus 
Table 1 and 2 shows the most and least in- 
formative words in the corpus. The IC order 
indicates the rank among all the words in the 
corpus, while TF*IDF order in the table in- 
dicates the rank among the words within a 
document. The document was picked ran- 
domly from the corpus. In general, most 
of the least informative words are function 
words, such as "with" or "and". However, 
some content words are selected, such as "pa- 
tient", "year", "old". These content words 
are very common in this domain and are 
mentioned in almost all the documents in the 
corpus. In contrast, the majority of the most 
informative words are content words. Some 
of the selections are less expected. For ex- 
ample "your" ranks as the most informative 
word in a document using TF*IDF. This in- 
dicates that listeners or readers are rarely ad- 
dressed in the corpus. It appears only once 
in the entire corpus. 
150 
Rank 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
ICMost Informative IC Least Informative 
Words IC Words IC 
zophrin 
namel 
xyphoid 
wytensin 
pyonephritis 
orobuccal 
tzanck 
synthetic 
Rx 
quote 
14.02725 
14.02725 
14.02725 
14.02725 
14.02725 
14.02725 
14.02725 
14.02725 
14.02725 
14.02725 
with 
on 
patient 
in 
she 
he 
for 
no 
day 
had 
4.08777 
4.20878 
4.26354 
4.35834 
4.52409 
4.52918 
4.66436 
4.69019 
4.78832 
4.98343 
Rank 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
Table I: IC Most and Least informative words 
TF*IDF Most Informative 
Words TF*IDF 
your 
vol 
tank 
sonometer 
papillary 
pancuronium 
name2 
name3 
incomplete 
yes 
0.15746 
0.15238 
0.15238 
0.15238 
0.15238 
0.15238 
0.15238 
0.15238 
0.14345 
0.13883 
TF*IDF Least Informative 
Words TF*IDF 
and 0.00008 
a 0.00009 
the 0.00009 
to 0.00016 
was 0.00020 
of 0.00024 
with 0.00034 
in 0.00041 
old 0.00068 
year 0.00088 
Table 2: TF*IDF Most and Least informative words 
4.2 Testing the Correlation of 
Informativeness and Accent 
Prediction 
In order to verify whether word informa- 
tiveness is correlated with pitch accent, we 
employ Spearman's rank correlation coeffi- 
cient p and associated test (Conover, 1980) 
to estimate the correlations between IC and 
pitch prominence as well as TF*IDF and 
pitch prominence. As shown in Table 3, 
both IC and TF*IDF are closely correlated 
to pitch accent with a significance level p = 
2.67.10 -85 and p = 2.90. i0 TM respectively. 
Because the! correlation coefficient p is pos- 
itive, this indicates that the higher the IC 
and TF*IDF are, the more likely a word is 
to be accented. 
4.3 Learning IC and TF*IDF Accent 
Models 
The correlation test suggests that there is 
a strong connection between informativeness 
and pitch accent. But we also want to show 
how much performance gain can be achieved 
by adding this information to pitch accent 
models. To study the effect of TF*IDF and 
IC on pitch accent, we use machine learning 
techniques to learn models that predict the 
151 
Feature Correlation Coefficient Significance Level 
TF*IDF p -- 0.29 p = 2.67.10 -65 
IC p = 0.34 p = 2.90.10 TM 
Table 3: The Correlation of Informativeness and Accentuation 
effect of these indicators on pitch accent. We 
use both RIPPER (Cohen, 1995) and Hid- 
den Markov Models (HMM) (Rabiner and 
Juang, 1986) to build pitch accent models. 
RIPPER is a system that learns sets of clas- 
sification rules from training data. It auto- 
matically selects rules which maximize the 
information gain and employs heuristics to 
decide when to stop to prevent over-fitting. 
The performance of RIPPER is compara- 
ble with most benchmark rule induction sys- 
tems such as C4.5 (Quinlan, 1993). We train 
RIPPER on the speech corpus using 10- 
fold cross-validation, a standard procedure 
for training and testing when the amount 
of data is limited. In this experiment, the 
predictors are IC or TF*IDF, and the re- 
sponse variable is the pitch accent assign- 
ment. Once a set of RIPPER rules are ac- 
quired, they can be used to predict which 
word should be accented in a new corpus. 
HMM is a probability model which has 
been successfully used in many applications, 
such as speech recognition (Rabiner, 1989) 
and part-of-speech tagging (Kupiec, 1992). 
A HMM is defined as a triple: )~=(A, B, H) 
where A is a state transition probability ma- 
trix, B is a observation probability distribu- 
tion matrix, and H is an initial state distribu- 
tion vector. In this experiment, the hidden 
states are the accent status of words which 
can be either "accented" or "not accented". 
The observations are IC or TF*IDF score 
of each word. Because of the limitation of 
the size of the speech corpus, we use a first- 
order HMM where the following condition is 
assumed: 
P( Qt+I -~i\[Qt = j, Qt-1 = k, . . . Q1 =n) = 
P(Qt+i =ilQt=j) 
where Qt is the state at time t. Because 
we employ a supervised training process, 
no sophisticated parameter estimation pro- 
cedure, such as the Baum-Welch algorithm 
(Rabiner, 1989) is necessary. Here all the 
parameters are precisely calculated using the 
following formula: 
A = {c~j :i = 1,..,N,j = 1,..,N} 
F(Qt-1--i, Qt=j) 
o~i3 -- F(Qt = j) 
B = (l~m: j = 1,..,N,m = 1,.., M} 
Zjm = F(Qt = j, = m) 
F(Qt = j) 
R = i= 1,..,N} 
F(Qi =i) 
~'i= F(Qi) 
where N is the number of hidden states and 
M is the number of observations. 
Once all the parameters of a HMM are set, 
we employ the Viterbi algorithm (Viterbi, 
1967; Forney, 1973) to find an optimal accen- 
tuation sequence which maximizes the pos- 
sibility of the occurrence of the observed IC 
or TF*IDF sequence given the HMM. 
Both RIPPER and HMM are widely ac- 
cepted machine learning systems. However, 
their theoretical bases are very different. 
HMM focuses on optimizing a sequence of 
accent assignments instead of isolated accent 
assignment. By employing both of them, we 
want to show that our conclusions hold for 
both approaches. Furthermore, we expect 
HMM to do better than RIPPER because 
the influence of context words is incorpo- 
rated. 
We use a baseline model where all words 
are assigned a default accent status (ac- 
cented). 52% of the words in the corpus 
are actually accented and thus, the baseline 
has a performance of 52~0. Our results in 
152 
Models HMM Performance RIPPER Performance 
Baseline 52.02% 52.02% 
TF*IDF Model 67.25% 65.66% 
IC Model 71.96% 70.06% 
TaBle 4: Comparison of IC, TF*IDF model with the baseline model 
Table 4 show that when TF*IDF is used 
to predict pitch accent, performance is in- 
creased over the baseline of 52% to 67.25% 
and 65.66 °7o for HMM and RIPPER respec- 
tively. In the IC model, the performance 
is further increased to 71.96% and 70.06%. 
These results 'are obtained by using 10-fold 
cross-validation. We can draw two conclu- 
sions from the results. First, both IC and 
TF*IDF are very effective in pitch accent 
prediction. All the improvements over the 
baseline model are statistically significant 
with p < i.Iii. 10 -16 1, using X 2 test (Fien- 
berg, 1983; Fleiss, 1981). Second, the IC 
model is more powerful than the TF*IDF 
model. It out performs the TF*IDF model 
with p = 3.8.10 -5 for the HMM model and 
p = 0.0002 for the RIPPER model. The low 
p-values show the improvements achieved by 
the IC models are significant. Since IC per- 
forms better than TF*IDF in pitch accent 
prediction, we choose IC to measure infor- 
mativeness in all the following experiments. 
Another observation of the results is that the 
HMM models do show some improvements 
over the RIPBER models. But the difference 
is marginal. More data is needed to test the 
significance of the improvements. 
4.4 Incorporating IC in Reference 
Accent Models 
In order to show that IC provides additional 
power in predicting pitch accent than cur- 
rent models, we need to directly compare 
the influence of IC with that of other ref- 
erence models. In this section, we describe 
experiments that compare IC alone against 
iS reports p=0 because of underflow. The real p 
value is less than I.ii • 10 -16, which is the smallest 
value the comptlter can represent in this case 
a part-of-speech (POS) model for pitch ac- 
cent prediction and then compare a model 
that integrates IC with POS against the POS 
model. Finally, anticipating the possibility 
that other features within a traditional TTS 
in combination with POS may provide equal 
or better performance than the addition of 
IC, we carried out experiments that directly 
compare the performance of Text-to-Speech 
(TTS) synthesizer alone with a model that 
integrates TTS with IC. 
In most speech synthesis systems, part-of- 
speech (POS) is the most powerful feature 
in pitch accent prediction. Therefore, show- 
ing that IC provides additional power over 
POS is important. In addition to the im- 
portance of POS within TTS for predicting 
pitch accent, there is a clear overlap between 
POS and IC. We have shown that the words 
with highest IC usually are content words 
and the words with lowest IC are frequently 
function words. This is an added incentive 
for comparing IC with POS models. Thus, 
we want to explore whether the new informa- 
tion added by IC can provide any improve- 
ment when both of them are used to predict 
accent assignment. 
In order to create a POS model, we first 
utilize MXPOST, a maximum entropy part- 
of-speech tagger (Ratnaparkhi, 1996) to get 
the POS information for each word. The 
performance of the MXPOST tagger is com- 
parable with most benchmark POS taggers, 
such as Brill's tagger (Brill, 1994). After 
this, we map all the part-of-speech tags into 
seven categories: "noun", "verb", "adjec- 
tive", "adverb", "number", "pronoun" and 
"others". The mapping procedure is con- 
ducted because keeping all the initial tags 
(about 35) will drastically increase the re- 
quirements for the amount of training data. 
Models HMM Performance RIPPER Performance 
IC Model 71.96% 70.06% 
POS Model 71.33% 70.52% 
POS+IC Model 74.06% 73.71% 
Table 5: Comparison of POS+IC model with POS model 
Models HMM Performance RIPPER Performance 
TTS Model 71.75% 71.75% 
TTS+IC Model 72.30% 72.75% 
POS+IC Model 74.06% 73.71% 
Table 6: Comparison of TTS+IC model with TTS model 
The obtained POS tag is the predictor in the 
POS model. As shown in table 5, the perfor- 
mance of these two POS models are 71.33% 
and 70.52% for HMM and RIPPER respec- 
tively, which is comparable with that of the 
IC model. This comparison further shows 
the strength of IC because it has similar 
power to POS in pitch accent prediction and 
it is very easy to compute. When the POS 
models are augmented with IC, the POS+IC 
model performance is increased to 74.06% 
and 73.71% respectively. The improvement 
is statistically significant with p -- 0.015 for 
HMM model and p = 0.005 for RIPPER 
which means the new information captured 
by IC provides additional predicting power 
for the POS+IC models. These experiments 
produce new evidence confirming that IC is 
a valuable feature in pitch accent modeling. 
We also tried another reference model, 
Text-to-Speech (TTS) synthesizer output, to 
evaluate the results. The TTS pitch ac- 
cent model is more comprehensive than the 
POS model. It has taken many features into 
consideration, such as discourse and seman- 
tic information. It is well established and 
has been evaluated in various situations. In 
this research, we adopted Bell Laboratories' 
TTS system (Sproat, 1997; Olive and Liber- 
man, 1985; Hirschberg, 1990). We run it on 
our corpus first to get the TTS pitch accent 
assignments. Comparing the TTS accent 
assignment with the expert accent assign- 
ment, the TTS performance is 71.75% which 
is statistically significantly lower than the 
HMM POS+IC model with p = 0.039. We 
also tried to incorporate IC in TTS model. 
A simple way of doing this is to use the 
TTS output and IC as predictors and train 
them with our data. The obtained TTS+IC 
models achieve marginal improvement. The 
performance of TTS+IC model increases to 
72.30% and 72.75% for HMM and RIPPER 
respectively, which is lower than that of the 
POS÷IC models. We speculate that this is 
may be due to the corpus we used. The 
Bell Laboratories' TTS pitch accent model 
is trained in a totally different domain, and 
our medical corpus seems to negatively affect 
the TTS performance (71.75% compared to 
around 80%, its normal performance). Since 
the TTS+IC models involve two totally dif- 
ferent domains, the effectiveness of IC may 
be compromised. If this assumption holds, 
we think that the TTSwIC model will per- 
form better when IC is trained together with 
the TTS internal features on our corpus di- 
rectly. But since this requires retraining a 
TTS system for a new domain and it is very 
hard for us to conduct such an experiment, 
no further comparison was conducted to ver- 
ify this assumption. 
Although TF*IDF is less powerhfl than IC 
in pitch accent prediction, since they mea- 
sure two different kinds of informativeness, 
it is possible that a TF*IDF+IC model can 
154 
! 
perform better than the IC model. Similarly, 
if TF*IDF is incorporated in the POS÷IC 
model, the overall performance may increase 
for the POS+IC+TF*IDF model. How- 
ever, our experiment shows no improvements 
when TF*IDF is incorporated in the IC and 
POS+IC model. Our experiments show that 
IC is always the dominant predictor when 
both IC and TF*IDF are presented. 
5 Related Work 
Information based approaches were applied 
in some natural languages applications be- 
fore. In (Resnik, 1993; Resnik, 1995), IC 
was used to measure semantic similarity be- 
tween words and it is shown to be more 
effective than traditional measurements of 
semantic distance within the WordNet hi- 
erarchy. A similar log-based information- 
like measurement was also employed in (Lea- 
cock and Chodorow, 1998) to measure se- 
mantic similarity. TF*IDF scores are mainly 
used in keyword-based information retrieval 
tasks. For example, TF*IDF has been used 
in (Salton, :1989; Salton, 1991) to index 
the words ini a document and is also imple- 
mented in SMART (Buckley, 1985) which 
is a general;-purpose information retrieval 
package, providing basic tools and libraries 
to facilitate information retrieval tasks. 
Some early work on pitch accent predic- 
tion in speech synthesis only uses the dis- 
tinction between content words and function 
words. Although this approach is simple, it 
tends to assign more pitch accents than nec- 
essary. We also tried the content/function 
word model on our corpus and as expected, 
we found it to be less powerful than the part- 
of-speech model. More advanced pitch ac- 
cent models make use of other information, 
such as part-of-speech, given/new distinc- 
tions and contrast information (Hirschberg, 
1993). Semantic information is also em- 
ployed in predicting accent patterns for com- 
plex nominal phrases (Sproat, 1994). Other 
comprehensive pitch accent models have 
been suggested in (Pan and McKeown, 1998) 
in the framework of Concept-to-Speech gen- 
eration where the output of a natural lan- 
guage generation system is used to predict 
pitch accent. 
6 Discussion 
Since IC is not a perfect measurement of in- 
formativeness, it can cause problems in ac- 
cent prediction. Moreover, even if a perfect 
measurement of informativeness is available, 
more features may be needed in order to 
build a satisfactory pitch accent model. In 
this section, we discuss each of these issues. 
IC does not directly measure the informa- 
tiveness of a word. It measures the rarity of a 
word in a corpus. That a word is rare doesn't 
necessarily mean that it is informative. Se- 
mantically empty words can be ranked high 
using IC as well. For example, CABG is a 
common operation in this domain. "CABG" 
is almost always used whenever the opera- 
tion is mentioned. However, in a few in- 
stances, it is referred to as a "CABG oper- 
ation". As a result, the semantically empty 
word (in this context) "operation" gets a 
high IC score and it is very hard to distin- 
guish high IC scores resulting from this sit- 
uation from those that accurately measure 
informativeness and this causes problems in 
precisely measuring the IC of a word. Simi- 
larly, misspelled words also can have high IC 
score due to their rarity. 
Although IC is not ideal for quantifying 
word informativeness, even with a perfect 
measurement of informativeness, there are 
still many cases where this information by 
itself would not be enough. For example, 
each word only gets a unique IC score re- 
gardless of its context; yet it is well known 
that context information, such as given/new 
and contrast, plays an important role in ac- 
centuation. In the future, we plan to build a 
comprehensive accent model with more pitch 
accent indicators, such as syntactic, seman- 
tic and discourse features. 
7 Conclusion 
In this paper, we have provided empirical ev- 
idence for the usefulness of informativeness 
for accent assignment. Overall, there is a 
155 
positive correlation between indicators of in- 
formativeness, such as IC and TF*IDF, and 
pitch accent. The more informative a word 
is, the more likely that a pitch accent is as- 
signed to the word. Both of the two measure- 
ments of informativeness improve over the 
baseline performance significantly. We also 
show that IC is a more powerful measure of 
informativeness than TF*IDF for pitch ac- 
cent prediction. Later, when comparing IC- 
empowered POS models with POS models, 
we found that IC enables additional, statis- 
tically significant improvements for pitch ac- 
cent assignment. This performance also out- 
performs the TTS pitch accent model signif- 
icantly. Overall, IC is not only effective, as 
shown in the results, but also relatively in- 
expensive to compute for a new domain. Al- 
most all speech synthesis systems, text-to- 
speech as well as concept-to-speech systems, 
can employ this feature as long as there is 
a large corpus. In the future, we plan to 
explore other information content measure- 
ments and incorporate them in a more com- 
prehensive accent model with more discourse 
and semantic features included. 
8 Acknowledgement 
Thanks to Julia Hirschberg, Vasileios Hatzi- 
vassiloglou and James Shaw for comments 
and suggestions on an earlier version of this 
paper. Thanks to Desmand Jordan for help- 
ing us with the collection of the speech and 
text corpus. This research is supported 
in part by the National Science Founda- 
tion under Grant No. IRI 9528998, the 
National Library of Medicine under project 
R01 LM06593-01 and the Columbia Univer- 
sity Center for Advanced Technology in High 
Performance Computing and Communica- 
tions in Healthcare (funded by the New York 
State Science and Technology Foundation). 

References 

Mary Beckman and Julia Hirschberg. 1994. 
The ToBI annotation conventions. Tech- 
nical report, Ohio State University, 
Columbus. 

Dwight Bolinger. 1972. Accent is pre- 
dictable (if you're a mind-reader). Lan- 
guage, 48:633-644. 

Joan Bresnan. 1971. Sentence stress 
and syntactic transformations. Language, 
47:257-280. 

Eric Brill. 1994. Some advances in rule- 
based part of speech tagging. In Proceed- 
ings of the 12th National Conference on 
Artificial Intelligence. 

Chris Buckley. 1985. Implementation of 
the SMART information retreival system. 
Technical Report 85-686, Cornell Univer- 
sity. 

William Cohen. 1995. Fast effective rule in- 
duction. In Proceedings of the 12th In- 
ternational Conference on Machine Learn- 
ing. 

W. J. Conover. 1980. Practical Nonparam- 
etic Statistics. Wiley, New York, 2nd edi- 
tion. 

Thomas M. Cover and Joy A. Thomas. 1991. 
Elements of Information Theory. Wiley, 
New York. 

Robert M. Fano. 1961. Transmission of 
Information: A Statistical Theory of 
Communications. MIT Press, Cambridge, 
Massachusetts. 

Stephen E. Fienberg. 1983. The Analysis 
of Cross-Classified Categorical Data. MIT 
Press, Cambridge, Mass, 2nd edition. 

Joseph L. Fleiss. 1981. Statistical Methods 
for Rates and Proportions. Wiley, New 
York, 2nd edition. 

G. David Forney. 1973. The Viterbi algo- 
rithm. Proceedings of IEEE, 61(3). 

Julia Hirschberg. 1990. Assigning pitch ac- 
cent in synthetic speech: The given/new 
distinction and deaccentability. In Pro- 
ceedings of the Seventh National Confer- 
ence of American Association of Artificial 
Intelligence, pages 952-957, Boston. 

Julia Hirschberg. 1993. Pitch accent in con- 
text: predicting intonational prominence 
from text. Artificial Intelligence, 63:305-340. 

Julian Kupiec. 1992. Robust part-of-speech 
tagging using a hidden markov model. 
Computer Speech and Language, 6(3):225- 
242, July. 

D. Robert Ladd. 1996. Intonational Phonol- 
ogy. Cambridge University Press, Cam- 
bridge. 

Claudia Leacock and Martin Chodorow. 
1998. Combining local context and Word- 
Net similai:ity for word sense identifi- 
cation. In Christiane Fellbaum, editor, 
WordNet: An electronic lexical database, 
chapter 11. MIT Press. 

Julie Beth Lovins. 1968. Development of 
a stemming algorithm. In Mechanical 
Translation and Computational Linguis- 
tics, volume 11. 

Joseph. P. Olive and Mark Y. Liberman. 
1985. Text to Speech--An overview. 
Journal of lthe Acoustic Society of Amer- 
ica, 78(Fall~ :s6. 

Shimei Pan and Kathleen R. McKeown. 
1998. Learning intonation rules for con- 
cept to speech generation. In Proceedings 
of COLING/A CL '98, Montreal, Canada. 

John R. Quinlan. 1993. C4.5: Programs for 
Machine Learning. Morgan Kaufmann, 
San Mateo. 

Lawrence R. Rabiner and B. H. Juang. 1986. 
An introduction to hidden Markov rood- 
els. IEEE ASSP Magazine, pages 4-15, 
January. 

Lawrence R. Rabiner. 1989. A tutorial on 
hidden Ma~:kov models and selected appli- 
cations in speech recognition. Proceedings 
of the IEEE, 77(2):257-286. 

Adwait Ratnaparkhi. 1996. A maximum en- 
tropy part iof speech tagger. In Eric Brill 
and Kenneth Church, editors, Conference 
on Empirical Natural Language Process- 
ing. Univ. of Pennsylvania. 

Philip Resnik. 1993. Semantic classes and 
syntactic ambiguity. In Proc. of ARPA 
Workshop on Human Language Technol- 
ogy, pages 278-283. Morgan Kaufmann 
Publishers. 

Philip Resnik. 1995. Using information con- 
tent to evaluate semantic similarity in 
a taxonomy. In Proceedings of the 14th 
Internatioi~al Joint Conference on Artifi- 
cial Intelligence, pages 448-453, Montreal, 
Canada. 

Gerard Salton. 1989. Automatic Text Pro- 
cessing: The Transformation, Analysis, 
and Retrieval of Information by Com- 
puter. Addison-Wesley, Reading, Mas- 
sachusetts. 

Gerard Salton. 1991. Developments in auto- 
matic text retrieval. Science, 253:974-980, 
August. 

Jan P. H. Van Santen. 1992. Contextual el- 
fects on vowel duration. Speech Commu- 
nicatio n, 11:513-546, January. 

Claude E. Shannon. 1948. A mathemati- 
cal theory of communication. Bell System 
Technical Journal, 27:379-423 and 623- 
656, July and October. 

Kim Silverman, Mary Beckman, John 
Pitrelli, Mari Ostendorf, Colin Wightman, 
Patti Price, Janet Pierrehumbert, and Ju- 
lia Hirschberg. 1992. ToBI: a standard for 
labelling English prosody. In Proceedings 
of ICSLP92, volume 2. 

Kim Silverman. 1987. The structure and 
processing of fundamental frequency con- 
tours. Ph.D. thesis, Cambridge Univer- 
sity. 

Richard Sproat. 1994. English noun- 
phrase accent prediction for Text-to- 
Speech. Computer Speech and Language, 
8:79-94. 

Richard Sproat. 1997. Multilingual Text- 
to-Speech Synthesis: The Bell Labs Ap- 
proach. Kluwer, Boston. 

Andrew J. Viterbi. 1967. Error bound 
for convolutionM codes and an asymp- 
totically optimum decoding algorithm. 
IEEE Transactions in Information The- 
or'y, 13(2). 
