Accenting and Deaccenting: 
a Declarative Approach 
Arthur Dirksen 
Institute for Perception l{esearch/IPO 
1'.O. Box 513, 5600 MB Eindhoven, The Netherlands 
E-m~il: dirksenQheiipo5.bitnet 
1 Introduction 
One of the problems that must be addressed 
by a text-to-speech system is the derivation 
of pitch accent, marking the distinction be- 
tween "given" and "new" information in an 
utterance. This paper discusses a language- 
independent approach to this problem, which 
is based on focus-accent theory (e.g. Ladd 
1978, Gussenhoven 1984, t3aart t987), and 
implemented in my program PRos-a. This 
program has been developed as part of the 
ESPRIT-project POLYGLOT, and provides 
an integrated environment for modelling the 
syntax-to-prosody interface of a multi-lingual 
text-to-speech system. 
The program operates in the following 
manner. First, the input text is parsed using 
a variation of context-free phrase-structure 
rules, attgmented with information about "ar- 
gument" structure of phrases. Next, the syn- 
tactic representation is mapped onto a met- 
rical tree. The metrical tree is then used to 
derive locations for pitch accents, as well as 
phonological and intonational phrase bound- 
aries. 
in this approach, differences between law 
guages are modelled entirely by the syntactic 
rules. Also, the system is strictly declaratiw:, 
in the sense that once a piece of information is 
added by a rule, it is never removed. In this 
respect, our approach differs radically from 
systems which make use of derivational rules 
(e.g. Quend & Kager 1992). Such systems 
tend to become extremely complex, hard to 
verify and almost impossible to maintain or 
extend (Quenb & Dirksen 1990, Dirksen & 
Quen6 in press). By contrast, in PROS-3 
there is a conspicuous relation between theory 
and implementation, attd the program can be 
extended in a number of ways) 
Below, 1 will focus on two major rules 
from focus-accent theory: Default Accent and 
l/.hythn~ic Deaccenting. The tirst rule is used 
to model deaccenting of "given" information, 
e.g. the pronouns it, her and cs in the English, 
l)utch and German sentences of (1), (2) and 
(3), respectively. 
(1)a I should have read a BOOK 
b I should have READ it 
(2)a ik had een BOEK moeten 1.ezen 
b ik had her moeten LEZEN 
(3)a ich hatte ein BUCII lesen sollen 
b ich hatte es LESEN sollen 
The second rule is used to provide rtlythmi- 
cal alternations between accented and deae- 
cented material in certain well-defined con- 
texts, as is illustrated by the sentences of (4). 
(4)a she is a NICE GIRL 
1One extension we `are currently considering is the 
`addition of some kind of discourse model (`along the 
lines of Hirschberg 1990) to more adequately model 
the "given new" distinction. Also, some prelimin,ary 
work tt,as been done on phonological p,arsing (e.g. 
Coleman 1990, 1991; see ,also his paper in this vol- 
ume) to derive word stress `and temporal structure of 
words. 
ACTES DE COLING-92, NA1VH~S. 23-28 AOUr 1992 8 6 5 I'ROC. OF COLING-92, NANTES, AU6.23-28, 1992 
b she is REALLY NICE 
c she is a REALLY nice GIRL 
d she is REALLY a NICE GIRL 
This paper is organized as follows. Section 
2 briefly introduces focus-accent theory and 
how it may be implemented. Next, sections 3 
and 4 discuss Default Accent and Rhythmic 
Deaccenting, respectively. In section 5, we 
make some concluding remarks. 
2 From Focus to Accent 
In focus-accent theory, metrical trees are 
used to represent relative prominence of 
nodes with respect to pitch accent. Whether 
a given node is accented or not is accounted 
for in terms of the focus/non-focus distinc 
tion. 
For example, a pitch accent on book in the 
phrase read a book may be accounted for by 
assuming the metrical structure (5). 
(s) +F 
/\ 
w s 
/\ 
w s 
read a book 
In (5), the entire phrase is marked 
+F(ocus), iodicating that it is to be inter- 
preted as a "new" or otherwise important ad- 
dition to the discourse. The relation between 
the focus-marker and a pitch accent on book is 
mediated by the labels w(eak) and s(trong), 
and formally accounted for by the following 
recursivc rule: 2 
Accent Rule 
For each node X, X is accented if 
a. X is marked +F, or 
b. X is strong, and the node immediately 
dominating X is accented. 
2By convention, only weak or root nodes are 
marked +F, thus indicating the upper bound of what 
is sometimes called the "focus set". 
Baart (1987) assumes that the metrical la- 
beling of a structure is determined by syn- 
tactic/thematic properties of phrases such 
as specification and complementation. More 
generally, we assume that "arguments" which 
are not deaccented are strong. For example, 
in (i) the NP a book is an argument of the 
verb read. Also, a determiner takes a noun 
as an argument. In a PROS-3 grammar, one 
must make this explicit by writing rules such 
as those in (6). 
(6)a VP -> (V/NP) (English) 
b VP -> (NP\V) (Dutch/German) 
c NP -> (Det/N) 
in such rules, (X/Y) or (Y\X) serves to 
indicate that Y is an argument of X. If 
we ignore deacccenting, argument structure 
directly determines the geometrical proper- 
ties of the metrical tree, and we may read 
(X/Y) or (Y\X) as weak-stron.q or strong- 
weak, respectively. 3 
Also, a PROS-3 grammar must indicate 
which nodes are eligible for focus (normally, 
all major phrasal categories). If a node is el- 
igible for focus, it must either be accented or 
deaccented. Words which are typically deac- 
cented are specified as such in a lexicon. 
In our implementation, a binary-branching 
metrical tree is used as the central data- 
structure, and the relation between focus and 
accent is defined by using sharing variables, 
which may becmne instantiated to a value 
"true" (:=accented)or "false" (=&accented), 
or remain unspecified (=not accented). The 
following definitions are used to implement 
accenting: 4 
accented(X) :- 
X:accent === true. 
3Even though metrical trees are strictly binary- 
branching, nnflti-branehing are accommodated by al- 
lowing rules such as S ~ (NP/(Infl/VP)). 
4The notation has been borrowed from Gazdar & 
Mellish 1989; '=--=' is the unification operator, and 
Node:Attr indicates a path in a graph (or a field iu a 
record). We assume negation by failure as in standard 
Prolog implementations. 
ACRES DE COLING-92, NANTES, 23-28 ^ol~r 1992 8 6 6 PRec. oF COLING-92. NANTES, AUG. 23-28, 1992 
strong(X, Y) :-" 
X:accent === Y:accent. 
deaccented(X) i- 
not accented(X). 
focus(X) :- 
accented(X); 
deaccented(X). 
The statement accented(X) may be used 
to assign accent to a node, or to verify 
that the node is acccented. The state- 
rnent strong (X, Y), which reads "the strong 
node of X is Y" implements condition b of 
the Accent Rule above by unifying the val- 
ues for accent of X and Y. The statement 
deaccented(X) succeeds if tile value for ac- 
cent of X is instantiated to "false", and fails 
otherwise, so it may be used as a test. Simi. 
larly, the statement not deaccented(X) may 
be used to test whether it might be possible 
to assign accent to X, but will not instantiate 
any values. Finally, the statement :focus(X) 
is used to assign accent to those nodes marked 
by the grammar writer as "eligible for focus", 
unless they have been deacccented. 
3 Default Accent 
Consider again the sentences in (1), (2) and 
(3), and observe that when the NP the book 
is replaced by the pronoun it, pitch accent 
appears to "shift" from the NP to the most 
deeply embedded verb, read, of which it is 
an argument. Any differences between En 
glish, Dutch and German seem to be strictly 
a matter of syntax. Assuming appropriate 
phrase-structure rules, such as (6)a and b, 
this is reflected in the corresponding metri- 
cal tree. The metrical structure of the verb- 
phrase of (1)a, is a strictly right-branching 
structure which is uniformly labeled as weak- 
strong. The metrical trees corresponding to 
the verb phrases of (2)a and (3)a, shown in 
(7) and (8), a~e les~ uniform. 
(7) / \ 
s w 
I\ I \ 
I4 S l~ S 
...con boek moeten lezen 
(8) / \ 
s 
I\ I \ 
W S S W 
...ein Buch lesen sollen 
In order to account for the b-sentences of 
(1), (2) and (3), in which a (deaccented) pro- 
noun replaces NP, it seems that all that is 
needed is a reversal of the weak-strong label- 
ing of the VP-node. To this end, Baart (1987) 
assumes the following rule: 
DEFAULT ACCENT 
a I\ => I\ 
W S S W 
A B A B 
b /\ => /\ 
S W W S 
B A B A 
Condition: B is deaccented 
In PROS-3, this rule is implemented as a 
filter, called STP, wtfich takes as input a syn- 
tactic structure assigned by the parser, and 
produces as output a metrical tree. A typical 
invocation might be: 
VP->(V/NP) => Prosody, 
focus (VP). 
Using the definitions of section 2, STP is 
defined by the following set of rules: ~ 
~'l'ake note that we are rather frivolous in using 
the slash-notation to encode both argument structure 
and metrical structure, though, of course, the two are 
distinct. That is, the metrical tree does not replace 
argument structure, but is merely its realization in 
the domain of sentence prosody. 
AcrEs DE COLING-92, NAlVl'l!s, 23-28 AO\[~-r 1992 8 6 7 PRec. OF COLING-92, NANTES, AUG. 23-28, 1992 
STP 
z->(x/Y) => z->(x\Y) :- 
deaccented(Y), 
strong(Z, X). 
b Z->(X/Y) => Z->(X/Y) :- 
not deaccen%ed(Y), 
strong(Z, Y). 
c Z->(Y\X) => Z->(Y/X) :- 
deaccented(Y), 
strong(Z, X). 
Z->(Y\X) => Z->(YkX) :- 
not deaccented(Y), 
strong(Z, Y). 
Cases a and c implement Default Accent, 
whereas b and d represent the "normal" case. 
4 Rhythmic Deaccenting 
Rhythmic factors provide a second source 
of deaccenting phenomena. They apply to 
structures such as (9), representing (4)c from 
section 1, and (10), representing the Dutch 
sentence "er is op VEEL plaatsen REGEN 
voorspeld" (there is in MANY places RAIN 
predicted), meaning: it has been predicted 
that it will rain in many places. 
(9) / \ 
w s 
/ \ 
w s 
really nice girl 
(10) / \ 
w s 
/ \ 
w s 
/ \ 
w s 
op reel plaatsen regen 
Although the pitch accent patterns implied 
by these structures are well-formed, there is a 
strong preference for deaceenting nice in (9) 
and plaatsen in (10). In order to account for 
these phenomena, we assume the following 
optional rule (adapted from Baart 1987): 
RHYTHM RULE 
/ \ => I \ 
w s w s 
I\ I\ 
(w s) (w s) 
/\ /1 
W S S W 
A B C h B C 
In this rule, brackets indicate a substruc- 
ture which may repeated zero or more times. 
A further requirement is that nodes A, B and 
C are not deaccented. 
The Rhythm Rule differs from Default Ac- 
cent in that it is not a local rule: its struc- 
tural change, tile weak-strong reversal of A 
and B, is dependent ell the presence of a node 
C whose weak sister-node dominates A and 
B in a rather complex manner. One way to 
implement such context-sensitive rules in a 
declarative framework, is to use feature per- 
colation. Space does not permit us to work 
out the implementation in full detail (there 
are also some additional requirements to be 
met), but the following should give the reader 
some idea. 
First, we add a new case to the STP-filter 
above, implementing the structural change of 
the Rhythm Rule, and marking the resulting 
structure with a feature annotation indicating 
that the Rhythm Rule has "applied": 
Z->(X/Y) => Z->(X\Y) :- 
not deaccented(X), 
not deacccented(Y), 
strong(Z, X), 
Z:rhytlm_rule === true. 
Next, we make sure that this feature is 
percolated upwards in weak-strong configura- 
tions, and blocked wherever necessary in or- 
der to filter out over-generation. 
ACRES DE COLING-92, NANTES, 23-28 AOt3T 1992 8 6 8 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 
5 Conclusion 
As emphasized above, PROS-3 is a language- 
independent system for deriving sentence 
prosody in a text-to-speech system. This is 
true, of course, only to the extent that focus- 
accent theory and its major rules are univer- 
sals of linguistic theory. Clearly, the proof of 
the pudding is in the eating. At IPO, PROS-3 
is currently being evaluated for I)utch, using 
a grammar of about 125 rules ai,d a lexicon 
of some 80,000 word forms derived from the 
CELEX lexical database. Also, we are work- 
ing on grammars and lexicons of comparable 
size and scope for English and German, and 
PROS-3 is used in the POLYGLOT-project 
for several F, uropean languages. 
Although preliminary results are encour- 
aging, there are also problems which need 
mention. First, tim focus/non-focus distinc- 
tion is modelled by rather crude heuristics 
(i.e. taking each major phrase as a candi- 
date for focus, deaeeenting of pronouns etc. 
by lexical specification). It would be nice 
if something more flexible and "discourse- 
aware" could be built in. Second, we have 
deliberately kept the PROS-3 grammar for- 
malism rather simple (Mlowing only atomic 
syntactic categories), so we could guarantee 
fairly efficient processing, tlowever, simple 
context-free rules do not disambiguate very 
well. Third, simple rules cannot fully take 
into account verb subcategorization. As a 
result, it is sometimes impossible to make 
the distinction between arguments and non- 
arguments, which is crucial to the metrical 
rules. So, what we need to do, is find an op- 
timal compromise between sophistication of 
syntactic analysis and efficiency of process- 
ing. We think that PROS-3 is the right tool 
to do this. 

Bibliography 
Baart, J.L.G. (1987), Focus, syntax and ac- 
cent placement, Diss. University of Leiden. 
Coleman, J.S. (1990), Unification Phonology: 
another look at "synthesis-by-rule". COL- 
ING 90, Vol. 3, 79-84. ACL. 
(1991), Prosodic structure, parameter- 
setting and ID/LP grammar. S. Bird (ed.), 
Declarative Perspectives on Phonology. Edim 
burgh Working Papers in Cognitive Science, 
Vol. 7, 65-78. 
Dirksen, A. & H. Quen~ (in press), Prosodic 
analysis: the next generation. V.J. van 
Heuven& L. Pols (eds.), Analysis and synthe- 
sis of speech: strategic research towards high- 
quality text-to- speech generation. Mouton de 
Gruyter, Berlin. 
Gazdar, G. & C. Mellish (1989), Natural lan- 
guage processing in prolog: an introduction to 
computational linguistics. Addison-Wesley, 
Workingham. 
Gussenhoven, C. (1984), On the grammar and 
semantzcs of sentence accents. Foris Pnbl., 
1)ordrecht. 
Hirschberg, J. (1990), Accent and discourse 
context: assigning pitch accent in synthetic 
speech, in Proceedings of the IEEE, 73-11, 
1589-1601. 
Ladd, D.R. (1978), The structure of into- 
national meaning. Indiana University press, 
Bloomington. 
Quen~, H. & A. Dirksen (1990), A comparison 
of natural, theoretical and automatically de- 
rived accentuations of Dutch texts. G. Bailly 
C. Benoit (eds.), Proceedings of the ESCA 
workshop on speech synthesis, Autrans, 137- 
140. 
Quend, H. & R. Kager (1992), The derivation 
of prosody for text- to-speech from prosodic 
sentence structure. Computer, speech and 
language, 6, 77-98.
