A Syntactic and Morphological Analyzer for a 
Text-to-Speech System 
Thomas Russi 
Institute of Electronics, Swiss Federal Institute of Technology (ETH) 
CH 8092 Zfirich, Switzerland, e-maih russi@strati.ethz.ch 
Abstract 
This paper presents a system which analyzes an in- 
'put text syntactically and morphologically and con- 
verts the text from the graphemic to the phonetic 
:representation (or vice versa). We describe the gram- 
mar formaSsm used and report a parsing experiment 
which compared eight parsing strategies within the 
:h'amework of chart parsing. Although the morpho- 
logical and syntactic analyzer has been developed for 
a text-to-speech system for German, it is language 
independent and general enough to be used for dia- 
log systems, NL-interfaces or speech recognition sys- 
tems. 
1 Introduction 
\]n order to convert text to speech one must first 
derive an underlying abstract linguistic represen- 
tation for the text. There are at least two rea- 
.,sons why a direct approach (e.g,, letter-to-sound 
rules) is inadequate. Firstly, rules for pronounc- 
ing words must take into consideration morpheme 
:;trueture, e.g., <sch> is pronouced differently in 
the German words "16sch+en" ("extinguish") and 
"Hgs+chen" (diminutive of "trousers"), and syn- 
tactic structure, e.g., to solve noun-verb arnbigui- 
~ies such as "Sucht" ("addiction") and "sucht" ("to 
~earch"). Secondly, sentence duration pattern and 
fundamental frequency contour depend largely on 
t;he structure of the sentence. 
While most commercial, but also some laboratory 
text-to-speech (TTS) systems use letter-to-sound 
rules without taking into account the morphologi- 
cal structure of a word, recently developed systems 
\[1,2,3\] incorporate morphological analysis. Although 
the influence of syntax on prosody is widely acknowl- 
edged \[2,3\], most TTS systems lack syntax analysis 
l.l,3\] or use some kind of phrase-level parsing \[2\] to 
obtain information on the syntactic structure of a 
:;entence. "_~'his is motivated more by current techno- 
logical limitations than by linguistic insights.We are 
convinced that in order to achieve highly intelligible 
and natural~sounding speech, not only the phonologi- 
cal and morphological but also the syntactic, seman- 
~,ic and even discourse structure of a text must be 
~,aken into account - although this is not yet feasible. 
As a step toward such a model, we have developed 
a morphological and syntactic analyzer that is based 
on simple but powerful formalisms which are linguis- 
tically well-motivated and computationMly effective. 
2 Morphological and Syntac- 
tic Analysis 
In our TTS system, morphological analysis consists 
of three stages: segmentation, parsing and genera- 
tion. The segmentation module finds possible ways 
to partition the input string into dictionary entries 
(morphs). Spelling changes, e.g., schwa-insertion or 
elision, are covered by morphographemic rules. The 
parsing module of the morphological analysis uses a 
word grammar to accept or reject combinations of 
dictionary entries and to percolate features from the 
lexicon to the syntactic analyzer. The generation 
module of the Inorphologica\] analysis generates the 
phonetic transcription by concatenating the phonetic 
strings, which are stored as part of each morph entry, 
and by applying morphophonetic rules. The syntac- 
tic analysis is based on a sentence grammar and a 
parser that takes as input the result of the morpho- 
logical analyzer. It assigns to each sentence its sur- 
face syntactic structure. The syntactic structure of 
the sentence and the phonetic transcription of each 
word are used at a later stage to determine prosodic 
features such as duration pattern and flmdamental 
frequency contour. 
3 UTN Formalism 
Morphographemic and morphophonetic rules are 
written ill a Kimmo-style formalism \[4 i. Unlike 
the original two-level model, a word grammar is 
used to parse the lexical strings and to determine 
the category of the overall word formed by several 
morphs..To express word and sentence grammars, 
we have developed a grammar formalism, called 
Unification-based Transition Networks (UTN), I~s 
skeleton are nondeterministic reeursive transition 
networks (RTNs), which are equivalent to comex~- 
free gramnmrs. A transition network speciIies the 
linear precedence and immediate dominance relation 
443 
within a constituent. Each label of a transition de- 
notes a preterminal, a constituent or an e-transition. 
As opposed to labels in RTNs, which are monadic, 
labels in UTNs are complex categories (features ma- 
trices). Each transition contains a set of attribute 
equations, which specify the constraints that must 
be satisfied between complex categories in a network. 
Our notation of attribute equations is very similar 
to that commonly used in unification-based rule for- 
malisms such as PArR \[5\]. The UTN formalism is 
fully declarative. It ix based on concatenation and 
recursion, which is reflected in the topology of the 
networks, and unification, which is used for match- 
ing, equality testing and feature passing. Although 
the UTN formalism is somewhat similar to ATNs 
!6!, it is much more concise and elegant because of 
its simplicity and declarativeness. The implemen- 
tation of several grammars for German syntax and 
morphosyntax revealed that transition networks are 
well-suited to design 1 and test grammars. We be- 
lieve that this formalism meets the general criteria 
of linguistic naturalness and mathematical power. 
in addition, the parsing experiment reported below 
shows that efficient parsers can be implemented for 
the UTN formalism. 
The design of our TTS system requires efIicient pars- 
ing algorithms and a flexible parser environment to 
compare several search and rule invocation strate- 
gies. Active chart parsing 18\] is well-suited for that 
purpose. We have implemented a general chart 
parser that can be parameterized fox several search 
and rule invocation strategies. The aim of the ex- 
periment reported below was to investigate to what 
extent a parser can be directed by using the FIRST, 
FOLLOW and REACHABILITY relations \[9,8} and 
combinations thereof, thereby reducing the nunrber 
of edges, the nmnber of applications of the funda- 
mental rule and parsing time. 
Str AE IE TOT FR TIME 
--B-2-- 89763 l~ 96812_ ~1-~ 
~B3- 123344 ~10080~ 133424 ~_1.~_~__~ 
87 86 60121 I 1.00A 
Table 1: Parsing sentence set SI with grammar GI 
strategy, uses the FIRST relation to test whether 
the next input symbol is in the FIRST set of the ac- 
tive edge each time an empty active edge is created. 
Strategy T3, a top-down strategy with lookahead, 
uses the FOLLOW set to test whether the next im 
put symbol belongs to the FOLLOW set of the im 
active edge each time an inactive edge is created. 
Strategy T4 combines the selectivity of strategy T2 
and lookahead of strategy T3. Strategy B1 imple- 
ments a left-corner algorithm \[19\]. Strategy B2 is ,~ 
left-corner parser directed by a top-down filter based 
on the tlEACHABILITY relation \[10\]. Strategy B3 
implements a left-corner algorithm with lookahead 
similar to that of strategy T3, while strategy B4 adds 
a top-down filter and lookahead to the left-corner al-- 
gorithm. 
4°2 Grammars and :~_?est Sets 
For the experiment presented h~re, we used a gram-- 
mar (GI) for German syntax:" that has been devel- 
oped for our TTS system and a grammar (Eli) for 
English syntax a (GII) to compare our ~esults with 
those of other experiments (\[7,11,10\]). Our s~entence 
sets consist of 35 German sentences (set SI, with an 
average sentence length of 9.8 words) and 39 English 
sentences (set SII, with an average sentence length 
of 15.3 words) from Tomita \[7\], pp. 185-189. 
4.1 Rule Invocation Strategies 
We compared eight parsing strategies, i.e., four top- 
down (T1 to T4) and four bottom-up (El to B4) 
strategies. The top-down strategies are variants of 
Earley's algorithm, the bottom-up strategies vari- 
ants of the left-corner algorithm \[9\]. T1, a pure top- 
down strategy, implements Earley's algorithm with- 
out lookahead. Strategy T2, a directed top-down 
1To compare the UTN formalism with rule-based for- 
malisms, we translated several grammars to transition net- 
works. As an example, the grammar GIII found in Tomita's 
book \[7\] with about 220 rules was translated to a strongly 
equivalent network grarrm~ar of 37 transition networks. We 
got the impression that it is easier to write and modify a net- 
work grammar of several dozen networks (that can be dis- 
played and edited graphically) than one of several hundreds 
of rules. 
4.3 Results 
Tables 1 and 2 show the results of parsing sets SI 
and SII with grammars GI and GII, respectively. We 
measured for all strategies (T1 to B4) the number of 
active (AE) and inactive (IE) edges, the total num- 
ber of edges (TOT = AE+IE) and parsing time 4 
(TIME). Since the UTN formalism is based on uni- 
fication, a time- and space-consuming operation, we 
also indicate the number of applications of the funda- 
mental rule (FR) to show the relation between pars- 
ing strategy and FR applications. 
2This grammar consists of 48 networks, 770 transitions, 
1246 unification equations and describes a substantial part of 
German syntax. 
This grammar is a strongly equivalent, network grammar 
of Tornita's grammar GIII. 
4 Parsing time is indicated relative to the fastest algorithm. 
444 
T1 
T2 
2t23 
T4 
B 1 
B 2 
B3 
B4 
91578 169461 
69160 169461 
76288 138801 
55173 13880 I 
210021 493721 
990{}1 227971 
169299 400221 
84984 194151 
TOT 
108524 
86106 54689 
90168 44226 
69053 44226 
259393 168871 
121798 75509 
209321 138232 
104399 65016 
54689 1.50 I 
1.08 { 
1.35 I 
1.00 I 
3.00 I 
1.56 I 
2.68 I 
1.55 I 
Table 2: Parsing sentence set SII with grammar GII 
Our experiments confirm the results of Shann and 
Wir6n \[10,1111 that parsing efficiency depends heavily 
on the grammar, the language, the grammar formal- 
ism and the sentence set. Nevertheless, by carefully 
tuning a parsing strategy, a significant increase in 
efficiency is gained. 
frndireeted top-down parsing performs better than 
undirected bottom-up. This coincides with the re- 
sults of Wir6n. Directed strategies 5 outperform 
undirected strategies with respect to parsing time 
and memory. This holds for top-down and bottom- 
up strategies. 
Previous experiments \[11,10,7\] did not investi- 
gate the influence of lookahead in top-down parsing. 
However, using lookahead (the FOLLOW relation) 
sig:tfificantly reduces the number of edges, the num- 
bex of applications of the fundamental rule and pars- 
ing time. 
Directed top-down parsing with lookahead is as 
fast as left-corner parsing with top-down fih.ering and 
lookahead. The difference between the two strate- 
gies is statistically insignificant when considering all 
experiments conducted with all German grammars 
and several sentence sets. However, it is uncertain 
to what extent this slatement can be generalized to 
other types of grammars and languages. Ba~sed on 
the results of our experiments, both strategies (T4 
and B4) are suited as main strategies in our TTS 
system. 
5 Concluding Remarks 
We have presented a language~-independenl model 
for syntactic and morphological analysis. Special 
emphasis has been laid on the description of the 
UTN formalism and a parser experiment which com- 
par{'d different rule invocation strategies. The ana- 
lyzer is fully implemented in Common Lisp and its 
application in a text-to-speech system has signitl- 
eantly improved the quality of the synthetic speech. 
Since the grapheme-to-phoneme conversion is bidi- 
reclional, our approach may also be promising for 
speech recognition. 
5The algorithm of TomJta can be considered a maximally 
directed charl-parser that uses the FIRST trod FOLLOW re- 
lation to construct an Lit-table at compile time. 

References 

A. Pounder and M. Kommenda. Morphological 
Analysis for a German Text-to-Speech System. 
In Proceedings of the llth International Confer° 
ence on Computational Linguistics, pages 263- 
268, 1986. 

D.H. Klatt. Review oftext-to.-speech conversion 
for English. Journal of the Acoustical Society of 
America, 82(3):737-793, September 1987. 

W Daelemans. Grafon: A Grapheme-to- 
Phoneme C.onversion System for Dutch. In Pro- 
ceedings of the 12th International Conference 
on Computational Linguistics, pages 133-138, 
1988. 

K. Koskenniemi. Two-level Morphology: A 
General Computational Model for Word-Form 
Recognition and Production. PhD thesis, Uni- 
versity of Helsinki, 1983. 

S. M. Shieber. An Introduction to Unification- 
Based Approaches to Grammar. CSLI Lecture 
Notes 4, Center for the Study of Language and 
Information, 1986. 

M. Bates. The Theory and Practice of Aug- 
mented rl)ansition Network Grammars. In L. 
Bole, editor, Natural Lang'aage Communication 
with Computers, pages 191--259, Springer Ver- 
lag, 1978. 

M. Tomita. Efficient t~arsing for Natural Lar~- 
guage. Kluwer Academic Publishers, 1986. 

M. Kay. Algorithm schemata and data strucures 
in syntactic processing. In St. Allfin: editor, 
Tezt Processing: Text Analysi'.~ and Geueration, 
Y~x~ Typology and A~tribution, pages 327-358, 
Almqvist and Wiksell International, Stockholm, 
Sweden, 1982. 

A.V. Aho and 3.D. Ulhnan. The Theory of 
Parsing, ~?'ansla~zo'n, and Compiling. Auto- 
matte Computation, Prentice-Hall Inc., Engle- 
wood Cliffs, N.Y., 1972. 

P. Shann. The selectiov of a parsing strategy 
for an on-line machine translation system in a 
sublanguage domain. A new practical compari- 
son. In Proc. of the International Workshop on 
Parsin 9 Technologies, pages 264-276, Carnegie 
Mellon University, 1989. 

M. Wir6n. A comparison of rule-invocation 
strategies in context-free chart parsing. In 
ACL Proceedings, Third E~ropean Conference, 
pages 226-233, Association for Computational 
Linguistics, 1987. 
