MORPHOLOGICAL ANALYSIS FOR A GERMAN TEXT-TO-SPEECH SYSTEM 
Amanda Pounder, Markus Kommenda 
Institut ffir Nachrichtentechnik und Hochfrequonztechnik 
Technische Universit/it Wien 
Gusshausstrasse 25, A-1040 Wien, Austria 
ABSTRACT 
A central problem in speech synthesis with unre- 
stricted vocabulary is the automatic derivation of 
correct pronunciation from the graphemic form of a 
text. The software module GRAPHON was developed to 
perform this conversion for German and is currently 
being extended by a morphological analysis compo- 
nent. This analysis is based on a morph lexicon and 
a sot of rule~ and structural descriptions for German 
word-forms. It provides each text input item with an 
individual characterization such that the phonological~ 
syntactic, and prosodic components may operate upon 
it. This systematic approach tht~s serves to minimize 
the number of wrong transcriptions and at the same 
time lays the foundation for the generation of stress 
and intonation patterns, yielding more intelligible~ 
natural-sounding, and generally acceptable synthetic 
speech. 
1. INTRODUCTION 
Many applications of computer speech require unre- 
stricted vocabulary. In particular~ voice output units 
of this kind permit the linkage of the common tele- 
phone network to a central computer, thus enabling 
access for a large public. "Karlchen", the Frankfurt 
talking railway timetable, and other automatic infor- 
mation services are based on this principle. 
If a written text serves as input to a speech syn- 
thesis system with unrestricted vocabulary (text-to- 
speech synthesis)p the derivation of a correct and 
natural-sounding pronunciation and intonation must 
be provided for. The software module GRAPHON 
(GRAPHemo-PHONome-conversion) has boon developed 
to convert any given German text into its phonetic 
transcription (I.P.A.)~ enriched by some prosodic 
markers. 
The text-to-speech system is being implemented on 
an HP 9816 workstation system with a 68000 CPU and 
768 kbyte of RAM. At present a SSI 263 phone syn- 
thesizer serves as acoustical output unit; a simplified 
articulatory model used to control a refined digital 
vocal tract synthesizer is under development. The 
software is written in PASCAL and operation of the 
whole system is expected to be almost real-time. (For 
further implementational details cf. \[1\].) 
While text-to-speech systems for the English lan- 
guage are fairly advanced, there is much room for 
development for German ~peaking systems. It is pos- 
sible only to a limited extent to profit from work in 
the field of English. Obviouslyj German pronunciation 
rules differ from those of other languages; however9 
the mere replacement of a given grapheme-to-pho- 
neme conversion rule by another is inadequate to 
meet the demands of the very different principles on 
which two writing-systems are founded. Tills also 
applies to the structural levels of morphology and 
syntax. 
2. MOTIVATION FOR A MORPHOIA)GICAL COMPONENT 
The application of an English pronunciation rule is 
lexically determined, that is to say, is restricted to a 
generally arbitrary subset of the lexicon (compare~ 
for example, the values of (eaY in the sets 4bread, 
head, thread...~ and {kneads bead, heat..,} ). It is for 
this reason that many English-based systems include 
very extensive dictionaries, for example the pioneer- 
ing work of Allen \[2\] with a 12000 morpheme lexicon. 
On the other hand, German rules have in general a 
much wider scope of application, which has led re- 
searchers working in the field of German to consider 
large lexical inventories unnecessary. The inventories 
in e.g. SAMT \[3\] or SPRAUS-VS \[4\] are thus re- 
stricted to function words needed for the syntactic 
analysis (prepositions, pronouns, articles, etc.). 
Similarly, our earliest efforts in this area were based 
263 
on a small lexicon and an extensive rule catalogue; 
howeverp numerous incorrect transcriptions at mor- 
phological boundaries and the frequent recurrence to 
ad-hoc rules (of. \[1\]) made the lack of some sort of 
morphological indicator apparent. 
However more closely German spelling may reflect 
pronunciation than is the case in Englishp difficulties 
arise in producing a correct pronunciation auto- 
matically if knowledge available to the human 
speaker, such as the internal structure of a given 
word or its native as opposed to foreign origin, is 
not made use of. The following examples should 
suffice to demonstrate the relationship between 
morphology and the values of the written symbols: 
- One fundamental rule is that vocalic quantity is 
determined by the number of following consonants: 
the first rule given in the DUDEN Aussprache- 
wSrterbuch \[5\] states that <a) is to be pronounced 
/a:/ when followed by only one consonant grapheme 
before the stem boundary, so that the inflectional 
form fast of the verb rasen ("rush") becomes 
/ra:st/~ whereas the simplex noun Rast ("rest") 
becomes /fast/. 
- Consonant or vowel groups may be assigned 
digraph or trigraph value only when they appear 
within morphological boundaries; compare for 
example the different values of <sch> in 15schen 
/f/ ("extinguish") und HiSschen /sq/ (dim. of 
"pants"), or of <ei> in Geier /ai/ ("vulture") and 
geirrt /al i/ ("erred"). 
- The first stem syllable in German (native stock) 
receives the primary word stressj a rule which 
implies this stemJs being identifiable; compare 
geben /'ge:bn/ ("give") and Gebein /ga'bain/ 
("bones'). 
Those phenomena play a role in the domain of deriv- 
ation and inflection~ which has been dealt with in 
several systemst e.g. SYNTEX \[6\] or REDE \[7\]; these 
do contain lists of common prefixes and suffixes to 
permit affix-stripping~ although they are pre- 
dominantly rule-based. The same problems are found 
in the field of composition; their import is heightened 
by the very great frequency of this process in the 
German language. Still, Riihl \[6\] proposes a decom- 
position algorithm which relies on distributional 
criteria and on lists of consonant clusters in initial 
264 
and final position (based on K6stner \[8\]). Other 
authors too prefer to minimize the lexical component: 
"The attempt to incorporate this problem into a 
mainly rule-based system seemed to us to require a 
rather great and thus undesired step towards a kind 
of dictionary approach" (\[9\]~ p.226). 
It is however certainly possible to make a case for a 
morphological analysis containing a morph-lexicon of 
some depth. The conversion program presented here 
makes extensive use of such an analysis component 
(see fig. 1) and thus in our opinion profits from the 
following advantages: 
inflection~ derivation~ and composition can be 
treated simultaneously, more economically, and with 
a reduced number of incorrect segmentations; this 
latter is achieved by specifying the respective 
environments of potential elements; 
- simple and efficient treatment of exceptions~ for 
instance the pronunciation of foreign words; this 
and the preceding result in a reduced transcrip- 
tion error rate and in simplified and more trans- 
parent grapheme-to-phoneme conversion rules; 
correct placement of word-internal boundariesj 
labelling of the constituents and the lexieally 
stored information concerning native vs. foreign 
status favour accurate word stress assignment; 
the lexicon-based approach prepares the ground 
for word classification and extraction of certain 
syntactic constraints, providing the input for an 
elementary sentence parser. 
  Text 
\[ MorphologicelAnelysis \] 
/rre~u,or ~1 It  or I,eme II Pronunci-- 
I~ II ~ - • afion ~ ~_~ t~ounaar, es 
I'rooso" .oo\] 
Phonet~ 
Transcript~n 
Word Stress \] 
8 Stress Rhythmic 
Pattern Pauses 
PaHs of 
Speech 
Syntactic Analysis \] 
Phrose 
Structure 
Intonation 
Pattern 
Fig. 1: The role of the morphological 
component within GRAPHON 
3. SKETCH OF THE MOHPHOIX)GICAL OOMPONENT 
3.1. Lexieal l:nventory 
Morphological analysis in our system relies on a 
single lexicon rather than on separate lists of, say, 
prefixes, stems, junctures etc. The entries in this 
lexicon are morphs and not morphemes in that stem 
variation, i.e. processes such as umlaut (e.g. Apfel - 
~pfel "apple"), ablaut (lau£ - lie? "run") and 
e-deletion (trocken - trockn- "dry") are not covered 
by rule but by storage of allomorphs. As we are not 
concerned with generation, this appears to be the 
most practical method. Forms that are in some way 
irregular are then naturally provided with individual 
entries, for example anomalous verb forms (sein - bin 
- war - w~r - ... "be") or forms of the definite 
article (dot, die, das, dem,...). We have chosen to set 
up the most basic forms wherever possible, e.g. NAM- 
as opposed to NAME (nominative singular), which 
permits an economical treatment of derivation and 
inflection. AE~ a matter of fact, the overriding 
principle gow~rning the decision what exactly should 
constitute an entry is a pragmatic one: for example, 
rather than taking sides on linguistic, historical, or 
psychological grounds in such controversial cases as 
antwort- vs. ant + worL- ("answer"), himbeer- vs. 
him + beer- ("raspberry"), or verlie~ ~ vs. ver + lier- 
("lose")~ we choose the solution favouring the ideal 
functioning ot' the system as a whole. 
3.1.1. Structure of a Dictionary Entry 
A dictionary entry consists of the lemma, i.e. 
graphemic representation of the morph, on the one 
hand and an information-tree, serving to characterize 
its phonological, morphological and syntactic value on 
the other. 
A number of practical conventions has been set up 
for the form of the lemma: a given morph is repre- 
sented by a maximum of ten lower-case letters ; the 
diacritic sign " (umlaut) is made use of (cf. other 
systems which decompose the vowels in question as 
<ae),<oe>,<ue>); likewise, the sign <D> is not replaced 
by <ss> either in the input text or in the lexicon. An 
orthographic ~ule of German states that <ss> becomes 
</\]> before a consonant or a word-boundary, so that 
the latter sign's usual function as an indicator of 
vowel length is neutralized in these positions 
(compare Flfiaae "rivers" vs. F/,De "feet" with FluI3 
(/tr/) vs. FuB (/u:/)); this "defect" (cf. \[10\]) p.108) 
can be got round by maintaining the opposition be- 
tween <ss> and <~> in the lemma. 
The information-tree contains classificatory data 
pertaining to the morph itself and to those it may 
immediately select; they concern morphological status 
( lexical stem - particle - derivational morph 
inflectional morph - juncture -...), native or foreign 
status, and combinatorial restrictions. In addition, the 
lexicon allows the introduction of information for the 
assignment to parts of speech and, wherever neces- 
sary, indications as to exceptional pronunciation or 
stress pattern. 
3.1.2. Extent of the Lexical Inventory 
At present the lexical inventory comprises some 2000 
entries, the choice of which was based on 
Ortmann \[11\], itself compiled from four frequency 
lists. As for the contents of the entries, we relied on 
Augst \[12\], Mater \[13\], and Wahrig \[14\]. For the 
ongoing testing, revie~ion~ and supplementing of this 
primary list we depend on the frequency list in 
Meier \[15\] as well as on sample texts from various 
random sources. Inasmuch as affixes, particles, and 
junctures (at least native ones) constitute closed 
classes, they should be represented exhaustively in 
the inventory. This is unfortunately not the case as 
soon as one turns to foreign elements, to whose 
number are always being added new candidates. 
Moreover, it is very difficult if not impossible to 
establish general principles according to which 
foreign suffixes in particular may be isolated and the 
dividing line found between stem and suffix. 
Proper nouns are represented only to a very limited 
extent; their range should be adapted to the require- 
ments of the task at hand. In fact, the compilation of 
the inventory has been carried out with the aims of 
expandability and maximum flexibility. 
It ts of course not to be expected that the lexicon 
would ever cover the entire vocabulary of a native 
speaker, nor is that our intention; consequently, we 
foresee a "joker raorph" which can stand for any 
stem that may happen to occur. This is made possible 
265 
by the generalization that a German stem conforms to 
a number of structural principles: for example, every 
stem must contain a vowel and the variety of con- 
sonant clusters in initial, medial, and final position is 
restricted (of. \[8\]). An oven more general canonical 
description can be expleited in the case of foreign 
elements. Such a device has not yet been 
implemented. 
For the time being, 64 kbyte have been reserved to 
accomodate the loxieal inventory. Note that all loxical 
data as described above are coded so as to achieve 
maximum storage efficiency. 
3.2. Word Parser 
The segmentation of a given (complex) word is 
carried cut automatically in a series of steps; the 
process is bound from the very first of these to the 
dictionary, as stated above. Just as the human 
speaker seeks familiar units in his identification of a 
werd, the automatic analysis considers for further 
attention only those segment~ which correspond to 
forms available in the lexicon, such that the segments 
are contiguous and no letters are left unaccounted 
for. Thus a segmentation such as mein + un + g for 
Meinung ("opinion") could not be produced in the 
first place, as +g+ has no representation in the 
lexicon. The number of petentlal analyses is further 
reduced by the fact that no boundaries are searched 
for in a word corresponding identically to a single 
unit in the lexicon, for example der would not be 
analyzed as d + or or d + o + r . For reasons of 
run-time efficiency, a strategy is used which 
"prefers" the longest segments, starting from the 
beginning of a given word; thus deck + on ("cover") 
would be the first segmentation proposed before 
d + eck + on. The usefulness of this principle can be 
seen from an example like Eintritt ("entrance"), 
where the order of segmentations would be: sin + 
tritt, sin + t + rift, ei + n + tritt, ei + n + t + flit, 
e + in + tritt, e + in + t + rift, e + i + n + tritt, 
e + i + n + t + rift. The first decision proposed by 
the parser can be proved to be the correct one in 
the overwhelming majority of cases, which allows us 
to delay requiring a second proposal until the first 
has been rejected on structural grounds in the follo- 
wing step of the analysis procedure. 
266 
In this second step the proposed segmentations are 
examined as to their conformity to the principles of 
German morphological structure. The following struc- 
tural formula describes every German word, whether 
of native or foreign erigin: 
\[P: + S + Ds o + J\]: 
whereby: 
llb.. 
a 
+,# 
# pa + S + DS + It 
o o o 
.there may be between a and b segments of this 
type in a given structure 
represent morphological boundaries of dif- 
ferent strengths (differentiatlon relevant for 
the context of certain phonolegical rules) 
P... Particle (in general equivalent to inseparable 
prefixes, e.g. +e~zt+, +prfi+) 
S.°. Stem 
D... dorivational morph, always a surf i× 
(e.g. + iS+, + m~g~) 
I... inflectional morph, always a suffix 
(e.g. +em-,+e~) 
J... juncture morph 
(e. g. +ese in Bm~desbEdm "national railway") 
The segmentation is assigned a structural description 
by matching the combinatorial features of each unit 
with the morph status information of its neighbour as 
given in the respective lexicon entries. A morph may 
be specified according to the following properties 
and in turn select certain values for these properties 
in its neighbour: 
- native or foreign status, 
- lexical functionality {this property is manifested by 
the capacity to receive inflection), 
-- morphological status (as in the above structure 
definition with additional detailed classification), 
and 
- lexical class, i.e. part of speech as reflected in the 
inflectional ending. 
Specification of these properties is optional; however~ 
the more information provided, the more restrictions 
with respect to the general structure formula are 
achieved, so that the number of potential labellings 
is reduced and the labellings themselves bear more 
information. Thus, it is possible to previde at least a 
partial treatment for words whose stems are not roD- 
resented in the 2000 entry lexicon. 
Should no match be obtained in this stop, the 
process is repeated with a new segmentation until 
compatible sets of features are found. 
+ ant + ell + 
+ an + teil + 
• F1SX fLSX TXIN 
'F1SX "FLSX fXIV 
+ or" + werb + st + At + i g + en + 
+ er + werb + st 4. &t + i g + e + n e 
+ el" + werb + st. + At + i + ge + n + 
+ er + werb + s + t~t + i g + en + 
'fXPX "FL.SX "fXJX ft_.SX "FXDX TXIA 
TXF'X 'FLSX 'FXJX FI....SX 'FXDX 'FXIV 
'FXF'X "fLSX "FXJX "FLSX ~XDX "FXZX fXIA 
Fig. 2: Sample segmentations and 
structural specifications 
f/F...native/foreign 
l/L...lexical/non-loxical 
X ... unspecified 
P .,. particle 
S ... stem 
J ... junction 
D ,.. derivational morph 
I ... inflectional morph 
Z .,. participle morph 
N ... noun 
A .,. adjective 
V ... verb 
Fig. 2 presents examples of the resulting segmen- 
tations and labellings. We see that the first 
segmentation of Anteil ("portion") is rejected, as in 
this case the stem would be preceded by a suffix 
(+ant+ being a longer segment than +an+, it has 
received "priority" up to this point). In the second 
segmentation, +an+ is correctly recognized as a 
non-lexical E~tem, upon which a lexical stem may 
follow. It is not possible to specify the lexical class 
selected by +an+, as it combines with all parts of 
speech; and as +toil+ can function as a noun or a 
verb stem, there result two potential labellings. The 
ambiguity cannot be resolved at this stage. 
The following example is somewhat more complicated. 
Crucial here is the boundary between the two stems 
of the compound Erwerbst~tiKen ("employees"): the 
phonological consequences of an error (/It/ instead 
of /st/) are quite serious. After the correct seg- 
mentation has finally been found, three possible 
interpretations are proposed. Note that +on+ can 
serve as a participle morph (Z), so that the word 
would syntactically function as an adjective. 
The third step consists of additional checks and 
finer specifications in order to isolate the correct 
structure and part-of-speech assignment for the 
whole word. For instance, if a suffix has been 
identified as a possible past participle morph, this 
could be verified by searching for a corresponding 
prefix (of. toil + t "shares" vs. ge + tell + t 
"shared"). Another check could exploit certain re- 
strictions on the sequence of lexical and non-lexical 
stems in a complex word,, Such tests have not as yet 
been implemented. 
The lexical class of a German word is, generally 
speaking, determined by its last element, so that the 
classification algorithm makes use of the results of 
the matching process at the end of the word. Some 
derivational morphs, e.g. +un~'h +keit+, +iech+, permit 
unambiguous classification. Unfortunately the same 
cannot be said of inflectional endings in particular 
and many other elements as well, taken alone. By ex- 
ploiting the combinatorial information, however, many 
ambiguities are eliminated; moreover, capitalization 
can be treated as a signal for the lexical class noun. 
Each text unit is now provided with a structural 
specification i~uch that the phonological, syntactic 
and prosodic component~ may operate on it. Fig. 3 
shows segmentations and lexical class assignment for 
a sample sentence; based on these, the phonological 
component already in place determines the correct 
pronunciation and generates the I.P.A. transcription, 
also given in fig. 3. 
( R ) .i.. d i e + \[ d i: \] 
(A, V) +r"i cht+i g+e+ \[r I gt iga \] 
( N ) -~.z ~.~!r + 1 eg+Ltng+ \[tsE ~ 1 e:g ~r~ \] 
(R) +Men+ \[v3~ \] 
(N, A ) +wOrt +er'r~-I. \[v~r-'\['; ~,~ \] 
(V) +:i. st+ list\] 
(A ~ V) +wicht+ig+ I-vz~t i(; \] 
( R ) +f Lir + \[T y:# \] 
(R) +die+ \[di: \] 
(N) +be+st i mm..~.un g-~ \[beSt z in~r~ 3 \] 
(A,N, V, A) +i hr+er+ \[i:r~2 \] 
( N, V ) +aus+spr ach+e+ \[as.\[ pr a::.: a \] 
( R ) + u n d + \[ v ~.l t \] 
(N) +be+ton+ung+ \[bato:mr~3 \] 
( R ) + u n d + \[ ~ n t \] 
( R ) +-f ()r + \[~ y:~ \] 
(R) +die+ \[di: \] 
(N) +er'+z eug+urlg+ \[~: ~ts0ig~rr\] \] 
(R) .Ider + \[de:~ \] 
(N) ..I.satz+mel od+i e+ \[za%me 1 odi: i) 
Fig. 3: Sample segmentations, lexical class 
assignment and resulting I.P.A. transcription 
N...noun; A...adjective; V...verb; R°.,other 
267 
4. CONCLUSION 
Although extensive tests on large corpora have not 
as yet been carried out, experiments with our cur- 
rent system permit evaluation of following aspects of 
the morphological analysis component in GRAPHON: 
- The development of the phonological component has 
shown that the setting up of a catalogue of 
pronunciation rules became simpler and more 
systematic, and at the same time, the rate of 
transcription errors could be greatly reduced. 
- A relatively limited number of lexical entries is 
capable of handling a considerable quantity of 
running text. The morphological information stored 
in each entry has proved to be relevant and in 
general sufficient for correct segmentation. 
However, in order to increase accuracy in deter- 
mining lexical class, as required by the syntactic 
analysis, it would be advantageous to expand the 
number of categories represented in the lexicon 
entries. As it was not clear before the present 
tests exactly which additional classification would 
be useful, we chose to start from a minimum and 
provide for easy future expansion. For example, the 
experiments confirm our assumptionthat it would be 
desirable to specify the potential junctures for a 
given stem and to differentiate several inflectional 
paradigms within a lexical class, in particular 
strong and weak verbs. These data would have 
resolved the ambiguities encountered for the 
sample words in Fig. 2. 
- As the aims of our system do not include any 
attempt to incorporate semantics and as moreover 
there is no feedback from the syntactic component 
planned, a unique structural specification cannot 
be expected in the case of ambiguities requiring 
reference to these structural levels. Since such 
ambiguities do not necessarily lead to incorrect 
grammatical specification and only rarely to 
incorrect pronunciation, this is only a relative 
limitation. 
Correctness of the phonemic transcription certainly 
accounts for a great part of the quality and accepta- 
bility of a text-to-speech system. Nevertheless it is 
often claimed {e.g. \[6\]) that synthetic speech should 
be evaluated along further dimensions, such as inte\[- 
268 
ligibility, listening comprehension and naturalness. 
One goal Of the approach presented here is to lay 
the ground for the incorporation of rules for the 
assignment and realization of stress and intonation 
patterns not only on the word but also on the sen- 
tence level. Thus the basic phonetic transcription will 
be extended and modified so as to give a represen- 
tation closer to natural speech. 
REFERENCES 

\[1\] Kommenda, M.: "GRAPHON - sin System zur 
Sprachsynthese hal Texteingabe". 
In: H. Trost and J. Retti (Eds.), 
0sterreichische Artificial Intalligence-Tagung. 
Springer, Berlin, 1985. 

\[2\] Allen, J.: "Synthesis of Speech from 
Unrestricted Text". 
Prec. IEEE, eel. 64 (1976), pp. 433-442. 

\[3\] Wolf, H.E.: "Sprachvollsynthese mit automatischer 
Transkription". 
Der Fernmelde-Ingenieur, eel. 38 / no. 10 (1984), 
pp. 1-42. 

\[4\] Mangold, H.; Stall, D.S.: "Principles of 
Text-Controlled Speech Synthesis with 
Special Application te German". 
In: L. Bolc (Ed.), 
Speech Communication with Computers. 
Carl Hanser, M~inchen, 1978, pp. 139-181. 

\[5\] DUDEN Aussprachew6rterbuch. 
Bibl. Inst., Mannhoim, 1974. 

\[6\] R~ihl, H.-W.: Sprachsynthese nach Regeln ffir 
unbesehr~inkton deutschen Text. 
Dissertation Ruhr-Universit/it Bochum, 
Germany, 1984. 

\[7\] Miiller, B.S.: "Regelgesteuerte Umsetzung yon 
deutschen Texten in gesprochone Sprache for 
das Sprachsusgabegerfit VOTRAX". 
In: B.S. Mfiller (Ed.)~ Germanistische Linguistik, 
eel. 79-80 (1985), pp.83-112. 

\[8\] K~istner, W.: Automatische Phonemisierung 
orthographiseher Texts im Deutschen. 
Helmut Buske, Hamburg, 1972. 

\[9\] Menzel, W.: " A Grapheme-to-Phoneme 
Transformation for German". 
Comp. & AI, eel.3, 1984, pp. 223-234. 

\[10\] Philipp, M.: Phonologie des Deutschen. 
Kohlhammer, Stuttgart, 1974. 

\[11\] Ortmann, W.: Wortbildung und Morphemstruktur 
sines deutsehen Gebrauchswortschatzes. 
Goethe-Institut, Mfinchen, 1983. 

\[12\] Augst, G.: Lexikon zur Wortbildung. 
Forschungsberichte des IdS, eel. 24.1-4. 
Gunter Narr, Tfibingen, 1975. 

\[13\] Mater, E°: Riickl~iufiges WSrterbuch der 
deutschen Gegenwartssprache. 
Bibliographisches Institut Leipzig, 1983. 

\[14\] Wahrig~ G.: Deutsehes W6rterbuch. 
G~itersloh, 1983. 

\[15\] Meier, H.: Deutsche Sprachstatistik. 
Georg elms, Hildesheim, 1978. 
