Iqae PSI/PHI architecture for prosodic parsing 
Dafydd GIBBON and Gunter BRAUN 
Faculty of Linguistics and Literary Studies 
University of Bielefeld 
Postfach 8640 
D-4800 Bielefeld I 
Abstract 
In this paper an architecture and an implementation for a 
linguistically based prosodic analyser is presented. The 
implementation is designed to handle typical prosodic input in 
the form of parallel input channels, and processes each input 
channel independently in a data-directed, phonologically 
motivated configuration of partly parallel, partly cascaded 
feature modules and module clusters, each implemented as 
finite transducers, producing intonationally relevant categories 
as output. The design criteria included maximal restriction of 
computational power (the system could be compiled into one 
massive finite transducer); relevance to computational linguistic 
formalisms with a view to developing an integrated model 
mapping prosodic structures on to textual structures; 
relatability to speech recognition algorithms, and to 
phonological theories. It was implemented in an object 
oriented environment with parallel processing simulation 
(CheOPS), and a linguistically interesting surface language 
(BATLAN). 
1. Aims and design criteria 
In this paper a new architecture for the parallel processing 
of feature systems, particularly in phonology, is presented and 
applied to data-directed prosodic parsing in English. It uses 
independent feature processing modules in configurations 
which allow Parallell, Sequential, Incremental (PSI) or 
Parallel, Hierarchical, Incremental (PHI) processing of 
phonetic data by the modules, with linguistically relevant 
output (in this case, prosodic categories such as pitch accent). 
The domain of prosodic features (especially intonation, 
stress or accent, tone) has not yet received significant attention 
from computational linguists. It poses problems which are 
rather different from the phoneme or letter concatenation 
models typically used in computational models of written 
language. In particular, the problems concern the parallel 
processing of segmental and suprasegmental features in 
functionally and structurally partly autonomous tiers; an 
architecture for this purpose has to be able to cope with a 
variety of different synchronisation protocols for feature 
modules and module clusters. In addition, a prosodic parser 
has to translate from a detailed phonetic feature representation 
into a more abstract prosodic structure suitable for mapping on 
to lexical items in syntactic, semantic and pragmatic contexts. 
Several styles of computational architecture could be used 
for this purpose, from relatively ad hoc blackboard 
architectures to the kind of virtual, abstract parallelism tbr 
feature processing in unification granmaars. The selec'~ion 
criteria used here included: 
1 Maximal restriction of computational power in terms of 
time and space bounds. 
2 Conceptual compatibility with computational linguistic 
formalisms used at other linguistic levels, with a view to 
developing an integrated model. 
3 As far as possible, relatability to speech recognition 
algorithms. 
4 Suitability as a simulator for phonological theories of the 
autosegmental type. 
It would appear at first sight to be a nearly impossible task 
to fulfil all these criteria at once. However, study of each of 
these areas revealed that a concept using finite automata (in 
particular, finite transducers) to simulate the feature modules, 
and partly parallel, partly cascaded configurations of these to 
simulate feature clusters and autosegmental tiers comes close 
to satisfying them, including the speech recognition 
requirement (cf. the hidden Markov models, Levinson 1986). 
A secondary aim was to develop a versatile workbench for 
such descriptions. A linguistically interesting and useful 
surface language for transition networks (BATLAN, 
Eikmeyer/Gibbon 1983)was selected for formulating the 
feature modules, and provided with control mechanisms for 
parallel and (unidirectionally) cascaded modules, clusters and 
tiers with different modes of synchronisation and interaction. 
The implementation was developed in an object oriented 
programming environment with facilities for the simulation of 
parallel processing (CheOPS, Eikmeyer 1986). 
2. The PSI/PHI architecture and prosodic pa~ing 
The PSI/PHI concept is suitable for application to other 
linguistic domains which can be modelled with parallel feature 
processing. A pure PSI system (using finite automata in 
feature modules) is weakly equivalent to massive single finite 
automaton, though obviously with greater expressive power, 
while a pure PHI system (using push-down automata in feature 
modules) could be used to process languages up to and 
including context sensitive indexed languages. The prosodic 
parsing application is a pure PSI system. The PHI facility is 
202 
not used at pre~nt; neither is a set of augmentations designed 
to provide ATN-Iikc facilities if require~l. The feature modnlc 
m~tomata are lbrmulatezt as finite transition networks. 
hi the prosodic parsing application, rite PSI system has a 
two4evel cascade structure, each consisting of parallel tiers of 
~baturt:s trod tbature clusters° The configuration used at present 
in a "stress: parser *~ lbr German is showu in Figure 1. 
InPut to the parser consists of parallel chmmels of' digitisexl 
signal paracneters such as intensity or fundamental ti'equeney 
(other spettral parameters could also be used). The initial 
'hmture detector (FI.)) level plays a specific fimctional role; it 
has the task of simulating the classical five tasks of a feature 
detector: parameter identification, time window specification, 
smootifing ?unction, segmentatiou algorithm~ aud classification 
(value ~,migmnent) algorithm. Fairly simple feature models for 
acoustic edge-detection (zero crossing, slope maxima) and 
contour detection (peak etc.) are fbHnulated as transition 
network tr~atsducers. Since the input signal is a continuous 
stream of' ;ndefinite length, the transducers are not assigned 
special finite, states, but can in; stoppexl anywhere. 
ROCrSSOR II 
:t !ZE'L JI 
EZ@-,_:J 
: rill. 
__::\]2 .... 
,E-recror l.E,ec 
I v°~ce~ I lrw l~ 
t,I  llCV/V( k._ IPEn'< J} ERn~ 
/ 
CENT II 
RR~ ER ~JJ 
}ligt~l'C 1: "s{r,,~ss pttlsec" 
0 ,5 I 1.5 
9r~B FREQUENCY 
5B r_ \[---T-T --i- 1 1 - f --i--}" --,~ q- 
O ,5 i 1.5 
I" \[B } ........... iIi i ~1,1 Iii iiiI III II iII II ii ill Ii i1\[i IiI Iii iiI Ii ii II111 i1111 iIi Ii II iIiii i11 iI ii I,I ii ii iII iiiii ill Iiiii \[i iIi i iiIIiIiiiillll;lllll iiiiiii iii iii ii II iii ii Iii 'Ill ii i ii i .... 
I ' { flJ .................                   i                             `                     $        '                        i      IIIiIiiiil,llllllll , IIiiIIiIiiIlilllllll$111111111111 ....... 
FU(Sfll ................ ~lllllllllllllllll ............ i ..... i ii ii ill ii ill ii ii iii ii lilil ill ii i$1 iiii iii iiiii ,i ill ii ill ii iiiii i , ,, ,111111111 ..................... , ............... 
\[ " (BJ ................ IiitllllllllllllllllllllHiii .......... IiIiiiiiiiIIilllllllllLIiIl,ll$111111111111111111111111111111111, , I\[111111 I lllllllllllllllllllllllllllll 
I PUliK ............... ............ ,.,., ............ , ............ , ..... * ........ t .......................... i ................ 
CV/VC .......................... , ,.,., .~, ,, ............. , ............... ,,, , , , ................. ,., ....... , ..... 
tflRISE ....... ' .................................................. *.'.' .................................... 
r-{~l-iii. L ........................ , .......... , ........... , ......................... 
d a fl Z U ~ ~2 CJ 1 fl tl r 0 t n S t at ,1 
d/a n/z/u x s E\d/I\A n/r//o t n S t/ai//t, 
dann suchst du dir eJ.nen roten stein 
Figure 2: "featt,re module representation" 
In lieu of segmenttd feature detectors, a phonetic 
transcription is assigned m,'mually using a resynthesis of the 
digital data for fine labelling purposes; the phonetic 
transcription is effectively regarded as an abbreviation for the 
relevant tiers of segmeutal feature modules and module 
clusters. 
The second level takes the primitively segmented output of 
the FD level and filters out the relevant prosodic categories 
and category sequences. 
The two remaining parallel levels of accent seqnences (in 
the case of the present parser) and segmental phonetic 
transcription (without syllable or word boundaries) are both 
fed into the textual ruapping component. The main cornponent 
lug\] so far implemented is the lexicon, formulated as a classical 
discrimination net transducer. The output from the lexicon is 
_TI~J\] at present an orthographic representation with underlining of 
~1 accented words. 
The feature module representations defined by these levels 
are shown in Figure 2. At tile top is a representation of two 
digitised sigaal parameters. The first group of "feature tapes" 
shows the output of the FD level, with upward, downward 
and central spikes representing binary o1' tertiary valucxt 
features, and dots representing null output. Tile second gl'oup 
shows the output of tile FC level, with considerably more 
sparse representation of data-driven abstraction hypotheses 
about possible occurrences of prosodic categories. The other 
two levels show outputs within tile textual mapping 
component. 
203 
Close attention has been paid to the empirical basis of this 
application. Results of experimental phonetic studies were used 
in formulating the FDs, and accent perception tests were 
conducted in order to verify the output of the PSI system 
against native listeners (Braun & Jin 1987). The tests yielded a 
satisfyingly high rate of approx. 85%. Within a homogeneous 
dialect group the parser is speaker-independent; the data are 
"raw" instructions for constructing blocks worlds, and include 
hesitations and repairs. 
The application is being developed within a project group 
financed by the DFG (Deutsche Forschungsgemeinschaft) to 
include further prosodic categories and a suitable syntax with 
strategies for coping with special speech processes such as 
slips and repairs. The pragmatic-semantic level of 
therue-rheme and focus structures has already been defined for 
restricted blocks worlds dialogues (Pignataro 1987) and will be 
incorporated into an automatic focus assignment system. 
Other, sta'ucturally different, or less expressive, or more 
heterogeneous systems using finite devices (particularly in 
phonology and phonetics), are being studied with a view to 
extending areas of application of the PSI/PHI architecture (cf. 
fHart & Collier 1975, Pierrehumbert 1980 in intonation; 
Church 1980, 1983 in syntax and segmental phonology.; Kay 
& Kaplan 1981, Kay 1987 in phonology and morphology; 
Bolc & Maksymienk0 1981, Chomyszyn 1986 in a Polish 
text-to-speech interface with rules by Steffen-Batogowa; 
Koskenniemi 1983 in morphology; Gibbon 1981, 1987 in 
intonation and tonology; Berwiek & Pilato 1987 in syntax 
acquisition). 
References 

Berwick, R.C.S. Pilato, 1987, "Learning syntax by automata 
induction." Machine Learning 2, 9- 35. 

Bolc, L. & M. Maksymienko, 1981. Komputerowy system 
przetwarzania tekstow fonematycznych. U Warsaw Press. 

Braun, G. & Jin, F., 1987. Akzentwahrnehmung und 
Akzenterkennung. "Prosodische Kohfision" Project Report 
U Bielefeld. 

Chomyszyn, J., 1986. "A phonemic transcription program for 
Polish." Int. 3. Man-Machine Studies 25, 271-293. 

Church, K.W., 1980. Memory limitations in natural language 
processing. Master's thesis, M.I.T. 

Church, K.W., 1983. Phrase Structure Parsing. A method for 
taking advantage of allophonic constraints. Ph.D. thesis, 
M.I.T. 

Eikmeyer, H.J., 1986. "CheOPS: an object-oriented system in 
PROLOG." User Manual. Bielefeld. 

Eikmeyer, H.J. & Gibbon, D., 1983. "BATNET: ein 
ATN-System in einer Nicht-LISP-Umgebung." Sprache 
und Datenverarbeitung 7, 26-35. 

Gibbon, D., 1981. "A new look at intonation syntax and 
semantics". In: A. James, P. Westney, eds., New 
Linguistic Impulses in Foreign Language Teaching. 
Tiibingen: Narr. 

Gibbon, D., 1987. "Finite state processing of tone systems." 
In: Proc. 3rd Conf. European Chapter of ACL, 
Copenhagen, 1--3 April 1987, 291-298. 

"t Hart, J. & Collier, R., 1975. "Integrating different levels of 
intonation analysis." J. Phonetics 3, 235-255. 

Kay, M., 1987. "Nonconcatenative Finite-State Morphology." 
Proc. 3rd Conf. Earopean Chapter of ACL, Copenhagen, 
1-3 April 1987, 2-10. 

Kay, M. & Kaplan, R., 1981. "Phonological rules aald 
finite-state transducers." Paper at Annual Meeting of ACL, 
28.2.1981, NYC. (Cited by Koskenniemi). 

Levinson, S.E., 1986. "Continuously variable duration hidden 
Markov models for automatic speech recognition." 
Computer Speech and Language 1, 29-45. 

Pierrehumbert, J., 1980. The Phonology and Phonetics of 
English Intonation. Ph.D. thesis, M.I.T. 

Pignataro, V., 1987. Ein Sprachgenerierungsmodell mit Topik 
und Fokus. "Prosodische Koh~ion" Project Report, U 
Bielefeld. 
