Incremental Sentence Production with a Parallel Marker-Passing Algorithm 
Hiroaki Kitano 
Center for Machine Translation 
Carnegie Mellon University 
Pittsburgh, PA 15213, U.S.A. 
hiroaki@cs.cmu.edu 
ABSTRACT ~DMDIALOG \[Kitano, 1989a\], developed at the Center 
for Machine Translation at Carnegie Mellon University. This paper describes a method of incremen- 
tal natural language generation using a paral- 
lel marker-passing algorithm for modeling si- 
multaneous interpretation. Semantic and syn- 
tactic knowledge are represented in a memory 
network in which several types of markers are 
passed around in order to make inference, and 
explore implicit parallelism of sentence produc- 
tion. The model is consistent with several psy- 
cholinguistic studies. The model is actually 
implemented as a part of the ~DMDIALOG real- 
time speech-to-speech dialog translation sys- 
tem developed at the Center for Machine Trans- 
lation at Carnegie Mellon University, and pub- 
licly demonstrated since March 1989. 
1 Introduction 
Incremented sentence production has been gaining more 
attention in recent years. It is particulary important in 
application areas such as speech-to-speech translation 
where real-.time transaction is essential. The ~DMDIALOG 
project is a research project to develop a speech-to-speech 
dialog translation system with simultaneous interpreta- 
tion capability. At the outset of the project, we have 
investigated actual simultaneous interpretation sessions 
and telephone dialogs. As a result, we found that one 
utterance in a real dialog can be quite long (15 seconds 
for one sentence is not rare in Japanese). This implies 
that if we adopt a sequential architecture, in which gen- 
eration starts only after the entire parsing is completed, 
this inevitably creates unendurable delay in translation. 
Suppose one speaker made an utterance of 15 seconds, 
and the other responded with an utterance of 15 seconds 
in length, the first speaker must wait at least 30 seconds to 
start hearing the translation of the utterance of his/her di- 
alog partner. It is inconceivable that such a system could 
be practially deployed. Introduction of a simultaneous 
interpretation scheme by coupling an incremental gener- 
ation and all incremental parsing technologies is the only 
way to minimize this problem 1 . 
Incremental sentence production is interesting from the 
standpoint of psycholinguistics as well. There are many 
psycholinguistic studies which support incremental sen- 
tence production as a psycholinguistically plausible ap- 
proach. We will discuss the psycholinguistic relevancy 
of our model later. 
In this paper, we describe a model of incremental 
sentence l:Woduction which is actually implemented as 
a part of the speech-to-speech dialog translation system 
1Although there is a problem of how to resolve ambiguities 
in parsing, discussion on such topic is beyond the scope of this 
paper. For those who are interested in this topic, refer to \[Kitano 
et. al., 1989a\]\[Kitano et. al., 1989b\]. 
2 Basic Organization of the Model 
We use a hybrid parallel paradigm \[Kitano, 1989b\], 
which is an integration of a parallel marker-passing 
scheme and a connectionist network, as a basic algo- 
rithm. Five types of markers (two types for parsing, two 
other types for generation, and an another type for con- 
textual priming) are passed around the memory network 
which represents knowledge from morphophonetic-level 
to discourse-level. A connectionist network performs 
sub-symbolic computations with a massive parallelism. 
Use of the hybrid parallel scheme on the memory net- 
work has its merit in exploring implicit parallelism in the 
process of natural language generation and parsing. 
2.1 The Memory Network 
The memory network incorporates knowledge from mor- 
phophonetics to plan hierarchies of each participant of a 
dialog. Each node is a type and represents either a concept 
(Concept Class node; CC) or a sequence of concepts (Con- 
cept Sequence Class node; CSC). Strictly speaking, both 
CC and CSC are a collection or family since they are, for 
the most part, sets of classes. CCs represent such knowl- 
edge as concepts (i.e. *Conference, *Event, *Mtrans- 
Action), and plans (i.e. *Declare-Want-Attend). CSCs 
represent sequences of concepts and their relations such as 
concept sequences 2 (i.e. <*Conference *Goal-Role *At- 
tend *Want>) or plan sequences (i.e. <*Declare-Want- 
Attend *Listen-Instruction>) 3 of the two participants of 
the dialog. CSCs have an internal structure composed of a 
concept sequence, constraint equations, presuppositions, 
and effects. This internal structure provides our scheme 
with the capability to handle unification-based processing 
as well as case-based processing, so that typical criticisms 
against DMAP-type NLP \[Riesbeck and Martin, 1985\], 
such as weak linguistic coverage and incapability of han- 
dling linguistically complex sentences, do not apply to our 
model 4. Each type of node creates instances during pars- 
ing which are called concept instances (CI) and concept 
sequence instances (CSI), respectively. CIs correspond to 
discourse entities. They are connected through labelled 
links such as IS-A or PART-OF, and weighted links which 
form a connectionist network. CSIs record specific cases 
of utterances indexed into the memory network whereas 
ZConcept sequences are the representation of an integrated 
syntax/semantics level of knowledge in our model. 
3This should not be confused with 'discourse segments' 
\[Grosz and Sidner, 1985\]. In our model, information repre- 
sented in discourse segments is distributively incorporated in 
the memory network. 
4Indeed, our model is substantially different from DMAP- 
type marker-passing or any other naive marker-passing models, 
because linguistic features are carried up by markers to conduct 
substantial linguistic analysis as well as case-based processing. 
1 217 
CSCs represent generalized cases and syntactic rules. Use 
of cases for generation is one of the unique features of our 
model while most generators solely depend upon syntac- 
tic rules. 
2.2 Tile Markers 
A guided marker-passing scheme is employed for infer- 
ence in the memory network. Basically, our model uses 
four types of markers. These markers are (1) activation 
markers, (2) prediction markers, (3) generation markers, 
and (4) verbalization markers. 
Activation Markers (A-Markers) are created based on 
the input of the source language. These are passed up 
through IS-A links and carry instance, features and cost. 
This type of marker is used for parsing. 
Prediction Markers (P-Markers) are passed along the 
conceptual and phonemic sequences to make predictions 
about which nodes are to be activated next. Each P- 
Marker carries constraints, cost, and the information 
structure of the utterance which is built incrementally 
during parsing. 
Generation Markers (G-Markers) show activation of 
nodes in the target language, and each contains a surface 
string, features, cost and an instance which the surface 
string represents. G-Markers are passed up through IS-A 
links. 
Verbalization Markers (V-Markers) anticipate and 
keep track of verbalization of surface strings. Final sur- 
face realizations, cost and constraints are carried by V- 
Markers. 
Besides these markers, we assume Contextual Markers 
(C-Markers) \[Tomabechi, 1987\] which are used when a 
connectionist network is computationally too expensive. 
The C.-Markers are passed through weighted links to in- 
dicate contextually relevant nodes. 
2.3 A Baseline Algorithm 
Generally, natural language generation involves several 
stages: content deliniation, text structuring, lexical se- 
lection, syntactic selection, coreference treatment, con- 
stituent ordering, and realization. In our model, the con- 
tent is determined at the parsing stage, and most other pro- 
cesses are unified into one stage, because, in our model. 
lexica~ item, phrase, and sentence are treated in the same 
mechanism. The common thrust in our model is the 
hypothesis-activation-selection cycle in which multiple 
hypotheses are activated and where one of them is finally 
selected. Thus, the translation process of our model is 
composed of processes of (1) concept activation, (2) lex- 
ical and phrasal hypotheses activation, (3) propositional 
content activation, (4) syntactic and lexical selection, and 
(5) realization. 
1. Concept Activation: A part of the parsing process 
as well as an initial process of generation. Individual 
concepts represented by CCs are activated as a result of 
parsing speech inputs. A-Markers are created and passed 
up by activating the concept. 
2. Lexieal and Phrasal Hypotheses Activation: Hy- 
potheses for lexicons and phrases which represent the 
activated concept are searched for, and G-Markers are 
created and passed up as a result of this process. Usually, 
multiple candidates are activated at a time. 
3. Propositional Content Activation: A part of the 
parsing process by which propositional content of the ut- 
terance is determined. 
4. Syntactic and Lexical Selection: Selection of one 
hypothesis from multiple candidates of lexical entries or 
phrases. First, the syntactic and semantic constraints are 
checked to ensure the correctness of the hypotheses, and 
the final selection is made using a cost/activation-based 
selection. 
5. Realization: The surface string (which can be either a 
sequence of words or a sequence of phonological signs) 
is formed from the selected hypothesis and scmt to the 
speech synthesis device. 
The movement of V-Markers is important in under- 
standing our algorithm. First, a V-Marker is located on 
the first element of the CSC. When a G-Marker hits the 
element with the V-Marker, the V-Marker is moved to 
the next element of the CSC (figure la), and unification 
is performed to ensure syntactic soundness of the sen- 
tence. In figure lb, dl is a closed class lexical item s. 
When a G-Marker hits the first element, a V-Marker on 
the first element is moved to the third element by pass- 
ing through the second element which is a closed class 
item. In this case, the element for the closed class item 
need not have a G-Marker. The lexical realization for the 
element is retrieved when the V-Marker passes through 
the element. In the case where the G-Marker hits an el- 
ement without a V-Marker, the G-Marker is stored in the 
element. When another G-Marker hits the element with 
a V-Marker, the V-Marker is moved to the next element. 
Since the next element already has a G-Marker, the V- 
Marker is further moved to the subsequent element of the 
CSC (figure lc). Although, in most cases, a bottom up 
process by G-Markers handles generation process, there 
are cases where a bottom up process alone can not iden- 
tify syntactic structure and lexieal items to express a given 
meaning. In such cases, a top-down process is invoked 
which identifies the best syntactic structure and lexieal 
items by searching downward from each element of the 
activated CSC. Each retrieval procedure is similar to the 
search of a closed class lexical item. 
There are cases in which an element oftheCSC is linked 
to other CSCs, and forms hierarchies of CSCs. Sup- 
pose each CSC represents a phrase structure nile, then the 
dynamically organized CSC hierarchy provides produc- 
tive power so that various types of structures of complex 
sentences can be generated. In the hierarchy of CSCs, 
G-Markers are passed up when a CSC is accepted, and 
carry feature structures which represent mourning frag- 
ments expressed by the CSC. V-Markers are passed down 
to lower CSCs when an element is predicted, and impose 
constraints on each elements of the lower CSCs. The hi- 
erarchical organization of CSCs allows all types of tree 
expansions: upward, downward and insertion. 
Figure 2 shows an example of how an analysis tree 
can be constructed in our model. In this example, we as- 
sume Lexical-Functional Grammar (LFG) as a grammar 
formalism, and the order which conceptual fl'agments are 
given is based on an order that conceptual fragments can 
be identified when parsing a corresponding Japanese sen- 
tence incrementally. Notice that all three types of exten- 
sions are involved even in such a simple sentence. 
SClosed class lexical items refer to function words such as 
in, of, at in English and wo, ga, ni in Japanese. These words 
are non-referential and their number do not grow, whereas open 
class lexical items are mostly referential and their number grows 
as vocabulary expands. 
218 2 
(a) v 
< ~oal a2 ... an > 
/ 
G 
(C) V 
<~ (/0 ~1 a2 • • • an > 
l 
G 
v (b) v 
::~ < at al a2 • .. a, > < ~0 a-1 a2 ... a, > 
! 
G 
V 
=:~ < a0 dl a2 .-. a, > 
V V 
< qo al a2 • • • an > ::~ < at at a2 • • • a, > T G 
G 
Figure 1: Movement of V-Marker in the CSC 
NP 
I N 
I She 
(~) 
NP pp 
I N P NP 
I !Z2x She a the hotel 
(4) 
NP pp 
I N P NP 
I I She at the hotel 
(2) 
PP 
A P VP 
V NP 
' I Lx unpack her luggage 
NP PP 
I N P NP 
She a the hotel 
(3) 
NP 
her luggage 
S 
A NP VP 
N V PP PP i I /x 
She slayed P NP I 
a, the he,e, , \[ 
unpack her luggage 
(a) 
Figure 2: An Incremental Tree Construction 
3 Activation of Lexical and Phrasal 
ltIypolheses and Propositional Contents 
When a concept is recognizexl by the parsing process, an 
hypotheses for translation will be activated. Tile concept 
can be an individual concept, a phrase or a sentence. In 
our model, they are all represented as CC nodes, and each 
inshqnce of the concept is represented as a CI node. The 
basic process is for each of the activated CCs, LEX nodes 6 
in the target language to be activated. There are fern' 
possible mappings between source language nodes and 
target language nodes which are activated; word-to-word, 
phrase-to-word, word-to-phrase, and phrase-to-phrase. In 
our model, hypotheses for sentences and phrases are rep- 
resented as CSCs. From the viewpoint of generation, 
either LEX nodes representing words or CSC nodes rep- 
resenting phrases or entire sentences are activat~xl. 
LEX node activation: There are cases when a word or 
a phrase can be translated into a word in the target lan- 
guage. In figure 3a and c, the word LEXsc or the phrase 
CSCsL activates CC~. LEXlrL is activated as a hypothesis 
of translation for LEXsc or CSCsL interpreted as CC~. A 
G-Marker is createxl at LEXT,L containing a surface real- 
ization, cost, features, and an instance which the LEXlrc 
represents (CI). The G-Marker is passed up through an 
IS-A link. When a CCI does not have LEXlrL, a CC2 is 
activated and a LEX2.tL will be activated. Thus, the most 
specific word in the target language will be activated as a 
hypothesis. 
CSC node activation: When a CC can be represented 
by a phrase or sentence, a CSC node is activated and a 
6LEX nodes are a kind of CSC which represent a lexical 
entry and phonological realization of the word. 
G-Marker which contains that phrase or sentence will be 
created. In figure 3b and d, LEXsL and CSCsL activates 
CCl which has CSCI.rL. In this case, CSClrL will be 
activated as a hypothesis to translate LEXsL or CSCsL in- 
terpreted as CC1. In particular, activation of CSCrL by 
CSCsL is interesting because it covers cases where two 
expressions can be translated only at phrasal or sentenial 
correspondence, not at the lexical level. Such cases are 
often found in greetings or canned phrases. It should be 
noted that CSCs represent either syntactic rules or eases of 
utterance. Assuming eases are acquired from legitimate 
utterances of native speakers, use of cases for a generation 
process should be preferred over purely syntactic formu- 
lation of sentences because use of cases avoids generation 
of sentences which are syntactically sound but never ut- 
tered by native speakers. 
4 Syntactic and Lexical Selections 
Syntactic and lexical selections are conducted involving 
three processes: feature aggregation, constraint satisfac- 
tion, and competitive activation. Feature aggregation and 
constraint satisfaction correspond to a symbolic approach 
to syntactic and lexieal selection which guarantee gram- 
maticality and local semantic accuracy of the generated 
sentences, and the competitive activation process is added 
in order to select tile best decision among multiple candi- 
dates. 
4.1 Feature Aggregation 
Feature aggregation is an operation which combines fea- 
tures in the process of passing up G-Markers so that 
minimal features are carried up. Due to the hierarchi- 
cal organization of the memory network, features which 
3 219 
I I% 
LEXsL CI LEX1TL 
I I 
LEXsL Cl CSC1TL CSCsL CI LEX17L 
I I% / c sc2  
CSCsL CI CSCI~ 
(a) £o) (c) (d) 
Figure 3: Activation of Syntactic and Lexical Hypotheses 
need to be carried by G-Markers are different depending 
upon which level of abstraction is used for generation'. 
Given the fact that unification is a computationally ex- 
pensive operation, aggregation is an efficient mechanism 
for propagating features because it ensures only minimal 
features are aggregated when features are unified, and ag- 
gregation itself is a cheap operation since it simply adds 
new features to existing features in the G-Marker. One 
other advantage of this mechanism is that the case-based 
process and the constraint-based process are treated in 
one mechanism because features required for each level 
of processing are incrementally added in G-Markers. 
4.2 Constraint Satisfaction 
Constraint is a central notion in modern syntax theories. 
Each CSC has constraint equations which define the con- 
straints imposed for that CSC depending on their level 
of abstraction 8. Feature structures and constraint equa- 
tions interact at two stages. At the prediction stage, if a 
V-Marker placed on the first element of the CSC already 
contains a feature structure that is non-nil, the feature 
structure determines, according to the constraint equa- 
tions, possible feature structures of G-Markers which 
subsequent elements of the CSC can accept. At a G- 
V-collision stage, a feature structure in the G-Marker is 
tested to see if it can meet what was anticipated. If the 
feature structure passes this test, information in the G- 
Marker and the V-Marker are combined and more precise 
predictions are made as to what will be acceptable in 
subsequent elements. Thus, the grammaticality of the 
generated sentences is guaranteed. Semantic restrictions 
are considered in this stage. 
4.3 Competitive Activation 
The competitive activation process introduced either by 
a C-Marker-passing or by the connectionist network de- 
termines the final syntactic and lexical realization of the 
sentence. Here, we have adopted a cost°based scheme as 
we have employed in parsing \[Kitano et. at., 1989a\]. In 
the cost-based scheme, the hypothesis with the least cost 
will be selected. This idea reflects our view that pars- 
ing and generation are dynamic processes in which the 
state of the system tends to a global minima, and that a 
cost represents dispersion of energy so that higher cost 
hypotheses are less likely to be taken as the state of the 
system. In the actual implementation, we compute a cost 
7When knowledge of cases, similar to the phrasal lexicon, 
is used for generation, features are not necessary because this 
knowledge is already indexed to specific discourse entities. 
8However, CSCs representing specific cases do not have 
contraint equations since they axe already instanfiated and the 
CSCs are indexed in the memory network. 
of each hypothesis which is determined by a C-Marker- 
passing scheme or a connectionist network. 
The C-Marker passing scheme puts C-Markers at con- 
textually relevant nodes when a conceptual root node is 
activated. A G-Marker which goes through a node with- 
out a C-Marker will be added with larger cost than others. 
When there are multiple hypothesis for the specific CC 
node; i.e. when multiple CSCs are linked with the CC, 
we will add up the cost of each G-Marker used for each 
linearization combined with pragmatic constraints which 
may be assigned to each CSC, and the preference for each 
CSC, and the hypothesis with least cost will ~; selected 
as the translated result. 
The Connectionist Network will be adopted with some 
computational costs. When a connectionist network is 
fully deployed, every node in the network is connected 
with weighted links. A competitive excitation and inhibi- 
tion process is performed to select one hypothesis. Final 
interpretation and translation in the target language are 
selected through a winner-take-all mechanism. 
5 Committment and Ambiguities 
One of the most significant issues is how to resolve ambi- 
guities of the parsing process as early as possible, so that 
the final translation hypothesis can be determined as early 
as possible. Since many sentences are ambiguous un- 
til, at least, the entire clause is analyzed, disambiguation 
necessarily imposes constraints upon scheduling of the 
generation process, However, it should be noted that the 
human interpreter does not start translating unless she/he 
is sure about what the sentence means. This allows our 
model to take a wait-and-see strategy when multiple hy- 
potheses are present during processing of input utterances. 
However, when some ambiguities still remain., the gen- 
erator needs to commit to one of the hypotheses, which 
may turn out to be false. This would be even compli- 
cated when a source language mid a target language have 
substantially different linguistic structures. For exam- 
ple, in English, negation comes before a verb, whereas 
Japanese negation comes after a verb, and the verb comes 
at the very end of a sentence. In such case, translation 
cannot be started until the verb, which comes the end of 
the sentence, was processed, and existance of negation 
after the verb is checked. Decision has to be made, for 
this case, to wait translation until these ambiguities are 
resolved by encountering a clause which follows the ini- 
tial clause. Fortunately, most Japanese utterance consist 
of multiple clauses which makes simultaneous interpreta- 
tion possible. In order to cope with these ambiguities, a 
simultaneous interpretation system should have capabili- 
ties such as (1) anticipating the possiblity of negation at 
the end, (2) incorporating some heuristics which recover 
220 4 
false translation to correct one, and (3) making decisions 
on when to start or wait translations. Theories of com- 
mitment in ambiguity resolution and generation are not 
established, yet, thus they are a subject of further investi- 
gations. One possible solution which we are investigating 
is to use probabilistic speed control of marker propaga- 
tion as seem in \[Wu, 1989\] so that the best hypothesis is 
presented first. This would allow the generator to commit 
upon present hypothesis within its local decisions. 
6 Psychological Plausibility 
Psychological studies of sentence production \[Garrett, 
1975\] \[Garrett, 1980\] \[Levelt and Maassen, 1981\] \[Bock, 
1982\] \[Bock, 1987\] and \[Kempen and Huijbers, 1983\] 
were taken into account in designing the model. In 
\[Kempen and Huijbers, 1983\], two independent retrieval 
processes are assumed, one accounting for abstract pre- 
phonologicalitems (L 1-items) and the other for phonolog- 
ical items (L2-items). The lexicalization in their model 
follows: (1) a simultaneous multiple Ll-item retrieval, 
(2) a monitoring process which watches the output of 
Ll-lexicalization to check that it is keeping within con- 
straints upon utterance format, (3) retrieval of L2-items 
after waiting until the Ll-item has been checked by the 
monitor, and all other Ll-items become available. In our 
model, a CCs activation stage corresponds to multiple 
Ll-item retrieval, constraint checks by V-Markers corre- 
spond to the monitoring, and the realization stage which 
concatenates the surface string in a V-Marker corresponds 
to the L2-item retrieval stage. The difference between our 
model and their model is that, in our model, L2-items are 
already incorporated in G-Markers whereas they assume 
L2-items are accessed only after the monitoring. Phe- 
nomenologically, this does not make a significant differ- 
ence because L2-items (phonological realization) in our 
model are not explicitly selected until constraints are met; 
atwhichpointthemonitoringis completed. However, this 
difference may be more explicit in tbe production of sen- 
tences because of the difference in the scheduling of the 
L2-itern retrieval and the monitoring. This is due to the 
fact that our model retains interaction between two levels 
as investigated by \[Bock, 1987\]. Our model also explains 
contradictory observations by \[Bock, 1982\] and \[Levo 
elt and Maassen, 1981\] because activation of CC nodes 
(Ll-iteras) and LEX nodes (L2-items) are separated with 
some interactions. Also, our model is consistent with 
a two-stage model \[Garrett, 1975\] \[Garrett, 1980\]. The 
functior~alandpositionallevels of processing in his model 
correspond to the parallel activation of CCs and CSCs, the 
V-Marker movement which is left to right, and the surface 
string concatenation during that movement. 
Studies of the planning unit in sentence production 
\[Ford and Holmes, 1978\] give additional support to the 
psychological plausibility of our model. They report that 
deep clause instead of surface clause is the unit of sen- 
tence planning. This is consistent to our model which em- 
ploys CSCs, which account for deep propositional units 
and the realization of deep clauses as the basic units of sen- 
tence planning. They also report that people are planning 
the next clause while speaking the current clause. This is 
exactly what our model is performing, and is consistent 
with our observations from transcripts of simultaneous 
interpretation. 
7 Relevant Studies 
Since most machine translation systems assume sequen- 
tial parsing and generation, a simple extension of exist- 
ing systems to combine speech recognition and synthesis 
would not suffice for interpreting telephony. The main 
problem is in previously existing systems' inability to 
attain simultaneous interpretation (whereas partial trans- 
lation is performed while parsing is in progress), because 
in other systems a parser and a generator are indepen- 
dent modules, and the generation process is only invoked 
when the entire parse is completed and full semantic rep- 
resentation is given to the generator. Our model serves 
as an example of approaches counter to the modular ap- 
proach, and attains simultaneous interpretation capabil- 
ity by employing incremental parsing and a generation 
model. Pioneer studies of parallel incremental sentence 
production are seem in \[Kempen and Hoekamp, 1987\] 
\[Kempen, 1987\]. They use a segment grammar which 
is composed of Node-Arc-Node building blocks to attain 
incremental formation of trees. Their studies parallel our 
model in many aspects. The segment grammar is a kind 
of semantic grammar since the arc label of each segment 
makes each segment a syntax/semantic object. Feature 
aggregation and constraint satisfaction by G-Markers and 
V-Markers in our model corresponds to a distributed uni- 
fication \[De Smedt, 1989\] in the segment grammar. \[De 
Smedt, 1990\] reports extensively on their approach to in- 
cremental sentence generation which parallel to our model 
in many aspects. 
8 Current Implementation 
The model of generation described in this paper has 
been implemented as a part of #DMDIALOG, a speech-to- 
speech dialog translation system developed at the Center 
for Machine Translation at Carnegie Mellon University. 
#DMDIALOG is implemented on an IBM RT-PC worksta- 
tion using CMU CommonLisp run on Mach OS. Speech 
input and voice synthesis are done by connected hardware 
systems, currently, we are using Matsushita Institute's 
Japanese speech recognition hardware and DECTalk. 
Figure 4 is an example of how sentences with multi° 
pie clauses are translated simultaneously in giDMDIALOG. 
Although an input is shown as a word sequence, real 
run takes speech inputs and a phoneme sequence is used 
to interface between the speech recognition device and 
the software. Current implementation translates between 
Japanese and English and operates on the conference reg- 
istration domain based on the corpus provided by the ATR 
Interpreting Telephony Research Laboratories. For more 
details of the generation scheme described in this paper, 
refer to \[Kitano, 1990\]. 
Currently, we are designing a version of our model to 
be implemented on massively parallel machines: IXM 
\[Higuchi et. al., 1989\] and SNAP \[Moldovan et. al., 
1989\]. 
9 Conclusion 
We described a parallel incremental model of natural lan- 
guage generation designed for the speech-to-speech di- 
alog translation system ~DMDIALOG. We demonstrated 
that a parallel marker-passing scheme is one desirable way 
of exploring inherent parallelism of sentence production. 
All types of tree expansion are attained, and ability to in- 
5 221 
Input Utterance Translation 
I 
want 
to 
attend 
the 
conference 
because 
I 
am 
interested 
in 
interpreting telephony 
watashi ha (I Role-Agent; This is ellipsed in the actual translation) 
kaigi ni sanka shitai (want to attend the conference) 
toiunoha (because) 
watashi ha (I Role-Agent; This is ellipsed in the actual translation) 
tuuyaku denwa ni kyoumi ga arukara desu (interested in interpreting telephony) 
Figure 4: An Example of Simultaneous Interpretation 
crementally generate complex sentences has been shown. 
It should be noted that, in our model, activations and se- 
lections of syntactic structure and lexical items are treated 
in an uniform mechanism. Psychological plausibility is 
another notable feature of our model since most research 
in natural language generation has not taken into account 
psychological studies. We believe our parallel incremen- 
tal generation model is a promising approach toward the 
development of interpreting telephony where simultane- 
ous interpretation is required. 

References 

\[Bock, 1987\] "Exploring Levels of Processing in Sentence Pro- 
duction," In Kempen, G. (Ed.) Natural Language Genera- 
tion, Nijhoff, 1987 

\[Bock, 198:2\] "Toward a Cognitive Psychology of Syntax: In- 
formation Processing Contributions to Sentence Formula- 
tion," Psycho. Rev., 89, ppl-47, 1982. 

\[De Smedt, 1989\] De Smedt, K., "Distributed Unification in 
Parallel Incremetnal Syntactic Tree Fomlation," In Proceed- 
ings of the Second European Workshop on Natural Langauge 
Generation, 1989. 

\[De Smedt, 1990\] De Smedt, K., Incremental Sentence Gener- 
ation, NICI Technical Report 90o01, Nijmegen Institute for 
Cognition Research and Information Technology, 1990. 

\[Ford and Holmes, 1978\] Ford, M. and Holmes, V., "Planning 
Units and Syntax in Sentence Production," Cognition, 6, 
pp35-53, 1978. 

\[Garrett, 1980\] Garrett, M.F., "Levels of Processing in Sen- 
tence Production," In Butterworth, B. (Ed.) Language Pro~ 
duction (Vol. 1 Speech and Talk), Academic Press, 1980. 

\[Garrett, 1975\] Garrett, M.F., ''The Analysis of Sentence Pro- 
duction," In Bower, G. (Ed.) The Psychology of Learning 
and Motivation, Vol. 9, Academic Press, 1975. 

\[Grosz and Sidner, 1985\] Grosz, B. and Sidner, C., ''The Struc- 
ture of Discourse Structure," CSLI Report No. CSLI-85-39, 
1985. 

\[Higuchi et. al., 1989\] Higuchi, T., Furuya, T., Kusumoto, H., 
Handa, K. and Kokubu~ A., "The Prototype of a Semantic 
Network Machine IXM," In Proceedings of the International 
Conference on Parallel Processing, 1989. 

\[Kempen, t987\] Kempen, G., "A Framework for Incremental 
Syntactic Tree Formation," In Proceedings of the Interna- 
tional Joint Conference onArtiftcial Intelligence (IJCAI-87), 
1987. 

\[Kempen and Hoekamp, 1987\] Kempen, G. and Hoenkamp, 
E., "An Incremental Procedural Grammar for Sentence For- 
mulation," Cognitive Science, 11,201-258, 1987. 

\[Kempen and Huijbers, 1983\] Kempen, G. and Huijbers, P., 
"The Lexicalization Process in Sentence Production and 
Naming: Indirect Election of Words," Cognition, 14, pp185- 
209, 1983. 

\[Kitano, 1990\] Kitano, H., "Parallel Incremental SentencePro- 
duction for a Model of Simultaneous Interpretation," In Dale, 
R. et. al. (Eds.) Current Research in Natural Language Gen- 
eration, Academic Press, 1990. 

\[Kitano, 1989a\] Kitano, H., A Massively Parallel Model of Si- 
multaneous Interpretation: The #DMDIALOO System, Tech- 
nical Report CMU-CMT-89-116, Carnegie Mellon Univer- 
sity, Pittsburgh, 1989. 

\[Kitano, 1989b\] Kitano, H., "Hybrid Parallelism: A Case of 
Speech-to-Speech Dialog Translation," In Proceedings of 
the IJCAI-89 Workshop on Parallel Algorithms for Machine 
Intelligence, 1989. 

\[Kitano et. al., 1989a\] Kitano, H., Tomabechi, H. and Levin, 
L., "Ambiguity Resolution in DMTRANS PLUS," In Proceed- 
ings of the Fourth Conference of the European Chapter of 
the Association for Computational Linguistics, 1989. 

\[Kitano et. al., 1989b\] Kitano, H., Mitamura, T. and Tomita, 
M., "Massively Parallel Parsing in ~liDMDIALOG: Integrated 
Architecture for Parsing Speech Inputs," In Proceedings of 
the International Workshop on Parsing Technologies, 1989. 

\[Levelt and Maassen, 1981\] Levelt, WJ.M. and Maassen, B., 
"Lexical Search and Order of Mention in sentence Produc- 
lion," In Klein, W. and Levelt, WJ.M. (Eds.), Crossing the 
Boundaries in Linguistics: Studies Presented to Manfred 
Bierwisch, Dordrecht, Reidel, 1981. 

\[Moldovan et. al., 1989\] Moldovan, D., Lee, W. and Lin, C., 
SNAP: A Marker-Propagation Architecture for Krwwledge 
Processing, Technical Report CENG 89-10, University of 
Southern California, 1989. 

\[Riesbeck and Marlin, 1985\] Riesbeck, C. and Marlin, C., Di- 
rect Memory Access Parsing, Yale Un&ersity Report 354, 
1985. 

\[Tomabechi, 1987\] Tomabechi, H., "Direct Memory Access 
Translation," Proceedings of lJCAl87, 1987. 

\[Wu, 1989\] Wu, D., "A Probabilisite Apporach to Marker Prop- 
agation" In Proceedings oflJCAI-89, 1989. 
