A Formal Basis for Spoken Language Translation by Analogy 
Keiko Horiguchi 
D21 Laboratory 
6-7-35 Kitashinagawa 
Shinagawa-ku, Tokyo 141 
Japan 
keiko©pdp, crl. sony. co. jp 
Alexander Franz 
Sony Computer Science Laboratory & 
D21 Laboratory 
6-7-35 Kitashinagawa 
Shinagawa-ku, Tokyo 141 
Japan 
amf~csl, sony. co. jp 
Abstract 
Since spoken language is characterized 
by a number of properties that defy in- 
terpretation and translation by purely 
grammar-based techniques, recent in- 
terest has turned to analogical (also 
known as case-based or example-based) 
approaches. In this framework, the 
most important step consists of robustly 
matching the recognized input expression 
with the stored examples. This paper 
presents a probabilistic formalization of 
analogical matching, and describes how 
this model is applied to speech transla- 
tion in the framework of translation by 
analogy. 
1 Introduction 
The la.,~t decade has seen growing interest in the 
example-based framework for translation of written 
and spoken language (Nagao, 1984),(Jones, 1996). 
This approach, sometimes called analogical, case- 
based, or memory-based, originated with the follow- 
ing insight. In the course of translating an expres- 
sion, a skilled human translator often recalls a sim- 
ilar translation that she has performed or studied 
before, and then carries out the new translation by 
analogy to the previous case, instead of applying a 
large number of lexical and grammatical rules in her 
head. In an example-based translation architecture, 
pairs of bilingual expressions are stored in the exam- 
ple database. The source language input expression 
is matched against the source language portion of 
each example pair, and the best matching example 
is selected. The system then returns the target lan- 
guage portion of the best example ms output. This 
is illustrated in Figure 1. 
BHiagusl Example Database 
Figure 1: Example-based Translation Architecture 
I.I Pre-translation 
The example-based approach has certain advantages 
over traditional rule-based approaches to translating 
spoken language. Since an analogical system relies 
on a database of pre-translated example pairs, it re- 
sults in high translation quality. High translation 
quality requires not only that the output be gram- 
matically correct, but also that the output sound 
natural and idiomatic. Spoken utterances consist of 
larger portions of fully-lexicalized or semi-lexicalized 
morpheme sequences, the use of which greatly con- 
tributes to sounding natural and native-like, but 
whose meanings are not totally predicatable from 
their forms (Pawley and Syder, 1983). An analogical 
system can generate natural-sounding output more 
easily than a compositional, rule-based system, be- 
cause it directly uses the correspondences between 
source-language and target-language expressions. 
32 
1.2 Robustness 
Another important requirement for spoken language 
translation is that the system has to be very robust. 
Spoken utterances contain a lot of disfluencies, such 
as pronunciation errors, word selection errors, word 
fragments and repairs. Furthermore, a speech trans- 
lation module also has to handle the errors intro- 
duced by the speech recognition component. In an 
analogical system, the process that matches the in- 
put expression against examples can be very robust, 
and can always return the best matching output ex- 
pressions instead of failing completely. 
1.3 Improving Translation Quality 
An additional requirement of an automatic trans- 
lation system is that it should be possible to im- 
prove the translation quality by expending addi- 
tional effort. In a traditional rule-based system, as 
the knowledge sources (such ms grammar rules, se- 
mantic disambiguation rules, transfer rules, etc.) ex- 
pand in size, there comes a point at which the com- 
plex interrelationships between the different types of 
information precludes any further improvement. In 
an analogical system, it is possible to incrementally 
improve the translation quality by adding more ex- 
amples to the example database, and by effecting 
corresponding improvements in the matching func- 
tion by e.g. refining the thesaurus or re-estimating 
word similarities from an expanded bilingual corpus. 
1.4 The Problem of ScalabiHty 
Unfortunately, the pure analogical approach lacks 
scalability. The effort required to acquire and main- 
tain the example database, the cost of the space re- 
quired to store the examples, and the cost of the 
time required to search the database can become 
prohibitively high, since a pure analogical system 
requires a separate example for every linguistic vari- 
ation. 
2 A Hybrid Analogical Approach 
Since language is productive, a realistic analogical 
system needs to be able to handle linguistic con- 
structions that do not have an exact match in the 
example database. Therefore it is important for a 
system to be able to combine fragments from more 
than one example expression to cover the input ex- 
pression. 
To meet this requirement, we have designed an ar- 
chitecture for robust, practical translation of spoken 
language in limited domains that integrates morpho- 
logical and syntactic linguistic processing with an 
analogical transfer component. The overall system 
is described briefly in this section. 
2.1 System Architecture 
The pipelined system architecture, shown in Fig- 
ure 2, separates speech recognition, morphological 
~i~\]l ~ :- ~--- \] : , i~ogoizer ,j 
\[ Morphological Analyzer \[ ii! 
\[ $ ' h.er I i 
Analogical \] '~:.~'"~ ..... . ~,~:~:~N~i~!: Transfer Module .... ~:~ ::~i~i~i 
Target Language Generator ~ ........... .... ~:::ii~i~i 
~---$ .... ,~"s'p~;'"~', 
',. _~.~9.~_ ~_ _',J 
Figure 2: System Architecture 
analysis, shallow parsing, and recursive analogical 
translation into different modules. This separation 
of general linguistic, domain, and transfer knowledge 
improves portability and scalability of the system. 
2.2 Shallow Source Language Analysis 
The purpose of the shallow analysis component is to 
identify clauses and phrases, to identify modifying 
relations as long as they are unambiguous (deriving 
a canonical interpretation in ambiguous cases), and 
to convert some surface variations into features. 
In our prototype implementation, an adapted ver- 
sion of the JUMAN 3.1 Japanese morphological an- 
alyzer (Kurohashi et al., 1994) is used for Part-of- 
Speech disambiguation, and for dictionary and the- 
sanrus look-up. 
The second step of source language analysis is 
carried out using an augmented context-free gram- 
mar for the NLYAcc parser (Ishii, Ohta, and Saito. 
1994), which is an implementation of the General- 
ized LR parsing parsing algorithm (Tomita, 1985). 
The shallow analysis module returns a shallow 
syntactic parse tree with various lexical and syn- 
tactic features. It is robust enough to tolerate ex- 
tragrammaticalities, disfluencies, and the like in the 
input. 
2.3 Analogical Transfer 
The recursive analogical transfer module matches 
the input shallow syntactic tree against the source 
language portions of example shallow syntactic trees. 
The example data is classified into different lin- 
guistic constituent levels, such as clause-level exam- 
ples, phrase-level examples, and word-level exam- 
pies. The system tries to match the input against 
examples of the largest unit. 
Once the system finds the best matching exam- 
pies of the largest unit, it checks whether there are 
portions that differ significantly between the input 
33 
and the example. If so. the system performs the 
analogical matching process again on the identified 
portion from'the input, using examples of the corre- 
sponding smaller unit. This recursive process con- 
tinues until all parts have been matched. Finally, the 
target language portions of the selected best matches 
are comhined to form the complete target language 
expression. 
The analogical matching step is based on a prob- 
abilistic formalization of matching by analogy. The 
details of this model are described in Section 4. 
2.4 Target Language Generation 
The target language generation module is designed 
to perform a number of linguistic operations, such 
ms enforcing subject-verb agreement, ensuring that 
required definiteness information is present (such as 
English determiners, quantifiers, or possessives), and 
generating the appropriate inflectional morphology. 
In our prototype implementation, we are using the 
PC-KIMMO system for generating English morphol- 
ogy (Ant.worth, 1990). After these operations, the 
shallow syntactic tree is linearized to create an ex- 
pression in the target language. 
2.5 Speech Synthesis 
In the final step, spoken output is generated from the 
target language expression. In our Japanese-Engiish 
prototype, this step is carried out by the DECTALK 
system (HMlalaan, 1996). 
3 Advantages of the Hybrid 
Analogical Approach 
The hybrid approach combines analogical matching 
and transfer with a rule-baaed component that ac- 
counts for one of the fundamental properties of lan- 
guage: Its productiveness. This section describes 
what we perceive to be the main advantages of the 
hybrid analogical approach to speech translation. 
3.1 Modular, Natural Knowledge Sources 
The system architecture separates general linguistic 
knowledge, domain knowledge, and transfer knowl- 
edge. This means it is easier to port it to different 
domains, and to apply it to new languages. 
We also consider the knowledge sources to be 
"natural". By this we mean that, from the point of 
view of knowledge representation, each knowledge 
source captures certain aspects of the translation 
process in its most natural form. For example, the 
example data ba.se captures translation correspon- 
dences in a natural way - by means of corresponding 
natural language expressions in the souce and tar- 
get language, Other, less natural means of knowl- 
edge representation would require significantly more 
effort to acquire and maintain. As a result, it is eas- 
ier to improve the translation quality by adding and 
modifying examples, and by modifying the thesaurus 
(if necessary). 
3.2 Examples vs. Syntactic or Semantic Grammars 
As described above, analogical translation relies on 
a database of example pairs which can encode id- 
iomatic translation correspondences at the lexical. 
phrasal, and clausal manner in a natural way. This 
is an improvement over previous approaches which 
rely on syntactic or semantic grammars. 
For example, the "transfer-driven'" apl)roach 
of (Sobashima et M., 1994) relies on essentially 
syntactically-based analysis and transfer rules that 
are manually annotated with examples by provid- 
ing a sound formal basis for analogical matching. 
This requires an extensive effort to create a body of 
rules that covers all possible expressions, and which 
can handle extra.grammatical or disfluent input. As 
an example of a semantic-grammax based approach. 
"concept-based" translation (Mayfield et al., 1995) 
requires an extensive manual knowledge acquisition 
effort to create detailed, domain and task-specific 
templates and semantic grammars. In addition, a 
heavily semantics-based approach such a.s this work 
suffers from a lack of generality due to the absence 
of linguistic processing. 
3.3 ;Exa,r~ples vs. Interlingua 
The framework of Interlingua-based translation rests 
on the presupposition that there can be a universal, 
unaxnbiguous, language-neutral," and practically (if 
not formally) sound knowledge representation for- 
realism to mediate between source and target lan- 
guages. In practice, defining, maintaining, and ex- 
tending such a formalism for multiple, not closely 
related languages has proved to be a major chal- 
lenge. Analogical speech translation does not rely 
on this presupposition, and instead seeks to capture 
intuitive translation correspondences. 
3.4 Syntactic and Lexical Distance 
In the hybrid analogical approach, the example data 
is categorized by linguistic constituent. For exant- 
pie, there axe translation example pairs at the clause 
level, phrase level, and word level. This yields a 
more efficient search procedure during the match- 
ing process, while only assuming non-controversial 
notions of syntactic constituency. By treating syn- 
tactically similarity and semantic similarity as two 
separate aspects of the matching process, we derive 
an improvement over methods that combine these 
two aspects. For example, (Sato and Nagao, 1990} 
combine a measure of structural similarity with a 
measure of word distance in order to obtain the over- 
all distance measure that is used for matching. 
34 
3.5 Computational Efficiency 
Analogical translation relies on a large database of 
example pairs. This incurs a significant computa- 
tional cost for searching and matching against all 
the examples, which is proportional to the number 
of examples multiplied by the average size of the rep- 
resentations of the examples. (In practice, this cost 
can be mitigated somewhat by clustering and in- 
dexing schemes for the example databa.se.) Hybrid 
analogical translation greatly reduces the number of 
required examples by relying on the generality of 
linguistic rules. 
Pure statistical machine translation (Brown et al., 
1993) mltst in principle recover the most probable 
alignment out of all possible alignments between the 
input and a translation. While this approach is the- 
oretically intriguing, it has yet to be shown to be 
computationally feasible in practice. 
3.6 Linguistic Efficiency 
In addition to computational efficiency, we also con- 
sider a factor that might be called "linguistic effi- 
ciency". We hold that a significant body of system- 
atic linguistic regularities has been identified that 
nlust be accounted for somehow during the process 
of translation. Linguistic efficiency refers to the no- 
tion of how efficient the system is with regard to 
these regularities. 
In hybrid analogical translation, the use of a mor- 
phological and syntactic module for shallow analysis 
to derive a linguistic representation with syntactic 
and lexical features allows us to handle phenomena 
such as inflections, transformations, and language- 
specific phenomena (such as the English determiner 
system and certain Japanese constructions that en- 
code politeness information) in a linguistically effi- 
cient manner. 
3.7 Translation Adequacy 
In order to be able to provide stylistically and prag- 
matically adequate translations of spoken language, 
it is not sufficient to merely ignore or tolerate extra- 
grammaticalities in the input; in many cases, the 
information carried by such phenomena must be 
reflected in the target language output. The hy- 
brid analogical approach is able to model such phe- 
nomena using probabilistic operators, which are ex- 
plained in more detail in the next section. 
4 A Probabilistic Model for 
Analogical Matching 
When applied to spoken language, the central step in 
analogical translation is a robust matching Step that 
compares the output of the speech recognition com- 
ponent with the contents of the example database. 
This section presents the probabilistic model that 
provides a formal basis for this matching step. 
Figure 3: Viewing Input a.~ Distorted Exanlples 
4.1 Notation 
Let I denote the input expression, consisting of a se- 
quence of words along with certain features resulting 
from shallow parsing. Thus, an input expressions 
I consists of a sequence of words iwl, iw2 ..... iw,~. 
and a set of features ifl,if2 ..... ifm. Similarly, let 
the source expression E of an example pair consist 
of ewl, ew2,..., ewp and efb el2 ..... efq. 
4,2 The Noisy Channel Model 
The "noisy-channel" model from information theory 
has proven highly effective in speech recognition and, 
more recently, in language understanding (Epstein et 
al., 1996; Miller et al., 1994). We adopt this model 
for translation by analogy in the following manner. 
Given an input expression, the analogical match- 
ing algorithm must determine the example expres- 
sion that is closest in meaning to the input expres- 
sion. We denote the probability that an example 
expression is appropriate for translating some input 
as the conditional probability of the example, given 
the input: 
(1) P(ExamplelInput ) 
Our aim is to find the example that has the high- 
est conditional probability of being appropriate to 
translate the given input. We denote that exam- 
pie with Emax, where the max function chooses the 
example with the maximum conditional probability: 
(2) Emax = trmXE~Examples\[P(ElI)\] 
Our approach to determining Ernax is as fol- 
lows. First, we can use Bayes' Law to obtain a re- 
expression of the conditional probability that needs 
to be maximized: 
(3) P(E\[I)= P(E)P(IIE) 
P(I) 
Since the input expression, and therefore P(I). re- 
mains constant over different examples, we can dis- 
regard the term P(I) in the denominator. Thus. we 
need to determine Emax which can be defined ms 
follows: 
35 
(4) Emax = maXEEExamples \[P(E)P(I\]E)\] 
The probability distribution over the examples 
P(E) encodes .the prior probability of using the dif- 
ferent examples to translate expressions in the do- 
main. It can be used to penalize certain specialized 
expressions that should be used less frequently. The 
conditional probability distribution is estimated us- 
ing a "'distortion" model of utterances that is de- 
.scribed in the next section. 
4.3 Viewing Input as Distorted Examples 
The conditional probability distribution P(I\[E) is 
modeled as follows. Consider that the speaker 
intends to express an underlying message S, but 
speech errors, certain speech properties, misrecog- 
nitions, and other factors interfere, resulting in the 
actual utterance I, which forms the input to the 
translation system (Figure 3). This is modeled using 
a number "'distortion" operators: 
• echo-word(ewi). This operator simply 
echoes the ith word, ewi, from the example to 
the input. 
• delete-word(ewe). This operator deletes the 
ith word, ewi, from the example. 
• add-word(iwj). This operator adds the jth 
word, iwj, to the input. 
• alter-word(ewi,iwi). This operator alters 
the ith word, ewi, from the example to the 
jth word, iwj, in the input expression. The 
altered word is different, but usually semanti- 
cally somewhat similar. 
• Corresponding operators for features. 
Given these operators, we can view the input I 
as an example E to which a number of distortion 
operators have been applied. Thus, we can represent 
an input expression I as an example plus a set of 
distortion operators: 
(5) I = {E, distortl .... ,distortffi} 
This means that we can re-express the conditional 
probability distribution for an input expression I, 
given that the meaning expressed by example E is 
intended, ms follows: 
(6) P(IIE) = P({distortl,...,distortz}\]E) 
4.4 Operator Independence Assumption 
Two independence assumptions axe required to make 
this model computationally feasible. For the first as- 
sumption, we assume that the individual distortion 
operators are conditionally independent, given the 
example E: 
(7) P(distorttldistortl ..... distortz; E) 
P(distort61E) 
This means that we can make the following sim- 
plification: 
(8) P({distortl ..... distort~}IE) = 
fi P(distort~lE) 
k=t 
Thus, we obtain the following: 
(9} P(IIE ) ~ fl P(distorttlE ) 
k=l 
Considering the individual components of the exam- 
pie E, this leads us to the following. 
(10) P(IIE) = 
z 
H P(dist°rt6iel' e2 ..... ep; ell, ef2 ..... efq) 
6=1 
4.5 Operator Localization Assumption 
For the second assumption, we make the Assumption 
that the individual distortion operators only depend 
on the words and features that they directly involve. 
In effect, we stipulate that the operators only affect 
a strictly local portion of the input. For example. 
we assume that the probability of echoing a word 
depends only on the word itself, so that the following 
holds: 
(11) P(echo-word(ew~)\[ewx ..... p; eh ..... q) 
P(echo-word(ew~)) 
Similarly, we assume that the probability of e.g. 
deleting a feature depends only on the feature itself. 
so that the following holds: 
(12) P(delete-feature(ef~)\[ewz ..... p;efz ..... q) 
P(delete- feature(eli )) 
This yields the following approximation: 
(13) P(I\[E) ~, 
z 
H P(distort-word6 (ewi, iwj )) 
t=l 
Y 
H P(distort-featuret (efi, ifj)) 
1=1 
36 
4.6 Computing the Match 
Given an input I and an expression E, it is straight- 
• forward io determine the isrobability of the feature 
distortion, since the features are indexed by name: 
Y 
(14) II P(distort-featuret(efi, ifj)) 
/=1 
In order to deternfine the probability of the word 
distortions, we must find the most probable set of 
distortion operators. Given an input and an exam- 
ple. there are many different sets of distortion oper- 
ators that could relate the two. Of course, we are 
interested in the most straightforward relation be- 
tween the two, which corresponds to the least cost or 
highest probability. To further complicate matters, 
there may not be a single unique set of distortion op- 
erators with a unique minimum cost (corresponding 
to a unique maximum probability); instead, there 
may be a number of distortion sets that all share 
the same minimal cost (and maximal probability). 
In this case, we are content to chose one of the min- 
imal cost sets at random. This set is defined as fol- 
lows: 
(15) Distortmax = rnAxOi,tor t \[P(DistortiE, I)\] 
We solve this problem with a dynamic program- 
ming algorithm that finds a set of distortion oper- 
ators with maximal probability. First, to obtain a 
distance me&sure, we take the negative logarithm of 
this expression: 
(16) -logP(DistortlE, I) 
Given that we have assumed independence be- 
tween individual distortion operators above, this can 
be simplified as follows: 
no. of operators 
(17) -log H P(distortkl E,I) 
k=l 
We have also assumed that the distortion oper, 
ators are independent of the part of the sentence 
that does not directly involve them. Thus, we can 
simplify further as follows: 
~g 
(18) -log H P(distort~jewi,iwi) 
/c=l 
This can be further split into the individual dis- 
tortion operktors: 
(19) ~ -logP(distort~(ewi,iwi) 
k=l 
This corresponds directly to the individual costs that 
we use for the dynamic programming equation. Let 
the example expression be E = et,e2,...,ep and 
and the input expression be I = it,J2 ..... i,~. Then, 
let D(p, n) be the distance between the example and 
the input. This distance is defined by the following 
recurrence: 
f D (p-t,n-t),logP(echo(ewp)) 
J D(p,n-1) -logP(add(iw~)) D(p,n)=min ~ D(p-l,n) -logP(delete(ew~)) 
I, D(p-l,n-1)-logP(alter( ewp. iw~ )} 
The result of this is the optimal alignment be- 
tween the input and the example, as well as the min- 
imum distance between them. The matcher selects 
the example with the smallest distance to the input, 
and assembles the target language portions of the se- 
lected example pairs to form a complete translation 
in the target language. 
5 Probability Estimation 
The method for speech translation by analogy de- 
scribed in this paper was designed to overcome the 
manual knowledge acquisition bottleneck by relying 
on techniques from symbolic and statistical machine 
learning, while still allowing the kind of manual tun- 
lag that is necessary to produce high-quality trans- 
lations. 
5.1 Prior Probability Distribution 
The prior probability distribution over the exam- 
pie database P(Examples) is used to penalize highly 
specialized example pairs that should be used less of- 
ten. After an initial distribution is estimated, these 
probabilities can be adjusted to solve translation 
problems due to idiosyncratic exRmples. 
5.2 Alteration Probability Distribution 
The two distortion operators alter-word and alter- 
feature perform the function of matching semanti- 
cally similar words or feature values. If a monolin- 
gual or bilingual corpus from the application domain 
is available, these probability distributions can be 
estimated using iterative methods. If neither type 
of corpus is available, the probabilities can be esti- 
mated with the aid of a manually-constructed the- 
saurus. 
5.3 Thesaurus-based Estimation 
A thesaurus is a semantic IA-A hierarchy whose 
nodes are semantic categories, and whose leaves are 
words. The traditional method of estimating word 
similarity, based on counting IS-A links, presupposes 
that every link encodes equal semantic distance - 
but in practice, this is never the case (Resnick, 
1995). Thus, we adopt a new method for judging 
semantic distance between two words. If appropri- 
ate distributional information for words is available. 
then the semantic similarity of two words could be 
estimated from the entropy of their lowest common 
dominating node lcdn: 
37 
Entropy(Root Node) 
(20) Similarity(ewi, iwj) ,~ Entropy(lcdn) 
In the absence Of distributional information, the en- 
tropy of a node depends only on the number of words 
that the node dominates. 
5.4 Other Distortion Probabilities 
The probability distributions for adding and deleting 
words and features can also be estimated from cor- 
pora. if available. Since there are very few Japanese 
spoken language corpora available, we are currently 
adopting a word-class based model for the remain- 
ing distributions that uses the categories for "strong 
content words" (nouns and verbs), "light content 
words" (adjectives and some adverbs), "grammatical 
fimction words" (e.g. particles and conjunctions), 
and "modifiers and adjuncts", In addition, it is pos- 
sible to a.~sign specific lexical penalties to individual 
words. 
5.5 Learning Example Pairs 
Since the contents of the database is central to 
achieving high-quality translations, it is usually nec- 
essary to adjust it manually in response to errors in 
the translation. At the same time, since the exam- 
ple databa,~e must be adapted for every new domain, 
it is important to minimize the amount of manual 
effort. For this reason, the example database was de- 
signed in such a way that it is possible to acquire new 
examples by a semi-automatic method consisting of 
an automatic extraction step from a bilingual corpus 
(see. for example, (Watanabe, 1993)), followed by a 
manual filtering and refinement step. 
6 Conclusions and Further Work 
Ba.~ed on the probabilistic model of analogical 
matching, we have implemented a prototype that 
translates from Japanese to English in a limited do- 
main. The application domain for the prototype are 
expressions for traveling in a foreign country, such as 
expressions related to making reservations or dining 
in a restaurant. 
The initial results from our prototype are very 
promising, but extension of system coverage and 
subsequent large-scale evaluation is needed. Addi- 
tional topics for further work include the estimation 
of lexical probabilities from bilingual corpora; im- 
proving the integration between the speech recogni- 
tion and the translation components in order to in- 
crease recognition accuracy and translation robust- 
ness; and extending the system to additional lan- 
guages. 

References 
Antworth. Evan L. 1990. PC.KIMMO: a two-le~,el 
processor for morphological analysis. Number 16 
in Occasional Publications in Academic Cotnpltt- 
ing. Summer Institute of Linguistics, Dalla.q.TX. 
Brown, Peter, Stephen A. Delia Pietra. Vincent 
J. Della Pietra. and Robert L. Mercer. 1993. The 
mathematics of statistical machine transalation: 
Parameter estimation. Computational Linguis- 
tics, 19(2):263-312. 
Epstein, M., K. Papieni. S. Roukos, T. Ward, and 
S. Della Pietra. 1996. Statistical natural lan- 
guage understanding using hidden clunq)ings. In 
ICASSP-96, pages 176-179, Atlanta. GA. 
Furuse, Osame and Hitoshi Iida. 1992. An example- 
based method for transfer-driven machine trans- 
lation. In Proceedings for the Fourth Interna- 
tional Conference on Theoretical and Method- 
ological Issues in Machine Translation (TMI-92). 
pages 139--150, Montreal, Canada, June. 
Furuse, Osamu and Hitoshi Iida. 1996. Incremen- 
tal translation utilizing constituent boundary pat- 
terns. In COLING.96, pages 412-417, Copen- 
hagen, Denmark. 
Hallo&an, W. 1996. DECtalk software: Text-to- 
speech technology and implementation. Digital 
Technical Journal, 7(4), March. 
Ishii, M., K. Ohta, and H. Salto. 1994. An effi- 
cient parser generator for naturM language. In 
COLING-9~, pages 417-420, Kyoto, Japan. 
Jones, Daniel. 1996. Analogical Natural Language 
Processing. UCL Press, London. 
Kurohashi, S., T. Na.kamura, Y. Matsumoto. and 
M. Nagao. 1994. Improvements of Japanese mor- 
phological analyzer JUMAN. In Proceedings of 
the International Workshop on Sharable Natural 
Language Resources, pages 417-420, Nora, .lapan. 
Mayfield, L., M. Gavalda~ W. Ward, and A. Waibel. 
1995. Concept-based speech translation. In 
ICASSP-95, pages 97-100, Detroit, MI. 
McKevitt, Paul, editor. 1994. Workshop on Integra. 
tion of Natural Language and Speech Processing. 
Seattle, WA. AAAI-94. 
Miller, S., R. Bobrow, R. Ingria, and R. Schwartz. 
1994. Hidden understanding models of natural 
language. In A CL-3~, pages 25-32, La.s Cruces. 
NM. 
Morimoto, Tsuyoshi, Masami Suzuki, Toshiyuki 
Takezawa, Gen'ichiro Kikui, Massaki Nagata, and 
Mutsuko Tomokiyo. 1992. A spoken language 
translation system: Sl-trans2. In Proceedings of 
the fifteenth International Conference on Compu- 
tational Linguistics, volume 2, pages 1048-1052. 
Nantes. 
Nagao, M. 1984. A framework of a Machine Trans- 
lation between .lapanese and English by analogy 
principle. In A. Elithorn and R. Banerji, editors, 
Artificial and Human Intelligence, pages 173-180. 
North-Holland. 
Pawley. Andrew and Frances Hodgetts Syder. 1983. 
Two puzzles for linguistic theory: nativelike selec- 
tion and nativelike fluency. In Jack C. Richards 
and Richard W. Schmidt. editors, Language and 
Communication, pages 191-227. Longman. 
Price, Patti. 1994. Combining linguistic with sta- 
tistical methods in automatic speech understand- 
ing. In Proceeding., off the work,ghop-The Balanc- 
ing Act: Combining ,symbolic and stati,~tical ap- 
prnache., to language, pages 76-83, Las Cruces, 
New Mexico. 
R ayner, M., D. Carter, P. Price, and B. Lyberg. 
1994. Estimating performance of pipelined spo- 
ken language translation systems. In IC8LP-94, 
pages 1251-1254, Yokohama, Japan. 
R.esnick, P. 1995. Using information content to 
evaluate semantic similarity in a taxonomy. In 
IJCAI.95. 
Ringger, Eric K. and James F. Allen. 1996. A fer- 
tility channel model for post-correction of con- 
tinuous speech recognition. In \[CSLP-96, pages 
897-900, Philadelphia, PA. 
Sato, S. and IV\[. Nagao. 1990. Towards memory° 
based translation. In COLING-90, pages 247- 
252, Helsinki, Finland. 
Shirotsuka, O. and K. Murakami. 1994. An 
example-b&~ed approach to semantic information 
extraction from Japanese spontaneous speech. In 
\[CSLP-94, pages 91-94, Yokohama, Japan. 
Sobs.shims. Y., O. Furuse, S. Akamine, J. Kawal, 
and H. Iida. 1994. A bidirectional, transfer- 
driven machine translation system for spoken di- 
alogues. In COLING-94, pages 64-68, Kyoto, 
Japan. 
Stemberger, Joseph P. 1982. Syntactic errors 
in speech. Journal of Psycholingu£~tic Rsearch, 
11(4):313-333. 
Tomita, Masaru. 1985. Efficient Parsing ffor Natural 
Language. Kluwer Academic Publishing, Boston, 
MA. 
Watanabe, H. 1992. Similarity-driven transfer sys- 
tem. In COLING-9~, pages 770--776. 
Watanabe, Hideo. 1993. A method for extracting 
translation patterns from translation examples. 
In TMI-93, pages 292-301. 
