CONSTRUCTION OF 
CORPUS-BASED SYNTACTIC RULES FOR 
ACCURATE SPEECH RECOGNITION 
JUNKO HOSAKA TOSHIYUKI TAKEZAWA 
ATR Interpreting Telephony Research Laboratories 
Hikaridai 2-2, Seika-cho, Soraku-gun 
Kyoto 619-02, Japan 
hosaka@at r-la.at r.co.jp 
takezawaQat r-la.~t r.co.j p 
Abstract 
This paper describes the syntactic rules which are 
applied in the Japanese speech recognition module of 
a speech-to-speech translation system. Japanese is 
considered to be a free word/phrase order language. 
Since syntactic rules are applied as constraints to re- 
duce the search space in speech recognition, apply- 
ing rules which take into account all possible phrase 
orders can have almost the same effect as using no 
constraints. Instead, we take into consideration the 
recognition weaknesses of certain syntactic categories 
and treat them precisely, so that a miuimal num- 
ber of rules can work most effectively. In this paper 
we first examine which syntactic categories are eas- 
ily misrecognized. Second, we consult our dialogue 
corpus, in order to provide the rules with great gen- 
erality. Based ou both stndies, we refine the rules. Fi- 
nally, we verify the validity of the refinement through 
speech recognition experiments. 
1 Introduction 
We are developing the Spoken 
Language Tl~ANSlation system (SL-TRANS)\[1\], in 
which both speech recognition processing and natural 
language processing arc integrated. Currently we are 
studying automatic speech translation from Japanese 
into English in the domain of dialogues with the re 
ception service of an international conference office. 
In this framework we are constructing syntactic rules 
for recognition of Japanese speech. 
In speech recognition, the most significant concern 
is raising the recognition accuracy. For that pur- 
pose, applying linguistic information turns out to be 
promising. Various approaches have been taken, such 
as using stochastic models\[2\], syntactic rules\[3\], se- 
mantic information\[4\] and discourse plans\[5\]. Among 
stochastic models, the bigram and trigram succeeded 
in achieving a high recognition accuracy in languages 
that have a strong tendency toward a standard word 
order, such as English. On the contrary, Japanese 
belongs to free word order languages\[6\]. For such 
a language, semantic information is more adequate 
a.s a constraint. However, building semantic con- 
straints for a large vocabulary needs a tremendous 
amount of data. Currently, our data consist of 
dialogues between the conference registration office 
and prospective conference participants with approx- 
imately 199,000 words in telephone conversations and 
approximately 72,000 words in keyboard conversa- 
tions. But our data are still not sufficient to build 
appropriate semantic constraints for sentences with 
700 distinct words. Processing a discourse plan re- 
quires excessive calculation and the study of discourse 
itself must be further developed to be applicable to 
speech recognition. On the other hand, syntax has 
been studied in more detail and makes increasing the 
vocabulary easier. 
As we are working on spoken language, we try to re- 
flect real language usage. For this purpose, a stochas- 
tic approach beyond trigrams, namely stochastic sen- 
tence parsing\[7\], seems most promising. Ideally, syn- 
tactic rules should be generated automatically from 
a large dialogue corpus and probabilities should also 
be automatically assigned to each node. But to do 
so, we need underlying rules. Moreover, coping with 
phoneme perplexity, which is crucial to speech recog- 
nition, with rules created frmn a dialogue corpus, re- 
quires additional research\[8\]. 
In this paper we propose taking into account tile 
weaknesses of the speech recogniton system in the 
earliest stage, namely when we construct underlying 
syntactic rules. First, we examined the speech recog- 
nition results to determine which Syntactic categories 
tend to be recognized erroneously. Second, we uti- 
lized our dialogue corpus\[9\] to support the refinement 
of rules concerning those categories. As examples, we 
discuss formal nouns 1 and conjunctive postposi~ions 2. 
Finally, we carried out a speech recognition experi- 
ment with the refined rules to verify the validity of 
our approach. 
1 Formal noun~ : keishiki-meishi in Japanese. 
Conjunctive postpositions : setsuzoku-joshi in Japanese. 
AcrEs DE COLING-92, NAr~TES, 23-28 AOt'~q" 1992 8 0 6 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 
2 Issues in HMM-LR Speech 
Recognition 
in the Japanese speech recognition module of our ex- 
perimental system the combination of generalized I,R 
parsing and fIidden Markov Model (IIMM) is realized 
~s IIMM-LR \[10\]. The system predicts phonetnes by 
using an LR parsing table and drives IIMM phoneme 
verifiers to detect/verify them without any interven- 
ing structure, such as a phoneme lattice. 
The speech recognition unit is a Japanese bun- 
selsu, which roughly corresponds to a phrase and is 
the next largest unit after the word. The ending of 
the bunselsu (phrase) is usually marked by a breath 
point. This justities its treatment as a distinct unit. 
A Japanese phrase consists of one independent word 
(e.g. noun, adverb, verb) and zero, one or more than 
one dependent words (e.g. postposition, auxiliary 
verb). The nmnber of words in a phreLse ranges from 
1 to 14, and the mean number is about 3, according 
to our dialogue corpus. 
We will clarify the weaknesses of HMM-Llt speech 
recognition both in phrases and in sentences. 
2.1 Phrase Recognition Errors 
We examined which syntactic categories tmtd to be 
erroneously recognized, when using IIMM-LR pltraae 
speech recognition. For this purl)ose , we applied 
syntactic rules containing no constraints on word 
sequences s. This me,ms tllat any word can follow 
any word. 
Examples (1) and (2) show the resnlts of IIMM-LH 
Japanese speech recognition "l. The nttered phoneme 
strings are enclosed in I I. 
(i) Isochirawal (this, that) 
> I : sochira-wa 
2 : sochira-wa-hu 
3 : sochira-ha-wa 
4 : sochira-hu-wa-hu 
5 : sochira-wa-hu-hu 
(2) laringatougozaimasul (thank you) 
............................................ 
i : ari~nga-to-wa-eN-hu-su-su-su 
2 : ari-nga-to-wa-eN-hu--su-su 
3 : ari-nga-to-wa-eN-hu~su-su-u 
4: ari-nga-to-wa-eN-su-su 
S : ari-nga-to-wa-eN-hu-,~u-su-su-a 
3Japttnese verbs, adjectives, etc. itl-e always inllected whett 
llsed. In syntactic lades colit~llillg 11o word sequence con- 
straints, hfllected verbs, inflected adjectives, ctc. m-c consid- 
ered to be words, 
4The nlaxhna\[ mnount of whole beam width, the global 
beam width, is set for 16 attd the xne~ximal beam width of 
each brmach, the local beam width, 10. 
In the examples, the symbols >, -, ng and N have 
special meaning: 
A correctly recognized plmme is nmrked with >. 
® A word boundary is marked with -. 
A nasalized /g/is transcribed ng. 
* A syllabic nasal is transcribed N. 
In (1), after recognizing the tirst word, the sys- 
tem selected subsequent words solely to produce a 
phoneme string similar to the original utterance. 
(2) is an example of phrase recognition which failed. 
In this example tou was erroneously recognized as to. 
Suhsequently, no fllrther correet words were selected. 
Examples (1) and (2) both show that IIMM-LR 
tends to select words consisting of extremely few 
phonemes when it fails in word recognition. To 
avoid this problem, precise rules should be writ- 
ten fin' sequences of words with small nnmbers of 
phonemes. In Japmmse, postpositions(e.g, ga, o, nit, 
wh-pronouiis(e.g, itsu, nani, claret\[Ill, numerals(e.g. 
ichi, hi, san) and certain nouns(e.g, kata, mono) par- 
ticularly tit this description. 
2.2 Sentence il.ecognition Errors 
To exanfine the error tendency of sentence speech 
recognition we applied a two-step method\[12\]. First, 
we applied phra~e rules to the ItMM-LR speech 
recognition s. Second, we applied phrase-ba-sed sen- 
tence rules tt, the phrase candidates as a post-filter, 
in order to obtain sentence candidates, while filter- 
ins out unacceptable candidates. We experimented 
with the 353 phrases making up 1:/7 sentences. The 
recognition rate ff)r the top candidates wins 68.3 % by 
exact string tnatching, and for the top 5 candidates 
95.5 %. 
Based on the top 5 phr~me candidates, we con- 
dncted a ;;entente experiment, ht this experiment 
we applied loosely constrained sentence rules. With 
these rules, altproxinnttely 80 % of all the pos- 
sibh', combinations of phrase candidates were re-. 
cepted. Following are examples which did not exactly 
match the uttered sentences a . Notice that misrecog- 
nized words consist of a relatively small number of 
phoneluesj gig }ve have seen iil section 2.1. 
(3) lkaingi~ni moubhiko-mi-tai-no-desu-nga \[ 
(rl ~ould like go !egister for 
the conference. ) 
as: kaingi-ni moushJko~mi-tai-N-desu-nga 
3b: kaingi.-ni moushiko-mi-gai-no-desu-ka 
(4) Ikochira-wa kaingizimukyoku-desul 
5'fhe global beam width is set fin" 100 and tile local beam 
width 10. 
~Since the phr~e candidates *tlv obtaiued by the I1MM-LIt 
speech recognitiolt, word botmdatie~ m'e Mready marked by -. 
AcrEs DE COLINGo92. NANTES. 23-28 ^ot'n 1992 8 0 7 I'r~oc. OF COLINGO2, NANTES. AUG. 23-28. 1992 
(This is the conference office.) 
............................................ 
4a: kata-wa kaingizimukyoku-desu 
............................................ 
(5) \[doumo aringat-ou-gozaima-shi-tal 
(Thank you very much.) 
............................................ 
5a: go-o aringat-ou-gozaima-shi-ta 
5b: go-me aringat-ou-gozaima-shi-ta 
5c: mono aringat-ou-gozaima-shi-ta 
............................................ 
(6) \[gozyuusho-to onamae-o onengai-shi-masu\[ 
(Can I have your name and address?) 
............................................ 
6a: gozyuusho-to onamae-o 
onengai-.sh±-masu-shi 
Though the phoneme string in 3a is different from 
the uttered phoneme string, the difference between 
no and N in meaning is minor, and has no effect on 
translation with the current technique. While (3) is 
affirmative, 3b is interrogative, which is indicated by 
the sentence final postposition ka. This cannot be 
treated with sentence rules. To haudle this problem, 
we need dialogue management. 
The uttered phrase kochira-~a in (4), meaning 
"this," was recognized erroneously as kat.a-wa in 4a, 
meaning "person." The word kata belongs to the 
formal noun group, a kind of noun which should be 
modified by a verbal phrase \[13\]. Sentence 4a is ac- 
ceptable, if modified by a verbal phrase, as in 4a': 
4a': midori-no seihukn-o kiteiru kata-wa 
kaigizimukyoku-desu 
(The person who is wearing a green uniform is 
\[with\] the conference office.) 
This is also true of the phrase mono in 5c meaning 
"thing," which was erroneously recognized instead of 
doumo meaning "very much": 
5c': kouka-na mono aringat-ou-gozaima-shi-ta 
(Thank you for the expensive thing.) 
In sentence candidates 5a and 5b, the numeral go, 
meaning "five," is used. These sentences may seem 
strange at first glance, but in a situation such as play- 
ing cards, these sentences are quite natural. If some- 
one plays a 5 when you need one, you would say: 
"Thanks for the five." Similarly, when you need a 3 
and a 5, and someone plays a 3 and after that some- 
one else plays a 5, you would say: "Thanks for the 
five, too." 
In the sentence candidate 6a, the conjunetive- 
poslposilion (conj-pp) shi is used sentence finally. In 
principle, a conj~pp combines two sentences, function- 
ing like a conjunction, such as "while" and "though," 
and is used in the middle of a sentence. 
Erroneous sentence recognition such as in the case 
of 3a-b cannot be treated by sentence rules. There- 
fore, we are trying to cope with erroneous recognition, 
as seen in sentence candidates 4a, 5a-c and 6a, with 
sentence rules. 
3 Dealing with Speech Recog- 
nition Errors 
We are going to deal with sentences containing tile 
following phrases: 
• Phrases with formal nouns 
• Phrases with numerals 
• Phrases with conj-pps used in the sentence final 
position 
In order to decide how to cope with the above 
problems, we used our dialogue corpus. Currently 
we have 177 keyboard conversations consisting of ap- 
proximately 72,000 words and 181 telephone conver- 
sations consisting of approxilnately 199,000 words 7. 
We regard keyboard conversations as representing 
written Japanese and telephone conversations as rep- 
resenting spoken Japanese. When retrieving the dia- 
logue corpus, we always compare written and spoken 
Japanese, in order to clarify the features of the latter. 
We examined the actuM usage of formal nouns as well 
as that of eonj-pps. 
3.1 Formal Nouns 
We examined the behavior of formal nouns, such as 
koto and mono. Formal nouns are considered to be 
a kind of noun which lacks the content usually found 
in common nouns such as "sky" or "apple." They 
function similarly to relative pronouns and therefore 
are used with a verbal modifier\[13\], as in examples 7 
and 8: 
7 : kinou ilia koto~wa torikeshitai. 
(I would like to take back what I said yesterday.) 
8 : nedan-ga takai mono-ga shitsu-ga ii wakede- 
wanai. 
(It is not always true that an expensive thing has 
good quality.) 
In examples 7 and S, the formal nouns, kolo and 
mono, are modified by kinou ilia (yesterday said) and 
nedan-ga takai (price expensive), respectively. But 
it is also true that these nouns behave like common 
nouns and can be used without any verbal modifier, 
as in examples 9 and 10: 
9 : sore-wa koto desu ne. 
7The dialogue corpus is ?.rowing constantly. When we re- 
trieved formM nouns, we had 113 keyboard conversations and 
96 telephone conversations. 
ACTES DE COLING-92, NANa~2S, 23-28 AOUT 1992 8 0 8 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 
(It is a grave matter.) 
10 : mono-wa ta.shika-da. 
(This stuff is trustworthy.) 
Considering the examples 7-10, we coukl define two 
kinds of usage for formal nouns. This distinction is 
applicable to sentence analysis, but is meaningless 
from the standpoint of applying syntactic rules ms 
constraints. 
3.1.1 Formal Nouns in the Corpus 
Ill our dialogue corpus, koto, mono, hou and kata are 
tile most frequently used formal nouns. Table 1 shows 
how often tile formal nouns are used with a verbal 
modifier. We have also rctrieved formal nouns used 
in the sentence initial position, w~ in example 10. 
Table 1: Formal Nouus 
Keyboard \] Telepho~ 
With Verb. Mod. 
Without Verb. Mod. 
Sent. Initial 
Total 
214 3 
0 
,~72 po I~ 
Table 1 indicates that tile coverage reaches 63 % in 
written Japanese, when we allow only formal nouns 
preceded by a verbal modifier in the syntactic rules. 
llowever, the coverage remains at 40 %, which is less 
than half, in the spoken Japanese we are dealing with. 
We have further examined those sentences in which 
fortnal nouns are not modified by verbals. Most of 
them are modified by phrases consisting of a noun and 
postposition no, which approximately corresponds to 
"of." Further, some are modified by phrases cousist~ 
ing of a verb tbllowed by postpositions to and no. 
Others are moditled by words which cars be used ex- 
clusively ,as nominal modifiers such as donna (what 
kind of) and sono (that). We found only one exam- 
pie in the keyboard conversation in which a fortnal 
noun is not modified at all: 
11 : osorakn kyouju-ni koto-no shidm-o tsutaeru 
koto-ga ii-to omoim~su. 
(it might be good if you tell the professor how the 
tiring is going.) 
In our diMogue corpus we found 2,491 phrases con 
taining the formal nouns kolo, mono, hou and kala. 
Out of 2,491 examples, there is only one which is not 
modified at all. If we define formal lsouns ,~s those 
which are always modilied in some manner, i.e. even 
if we do not allow formal nouns to be used alone, the 
coverage still exceeds 99 %. Since the occurrence rate 
of formal nmms without ally moditier is very low, we 
can treat the usage of formal nouns (as in examples 
9-11) as semi-frozen expressions. 
3.2 Conjunctive Postpositions 
Japanese pc,stpositions such m~ 9 a, o and hi, which 
function a.s case markers, are usually attached to 
nominals. Different from this kind of postposition, 
conj-pps such a~s ga, te and ba are used after verbMs. 
Conj-pps combine two clauses, fimctiouing similarly 
to conjunctions such as "because" and "whilc," and 
are thus often used in the middle of a sentence, as in 
example 12. But they cars also be used in the sen- 
tence final position, ,as ill exmnple 13. 
12: kaigi-ni mousikomi-tai-no-desu-ga, 
DttT 
mousikomiyousi-o ookurikudasai. 
AKK 
\]\]ecanse I would like to apply for the conference, 
plee, se send me a registration form. 
13: kaigi-ni mousikomi-tui-uo-desu-ga. 
DA\]' 
1 would like to apply for the conference, ... 
Example 13 sounds vague, if uttered in isolation. 
There should follow some additional words to express 
the complete meaning. Sentences finishing with a 
eonj-pp leave the interpretation to tile hearer. And, 
in general, the hearer can correctly interpret the sen- 
tence from the context. Understanding conj-pps, 
therefore, plays an important role in treating spoken 
Japanese. 
3.2.1 Sentence Final Conj-pps in the Corpus 
In the dialogue corpus the following conj-pps are used: 
ga (beeanse, while), node and udc (because), te 
aud~ (and), k.r~ 0 ......... fret), k'~,'~.,l ...... k~,'edo, 
kedo and kedomo (though, but), shi (and, and then), 
....... de (because), tara (if), to (if, when), ba (if) and 
nagara (while). 
Table 2 shows conj-pps used sentence finally. 
According to Table 2, the conj-pp ga is the one 
most used in keyboard conversations. While the us- 
age of conj-pps in keyboard conversations is heavily 
concentrated on ga with all occurrence rate of 85%, 
it is more balanced m telephone conversations. In 
addition to ga (38%), kcredomo (30%) and conj-pps 
which carry a similar meaning such as kercdo, kedo 
and kedomo are frequently used. In telephone conver- 
sations, node (13%) is also frequently nsed. Treating 
only the six conj-pps in sentence final position, the 
coverage reaches 91% for Sl)oken Japanese. l)itt~ren~ 
tiatmg conj-pps which can Ire used in sentence final 
position i?om those which can be used only in the 
middle of a sentence is also supported by the speech 
recognition results\[14\]. The conj-pps shi and cha are 
especially subject to erroneous recognition. 
Acrgs DF. COLING-92, NAbrLT.S, 23-28 Ao£rr 1992 8 0 9 PROC. OF COLING-92, NANTES, AUo. 23-28, 1992 
Table 2: Sentence Final Conj-pps 
\[_ Keyboard Telephone 
Conj-pp \]Frequency\] % Frequency\] % 
9a 197 
node 11 
nde 0 
le 8 
de 0 
kara 6 
keredomo 5 
keredo 1 
kedo 1 
kedomo 0 
shi 2 
monode l 
lara 0 
1o 0 
ba 0 
nagara 0 
85 274 38 
5 96 13 
0 5 1 
3 23 3 
0 1 0 
3 14 2 
2 212 30 
o 18 -2T 
0 12 2 
9 37 5 
1 l0 1 
0 0 0 
0 5 1 
0 2 0 
0 2 0 
0 1 0 
3.3 Syntactic Rules for Speech Recog- 
nition 
Based on the corpus retrieval we decided to deal with 
formal nouns and conj-pps as described below. AIM 
we decided to treat numerals only in a restricted en-- 
vironment, because they are significant noise factors 
in speech rccognitionS: 
• Phrases with formal nouns nmst be modified. 
• Phrases with numerals can be used only ill 
certain environments. Numerals are allowed 
in addresses, telephone numbers, dates aim 
prices. Japanese nnlYlera\]s consist of all ex- 
tremely small number of phonemes, e.g. ichi, 
hi, san (1, 2, 3) and are therefore especially easy 
to misrecognize 9. "\['bus, they should be strongly 
constrained. The domain we have chosen is lim- 
ited to dialogues between all international con- 
ferenee receptkmist and prospcctive participants 
and we are going to deal only with tile antic- 
ipated usage in the domain. Another condi- 
tion, sue\]l as playing cards, will be treated when 
speech recognition is further improved. 
• We classify conj-pps into two groups: conj-pps 
which call be used in the sentence final position 
as well as in the milldlc of a sentence, and conj- 
pps which can be used only ill the nfiddle of a 
seutence. 
We refined the loosely constrained syntactic rules 
introduced ill section 2.2. ill the new version of the 
sentence rules, formal nouns, numerals and eonjq)ps 
are more precisely treated. Ill the following, we ex 
plain the rules for formal nouns and conj-llpS. 
SSee Figure 2. 
9Nmnbers greater than ten e.re in principle the combination 
of basic numbers. 
'File format for syutactic rules is as follows: 
(<CATI> <--> (<CAT2> <CAT3>)) 
Nonterufinals are surrounded by <>10. The above 
rule indicates that CATI consists of CAT2 and CAT3. 
To make tile distinction between phrase categories 
which are terminals ill phrase-based sentence rules 
and those which are not, we will write tile former all 
in lower-case. 
Ill the process of sentence construction, phrases 
containing a formal noun np-formal are treated ms 
fotlowsn : 
(<M-NN> <--> (<NN>)) 
(<M-NN> <--> (<MOD-N> <NN-FORM>)) 
(<M-NN> <-~> (<MOD-N> <H-NN>)) 
(<NN> <~--> (<np>)) 
(<NN-FORM> <--> (<np-formal>)) 
'\]?\]le above rules say that noun phrases M-NN call, 
m principle, be modified by some modifier MOD-K In 
tile case of a common noun NN, tile phrase can be 
lnodified but need not be. But in the case of a formal 
noun IqN-FOKK file phrase must be modified. 
Phrases with a conj-pp which is exclusively used in 
tile middle of a sentence vaux-s, those with a eonj- 
pp which is used both ill the middle of a sentence 
and in tile sentence final vaux-s+~, and verb phrases 
without any eonj-pps vaux, are treated as follows: 
(<SS> <--> (<NVS>)) 
(<NVS> <--> (<VS>)) 
(<VS> <--> (<VC>)) 
(<VS> <--> (<ADVPH> <VS>)) 
<ADVPH> <--> (<ADV-s>)) 
(<ADV-s> <--> (<ADVI>)) 
(<ADV-s> <--> (<ADVI> <ADV-s>)) 
<ADVP}I> <--> (<ADV-c>)) 
(<ADV~c> <--> (<VADVS>)) 
(<VADVS> <--> (<VADV>)) 
(<VADVS> <--> (<ADV-s> <VADVS>)) 
(<VADV> <--> (<vaux-s>)) 
(<VADV> <--> (<vaux-s+f>)) 
(<VC> <--> (<vaux>)) 
(<VC> <--> (<vaux-s+f>)) 
A sentence SS does uot always need a noun phrase. 
A sentence SS can consist of only one verb phrase VC, 
or call be preceded by adverbial pfir,~ses ADVPH. A sen- 
tence SS can end either with a verb phrase without 
a conj-pps vaux or with a verb phrase with a cer- 
tain kind of conj-pps vaux-s+:f. An adverbial phrase 
ADVPH can consist of only adverbs ADVI and Call also 
consist of verbal phrases VADVS. The verbal pbrases 
l°For tenninMs we have a different notation. Terminals in 
phrase rules ta'e phoneme sU'ings, whose trm~scriptlon is de- 
fined by the HMM-LR phoneme model. 
11 For the sake of explanation, the rifles m'e simplified. 
Acr~ DE COLING-92, NAhqES. 23-28 AOt~q' 1992 8 1 0 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 
VADVS call contain ally conj-pps, which means both 
vaux-s and vanx-s+~. 
Compared with tile first version, which accepts all- g 901-/~/~A~~ 
proximately 80 % of the sentence candidates coilsist- ~" 
ing of all the possible combinations of plmLse candi- 
dates, tile refined version only accepts approximately ~ 80 , . 
tile phrase rules and l)hrase-based sentence rules. ~ 70 
Table 3: Size and Perplexity of Syntactic Rules 60f 
~ ~--o NowGrammnr 
\[ Pi ..... \])~ule8 I Seutencc t~.tde~ | \]J ReSent .... F\[nl~ll(3oll\[-pp m • - )~ No Forme~l Noun Trentlng 
\[ lqo. of l\[uies 1,973 471 50 i ..... NoEl ........ 
N o, of Terminals 744 133 / ~'1 
Perplexity :k57/Phoneme 99.7/l'hrm~e 
40 
4 Validity of l~ule tLefinements ~0 0 5 10 
We tested the improvement in two ways: speech 
recognition accuracy 'and the acceptance rate\[12\]. 
rio estimate the latter we checked how many sen- 
tence candidates were fltered out by applying phrmse- 
based sentence rules as a post-filter. We verified 
the rule refinements through coral)arisen of results 
gained by five different rule sets: tile refined ver- 
sion of sentence rules which contain all three reline 
ments (Neu Grammar); the refined versiou without 
conj-pp treatment (No Sentence Final Conj-pp), 
without formal noun treatment (No Formal ~oun 
Treating), and without mnneral trcatmcnt (No 
Nume~'al Tz'eat±ng); and rules which allow all combi- 
nations of phr~qe candidates (No (;rmmn~n:). For the 
frst four of these rule sets wc determined ranks based 
on the probabilities of phoneme strings predicted by 
syntactic rules. But in the No Grmamar case we deter- 
mined tile rank solely based on phoneme probability. 
We exl)erimented with the same 353 phrases which 
make up 137 sentences as irl section 2.2. The phrase 
recognition rate for the top 5 candidates was again 
95.5% by exact string matching. 
4.1 Speech Recognition Accuracy 
We conducted speech recognition experiments. Fig- 
ure 1 shows the constraint effectiveness of the phrmse- 
based sentence rules given the five conditions ex- 
amined. These live conditions arc'. compared ill tile 
graph, based on their abilities to correctly recognize 
the spoken sentences among tile top ranked 20 can. 
didates. 
While the sentence recognition rate tbr the top can- 
didates remains 37.2 % when probability is the only 
factor in determining tile candidates, the recognition 
rate rises to 70.1% when tile refined syntactic rules 
are applied as constraints. Differentiating eonj-pps is 
highly effective. Without this treatment, tile recog- 
nition rate renaains 48.2%. Formal noun and lnunera\] 
treatments are not as effective. Figure 1 indicates 
15 20 
No. of Gllndidelte= 
Figure 1 : Comparison of l~cognltion I{tttes 
that tile elt~ct according to each syntactic constraint 
is especially distinct up to rank 5, and that the recog- 
nition rates saturate when we take into account Sell- 
tenee candidates up to rank 10. 
4.2 Acceptance Rate 
We also verified the validity of sentence rules through 
tile acceptance rate. We examined how many sen 
tence candidat~es were filtered ont. Table 4 shows the 
frequencies of sentences consisting of different nun> 
bets of phrases in our test corpus: 
Table 4: Phrase Number and Frequency 
\[Phrase Number 2 8 
Freq y ...... ~\]~~ 
Figure 2 shows tile acceptance rates when applying 
four different syntactic rules. Wlmn applying rules 
which allow all combinations of phrase candidates, 
the accel)tance rate remains 100 %. 
't'hc effect of constraints is especially clear lot sen- 
tences with a small number of I)hra~s. In sentences 
witil one phrase, the asceptance rate for the revised 
version is 41%, and for the wu'sion without conj-pp 
constraints 70%. In cOral)arisen with Figure 1, treat- 
ing nmuerals contributes toward filtering out sentence 
candidates rather than raising speech recognition ac- 
curacy. Independent of the constraint strength, tile 
mort? phrases there are ill ~. sentence, tile ntore effete 
lively tile rules work. 'l)hc wdue for a sentence with 
8 phrases is unreliable, as we have only one example. 
Acn!s DE COLING-92, NAbrn!s, 23-28 ,',.o(n" 1992 8 i 1 PRec. of COLING-92, NANTI!S, AUG. 23-28. 1992 
8O 
¢c 7o o- 
$ 
60 o~ 
e--o New Grammar 
A .-... • No 8entente Final Conj-pp 
• . • .... x< No Formal Noun Treating 
• ., o--* No NumeralTreating 
"A"" ".. 
I, I I I I I I I I 
1 2 3 4 5 s 7 8 9 
Number of Phrases In One Sontence 
Figure 2: Acceptance Rate 
5 Conclusion 
We have described phrase-based syntactic rules which 
are used as constraints in the Japanese speech recog- 
nition module of our experimental speech-to-speech 
translation system. For constructing rules we took 
into account the error tendency in speech recognition. 
We treated precisely those syntactic categories which 
tend to be recognized erroneously. To increase the 
efficacy of each rule, the rule construction is strongly 
motivated by our dialogue corpus. By applying the 
refined phrase-based syntactic rules, the speech recog- 
nition rate for the top candidates improved from 37.2 
% to 70.1% and for the top 5 candidates from 73.7 
% to 83.9 %. 
The implementation of syntactic rules bascd on our 
dialogue corpus is continuing in order to increase cov- 
erage. Currently we are studying postposition dele- 
tion in nominal phrases, which is one of the features 
of spoken Japanese. When adding rules and enlarg- 
ing vocabulary, we cammt avoid decreasing speech 
recognition accuracy, but our further experiments 
showed that careful rule construction filtered out un- 
acceptable sentence candidates much more effectively. 
Though we believe that our dialogue corpus for the, 
current domain provides enough expressions of spo- 
ken Japanese, we are going to apply the same method 
to other domains to establish the generality of the 
rules. 
Acknowledgements 
The authors wish to thank Dr.A.Kurematsu, President of 
ATR Interpreting Telephony Research Labs for his contin- 
ued support, Mr.T.Morimoto for discussion of the various 
stages of this work, Mr.K.hmue for his help in database 
retrieval and Dr.S.Luperfoy and Dr.L.Fals for reading an 
earlier draft. 
References 
\[1\] Morimoto, T., Shikano, K., Iida, H., Kurem~.tsu, 
A.(1990): "Integra.tion of Speech Recognition and 
Language Processing in Spoken Language Trans- 
lation System(SL-TRANS)," Proc. of ICSLP-90, 
pp.921-924. 
\[2\] Lee, K.-F. ~nd Hon, H.-W.(1988): "Large- 
Vocabulary Speaker-Independent Continuous Speech 
Recognition Using HMM," Proc. of ICASSP-88, 
pp.123-1'26. 
\[3\] Ney, H.(1987): "Dynamic Programming Speech 
Recognition Using a Context-Free Grammar," Proc. 
of ICASSP-87, pp.69-72. 
\[4\] Matsunaga, S., Sagayama, S., Homma, S. and Furui, 
S.(1990): "A Continuous Speech Recognition System 
Based on a Two-Level Grammar Approach," Proc. of 
ICASSP-90, pp.589-592. 
\[5\] Yamao'ka, T. and 1ida, H.(1990): "A Method to Pre- 
dict the Next Utterance Using a Four-layered Plan 
Recognition Model," Proc. of ECAI-90, pp.726-731. 
\[6\] Kuno, S. (1973): The Structure of the Japanese Lan- 
guage, The MIT Press, Cambridge, Massachusetts 
and London. 
\[7\] Fujisaki, T. (1984): "A Stochastic Approach to Sen- 
tence Parsing," Proc. of COLING-84, pp.16-19. 
\[8\] Ferretti, M., Maltese, G., Scarci, S. (1990): "Mea- 
suring Information Provided by Language Model and 
Acoustic Model in Probabilistic Speech Recognition: 
Theory and Experimental Results," Speech Commu- 
nication 9, pp.531-539. 
\[9\] Ehara, T., Ogura, K., Morimoto, T. (1990): "ATR 
Dialogue Database," Proc. of ICSLP-90, pp. 1093- 
1096. 
\[10\] Kita, K., Kawabata, T., Salto, H. (1989): "HMM 
Continuous Speech Recogniton Using Predictive LR 
Parsing," Proc. of ICASSP-89, pp.703~706. 
\[11\] Hosaka, J., Ogura, K., Kogure, K. (1990): "Word 
Sequence Constraints for Japanese Speech Recogni- 
tion," Proc. of ECA1-90, pp. 363-365. 
\[12\] Takezawa, T., Kita, K., Hosaka, J., Morinmto, 
T. (1991):"Linguistic Constraints for Continuous 
Speech Recognition in Goal-Directed Dialogue," 
Proc. of ICASSP 91, pp.801-804. 
Ogawa, Y., Hayashi, H., et al. (1982, 1988): Nihongo 
Kyouiku Jiten, Talshuukan, Tokyo, (In Japanese). 
Hosaka, J., Takezawa, T., Ehara, T. (1991): "Uti- 
lizing Empirical Data for Postposition Classification 
toward Spoken Japanese Speech Recognition," Proc. 
of ESCA-91, pp. 573-576. 
\[13\] 
ACRES DE COLING-92, NANTES, 23-28 Aotrr 1992 8 1 2 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 
