COUPLING AN AUTOMATIC DICTATION SYSTEM WITH A GRAMMAR CHECKER 
Jean-Pierre CHANOD, Marc EL-BEZE, Sylvle GUILLEMIN-LANNE 
IBM France, Paris Scientific Center 
Automatic dictation systems (ADS) are 
nowadays powerful and reliable. However, 
some Inadequacies of the underlying 
models still cause errors. In this paper, we 
are essentially interested in the language 
model implemented In the linguistic 
component, and we leave aside the acoustic 
module. More precisely, we aim at 
Improving this linguistic model by coupling 
the ADS with a syntactic parser, able to 
diagnose and correct grammatical errors. 
We describe the characteristics of such a 
coupling, and show how the performance of 
the ADS improves with the actual coupling 
realized for French between the Tangora 
ADS and the grammar checker developed at 
the IBM France Scientific Center. 
Description of the Tangora 
system 
The Tangora system is implemented on a 
personal computer IBM PSI2 or IBM 
RS/6000. A vocal I/O card is added, as well 
as a specialized card equipped with two 
micro-processors, which provide the needed 
power for the decoding algorithms. The 
programs are written In assembly or C. 
The multi-lingual aspect of the Tangora 
system (DeGennaro 91) constitutes a major 
asset. Indeed, It was Initially conceived for 
English (Averbuch, 87) by the F. Jellnek 
team (IBM T. J. Watson Research Center), 
but It was adapted since to process Italian, 
German and French Inputs. As a whole, the 
average error rate is close to 5%. But 
problems specific to each language require 
adapted solutions. 
The user is required to train the system by 
uttering 100 sentences during an enrollment 
phase, and to manage slight pauses 
between two words. For the French system, 
liaisons at this time are prohibited. 
Architecture of the system 
The voice signal is submitted to a chain of 
signal processing, in order to extract 
acoustic parameters from the sound wave. 
Thus, the data flow is reduced from 30,000 
to 100 bytes per second. Two passes of 
acoustic evaluation are performed: a 
relatively gross pass (so-called Fast Match) 
selects a first list of candidate words 
(around 500 words); this list is further 
reduced thanks to the language model (see 
below)~ so that only a small number of 
remaining candidates are submitted to a 
second, more precise, acoustic pass (so- 
called Detailed Match). Storage constraints 
as we!l as the methods used to provide the 
language model explain that the size of the 
dictionary is limited to about 20,000 entries. 
The decoding algorithm 
This algorithm determines the more likely 
uttered sequence of words. It works from 
left to right by combining the various scores 
estimated by the acoustic and linguistic 
models, according to a so-called stack 
decoding strategy. At this stage, the 
elementary operation consists tn expanding 
the best existing hypothesis which Is not yet 
expanded, i. e. It consists In keeping the 
sentence segment, which, followed by the 
contemplated current word, Is rated with the 
highest likelihood. 
Methods 
If one formulates the problem of speech 
recognition according to an Information 
theory approach, one naturally chooses 
probabillstic models among all available 
language models (Jeltnek, 76). The trlgram 
(Cerf, 90), trlPOS 1 (Derouault, 84), or 
trilemma (Derouault, 90) models offer ways 
of estimating the probability of any 
sequence of words. For instance, formula of 
the trlgram model: 
fl 
P(W~) = P(wl) × P(w2/wO × HP(wj/wI_ =, wl_ 1) 
1~3 
The analysis of decoding errors show that 
half of them are due to the acoustic model, 
the other half being associated with the 
I Model baled on triplets of parts of AOooeh (POS), 
ACTES DE COLING-92, NANTI'S, 23 28 AoOr 1992 9 4 0 PRO\[:. OF COLING-92, NArcr~s. AUG. 23-28, 1992 
language model. Actually, the number of 
homophones being quite high (2.6) In an 
inflected language such as French, it Is 
clear that no acoustic model, as perfect as It 
may be, can produce a satisfactory 
decoding without the support of a language 
model. 
Power and limitations of probabilistic 
language models 
Probablllstlc language models are powerful 
enough to considerably reduce ambiguities 
that the acoustic model alone cannot solve. 
However, they suffer from punctual Imper- 
fections that are bound to their formulation. 
This Is clearly shown by testing a 
probablllstlc model on the lattice formed by 
the set of the homophones of the words of 
every sentence. The decoding obtained by 
searching for the maxlreum likelihood path 
(Cerf, 91) gives an error rate close to 3%, 
thus showing some of the Inadequacies of 
the probablllstlc language models. 
Besides, and agatn for reliability reasons, 
statistics need to be gathered from large 
learning corpora (tens or even hundreds of 
millions words). In spite of all the 
preliminary cleaning that may be done 
(automatic correction of typos, tripled 
consonants for Instance), such a huge 
corpus contains a certain number of 
grammatical errors, that Introduce noise In 
the model. 
Probablllstlc estlmatlons are produced by 
counting triplets of words or grammatical 
classes, tn any of the trtgram, triPeS or 
trllemma models, a word Is generally 
predicted according to the two preceding 
words, classes or lemmas only. However, 
grammatical rules may apply to larger 
frames. Not only the rules often apply to 
words located out of the window used by 
the probabtllstlc model, but also 
grammatically significant words are to be 
found either In previous or In posterior 
position. Let us mention, as Illustrations, 
some phenomena for which the probablllstlc 
model does not fit: 
• Adverbs and complements constitute an 
obstacle to tile transfer of information on 
gender, number and person, while this 
Information Is needed to choose 
between different homophones, as In: 
I~ COMMISSION charg(ie d' 6tabllr un 
plan de aoutlen global aux populotlone 
des terrltolres occup~m s" est RdUNIE 
dlmanche, 
Appositions and interpolated clauses 
Increase the distance between elemeuts 
which must agree: 
Plusloun= PARTI5 d'oppo=lUo, de 
gaucho, notammant Io paHl commu= 
nlate, PARTAGENT co point de rue. 
Predicting a word thanks to tim 
preceding words does not allow the 
system to appropriately control person 
agreement when the subject follows the 
verb. Example: 
Quo aont DEVENUS los prlnelpaux 
PROTAGONtSTES de la vlctolre du 
onze novombre? 
Moreover, some confusions due to 
homophony induce changes of 
grammatical category, that require a 
complete Interpretation of the sentence 
to be properly diagnosed, as in "et"/'est ~ 
(conjunction/verb) or "&"l"a ~ 
(preposition/verb). 
Coupling the ADS with the 
grammar checker 
To bring a solution to the problems 
described above, we propose to perform a 
grammatical analysis after the decoding 
operation. The grammatical analysis applies 
to the best of the hypotheses selected by 
the ADS. It serves as a basis to diagnose 
grammatical errors and te suggest correc- 
tions 2 . 
The syntactic parser must prove powerful 
and reliable enough to effectively Improve 
the performance of the ADS. It must provide 
a broad coverage, In order to cope with a 
large variety of texts, the source and the 
domain of which are not known In advance. 
It must also compute a global analysis of 
the sentence In order to fill the deficiencies 
of the probablllstlc model. 
Description of the syntactic parser 
The syntactic parser we use meets the 
requirements described above (Chased el). 
It is actually conceived to provide the global 
syntactic analysis of extremely diversified 
texts. 
It is based on an original linguistic ~rategy 
developed by Karen Jonson for US English 
(Heldorn 132, Jonson, 8G). The parser Initially 
e A similar approach was tested in English, but only to detect grammatically incorre~ct ~nionceB (Bellegarda 92) 
AcrEs DE COLING-92, NANTES, 23-28 AO~r 1992 9 4 1 PROC. o=: (;OI.ING-92, NANTES, AUG. 23-28, 1992 
compute8 a syntactic sketch, which 
represents the likeliest syntactic surface 
structure of the sentence; at this stage, such 
phenomena as coordinations, ellipses, 
interpolated clauses, If not totally resolved, 
do not block the parsing. The analysis Is 
based on the so-called relaxed approach, 
which consists in rejecting linguistic 
constraints which, as pertinent as they may 
be In descriptive linguistics, are rarely 
satisfied strlcto sansu In the surface struc- 
tures of free texts. This strategy proves to 
broaden the coverage of the grammar as 
well as it allows the parser to deal with 
erroneous texts. 
Architecture of the parser:. 
The system is written in PLNLP 
(Programming Language for Natural 
Language Processing, G. Heldorn, 72). It 
Includes: 
• A morphologic dictionary (50,000 
lemmas plus their Inflection tables), = 
* A morpho-syntactlo dictionary, which 
describes the sub-categorizations 
attached to each temma, 
• A set of more than 300 PLNLP produc- 
tion rules, which produce the syntactic 
sketches, 
• A set of procedures built to re-interpret 
the syntactic sketches and to diagnose 
errors, 
• A form generator, which provides 
corrected forms. 
Indeed, some other techniques are also 
used. Strong syntactic constraints are 
relaxed during a second pass; It allows the 
system to detect errors which induce major 
syntactic changes (for Instance confusion 
"et/est"), whim forbidding undesired or too 
numerous parses. Fitted parses are 
computed In case the global analysis falls 
(Jansen, 83) and multiple parses are ranked 
thanks to specific procedures (Heldorn, 76). 
This last point allows the system to 
automatically select the strongest 
hypothesis, according to the linguistic 
features (Including the grammar errors) of 
the syntactic trees. 
Adaptation of the parser to the ADS 
As mentioned above, many grammatical 
errors In written French are actually caused 
by homophones (gender, number 
agreement, confusion between Infinitive and 
past participle, "chantez/chanter', %t/esf", 
etc.). The parser, Initially built for written 
French, Is thus well prepared to detect 
errors produced by an ADS. 
It can however be adapted to the specific 
needs of the ADS, by adding specific 
procedures (detection of ill-recognized 
frozen phrases, etc.), and by filtering out 
non-homophonic corrections, or corrections 
which do not belong to the list of candidates 
initially proposed by the ADS. 
Indeed, post-processing procedures are 
largely used to diagnose errors after the 
syntactic tree has been computed. This 
offers the Immense advantage of making the 
system evolutionary: It can be easily 
modified, In order to Improve the scope of 
the detections. This made the adaptation of 
the grammar checker to the ADS quite 
straightforward. 
Description of the processing chain 
In case of the ADS, the coupling Is done by 
a simple call to the parser for each sen- 
tence. In case of the homophone scheme, 
the diagram of the processing chain Is 
shown In the following figure: 
= The=e 50,000 lemmae produce about 350,000 inflected forms, which largely exceeds the 20,000 forms uemd by the Tangora system. 
ACTES DE COLING-92, NANTES, 23-28 AOI)T 1992 9 4 2 Pr~oc. OF COLIN'G-92, NANTES, AUG. 23-28, 1992 
Figure 1. Coupling Diagram 
ExpeHencos 
Our tests were carried on the following 
texts: 
corpl AFP dispatches (1000 words) 
corp2 AFP dispatches (3221 words) 
corp3 e-mail notes (1909 words) 
corp4 grammar books (1337 words) 
Only the CORP1 file was obtained through a 
real decoding; the other corpora were 
processed by automatically generating their 
homophones. 
Results 
The experiments were made at an early 
stage of the coupling. They could certainly 
be improved with more extensive tests, as 
the adaptation of the grammar checker to 
the ADS would gain In accuracy. 
Percentage of erroneous words left 
uncorrected 
LM without parser with parser 
corpl 4.5% 3.6% 
corp2 4.6% 3.6% 
corp3 6.3% 6.1%4 
corp4 7% 5.8% 
Given the high performance of the ADS and 
the difficulty to Improve It In the frame of the 
probablllstlc model, the improvement of 
around 1% observed on three of the test 
corpora is very promising. 
Samples of corrected sentences: 
Example 1: Subject-predicate, attributive 
adjective-noun, subject-verb agreement 
Lee conditions sont tr~ durs roll ie pays, 
devenus Ind6fendable, les accepteeL 
After parsing, the suggested correction Is: 
LUS condWons sont ~ DURES mall; le 
pays, DEVENU Ind6fendeble, les ACCEPTE. 
Example 2: subject.verb agreement; contusion 
between the conjunction "st" end the verbal form 
"est" : 
Le felt que le I~ros de chscun des bols 
romans solent dlffdrents el: rGv~lateers. 
After parsing, the suggested correction Is: 
Le felt qua le h6ros de chscun des b~ls 
romans SOIT DIFF6RENT EST 
R6V6LATEUR. 
Example 3: Confusion between the verbal form 
"e ~ and the preposition "A"; Confusion between 
the past participle and the Infinitive form of the 
corresponding verb. 
Ce document est a falro sign6 recto et 
verso par le propdGtalro st par le gesUon- 
nalro. 
After parsing, the suggested correction Is: 
Ce document est & falro SIGNER recto et 
verso par le proprl6talro et par le gestlon- 
nalro. 
Conclusion 
Coupling the ADS and the syntactic parser 
meets the Initially assigned objectives quite 
satisfactorily: broad coverage of the texts 
parsed by the grammar, meaningful 
percentage of justified corrections, 
adequacy of the syntactic parser to the 
types of errors specifically generated by the 
decoder. 
The tests that we performed on various 
corpora are all the more encouraging, since 
a great deal of the remaining errors result 
from semantic ambiguities that no grammar 
checker based upon a syntactic analysis of 
the sentence can detect. 
4 The bad results of the CORP3 file are due In greet part to the difficulties of e-mall, that make parsing less 
accurate. 
ACTES DE COLING-92, NANTES, 23-28 AOt~T 1992 9 4 3 PROC. OF COLING-92, NANH'ES, AUG. 23-28, 1992 
L'~ge do la MER lu plus fr~luent ~ I'accou- 
chement est de vlngt-slx ans. 
A subsidiary advantage of the coupling 
would be to detect errors that would not be 
produced by the ADS but by the speaker 
him/herself (punctuation, stylistic infelicities, 
mood of subordinate clauses, etc.). Not only 
we may contemplate transcribing as 
accurately as possible the words of a 
speaker, but also offering him/her a stylistic 
aid. 
References 
Averbuch A. et al., 1987: Experiments with 
the TANGORA 20,000 word Speech 
Recognizer, Proceedings of ICASSP, Dallas, 
pp. 701-704. 
Bellegarda J., Braden-Harder L., Jensen K., 
Kanevsky D., Zadrozny W., 1992: "Post- 
recognizer language processing: applica- 
tions to speech, handwriting", submitted to 
EUSIPCO'92. 
Cerf-Danon H., de La Noue P., Dlrlnger L., 
EI-B~ze M., Marcadet J.C., 1990: "A 20,000 
words, automatic speech recognizer. Adap- 
tation to French of the US TANGORA 
system", Nato 1990. 
Cerf-Danon H., EI-B~ze M., 1991: "Three 
different Probablllstlc Language Models: 
Comparison and Combination", ICASSP 
1991. 
Chanod J-P., 1991: Analyse automatlque 
d'erreurs: strat(~gie Ilngulstlque et 
computatlonnelle, Colloque Informatlque et 
Langue naturelle, 23-24 janvler 91, Liana 
Univ. de Nantes. 
DeGennaro S., Cerf-Danon H., Ferrettl M., 
Gonzales J., Keppel E., 1991: "Tangora - a 
large vocabulary speech recognition system 
for five languages ", EuroSpeech 1991, 
Genoa. 
Derouault A-M., M~rialdo B., 1984: 
"Language modeling at the syntactic level" 
7th International Conference on Pattern 
Recognition, August 1984, Montreal. 
Derouault A-M., EI-B~ze M., 1990: "A 
Morphological Model for Large Vocabulary 
Speech Recognition", ICASSP 1990. 
Heldorn, G.E., 1972: Natural Language Inputs 
to a Simulation Programming System, Ph.D. 
dissertation, Yale University. 
Heidorn G.E., Jensen K., Miller L.A., Byrd 
R.J., Chodorow M.S., 1962: "3"he EPISTLE 
Text-Critiquing System", IBM system Journal, 
vol.21, n°3. 
Heidorn, G.E., 1976: "An Easily Computed 
Metric for Ranking Alternative Parses", 
Presented at the Fourteenth Annual Meeting 
of the ACL, San Francisco, October 1976. 
Jellnek F., 1976: "Continuous Speech 
Recognition by Statistical Methods", 
Proceedings of the IEEE, Vo/ 64, April 1976. 
Jensen, K., Heldorn, G.E., 1983: "The Fitted 
Parse: 100% Parsing Capability In a 
Syntactic Grammar of English", Prec. Conf. 
on Applied Natural Language Processing, 
Santa Monlca, California, pp.93-98. 
Jensen, K. 1966: "A Broad-Coverage 
Computational Syntax of English", 
Unpublished documents, IBM T.J. Watson 
Research Center, Yorktown Heights, N.Y. 
ACt'ES DE COLING-92, NANTES, 23-28 AO~V 1992 9 4 4 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 
