Compiling Language Models from a Linguistically Motivated 
Unification Grammar 
Manny Rayner t>, Beth Ann Hockey t, Frankie James t 
Elizabeth Owen Bratt ++, Sharon Goldwater ++ and Jean Mark Gawron ~ 
tResea.rch Institute for 
Advanced Computer Science 
Mail Stop 19-39 
NASA Ames Research Center 
Moffett Field, CA 94035-1000 
Abstract 
Systems now exist which are able to con:pile 
unification gralmnars into  models that 
can be included in a speech recognizer, but it 
is so far unclear whether non-trivial linguisti- 
cally principled gralnlnars can be used for this 
purpose. We describe a series of experiments 
which investigate the question empirica.lly, by 
incrementally constructing a grammar and dis- 
covering what prot)lems emerge when succes- 
sively larger versions are compiled into finite 
state graph representations and used as lan- 
guage models for a medium-vocabulary recog- 
nition task. 
1 Introduction ~ 
Construction of speech recognizers for n:ediuln- 
vocabulary dialogue tasks has now becolne an 
important I)ractical problem. The central task 
is usually building a suitable  model, 
and a number of standard methodologies have 
become established. Broadly speaking, these 
fall into two main classes. One approach is 
to obtain or create a domain corpus, and froln 
it induce a statistical  model, usually 
some kind of N-gram grammar; the alternative 
is to manually design a grammar which specifies 
the utterances the recognizer will accept. There 
are many theoretical reasons to prefer the first 
course if it is feasible, but in practice there is of- 
ten no choice. Unless a substantial domain cor- 
pus is available, the only method that stands a 
chance of working is hand-construction of an ex- 
i The majority of the research reported was performed 
at I{IACS under NASA Cooperative Agreement~ Number 
NCC 2-1006. The research described in Section 3 was 
supported by the Defense Advanced Research Projects 
Agency under Con~racl~ N66001-94 C-6046 with the 
Naval Command, Control, and Ocean Surveillance Cen- 
ter. 
SRI International 
333 Ravenswood Ave 
Menlo Park, CA 94025 
*netdecisions 
Wellington House 
East Road 
Cambridge CB1 1BH 
England 
plicit grammar based on the grammar-writer's 
intuitions. 
If the application is simple enough, experi- 
ence shows that good grammars of this kind 
can be constructed quickly and efficiently using 
commercially available products like ViaVoice 
SDK (IBM 1999) or the Nuance Toolkit (Nu- 
ance 1999). Systems of this kind typically al- 
low specification of some restricted subset of the 
class of context-free grammars, together with 
annotations that permit the grammar-writer to 
associate selnantic values with lexical entries 
and rules. This kind of framework is fl:lly ad- 
equate for small grammars. As the gran:mars 
increase in size, however, the limited expres- 
sive power of context-free  notation be- 
conies increasingly burdensome. The grainn:a,r 
tends to beconie large and unwieldy, with many 
rules appearing in multiple versions that con- 
stantly need to be kept in step with each other. 
It represents a large developn:ent cost, is hard 
to maintain, and does not usually port well to 
new applications. 
It is tempting to consider the option of mov- 
ing towards a :::ore expressive grammar tbrmal- 
isln, like unification gramnm.r, writing the orig- 
inal grammar in unification grammar form and 
coml)iling it down to the context-free notation 
required by the underlying toolkit. At least 
one such system (Gemilfi; (Moore ct al 1997)) 
has been implemented and used to build suc- 
cessful and non-trivial applications, most no- 
tably ComnmndTalk (Stent ct al 1999). Gem- 
ini accepts a slightly constrained version of the 
unification grammar formalism originally used 
in the Core Language Engine (Alshawi 1992), 
and compiles it into context-free gran:nmrs in 
the GSL formalism supported by the Nuance 
Toolkit. The Nuance Toolkit con:piles GSL 
gran:mars into sets of probabilistic finite state 
670 
gra.phs (PFSGs), which form the final bmguage 
model. 
The relative success of the Gemilfi system 
suggests a new question. Ulfification grammars 
ha.re been used many times to build substantial 
general gramlnars tbr English and other na.tu- 
ra\[ s, but the  model oriented 
gra.mln~rs so far developed fi)r Gemini (includ- 
ing the one for ColnmandTalk) have a.ll been 
domain-sl)ecific. One naturally wonders how 
feasible it is to take yet another step in the di- 
rection of increased genera.lity; roughly, what 
we want to do is start with a completely gen- 
eral, linguistically motivated gramma.r, combine 
it with a domain-specific lexicon, and compile 
the result down to a domain-specitic context- 
free grammar that can be used as a la.nguage 
model. If this 1)tetra.mine can be rea.lized, it is 
easy to believe that the result would 1)e a.n ex- 
tremely useful methodology tbr rapid construc- 
tion of la.nguage models. It is ilnportant to note 
tha.t there are no obvious theoretical obstacles 
in our way. The clailn that English is context- 
free has been respectable since a.t least the early 
8(Is (Pullum and Gazda.r 1982) 'e, and the idea. 
of using unification grammar as a. compact wa 5, 
of tel)resenting an ulMerlying context-fl'e~e, lan- 
guage is one of the main inotivations for GPSG 
(Gazdar et al 1985) and other formalislns based 
on it. The real question is whether the goal is 
practically achievable, given the resource limi- 
tations of current technology. 
In this l)a.1)er, we describe work aimed at the 
target outlined above, in which we used the 
Gemini system (described in more detail in Sec- 
tion 2) to a.ttempt to compile a. va.riety of lin- 
guistically principled unification gralnlna.rs into 
la.ngua.ge lnodels. Our first experiments (Sec- 
tion 3) were pertbrmed on a. large pre-existing 
unification gramlna.r. These were unsuccessful, 
for reasons that were not entirely obvious; in 
order to investigate the prol)lem more system- 
atically, we then conducted a second series of 
experilnents (Section 4), in which we increlnen- 
tally 1)uilt up a smMler gra.lnlna.r. By monitor- 
ing; the behavior of the compilation process and 
the resulting langua.ge model as the gra.lmnar~s 
2~1e m'e aware l, hal, this claim is most~ 1)robably not 
l;rue for natural s ill gelmraI (lh'csnall cl al 
1987), but furl~hcr discussion of t.his point is beyond I.he 
scope of t, llC paper. 
cover~ge was expanded, we were a.ble to iden- 
tit~ the point a,t which serious problems began 
to emerge (Section 5). In the fina.1 section, we 
summarize and suggest fltrther directions. 
2 Tile Genfini Language Model 
Compiler 
To lnake the paper nlore self-contained, this sec- 
tion provides some background on the method 
used by Gemini to compile unifica.tion grain- 
mars into CFGs, and then into  mod- 
els. The ha.sic idea. is the obvious one: enu- 
mera.te all possible instantiations of the feal;ures 
in the grammar rules and lexicon entries, and 
thus tra.nsform esch rule and entry in the ()rig- 
inal unification grammar into a set of rules in 
the derived CFG. For this to be possible, the 
relevant fe~ttul'es Inust be constrained so that 
they can only take values in a finite predefined 
range. The finite range restriction is inconve- 
nient for fea.tures used to build semantic repre- 
sentations, and the tbrmalism consequently dis- 
tinguishes syntactic and semantic features; se- 
lmmtic features axe discarded a.t the start of the 
compilation process. 
A naive iml)lelnentation of the basic lnethod 
would be iml)raetical for any but the small- 
est a.nd simplest grammars, and considera.ble 
ingemfity has been expended on various opti- 
mizations. Most importantly, categories axe ex- 
panded in a demand-driven fa.shion, with infer 
lnatiotl being percolated 1)oth t)otton>up (from 
the lexicon) and top-down (fl'om the grammar's 
start symbol). This is done in such a. way 
that potentially valid colnl)inations of feature 
instantiations in rules are successively filtered 
out if they are not licensed by the top-down 
and bottom-ul) constra.ints. Ranges of feature 
values are also kept together when possible, so 
that sets of context-free rules produced by the 
mdve algorithm may in these cases be merged 
into single rules. 
By exploiting the structure of the gram- 
mar a.nd lexicon, the demand-driven expansion 
lnethod can often effect substa.ntial reductions 
in the size of the derived CFG. (For the type 
of grammar we consider in this paper, the re- 
duction is typically by ,~ fa.etor of over 102°). 
The downside is that even an app~trently slnall 
cha.nge in the syntactic t>atures associated with 
a. rule may have a large eIfect on the size of 
671 
the CFG, if it opens up or blocks an impor- 
tant percolation path. Adding or deleting lexi- 
con entries can also have a significant effect on 
the size of the CFG, especially when there are 
only a small number of entries in a given gram- 
matical category; as usual, entries of this type 
behave from a software engineering standpoint 
like grammar rules. 
The  model compiler also performs 
a number of other non-trivial transformations. 
The most important of these is related to the 
fact that Nuance GSL grammars are not al- 
lowed to contain left-recursive rules, and left- 
recursive unification-grammar rules must con- 
sequently be converted into a non-left-recursive 
fort::. Rules of this type do not however occur 
in the gramlnars described below, and we conse- 
quently omit further description of the method. 
3 Initial Experiments 
Our initial experiments were performed on a 
recent unification grammar in the ATIS (Air 
Travel Information System) domain, developed 
as a linguistically principled grammar with a 
domain-specific lexicon. This grammar was 
cre~ted for an experiment COl::t)aring cover- 
age and recognition performance of a hand- 
written grammar with that of a.uto:::atically de- 
rived recognition  models, as increas- 
ing amounts of data from the ATIS corpus 
were made available for each n:ethod. Exam- 
ples of sentences covered by this gralnlnar are 
"yes", "on friday", "i want to fly from boston 
to denver on united airlines on friday septem- 
ber twenty third", "is the cheapest one way 
fare from boston to denver a morning flight", 
and "what flight leaves earliest from boston to 
san francisco with the longest layover in den- 
ver". Problems obtaining a working recognition 
grammar from the unification grammar ended 
our original experiment prematurely, and led 
us to investigate the factors responsible for the 
poor recognition performance. 
We explored several likely causes of recogni- 
tion trouble: number of rules, ::umber of vocab- 
ulary items, size of node array, perplexity, and 
complexity of the grammar, measured by aver- 
age and highest number of transitions per graph 
in the PFSG form of the grammar. 
We were able to in:mediately rule out sim- 
ple size metrics as the cause of Nuance's diffi- 
culties with recognition. Our smallest air travel 
grammar had 141 Gemini rules and 1043 words, 
producing a Nuance grammar with 368 rules. 
This compares to the Con:mandTalk grammar, 
which had 1231 Gemini rules and 1771 words, 
producing a Nuance gran:n:ar with 4096 rules. 
To determine whether the number of the 
words in the grammar or the structure of 
the phrases was responsible for the recognition 
problems, we created extreme cases of a Word+ 
grammar (i.e. a grammar that constrains the 
input to be any sequence of the words in the 
vocabulary) and a one-word-per-category gram- 
mar. We found that both of these variants 
of our gralmnar produced reasonable recogni- 
tion, though the Word+ grammar was very in- 
accurate. However, a three-words-per-category 
grammar could not produce snccessflfl speech 
recognition. 
Many thature specifications can lnake a gram- 
mar ::tore accurate, but will also result in a 
larger recognition grammar due to multiplica- 
tion of feature w~lues to derive the categories 
of the eontext-fl'ee grammar. We experimented 
with various techniques of selecting features to 
be retained in the recognition grammar. As de- 
scribed in the previous section, Gemini's default 
method is to select only syntactic features and 
not consider semantic features in the recogni- 
tion grammar. We experimented with selecting 
a subset of syntactic features to apply and with 
applying only se:nantic sortal features, and no 
syntactic features. None of these grammars pro- 
duced successful speech recognition. 
/.Fro::: these experiments, we were unable to 
isolate any simple set of factors to explain which 
grammars would be problematic for speech 
recognition. However, the numbers of transi- 
tions per graph in a PFSG did seem suggestive 
of a factor. The ATIS grammar had a high of 
1184 transitions per graph, while the semantic 
grammar of CommandTalk had a high of 428 
transitions per graph, and produced very rea- 
sonable speech recognition. 
Still, at; the end of these attempts, it beca.me 
clear that we did not yet know the precise char- 
acteristic that makes a linguistically motivated 
grammar intractable for speech recognition, nor 
the best way to retain the advantages of the 
hand-written grammar approach while provid- 
ing reasonable speech recognition. 
672 
4 Incremental Grammar 
Development 
In our second series of experiments, we in- 
crelnenta.lly developed a. new grammar front 
s('ra.tch. The new gra.mma.r is basica.lly a s('a.led- 
down and a.dapted version of tile Core Lan- 
guage Engine gramme\ for English (Puhnan 
1!)92; Rayner 1993); concrete development work 
a.nd testing were organized a.round a. speech in- 
terfa c(; to a. set; of functionalities oflhred by a 
simple simula,tion of the Space Shuttle (Rather, 
Hockey gll(l James 2000). Rules and lexical 
entries were added in sma.ll groups, typically 
2-3 rules or 5 10 lexical entries in one incre- 
ment. After each round of exl)a.nsion , we tested 
to make sure that the gramlnar could still 1)e 
compiled into a. usa.bh; recognizer, a.nd a.t sev- 
ere.1 points this suggested changes in our iln- 
1)\]ementation strategy. The rest of this section 
describes tile new grmmnar in nlore detail. 
4.1 Overview of Rules 
The current versions of the grammar and lexi- 
con contain 58 rules a.nd 30J. Ulfinflectesl entries 
respectively. They (:over the tbllowing phenom- 
eli :~IZ 
1. Top-level utl;er~tnces: declarative clauses, 
WH-qtlestions, Y-N questions, iml)erat;ives, 
etlil)tical NPs and I)Ps, int(;rject.ions. 
~.. /\] 9 \,~ H-lnovement of NPs and PPs. 
3. The fbllowing verb types: intr~nsi- 
tive, silnple transitive, PP con:plen-mnt, 
lnodaJ/a.uxiliary, -ing VP con-q)len:ent, par- 
ticleq-NP complement, sentential comple- 
lnent, embedded question complement. 
4. PPs: simple PP, PP with postposition 
("ago")~ PP lnodifica,tion of VP and NP. 
5. Relat;ive clauses with both relative NP pro- 
1101111 ("tit(; telnperature th,tt I measured ) 
and relative PP ("the (loci: where I am"). 
6. Numeric determiners, time expressions, 
and postmodification of NP 1)y nun:eric ex- 
pressions. 
7. Constituent conjunction of NPs and 
cl~ulses. 
Tilt following examl)le sentences illustrate 
current covera,ge: 3 '-. , ':how ~d)out scenario 
three.?", "wha, t is the temperature?", "mea- 
sure the pressure a,t flight deck", "go to tile 
crew ha.tch a.nd (:lose it", "what were ten:per- 
a.tttt'e a, nd pressure a.t iifteen oh five?", "is the 
telnpera.ture going ttp'. ~', "do the fi×ed sensors 
sa.y tha.t the pressure is decreasing. , "find out 
when the pressure rea.ched fifteen p s i .... wh~t 1 
is the pressure that you mea.sured?", "wha.t is 
the tempera.lure where you a.re?", ¢~(:a.n you find 
out when the fixed sensors say the temperature 
at flight deck reached thirty degrees celsius?". 
4.2 Unusual Features of the Grammar 
Most of the gramn:~u', as already sta.ted, is 
closely based on the Core Language Eng!ne 
gra.nlnla.r. \¥e briefly sllnllna.rize the main di- 
vergences between the two gramnlars. 
4.2.1 Inversion 
The new gramlna, r uses a. novel trea.tment of 
inversion, which is p~trtly designed to simplify 
the l)l'ocess of compiling a, fea,ture gl'anllna, r into 
context-free form. The CLE grammar's trea.t- 
lltent of inversion uses a, movement account, in 
which the fronted verb is lnoved to its notional 
pla.ce in the VP through a feature. So, tbr 
example, the sentence "is pressure low?" will 
in the origina.1 CLE gramma.r ha.re the phrase- 
structure 
::\[\[iS\]l" \[pressure\]N/, \[\[\]V \[IO\V\]AI),\]\]V'\]'\],'g" 
in whk:h the head of th(, VP is a V gap coin- 
dexed with tile fronted main verb 1,~ . 
Our new gra.mn:ar, in contrast, hal:dles in- 
version without movement, by making the con> 
bination of inverted ver\]) and subject into a. 
VBAR constituent. A binary fea.ture invsubj 
picks o:ll; these VBARs, a.nd there is a. question- 
forma,tion rule of tilt form 
S --> VP : Einvsubj=y\] 
Continuing the example, the new gram- 
mar a.ssigns this sentence tilt simpler phrase- 
structure 
"\[\[\[is\] v \[press:ire\] N*'\] v.A. \[\[low\] J\] V.\] S" 
4.2.2 Sortal Constraints 
Sortal constra,ints are coded into most gr~un:nnr 
rules as synta.ctic features in a straight-forward 
lna.nner, so they are available to the compilation 
673 
process which constructs the context-free gram- 
mar, ~nd ultimately tile  model. The 
current lexicon allows 11 possible sortal values 
tbr nouns, and 5 for PPs. 
We have taken the rather non-standard step 
of organizing tile rules for PP modification so 
that a VP or NP cannot be modified by two 
PPs of the same sortal type. The principal mo- 
tivation is to tighten the  model with 
regard to prepositions, which tend to be pho- 
netically reduced and often hard to distinguish 
from other function words. For example, with- 
out this extra constraint we discovered that an 
utterance like 
measure temperature at flight deck 
and lower deck 
would frequently be misrecognized as 
measure temperature at flight deck in 
lower deck 
5 Experiments with Incremental 
G r am 111 ar S 
Our intention when developing the new gram- 
mar was to find out just when problems began 
to emerge with respect to compilation of tan- 
gm~ge models. Our initial hypothesis was that 
these would l)robably become serious if the rules 
for clausal structure were reasonably elaborate; 
we expected that the large number of possible 
ways of combining modal and auxiliary verbs, 
question forlnation, movement, and sentential 
complements would rapidly combine to produce 
an intractably loose  model. Interest- 
ingly, this did not prove to be the case. In- 
stead, the rules which appear to be the primary 
ca.use of difficulties are those relating to relative 
clauses. We describe the main results in Sec- 
tion 5.1; quantitative results on recognizer per- 
tbrmance are presented together in Section 5.2. 
5.1 Main Findings 
We discovered that addition of the single rule 
which allowed relative clause modification of an 
NP had a dr~stic effect on recognizer perfor- 
lnance. The most obvious symptoms were that 
recognition became much slower and the size of 
the recognition process much larger, sometimes 
causing it to exceed resource bounds. The false 
reject rate (the l)roportion of utterances which 
fell below the recognizer's mininmnl confidence 
theshold) also increased substantially, though 
we were surprised to discover no significant in- 
crea.se in the word error rate tbr sentences which 
did produce a recognition result. To investi- 
gate tile cause of these effects, we examined the 
results of perfornfing compilation to GSL and 
PFSG level. The compilation processes are such 
that symbols retain mnemonic names, so that it 
is relatively easy to find GSL rules and gral)hs 
used to recognize phrases of specified gralnmat- 
ical categories. 
At the GSL level, addition of the relative 
clause rule to the original unification grammar 
only increased the number of derived Nuance 
rules by about 15%, from 4317 to 4959. The av- 
erage size of the rules however increased much 
more a. It, is easiest to measure size at the level of 
PFSGs, by counting nodes and transitions; we 
found that the total size of all the graphs had in- 
creased from 48836 nodes and 57195 tra.nsitions 
to 113166 nodes and 140640 transitions, rather 
more than doubling. The increase was not dis- 
tributed evenly between graphs. We extracted 
figures for only the graphs relating to specific 
grammatical categories; this showed that, the 
number of gra.1)hs fbr NPs had increased from 
94 to 258, and lnoreover that the average size 
of each NP graph had increased fronl 21 nodes 
and 25.5 transitions to 127 nodes and 165 tra.nsi- 
tions, a more than sixfold increase. The graphs 
for clause (S) phrases had only increased in 
number froln 53 to 68. They ha.d however also 
greatly increased in average size, from 171 nodes 
and 212 transitions to 445 nodes and 572 tran- 
sitions, or slightly less than a threefold increase. 
Since NP and S are by far the most important 
categories in the grammar, it is not strange that 
these large changes m~tke a great difference to 
the quality of the  model, and indi- 
rectly to that of speech recognition. 
Colnparing the original unification gramlnar 
and the compiled CSL version, we were able to 
make a precise diagnosis. The problem with the 
relative clause rules are that they unify feature 
values in the critical S and NP subgralnlnars; 
this means that each constrains the other, lead- 
ing to the large observed increase in the size 
and complexity of the derived Nuance grammar. 
aGSL rules are written in all notation which allows 
disjunction and Klccne star. 
674 
Specifically, agreement ilffbrmation and sortal 
category are shared between the two daugh- 
ter NPs in the relative clause modification rule, 
which is schematically as follows: 
Igp: \[agr=A, sort=S\] --+ 
NP: \[agr=A, sort=S\] 
REL:\[agr=A, sort=S\] 
These feature settings ~re needed in order to get 
tile right alternation in pairs like 
the robot that *measure/measures 
the teml)erature \[agr\] 
the *deck/teml)era.ture tha.t you 
measured \[sort\] 
We tested our hypothesis by colnlnenting ()lit 
the agr and sort features in the above rule. 
This completely solves the main 1)robh;in of ex- 
1)lesion in the size of the PFSG representation; 
tile new version is only very slightly larger than 
tile one with no relative clause rule (50647 nodes 
and 59322 transitions against 48836 nodes and 
57195 transitions) Most inL1)ortantty, there is 
no great increase in the number or average size 
of the NP and S graphs. NP graphs increase in 
number froin 94 to 130, and stay constant in a.v- 
era ge size.; S graphs increase in number f}om 53 
to 64, and actually decrease, in aa;erage size to 
13,5 nodes and 167 transitions. Tests on st)eech 
(l~t;a. show that recognition quality is nea~rly :lie 
sa.me as for the version of the recognizer which 
does not cover relative clauses. Although speed 
is still significantly degraded, the process size 
has been reduced sufficiently that the 1)roblen:s 
with resource bounds disappear. 
It would be rea.sonal)le 1:o expect tim: remov- 
ing the explosion in the PFSG ret)resentation 
would result in mL underconstrained  
model for the relative clause paxt of the gram- 
mar, causing degraded 1)erformance on utter- 
ances containing a, relative clause. Interestingly, 
this does not appear to hapl)en , though recog- 
nition speed under the new grammar is signif- 
icaatly worse for these utterances COml)ared to 
utterances with no relative clause. 
5.2 Recognition Results 
This section summarizes our empirical recog- 
nition results. With the help of the Nuance 
Toolkit batchrec tool, we evah:ated three ver- 
sions of the recognizer, which differed only with 
respect to tile  model, no_rels used 
the version of the  model derived fl'onI a 
granLn:a.r with the relative clause rule removed; 
rels is the version derived from the fltll gram- 
lnar; and unlinked is the colnl)romise version, 
which keeps the relative clause rule but removes 
the critical features. We constructed a corpus 
of 41 utterances, of mean length 12.1 words. 
The utterances were chosen so that the first, 31 
were within the coverage of all three versions 
of the grammar; the last 10 contained relative 
clauses, and were within the coverage of re:s 
and un:inked but :tot of no_rels. Each utter- 
anee was recorded by eight different subjects, 
none of whom had participated in development 
of the gra.mmar or recognizers. Tests were run 
on a dual-processor SUN Ultra60 with 1.5 GB 
of RAM. 
The recognizer was set, to reject uttera.nces if 
their a.ssociated confidence measure fell under 
the default threshold. Figures 1 and 2 sum- 
marize the re.suits for the first 31 utterances 
(no relative clauses) and the last 10 uttera:Lces 
(relative clauses) respectively. Under '×RT', 
we give inean recognition speed (averaged over 
subjects) e×pressed as a multiple of real time; 
'PRe.j' gives the false reject rate, the :heart l)er - 
centage of utterances which were reiected due to 
low confidence measures; 'Me:n' gives the lnean 
1)ercentage of uttera.nces which fhiled due to the. 
recognition process exceeding inemory resource 
bounds; and 'WER,' gives the mean word er- 
ror rate on the sentences that were neither re- 
jected nor failed due to resource bound prob- 
lems. Since the distribution was highly skewed, 
all mea.ns were calculated over the six subjects 
renm.i:fing after exclusion of the extreme high 
and low values. 
Looking first at Figure 1, we see that rels is 
clearly inferior to no_rels on tile subset of the 
corpus which is within the coverage of both ver- 
sions: nea.rly twice as many utterances are re- 
jected due to low confidence values or resource 
1)roblems, and recognition speed is about five 
times slower, unlinked is in contrast :tot sig- 
nificantly worse than no_rels in terms of recog- 
nition performance, though it is still two and a 
half times slower. 
Figure 2 compares rels and unlinked on the 
utterances containing a relative clause. It seems 
reasona.ble to say that recognition performance 
675 
I C4ran"nar I I FR .i I IWER 1 
no_rels 1.04 9.0% - 6.0% 
rels 4.76 16.1% 1.1% 5.7% 
unlinked 2.60 9.6% - 6.5% 
Figure 1: Evaluation results for 31 utterances 
not containing relative clauses, averaged across 
8 subjects excluding extreme values. 
Grammar xRT FRej Men: WER\] 
rels 4.60 26.7% 1.6% 3.5%\] 
unlinked 5.29 20.0% - 5.4%J 
Figure 2: Evaluation results for i0 utter~mces 
containing relative clauses, averaged across 8 
subjects excluding extreme values. 
is comparable for the two versions: rels has 
lower word error rate, but also rqjects more 
utterances. Recognition speed is marginally 
lower for unlinked, though it is not clear to us 
whether the difference is significant given the 
high variability of the data. 
6 Conclusions and Further 
Directions j 
We found the results presented above surpris- 
ing and interesting. When we 1)egal: our pro- 
gramme of attempting to compile increasingly 
larger linguistically based unification grammars 
into  models, we had expected to see a 
steady combinatorial increase, which we guessed 
would be most obviously related to complex 
clause structure. This did not turn out to be the 
case. Instead, the serious problems we encoun- 
tered were caused by a small number of crit- 
ical rules, of which the one for relative clause 
modification was by the far the worst. It was 
not immediately obvious how to deal with the 
problem, but a careful analysis revealed a rea- 
sonable con:promise solution, whose only draw- 
back was a significant but undisastrous degra- 
dation in recognition speed. 
It seems optimistic to hope that the rela- 
tive clause problem is the end of the story; the 
obvious way to investigate is by continuing to 
expand the gramlnar in the same incremental 
fashion, and find out what happens next. We 
intend to do this over the next few months, and 
expect in due course to be able to l)resent fur- 
ther results. 

References 

H. Alshawi. 1992. The Core Language Engine. 
Cambridge, Massachusetts: The MIT Press. 

J. Bresnan, R.M. Kapla.n, S. Peters and A. Za- 
enen. Cross-Serial Dependencies in Dutch. 
1987. In W. J. Savitch et al (eds.), The For- 
real Complexity of Natural Languagc, Reidel, 
Dordrecht, pages 286-319. 

G. Gazdar, E. Klein, G. Pullum and I. Sag. 
1985. Generalized Phrase Structure Gram- 
mar Basil Blackwell. 

IBM. 1999. ViaVoice SDK tbr Windows, ver- 
sion 1.5. 

R. Moore, J. Dowding, H. Bratt, J.M. Gawron, 
Y. Gorfl:, and A. Cheyer. 1997. Com- 
mandTalk: A Spoken-Language Interface 
tbr Battlefield Simulations. Proceedings 
of the Fifth Conference on Applied Nat- 
uraI Languagc Processing, pages 1-7, 
Washington, DC. Available online from 
http ://www. ai. sri. com/natural- 
/project s/arpa-sl s / commandt alk. html. 

Nuance Communications. 1999. Nuance Speech 
Recognition System Developer's Manv, aI, Ver- 
sion 6.2 

G. Pullum and G. Gazdar. 1982. Natural Lan- 
guages and Context-Free Languages. Lin- 
guistics and Philosophy, 4, pages 471-504. 

S.G. Puhnan. 1992. Unification-Based Synta.c- 
tic Analysis. In (Alshawi 1992) 

M. Rayner. 1993. English Linguistic Coverage. 
In M.S. Agn~s et al. 1993. Spoken Language 
Translator: First Year Report. SRI Techni- 
cal Report CRC-043. Available online from 
http ://www. sri. com. 

M. Rayner, B.A. Hockey and F. James. 2000. 
Turning Speech into Scripts. To appear in 
P~vceedings of the 2000 AAAI Spring Sym- 
posium on Natural Language Dialogues with 
Practical Robotic Devices 

A. Stent, J. Dowding, J.M. Gawron, E.O. 
Bratt, and R. Moore. 1999. The Coin- 
mandTalk Spoken Dialogue System. P'rv- 
cecdings of the 37th Annual Meeting of the 
ACL, pages 183-190. Available online from 
http ://www. ai. sri. com/natural- 
/project s/arpa-s :s / commandt a:k. html. 
