Proceedings of the 3rd Workshop on Scalable Natural Language Understanding, pages 9–16,
New York City, June 2006. c©2006 Association for Computational Linguistics
Backbone Extraction and Pruning for Speeding Up a Deep Parser for
Dialogue Systems
Myroslava O. Dzikovska
Human Communication Research Centre
University of Edinburgh
Edinburgh, EH8 9LW, United Kingdom,
mdzikovs@inf.ed.ac.uk
Carolyn P. Ros´e
Carnegie Mellon University
Language Technologies Institute,
Pittsburgh, PA 15213,
cprose@cs.cmu.edu
Abstract
In this paper we discuss issues related to
speeding up parsing with wide-coverage
uni cation grammars. We demonstrate
that state-of-the-art optimisation tech-
niques based on backbone parsing before
uni cation do not provide a general so-
lution, because they depend on speci c
properties of the grammar formalism that
do not hold for all uni cation based gram-
mars. As an alternative, we describe an
optimisation technique that combines am-
biguity packing at the constituent structure
level with pruning based on local features.
1 Introduction
In this paper we investigate the problem of scaling
up a deep uni cation-based parser developed specif-
ically for the purpose of robust interpretation in dia-
logue systems by improving its speed and coverage
for longer utterances. While typical sentences in di-
alogue contexts are shorter than in expository text
domains, longer utterances are important in discus-
sion oriented domains. For example, in educational
applications of dialogue it is important to elicit deep
explanation from students and then offer focused
feedback based on the details of what students say.
The choice of instructional dialogue as a target ap-
plication in uenced the choice of parser we needed
to use for interpretation in a dialogue system. Sev-
eral deep, wide-coverage parsers are currently avail-
able (Copestake and Flickinger, 2000; Ros·e, 2000;
Baldridge, 2002; Maxwell and Kaplan, 1994), but
many of these have not been designed with issues re-
lated to interpretation in a dialogue context in mind.
The TRIPS grammar (Dzikovska et al., 2005) is a
wide-coverage uni cation grammar that has been
used very successfully in several task-oriented di-
alogue systems. It supports interpretation of frag-
ments and lexical semantic features (see Section 2
for a more detailed discussion), and provides addi-
tional robustness through  robust rules that cover
common grammar mistakes found in dialogue such
as missing articles or incorrect agreement. These
enhancements help parsing dialogue (both spoken
and typed), but they signi cantly increase grammar
ambiguity, a common concern in building grammars
for robust parsing (Schneider and McCoy, 1998). It
is speci cally these robustness-ef ciency trade-offs
that we address in this paper.
Much work has been done related to enhanc-
ing the ef ciency of deep interpretation systems
(Copestake and Flickinger, 2000; Swift et al., 2004;
Maxwell and Kaplan, 1994), which forms the foun-
dation that we build on in this work. For example,
techniques for speeding up uni cation in HPSG lead
to dramatic improvements in ef ciency (Kiefer et
al., 1999). Likewise ambiguity packing and CFG
backbone parsing (Maxwell and Kaplan, 1994; van
Noord, 1997) are known to increase parsing ef -
ciency. However, as we show in this paper, these
techniques depend on speci c grammar properties
that do not hold for all grammars. This claim is con-
sistent with observations of Carroll (1994) that pars-
ing software optimisation techniques tend to be lim-
ited in their applicability to the individual grammars
they were developed for. While we used TRIPS as
our example uni cation-based grammar, this inves-
tigation is important not only for this project, but in
the general context of speeding up a wide-coverage
uni cation grammar which incorporates fragment
9
rules and lexical semantics, which may not be im-
mediately provided by other available systems.
In the remainder of the paper, we begin with a
brief description of the TRIPS parser and grammar,
and motivate the choice of LCFLEX parsing algo-
rithm to provide a fast parsing foundation. We then
discuss the backbone extraction and pruning tech-
niques that we used, and evaluate them in compar-
ison with the original parsing algorithm. We con-
clude with discussion of some implications for im-
plementing grammars that build deep syntactic and
semantic representations.
2 Motivation
The work reported in this paper was done as part
of the process of developing a dialogue system that
incorporates deep natural language understanding.
We needed a grammar that provides lexical seman-
tic interpretation, supports parsing fragmentary ut-
terance in dialogue, and could be used to start de-
velopment without large quantities of corpus data.
TRIPS ful lled our requirements better than sim-
ilar alternatives, such as LINGO ERG (Copestake
and Flickinger, 2000) or XLE (Maxwell and Kaplan,
1994).
TRIPS produces logical forms which include se-
mantic classes and roles in a domain-independent
frame-based formalism derived from FrameNet and
VerbNet (Dzikovska et al., 2004; Kipper et al.,
2000). Lexical semantic features are known to be
helpful in both deep (Tetreault, 2005) and shal-
low interpretation tasks (Narayanan and Harabagiu,
2004). Apart from TRIPS, these have not been in-
tegrated into existing deep grammars. While both
LINGO-ERG and XLE include semantic features
related to scoping, in our applications the avail-
ability of semantic classes and semantic role as-
signments was more important to interpretation,
and these features are not currently available from
those parsers. Finally, TRIPS provides a domain-
independent parse selection model, as well as rules
for interpreting discourse fragments (as was also
done in HPSG (Schlangen and Lascarides, 2003), a
feature actively used in interpretation.
While TRIPS provides the capabilities we need,
its parse times for long sentences (above 15 words
long) are intolerably long. We considered two pos-
sible techniques for speeding up parsing: speeding
up uni cation using the techniques similar to the
LINGO system (Copestake and Flickinger, 2000),
or using backbone extraction (Maxwell and Ka-
plan, 1994; Ros·e and Lavie, 2001; Briscoe and Car-
roll, 1994). TRIPS already uses a fast uni cation
algorithm similar to quasi-destructive uni cation,
avoiding copying during uni cation.1 However,
the TRIPS grammar retains the notion of phrase
structure, and thus it was more natural to chose to
use backbone extraction with ambiguity packing to
speed up the parsing.
As a foundation for our optimisation work, we
started with the freely available LCFLEX parser
(Ros·e and Lavie, 2001). LCFLEX is an all-paths
parser that uses left-corner prediction and ambigu-
ity packing to make all-paths parsing tractable, and
which was shown to be ef cient for long sentences
with somewhat less complex uni cation augmented
context-free grammars. We show that all-paths pars-
ing with LCFLEX is not tractable for the ambiguity
level in the TRIPS grammar, but that by introduc-
ing a pruning method that uses ambiguity packing to
guide pruning decisions, we can achieve signi cant
improvements in both speed and coverage compared
to the original TRIPS parser.
3 The TRIPS and LCFLEX algorithms
3.1 The TRIPS parser
The TRIPS parser we use as a baseline is a bottom-
up chart parser with lexical entries and rules repre-
sented as attribute-value structures. To achieve pars-
ing ef ciency, TRIPS uses a best- rst beam search
algorithm based on the scores from a parse selection
model (Dzikovska et al., 2005; Elsner et al., 2005).
The constituents on the parser’s agenda are grouped
into buckets based on their scores. At each step, the
bucket with the highest scoring constituents is se-
lected to build/extend chart edges. The parsing stops
once N requested analyses are found. This guaran-
tees that the parser returns the N-best list of analyses
according to the parse selection model used, unless
the parser reaches the chart size limit.
1Other enhancements used by LINGO depend on disallow-
ing disjunctive features, and relying instead on the type system.
The TRIPS grammar is untyped and uses disjunctive features,
and converting it to a typed system would require as yet unde-
termined amount of additional work.
10
In addition to best- rst parsing, the TRIPS parser
uses a chart size limit, to prevent the parser from
running too long on unparseable utterances, similar
to (Frank et al., 2003). TRIPS is much slower pro-
cessing utterances not covered in the grammar, be-
cause it continues its search until it reaches the chart
limit. Thus, a lower chart limit improves parsing
ef ciency. However, we show in our evaluation that
the chart limit necessary to obtain good performance
in most cases is too low to  nd parses for utterances
with 15 or more words, even if they are covered by
the grammar.
The integration of lexical semantics in the TRIPS
lexicon has a major impact on parsing in TRIPS.
Each word in the TRIPS lexicon is associated with a
semantic type from a domain-independent ontology.
This enables word sense disambiguation and seman-
tic role labelling for the logical form produced by
the grammar. Multiple word senses result in addi-
tional ambiguity on top of syntactic ambiguity, but it
is controlled in part with the use of weak selectional
restrictions, similar to the restrictions employed by
the VerbNet lexicon (Kipper et al., 2000). Check-
ing semantic restrictions is an integral part of TRIPS
parsing, and removing them signi cantly decreases
speed and increases ambiguity of the TRIPS parser
(Dzikovska, 2004). We show that it also has an im-
pact on parsing with a CFG backbone in Section 4.1.
3.2 LCFLEX
The LCFLEX parser (Ros·e and Lavie, 2001) is an
all-paths robust left corner chart parser designed to
incorporate various robustness techniques such as
word skipping,  exible uni cation, and constituent
insertion. Its left corner chart parsing algorithm
is similar to that described by Briscoe and Carroll
(1994). The system supports grammatical speci -
cation in a uni cation framework that consists of
context-free grammar rules augmented with feature
bundles associated with the non-terminals of the
rules. LCFLEX can be used in two parsing modes:
either context-free parsing can be done  rst, fol-
lowed by applying the uni cation rules, or uni ca-
tion can be done interleaved with context-free pars-
ing. The context free backbone allows for ef cient
left corner predictions using a pre-compiled left cor-
ner prediction table, such as that described in (van
Noord, 1997). To enhance its ef ciency, it incor-
porates a provably optimal ambiguity packing algo-
rithm (Lavie and Ros·e, 2004).
These ef ciency techniques make feasible all-
path parsing with the LCFLEX CARMEL grammar
(Ros·e, 2000). However, CARMEL was engineered
with fast all-paths parsing in mind, resulting in cer-
tain compromises in terms of coverage. For exam-
ple, it has only very limited coverage for noun-noun
compounding, or headless noun phrases, which are a
major source of ambiguity with the TRIPS grammar.
4 Combining LCFLEX and TRIPS
4.1 Adding CFG Backbone
A simpli ed TRIPS grammar rule for verb phrases
and a sample verb entry are shown in Figure 1. The
features for building semantic representations are
omitted for brevity. Each constituent has an assigned
category that corresponds to its phrasal type, and a
set of (complex-valued) features.
The backbone extraction algorithm is reason-
ably straightforward, with CFG non-terminals cor-
responding directly to TRIPS constituent categories.
To each CFG rule we attach a corresponding TRIPS
uni cation rule. After parsing is complete, the
parses found are scored and ordered with the parse
selection model, and therefore parsing accuracy in
all-paths mode is the same or better than TRIPS ac-
curacy for the same model.
For constituents with subcategorized arguments
(verbs, nouns, adverbial prepositions), our back-
bone generation algorithm takes the subcategoriza-
tion frame into account. For example, the TRIPS
VP rule will split into 27 CFG rules corresponding
to different subcategorization frames: VP→V intr,
VP→V NP NP, VP→V NP CP NP CP, etc. For
each lexical entry, its appropriate CFG category is
determined based on the subcategorization frame
from TRIPS lexical representation. This improves
parsing ef ciency using the prediction algorithms in
TFLEX operating on the CFG backbone. The ver-
sion of the TRIPS grammar used in testing con-
tained 379 grammar rules with 21 parts of speech
(terminal symbols) and 31 constituent types (non-
terminal symbols), which were expanded into 1121
CFG rules with 85 terminals and 36 non-terminals
during backbone extraction.
We found, however, that the previously used tech-
11
(a) ((VP (SUBJ ?!subj) (CLASS ?lf))
-vp1-role .99
(V (LF ?lf) (SUBJ ?!subj) (DOBJ ?dobj)
(IOBJ ?iobj) (COMP3 ?comp3))
?iobj ?dobj ?comp3)
(b) ((V (agr 3s) (LF LF::Filling)
(SUBJ (NP (agr 3s)))
(DOBJ (NP (case obj))) (IOBJ -) (COMP3 -)))
Figure 1: (a) A simpli ed VP rule from the TRIPS
grammar; (b) a simpli ed verb entry for a transitive
verb. Question marks denote variables.
nique of context-free parsing  rst followed by full
re-uni cation was not suitable for parsing with the
TRIPS grammar. The CFG structure extracted from
the TRIPS grammar contains 43 loops resulting
from lexical coercion rules or elliptical construc-
tions. A small number of loops from lexical coer-
cion were both obvious and easy to avoid, because
they are in the form N→ N. However, there were
longer loops, for example, NP → SPEC for sen-
tences like  John’s car and SPEC → NP for head-
less noun phrases in sentences like  I want three .
LCFLEX uses a re-uni cation algorithm that asso-
ciates a set of uni cation rules with each CFG pro-
duction, which are reapplied at a later stage. To
be able to apply a uni cation rule corresponding to
N→ N production, it has to be explicitly present in
the chart, leading to an in nite number of N con-
stituents produced. Applying the extra CFG rules
expanding the loops during re-uni cation would
complicate the algorithm signi cantly. Instead, we
implemented loop detection during CFG parsing.
The feature structures prevent loops in uni ca-
tion, and we considered including certain grammat-
ical features into backbone extraction as done in
(Briscoe and Carroll, 1994). However, in the TRIPS
grammar the feature values responsible for break-
ing loops belonged to multi-valued features (6 val-
ued in the worst case), with values which may de-
pend on other multiple-valued features in daughter
constituents. Thus adding the extra features resulted
in major backbone size increases because of cate-
gory splitting. This can be remedied with additional
pre-compilation (Kiefer and Krieger, 2004), how-
ever, this requires that all lexical entries be known
in advance. One nice feature of the TRIPS lex-
icon is that it includes a mechanism for dynami-
cally adding lexical entries for unknown words from
wide-coverage lexicons such as VerbNet (Kipper et
al., 2000), which would be impractical to use in pre-
compilation.
Therefore, to use CFG parsing before uni cation
in our system, we implemented a loop detector that
checked the CFG structure to disallow loops. How-
ever, the next problem that we encountered is mas-
sive ambiguity in the CFG structure. Even a very
short phrase such as  a train had over 700 possi-
ble CFG analyses, and took 910 msec to parse com-
pared to 10 msec with interleaved uni cation. CFG
ambiguity is so high because noun phrase fragments
are allowed as top-level categories, and lexical am-
biguity is compounded with semantic ambiguity and
robust rules normally disallowed by features during
uni cation. Thus, in our combined algorithm we had
to use uni cation interleaved with parsing to  lter
out the CFG constituents.
4.2 Ambiguity Packing
For building semantic representations in parallel
with parsing, ambiguity packing presents a set of
known problems (Oepen and Carroll, 2000). One
possible solution is to exclude semantic features dur-
ing an initial uni cation stage, use ambiguity pack-
ing, and re-unify with semantic features in a post-
processing stage. In our case, we found this strategy
dif cult to implement, since selectional restrictions
are used to limit the ambiguity created by multiple
word senses during syntactic parsing. Therefore, we
chose to do ambiguity packing on the CFG structure
only, keeping the multiple feature structures associ-
ated with each packed CFG constituent.
To begin to evaluate the contribution of ambiguity
packing on ef ciency, we ran a test on the  rst 39 ut-
terances in a hold out set not used in the formal eval-
uation below. Sentences ranged from 1 to 17 words
in length, 16 of which had 6 or more words. On this
set, the average parse time without ambiguity pack-
ing was 10 seconds per utterance, and 30 seconds per
utterance on utterances with 6 or more words. With
ambiguity packing turned on, the average parse time
decreased to 5 seconds per utterance, and 13.5 sec-
onds per utterance on the utterances with more than
6 words. While this evaluation showed that ambi-
12
guity packing improves parsing ef ciency, we deter-
mined that further enhancements were necessary.
4.3 Pruning
We added a pruning technique based on the scor-
ing model discussed above and ambiguity packing
to enhance system performance. As an illustration,
consider an example from a corpus used in our eval-
uation where the TRIPS grammar generates a large
number of analyses,  we have a heart attack vic-
tim at marketplace mall . The phrase  a heart at-
tack victim has at least two interpretations, a [N1
heart [N1 attack [N1 victim]]] and  a [N1 [N1 heart
[N1 attack]] [N1 victim]] . The prepositional phrase
 at marketplace mall can attach either to the noun
phrase or to the verb. Overall, this results in 4 basic
interpretations, with additional ambiguity resulting
from different possible senses of  have .
The best- rst parsing algorithm in TRIPS uses
parse selection scores to suppress less likely inter-
pretations. In our example, the TRIPS parser will
chose the higher-scoring one of the two interpreta-
tions for  a heart attack victim , and use it  rst. For
this NP the features associated with both interpreta-
tions are identical with respect to further processing,
thus TRIPS will never come back to the other in-
terpretation, effectively pruning it.  At also has 2
possible interpretations due to word sense ambigu-
ity: LF::TIME-LOC and LF::SPATIAL-LOC. The
former has a slightly higher preference, and TRIPS
will try it  rst. But then it will be unable to  nd an
interpretation for  at Marketplace Mall , and back-
track to LF::SPATIAL-LOC to  nd a correct parse.
Without chart size limits the parser is guaran-
teed to  nd a parse eventually through backtracking.
However, this algorithm does not work quite as well
with chart size limits. If there are many similarly-
scored constituents in the chart for different parts of
the utterance, the best- rst algorithm expands them
 rst, and the the chart size limit tends to interfere
before TRIPS can backtrack to an appropriate lower-
scoring analysis.
Ambiguity packing offers an opportunity to make
pruning more strategic by focusing speci cally on
competing interpretations for the same utterance
span. The simplest pruning idea would be for ev-
ery ambiguity packed constituent to eliminate the in-
terpretations with low TRIPS scores. However, we
need to make sure that we don’t prune constituents
that are required higher up in the tree to make a
parse. Consider our example again.
The constituent for  at will be ambiguity
packed with its two meanings. But if we prune
LF::SPATIAL-LOC at that point, the parse for  at
Marketplace Mall will fail later. Formally, the com-
peting interpretations for  at have non-local fea-
tures, namely, the subcategorized complement (time
versus location) is different for those interpretations,
and is checked higher up in the parse. But for  a
heart attack victim the ambiguity-packed interpre-
tations differ only in local features. All features as-
sociated with this NP checked higher up come from
the head noun  victim and are identical in all inter-
pretations. Therefore we can eliminate the low scor-
ing interpretations with little risk of discarding those
essential for  nding a complete parse. Thus, for
any constituent where ambiguity-packed non-head
daughters differ only in local features, we prune
the interpretations coming from them to a speci ed
prune beam width based on their TRIPS scores.
This pruning heuristic based on local features
can be generalised to different uni cation grammars.
For example, in HPSG pruning would be safe at all
points where a head is combined with ambiguity-
packed non-head constituents, due to the locality
principle. In the TRIPS grammar, if a trips rule
uses subcategorization features, the same locality
principle holds. This heuristic has perfect precision
though not complete recall, but, as our evaluation
shows, it is suf cient to signi cantly improve per-
formance in comparison with the TRIPS parser.
5 Evaluation
The purpose of our evaluation is to explore the ex-
tent to which we can achieve a better balance be-
tween parse time and coverage using backbone pars-
ing with pruning compared to the original best- rst
algorithm. For our comparison we used an excerpt
from the Monroe corpus that has been used in previ-
ous TRIPS research on parsing speed and accuracy
(Swift et al., 2004) consisting of dialogues s2, s4,
s16 and s17. Dialogue s2 was a hold out set used for
pilot testing and setting parameters. The other three
dialogues were set aside for testing. Altogether, the
test set contained 1042 utterances, ranging from 1 to
13
45 words in length (mean 5.38 words/utt, st. dev. 5.7
words/utt). Using our hold-out set, we determined
that a beam width of three was an optimal setting.
Thus, we compared TFLEX using a beam width of 3
to three different versions of TRIPS that varied only
in terms of the maximum chart size, giving us a ver-
sion that is signi cantly faster than TFLEX overall,
one that has parse times that are statistically indis-
tinguishable from TFLEX, and one that is signi -
cantly slower. We show that while lower chart sizes
in TRIPS yield speed ups in parse time, they come
with a cost in terms of coverage.
5.1 Evaluation Methodology
Because our goal is to explore the parse time versus
coverage trade-offs of two different parsing architec-
tures, the two evaluation measures that we report are
average parse time per sentence and probability of
 nding at least one parse, the latter being a measure
estimating the effect of parse algorithm on parsing
coverage.
Since the scoring model is the same in TRIPS and
TFLEX, then as long as TFLEX can  nd at least one
parse (which happened in all but 1 instances on our
held-out set), the set returned will include the one
produced by TRIPS. We spot-checked the TFLEX
utterances in the test set for which TRIPS could
not  nd a parse to verify that the parses produced
were reasonable. The parses produced by TFLEX on
these sentences were typically acceptable, with er-
rors mainly stemming from attachment disambigua-
tion problems.
5.2 Results
We  rst compared parsers in terms of probability of
producing at least one parse (see Figure 2). Since
the distribution of sentence lengths in the test corpus
was heavily skewed toward shorter sentences, we
grouped sentences into equivalence classes based on
a range of sentence lengths with a 5-word increment,
with all of the sentences over 20 words aggregated
in the same class. Given a large number of short sen-
tences, there was no signi cant difference overall in
likelihood to  nd a parse. However, on sentences
greater than 10 words long, TFLEX is signi cantly
more likely to produce a parse than any of the TRIPS
parsers (evaluated using a binary logistic regression,
N = 334, G = 16.8, DF = 1, p < .001). Fur-
Parser <= 20 words >= 6 words
TFLEX 6.2 (20.2) 29.1 (96.3)
TRIPS-1500 2.3 (5.4) 6.9 (8.2)
TRIPS-5000 7.7 (30.2) 28.1 (56.4)
TRIPS-10000 22.7 (134.4) 107.6 (407.4)
Table 1: The average parse times for TRIPS and
TFLEX on utterances 6 words or more.
thermore, for sentences greater than 20 words long,
no form of TRIPS parser ever returned a complete
parse.
Next we compared the parsers in terms of aver-
age parse time on the whole data set across equiva-
lence classes of sentences, assigned based on Aggre-
gated Sentence Length (see Figure 2 and Table 1).
An ANOVA with Parser and Aggregated Sentence
Length as independent variables and Parse Time as
the dependent variable showed a signi cant effect
of Parser on Parse Time (F(3, 4164) = 270.03,
p < .001). Using a Bonferroni post-hoc analysis, we
determined that TFLEX is signi cantly faster than
TRIPS-10000 (p < .001), statistically indistinguish-
able in terms of parse time from TRIPS-5000, and
signi cantly slower than TRIPS-1500 (p < .001).
Since none of the TRIPS parsers ever returned a
parse for sentences greater than 20 words long, we
recomputed this analysis excluding the latter. We
still  nd a signi cant effect of Parser on Parse Time
(F(3, 4068) = 18.6, p < .001). However, a post-
hoc analysis reveals that parse times for TFLEX,
TRIPS-1500, and TRIPS-5000 are statistically in-
distinguishable for this subset, whereas TFLEX is
signi cantly faster than TRIPS-10000 (p < .001).
See Table 1 for for parse times of all four parsers.
Since TFLEX and TRIPS both spent 95% of their
computational effort on sentences with 6 or more
words, we also include results for this subset of the
corpus.
Thus, TFLEX presents a superior balance of cov-
erage and ef ciency especially for long sentences
(10 words or more) since for these sentences it is
signi cantly more likely to  nd a parse than any ver-
sion of TRIPS, even a version where the chart size is
expanded to an extent that it becomes signi cantly
slower (i.e., TRIPS-10000). And while TRIPS-1500
is consistently faster than the other parsers, it is
not signi cantly faster than TFLEX on sentences 20
14
Figure 2: Parse times and probability of getting a parse depending on (aggregated) sentence lengths. 5
denotes sentences with 5 or fewer words, 25 sentences with more than 20 words.
words long or less, which is the subset of sentences
for which it is able to  nd a parse.
5.3 Discussion and Future Work
The most obvious lesson learned in this experience
is that the speed up techniques developed for speci c
grammars and uni cation formalisms do not transfer
easily to other uni cation grammars. The features
that make TRIPS interesting  the inclusion of lex-
ical semantics, and the rules for parsing fragments
 also make it less amenable to using existing ef -
ciency techniques.
Grammars with an explicit CFG backbone nor-
mally restrict the grammar writer from writing
grammar loops, a restriction not imposed by gen-
eral uni cation grammars. As we showed, there
can be a substantial number of loops in a CFG due
to the need to cover various elliptical constructions,
which makes CFG parsing not interleaved with uni-
 cation less attractive in cases where we want to
avoid expensive CFG precompilation. Moreover, as
we found with the TRIPS grammar, in the context
of robust parsing with lexical semantics the ambigu-
ity in a CFG backbone grows large enough to make
CFG parsing followed by uni cation inef cient. We
described an alternative technique that uses pruning
based on a parse selection model.
Another option for speeding up parsing that we
have not discussed in detail is using a typed gram-
mar without disjunction and speeding up uni cation
as done in HPSG grammars (Kiefer et al., 1999). In
order to do this, we must  rst address the issue of
integrating the type of lexical semantics that we re-
quire with HPSG’s type system. Adding lexical se-
mantics while retaining the speed bene ts obtained
through this type system would require that the se-
mantic type ontology be expressed in the same for-
malism as the syntactic types. We plan to further
explore this option in our future work.
Though longer sentences were relatively rare
in our test set, using the system in an educa-
tional domain (our ultimate goal) means that the
longer sentences are particularly important, because
they often correspond to signi cant instructional
events, speci cally answers to deep questions such
as  why and  how questions. Our evaluation has
been designed to show system performance with ut-
terances of different length, which would roughly
correspond to the performance in interpreting short
and long student answers. Since delays in respond-
ing can de-motivate the student and decrease the
quality of the dialogue, improving handling of long
utterances can have an important effect on the sys-
tem performance. Evaluating this in practice is a
possible direction for future work.
6 Conclusions
We described a combination of ef cient parsing
techniques to improve parsing speed and coverage
with the TRIPS deep parsing grammar. We showed
that context-free parsing was inef cient on a back-
bone extracted from an existing uni cation gram-
mar, and demonstrated how to make all-path pars-
ing more tractable by a new pruning algorithm based
15
on ambiguity packing and local features, general-
isable to other uni cation grammars. We demon-
strated that our pruning algorithm provides better
ef ciency-coverage balance than the best- rst pars-
ing with chart limits utilised by the TRIPS parser,
and discussed the design implications for other ro-
bust parsing grammars.
Acknowledgements
We thank Mary Swift and James Allen for their
help with the TRIPS code and useful comments.
This material is based on work supported by grants
from the Of ce of Naval Research under numbers
N000140510048 and N000140510043.

References
J. Baldridge. 2002. Lexically Specified Derivational
Control in Combinatory Categorial Grammar. Ph.D.
thesis, University of Edinburgh.
T. Briscoe and J. Carroll. 1994. Generalized proba-
bilistic LR parsing of natural language (corpora) with
uni cation-based grammars. Computational Linguis-
tics, 19(1):25 59.
J. Carroll. 1994. Relating complexity to practical per-
formance in parsing with wide-coverage uni cation
grammars. In Proceedings of ACL-2004.
A. Copestake and D. Flickinger. 2000. An open
source grammar development environment and broad-
coverage English grammar using HPSG. In Proceed-
ings of LREC-2000, Athens, Greece.
M. O. Dzikovska, M. D. Swift, and J. F. Allen. 2004.
Building a computational lexicon and ontology with
framenet. In LREC workshop on Building Lexical Re-
sources from Semantically Annotated Corpora.
M. Dzikovska, M. Swift, J. Allen, and W. de Beaumont.
2005. Generic parsing for multi-domain semantic in-
terpretation. In Proceedings of the 9th International
Workshop on Parsing Technologies (IWPT-05).
M. O. Dzikovska. 2004. A Practical Semantic Represen-
tation For Natural Language Parsing. Ph.D. thesis,
University of Rochester.
M. Elsner, M. Swift, J. Allen, and D. Gildea. 2005. On-
line statistics for a uni cation-based dialogue parser.
In Proceedings of the 9th International Workshop on
Parsing Technologies (IWPT-05).
A. Frank, B. Kiefer, B. Crysmann, M. Becker, and
U. Schafer. 2003. Integrated shallow and deep pars-
ing: TopP meets HPSG. In Proceedings of ACL 2003.
B. Kiefer and H.-U. Krieger. 2004. A context-free ap-
proximation of head-driven phrase structure grammar.
In H. Bunt, J. Carroll, and G. Satta, editors, New De-
velopments in Parsing Technology. Kluwer.
B. Kiefer, H. Krieger, J. Carroll, and R. Malouf. 1999.
Bag of useful techniques for ef cient and robust pars-
ing. In Proceedings of ACL 1999.
K. Kipper, H. T. Dang, and M. Palmer. 2000. Class-
based construction of a verb lexicon. In Proceedings
of the 7th Conference on Artificial Intelligence and of
the 12th Conference on Innovative Applications of Ar-
tificial Intelligence.
A. Lavie and C. P. Ros·e. 2004. Optimal ambiguity pack-
ing in context-free parsers with interleaved uni cation.
In H. Bunt, J. Carroll, and G. Satta, editors, Current Is-
sues in Parsing Technologies. Kluwer Academic Press.
J. T. Maxwell and R. M. Kaplan. 1994. The interface
between phrasal and functional constraints. Computa-
tional Linguistics, 19(4):571 590.
S. Narayanan and S. Harabagiu. 2004. Question answer-
ing based on semantic structures. In Proceedings of
International Conference on Computational Linguis-
tics (COLING 2004), Geneva, Switzerland.
S. Oepen and J. Carroll. 2000. Ambiguity packing in
constraint-based parsing - practical results. In Pro-
ceedings of NAACL’00.
C. P. Ros·e and A. Lavie. 2001. Balancing robustness
and ef ciency in uni cation-augmented context-free
parsers for large practical applications. In J. Junqua
and G. Van Noord, editors, Robustness in Language
and Speech Technology. Kluwer Academic Press.
C. Ros·e. 2000. A framework for robust semantic in-
terpretation. In Proceedings 1st Meeting of the North
American Chapter of the Association for Computa-
tional Linguistics.
D. Schlangen and A. Lascarides. 2003. The interpreta-
tion of non-sentential utterances in dialogue. In Pro-
ceedings of the 4th SIGdial Workshop on Discourse
and Dialogue, Japan, May.
D. Schneider and K. F. McCoy. 1998. Recognizing syn-
tactic errors in the writing of second language learners.
In Proceedings of COLING-ACL’98.
M. Swift, J. Allen, and D. Gildea. 2004. Skeletons in
the parser: Using a shallow parser to improve deep
parsing. In Proceedings of COLING-04.
J. Tetreault. 2005. Empirical Evaluations of Pronoun
Resolution. Ph.D. thesis, University of Rochester.
G. van Noord. 1997. An ef cient implementation of
the head-corner parser. Computational Linguistics,
23(3):425 456.
