Proceedings of EACL '99 
Repair Strategies for Lexicalized Tree Grammars 
Patrice Lopez 
LORIA, 
BP239, 54500 Vandoeuvre, 
FRANCE 
lopez@loria.fr 
Abstract 
This paper presents a framework for the 
definition of monotonic repair rules on 
chart items and Lexicalized Tree Gram- 
mars. We exploit island representations 
and a new level of granularity for the 
linearization of a tree called connected 
routes. It allows to take into account the 
topology of the tree in order to trigger 
additional rules. These local rules cover 
ellipsis and common extra-grammatical 
phenomena such as self-repairs. First re- 
sults with a spoken language corpora are 
presented. 
Introduction 
In the context of spoken task-oriented man- 
machine and question-answering dialogues, one of 
the most important problem is to deal with spon- 
taneous and unexpected syntactical phenomena. 
Utterances can be very incomplete and difficult 
to predict which questions the principle of gram- 
maticality. Moreover large covering grammars are 
generally dedicated to written text parsing and 
it is not easy to exploit such a grammar for the 
analysis of spoken language even if complex syn- 
tax does not occur. 
For such sentences, robust parsing techniques 
are necessary to extract a maximum of informa- 
tion from the utterance even if a Complete parsing 
fails (at least all possible constituents). Consid- 
ering parsing of word-graphs and the large search 
space of parsing algorithms in order to compute all 
possible ambiguities, the number of partial parses 
can be very important. A robust semantic pro- 
cessing on these partial derivations would result in 
a prohibitive number of hypotheses. We argue in 
this paper that appropriate syntactical constraints 
expressed in a Lexicalized Tree Grammar (LTG) 
can trigger efficient repair rules for specific oral 
phenomena. 
First results of a classical grammatical parsing 
are presented, they show that robust parsing need 
to cope with oral phenomena. We argue then that 
extended domain of locality and lexicalization of 
LTG can be exploited in order to express repair 
local rules for these specific spoken phenomena. 
First results of this approach are presented. 
1 LTG parsing and repairing 
strategy 
1.1 Experimental results 
Table 1 presents parsing test results of the Go- 
cad corpora. This corpora contains 861 utterances 
in French of transcribed spontaneous spoken lan- 
guage collected with a Wizard of Oz experiment 
(Chapelier et al., 1995). We used a bottom-up 
parser (Lopez, 1998b) for LTAG. The size of the 
grammar was limited compared with (Candito, 
1999) and corresponds to the sublanguage used in 
the Gocad application. However designing princi- 
ples of the grammar was close to the large covering 
French LTAG grammar just including additional 
elementary trees (for example for unexpected ad- 
verbs which can modify predicative nouns) and a 
notation enrichment for the possible ellipsis occur- 
rences (Lopez, 1998a). The LTAG grammar for 
the sublanguage corresponds to a syntactical lex- 
icon of 529 entries and a set of 80 non-instancied 
elementary trees. 
A taxonomy of parsing errors occurring in oral 
dialogue shows that the majority of failures are 
linked to orality: hesitations, repetitions, self re- 
pairs and some head ellipsis. The table 2 gives the 
occurrence of these oral phenomena in the Gocad 
corpora. Of course more than one phenomenon 
can occur in the same utterance. 
Prediction of these spoken phenomena would re- 
sult in a very high parsing cost. However if we 
can detect these oral phenomena with additional 
techniques combining partial results, the number 
of hypotheses at the semantic level will decrease. 
249 
Proceedings of EACL '99 
Corpus % complete \] Average no 
parses , of parses/utter. 
Cocad II 78.3 II 2.o 
Average no of 
partial results/utter. 
7.1 
Table 1: Global results for the parsing of the Gocad corpora utterances 
ill-formed with with with I agrammatical 
utterances hesitations repetitions self-repairs \[ ellipsis 
Occurrences II 123 II 28 22 II 15 
Table 2: Occurrences of error oral phenomena in the Gocad corpora 
1.2 Exploiting Lexicalized Tree 
Grammars 
The choice of a LTG (Lexicalized Tree Grammar), 
more specifically a LTAG (Lexicalized Tree Adjo- 
ing Grammar), can be justified by the two main 
following reasons: first the lexicalization and the 
extended domain of locality allow to express easily 
lexical constraints in partial parsing trees (elemen- 
tary trees), secondly robust bottom-up parsing al- 
gorithms, stochastic models and efficient precom- 
pilation of the grammar (Evans and Weir, 1998) 
exist for LTG. 
When the parsing of an utterance fails, a ro- 
bust bottom-up algorithm gives partial derived 
and derivation trees. With a classical chart pars- 
ing, items are obtained from other items and cor- 
respond to a well-recognized chunk of the utter- 
ance. The chart is an acyclic graph representing 
all the derivations. A partial result corresponds 
to the maximal expansion of an island, so to an 
item which is not the origin of any other item. 
The main difference between a Context Free 
Grammar and a Lexicalized Tree Grammar is that 
a tree directly encodes for a specific anchor a par- 
tial parsing tree. This representation is richer 
than a set of Context Free rules. We argue that 
we can exploit this feature by triggering rules not 
only according to the category of the node N cor- 
responding to an item but considering some nodes 
near N. 
2 Island representation and 
connected routes in repair local 
rules 
2.1 Finite States Automata 
representation of an elementary tree 
The linearization of a tree can be represented 
with a Finite State Automaton (FSA) as in figure 
2. Every tree traversal (left-to-right, bidirectional 
from an anchor, ...) can be performed on this au- 
tomaton. Doted trees used for example in (Sch- 
abes, 1994) are equivalent to the states of these 
automata. It is then possible to share all the FSA 
of a lexicalized grammar in a single one with tech- 
niques presented in (Evans and Weir, 1998). 
~ S 
<> 
S N$ V <> V S 
Figure 2: Simple FSA representing an elementary 
tree for the normal form of French intransive verb. 
We consider the following definitions and nota- 
tions : 
Each automaton transition is annotated with 
a category of node. Each non-leaf node ap- 
pears twice in the list of transition fram- 
ing the nodes which it dominates. In order 
to simplify our explanation the transition is 
shown by the annotated category. 
Transitions can be bidirectional in order to 
be able to start a bidirectional tree walk of a 
tree starting from any state. 
• Considering a direction of transition (left-to- 
right, right-to-left) the FSA becomes acyclic. 
2.2 Parsing invariant and island 
representation 
A set of FSA corresponds to a global represen- 
tation of the grammar, for the parsing we use 
a local representation called item. An item is 
defined as a 7-tuple of the following form: 
250 
Proceedings of EACL '99 
(a) Rule for hesitations : 
(i, j, rE, fR) (j, k, f£, f~) (k, l, o~, f~) 
(i, k, fL, fiR) (k, l, f~, o'~) (head(F'L) = tail(F'R) = H) 
(b) Rule for head ellipsis on the left : 
(i, j, aL, aR) (j, k, a~, a~) (tait(rR) = X, 
(i, k, aL, a~) head(UL) = X*) 
n ((head(r'L) = X $ 
n ta/l(r~) = X $)) 
V 
(c) Rule for argument ellipsis on the right : 
(i, j, oL, fR) (ta/l(rR) = X ~) (i, j, fL, next(rR)) 
(d) Rule 1 for self repair : 
O-r O-t (i,j, aL,aR) (j,k, L, R/ 
(i, k, aL, a'R) 
(3i = (v, w, a~, a~) E A, i ~* (i, j, aL, aR) 
(3X 6 r'~ A head(F~L) = X*)V 
(tail(r'~) = x $ i head(F'L) = X ~)) 
A 
Figure 1: Example of repair rules 
item: ( left index, right index, 
left state, right state, 
foot left index, 
foot right index, star state) 
The two first indices are the limits on the in- 
put string of the island (an anchor or consecutive 
anchors) corresponding to the item. During the 
initialization, we build an item for each anchor 
present in the input string. An item also stores 
two states of the same FSA corresponding to the 
maximal extension of the island on the left and 
on the right, and only if necessary we represent 
two additional indices for the position of the foot 
node of a wrapping auxiliary tree and the state 
star corresponding to the node where the current 
wrapping adjunction have been predicted. 
This representation maintains the following in- 
variant: an item of the form (p, q, fL, O'R) specifies 
the fact that the linearized tree represented by a 
FSA A is completely parsed between the states 
aL and ct R of A and between the indices p and q. 
No other attachment on the tree can happen on 
the nodes located between the anchors p and q-1. 
2.3 Connected routes 
Considering an automaton representing the lin- 
earization of an elementary tree, we can define a 
connected route as a part of this automaton corre- 
sponding to the list of nodes crossed successively 
until reaching a substitution, a foot node or a root 
node (included transition) or an anchor (excluded 
transition). Connected route is an intermediate 
level of granularity when representing a linearized 
tree: each elementary (or a derived tree) can be 
represented as a list of connected routes. Consid- 
ering connected routes during the parsing permits 
to take into account the topology of the elemen- 
tary trees and to locate significative nodes for an 
attachment (Loper, 1998b). We use the following 
additional simplified notations : 
• The connected route passing through the 
state ad is noted Fd. 
• next(r) (resp. previous(F)) gives the first 
state of the connected route after (resp. be- 
fore) F according to a left-to-right automaton 
walk. 
• next(N) (resp. previous(N)) gives the state 
after (resp. before) the transition N. 
• headiF. ) (resp. tail(F)) gives the first right 
(resp. left) transition of the leftmost (resp. 
rightmost) state of the connected route F. 
2.4 Inference rules system 
The derivation process can be viewed as infer- 
ence rules which use and introduce items. The 
inference rules (Schabes, 1994) have the following 
meaning, if q items (itemi)o<i<q are present in the 
chart and if the requirements are fulfilled then add 
the r items (itemj)o<_j<r in the chart i\[ necessary: 
(item~)o<~<q ( conditions ) add (itemj)o<j<r) 
We note O* the reflexive transitive closure 
of the derivation relation between two items: if 
il ~* i2 then the item identified with i2 can be ob- 
tained from il after applying to it a set of deriva- 
tions. We note a root node with $. 
Figure 1 presents examples of repair rules. This 
additional system deals with the following phe- 
nomena: 
251 
Proceedings of EACL '99 
ill-formed 
utterances 
% Correctly 
recovered 
with ii  ith L with unexpected 
hesitations repetitions self-repairs ellipsis 
Table 3: Repair results for the Gocad corpora 
• Hesitations : Rule (a) for hesitations absorbs 
adjacent initial trees whose head is a H node. 
Such a tree can correspond to different kind 
of hesitation. 
• Ellipsis : two rules and their symmetrical con- 
figurations try to detect and recover respec- 
tively an empty head (b) and an empty argu- 
ment (c). 
• Self-repair : The (Cori et ai., 1997) definition 
of self repairs stipulates that the right side of 
the interrupted structure (the partial derived 
tree on the left of the interruption point) and 
the reparandum (the adjacent syntactic is- 
land) must match. Instead of modifing the 
parsing algorithm as (Cori et al., 1997) do, we 
consider a more expressive connected route 
matching condition. Rule (d) deals with self- 
repair where the repaired structure has been 
connected on the target node. 
3 First results 
The rules has been implemented in Java and are 
integrated in a grammatical environment system 
dedicated to design and test the parsing of spo- 
ken dialogue system sublangages. We use a two 
stage strategy (Ros@ and Lavie, 1997) correspond- 
ing to two sets of rules: the first one is the set 
for a bottom-up parsing of LTAG using FSA and 
connected routes (Lopez, 1998b), the second one 
gathers the repair rules presented in this paper. 
This strategy separates parsing of grammatical 
utterances (resulting from substitution and ad- 
junction) from the parsing of admitted utterances 
(performed by the additional set). This kind of 
strategy permits to keep a normal parsing com- 
plexity when the utterance is grammatical. We 
present in table 3 statistics for the parsing repairs 
of the Gocad copora. 
Discussion 
Connected routes give robustness capacities in a 
Lexicalized Tree Framework. Note that the re- 
sults has been obtained for transcribed spoken 
language. Considering parsing of word-graphs re- 
sulting from a state-of-the-art HMM speech recog- 
nizer, non-regular phenomena encountered in spo- 
ken language might cause a recognition error on 
a neighbouring word and so could not always be 
detected. 
To prevent overgeneration during the second 
stage, both semantic additional well-formed crite- 
ria and a restrictive scoring method can be used. 
Future works will focus on a mecanism which al- 
lows a syntactic and semantic control in the case 
of robust parsing based on a LTAG and a syn- 
chronous Semantic Tree Grammar. 
References 
Marie-H@l~ne Candito. 1999. Structuration d'une 
grammaire LTAG : application au fran ais et d 
l'italien. Ph.D. thesis, University of Paris 7. 
Lanrent Chapelier, Christine Fay-Varnier, and 
Azim Roussanaiy. 1995. Modelling an Intel- 
ligent Help System from a Wizard of Oz Exper- 
iment. In ESCA Workshop on Spoken Dialogue 
Systems, Vigso, Danemark. 
Marcel Cori, Michel de Fornel, and Jean-Marie 
Marandin. 1997. Parsing Repairs. In Rus- 
lan Mitkov and Nicolas Nicolov, editors, Recent 
advances in natural language processing. John 
Benjamins. 
Roger Evans and David Weir. 1998. A structure- 
sharing parser for lexicaiized grammars. In 
COLING-ALC, Montr@al, Canada. 
Patrice Lopez. 1998a. A LTAG grammar for 
parsing incomplete and oral utterances. In 
European Conference on Artificial Intelligence 
(ECAI), Brighton, UK. 
Patrice Lopez. 1998b. Connection driven pars- 
ing of Lexicalized TAG. In Workshop on Text, 
Speech and Dialog (TSD), Brno, Czech Repub- 
lic. 
C.P. Ros@ and A. Lavie. 1997. An efficient dis- 
tribution of Labor in Two Stage Robust In- 
terpretation Process. In Proceeding of Empir- 
ical Methods in Natural Language Processing, 
EMNLP'97, Rhode Island, USA. 
Yves Schabes. 1994. Left to Right Parsing of 
Lexicalized Tree Adjoining Grammars. Com- 
putational Intelligence, 10:506-524. 
252 
