An Empirical Evaluation of LFG-DOP 
Rens Bod 
Infonnatics Research Institute, University of Leeds, Leeds LS2 9JT, & 
Institute for Logic, Language and Computation, University of Amsterdam 
mns@scs.leeds.ac.uk 
Abstract 
This paper presents an empirical assessment of the LFG- 
DOP model introduced by Bed & Kaplan (1998). The 
parser we describe uses fragments l'rom LFG-aunotated 
sentences to parse new sentences and Monte Carlo 
techniques to compute the most probable parse. While 
our main goal is to test Bed & Kaplan's model, we will 
also test a version of LFG-DOP which treats generalized 
fragments as previously unseen events. Experiments with 
the Verbmobil and Itomecentre corpora show that our 
version of LFG-DOP outperforms Bed & Kaplan's 
model, and that LFG's functional information improves 
the parse accuracy of Iree structures. 
1 Introduction 
We present an empirical ewduation of the LFG-DOP 
model introduced by Bed & Kaplan (1998). LFG-DOP is 
a Data-Oriented Parsing (DOP) model (Bed 1993, 98) 
based on the syntactic representations of Lexical- 
Functional Grammar (Kaplan & Bresnan 1982). A DOP 
model provides linguistic representations lot- an tmlimit- 
cd set of sentences by generalizing from a given corptts 
of annotated exemphu's, it operates by decomposing the 
given representations into (arbitrarily large) fi'agments 
and recomposing those pieces to analyze new sentences. 
The occurrence-frequencies of the fragments are used to 
determine the most probable analysis of a sentence. 
So far, DOP models have been implelnented for 
phrase-structure trees and logical-semantic represent- 
ations (cf. Bed 1993, 98; Sima'an 1995, 99; Bonnema el 
al. 1997; Goodman 1998). However, these DOP models 
are limited in that they cannot accotmt for underlying 
syntactic and semantic dependencies that are not 
reflected directly in a surface tree. DOP models for a 
number of richer representations have been explored 
(van den Berg et al. 1994; Tugwell 1995), but these 
approaches have remained context-free in their 
generative power. In contrast, Lexical-Functional 
Grammar (Kaplan & Bresnan 1982) is known to be 
beyond context-free. In Bed & Kaplan (1998), a first 
DOP model was proposed based on representations 
defined by LFG theory ("LFG-DOP"). I This model was 
I DOP models have recently also been proposed for Tree- 
Adjoining Grammar and Head-driven Phrase Structure 
Grammar (cf. Neumann & Flickinger 1999). 
studied fi'om a mathematical perspective by Cormons 
(1999) who also accomplished a first simple experinacnt 
with LFG-DOP. Next, Way (1999) studied LFG-DOP as 
an architecture for machine translation. The current 
paper contains tile first extensive empMeal evaluation 
of LFG-DOP on the currently available LFG-annotatcd 
corpora: the Verbmobil corpus and the Itomecentre 
corpus. Both corpora were annotated at Xerox PARC. 
Out" parser uses fragments from LFG-annotated 
sentences to parse new sentences, and Monte Carlo 
lechniques to compute the most probable parse. 
Although our main goal is to lest Bed & Kaplan's LFG- 
l)OP model, we will also test a modified version o1' 
LFG-DOP which uses a different model for computing 
fragment probabilities. While Bed & Kaplan treat all 
fragments probabilistically equal regardless whether 
they contain generalized features, we will propose a 
more fine-grained probability model which treats 
fragments with generalized features as previously 
unseen events and assigns probabilities to these 
fi'agments by means of discotmting. The experiments 
indicate that our probability model outperforms Bed & 
Kaplan's probability model on the Verbmobil and 
Homecentre corpora. 
The rest of this paper is organized as follows: 
we first summarize the LFG-DOP model and go into our 
proposed extension. Next, we explain the Monte Carlo 
parsing technique for estimating lhe most probable LFG- 
parse o1' a sentence. In section 3, we test our parser on 
sentences from the LFG-annotated corpora. 
2 Summary of LFG-DOP and an Extension 
In accordance with Bed (1998), a particular DOP model 
is described by specifying settings for the following four 
parameters: 
• a formal definition of a well-formed representation for 
tltlcl'allcc (lllalys~s, 
• a set of decomposition operations that divide a given 
utterance analysis into a set of.fragments, 
• a set of composition operations by which such 
fragments may be recombined to derive an analysis of a 
new utterance, and 
• a probabili O, model that indicates how the probability 
of a new utterance analysis is computed. 
62 
In defining a l)OP model for l.exicaI-Fuuctional 
Grammar representations, Bed & Kaphm (1998) give 
the following sctlings for l)OP's four parameters. 
2.1 Representations 
The representations used by LFG-I)OP are direclly taken 
from LFG: they consist of a c-structure, an f-strt|ett|re 
and a mapping q~ between them (sue Kaplan & thesnan 
1982). The following figure shows an example 
representation for the utterance Kim eats. (We leave out 
somK features to keep the example simple.) 
~/~~ I"RH' 'Kim' l 
" Kim caL~; l'Rlil) 'L I{(,SUIH) ' 
Figttt'c I. A representation Ibr Kim eat.~ 
Bed & Kaphm also introduce the notion of accessibility 
which they later use for defining the decomposition 
operations of LFG-DOP: 
An f-struc{t|re unit./'is ?p-accessible f|om a node n iff 
either ;t is ~l)-Iinked to j' (that is, f= ~l)(;;) ) or./' is 
contained within d)(n) (that is, there is a chain (31 
atlribu(es that leads from qb(n) Iof). 
According to tire IA;G representation theory, c-st|'uctures 
and f-structures must salisfy certain fo|n|al well- 
formedness conditions. A c-strt|cture/f-structu|e pair is a 
valid LFG represcntalion only if it satisfies the 
Nonbranching Dominance, UniquelleSs, Coherence and 
Completeness conditions (see Kaplan & Bresnan 1982). 
2.2 Decomposition operations and FragmelflS 
The l'x'agmellts for LFG-I)OP consist (3t" connected 
sublrees whose nodes are in <l~-correslxmdence with Ihe 
cor|'epondi|lg sul>units of f-structures. To give a precise 
definition of L1;G-I)OP fragments, it is convenient to 
recall the decomposition operations employed by the 
simpler "Tree-l)OP" model which is based on phrase- 
structt|,'e treks only (Bed 1998): 
(I) Root: the Root operation selects any node of a trek 
to be the root o1' the new subtree and erases all 
nodes except the selected node and the nodes it 
dominates. 
(2) Frontier: the Frontier operation then chooses a set 
(possibly en|pty) of nodes in tile new subtree 
different from its ,'oot and erases all subtrees 
dominated by the chosen nodes. 
Bed & Kaplau extend Tree-1)OP's Root and Frontier 
operations so that they also apply to the nodes of the c- 
structure in L1;G, while respecting the fundamental 
principles of c-structure/f-structure correspotKlence. 
When a node is selected by the Root operation, 
all nodes outside of that node's subtree are erased, jttst 
as in Tree-DOP. Further, I'or LFG-DOP, all d? links 
leaving the erased nodes arc removed and all f-structure 
traits that are not (~-accessible from the remaining nodes 
are erased. For example, if Root selects the NP in figure 
1, then the f-structure corresponding lo the S node is 
erased, giving figure 2 as a possible fl'agment: 
lqgurc 2. An LFG-DOI ~ fragment obtained by Root 
In addition tile Root Ol)eratioll deletes from the 
remaining f-st,ucturo all semantic forlns lhat are local to 
f-structu|'es thai correspond to el'used c-slructure nodes, 
and it thereby also maintains the fundamental two-way 
connection between words and meanings. Thus, if Root 
selects the VP node so thai the NP is erased, the subject 
semantic form "Kim" is also deleted: 
if cats I'RI;I) 'Ca|(SUIH)' 
Figure 3. Another I.FG-I)O1 > fragment 
As with Tree-1)OP, the Frontier operation thel~ selects a 
set of f,outier nodes and deletes all subtrees they 
dominate. Like Root, it also removes Ihe q~ links of the 
deleted nodes and erases any semantic form that 
cOlleSpolldS to ~.llly (3|" those nodes. Frontier does not 
delete any other f-structure fealures, however. For 
instance, if tire NP in figure I is sclec{ed as a l't'onlim" 
node, Frontier erases the predicate "Kim" from the 
fragment: 
- 
PI(I~I) 'eat(SUllJ)' C~IIS 
Figure 4. A I:ronlier-gcneratcd fragment 
Finally, Bed & Kaplan present a third decomposition 
operation, Discard, defined to conslruct generalizations 
of the fragments supplied by Root and Frontier. Discard 
acts to delete combinations of altrilmte-value pairs 
subject Io Ihe following condition: Discard does not 
delete pairs whose values ~-correspond to relnaining c- 
63 
structure nodes. Discard produces fragments such as in 
figure 5, where the subject's number in figure 3 has been 
deleted: 
eats 
SUItJ \[ \] 
'l'liNSli PRt!S 
I'Rlil) 'eat(suB J)' 
Figure 5. A l)iscard-generated lYagment 
2.3 The composition operation 
In LFG-DOP the operation for combining fragments, 
indicated by o, is carried out in two steps. First the c- 
structures are combined by left-most substitution subject 
to the category-matching condition, .just as in Tree-DOP 
(cf. Bed 1993, 98). This is followed by the recursive 
unil'ication of the f-structures corresponding to the 
matching nodes. A derivation for an LFG-DOP 
representation R is a sequence o1' fragments the first of 
which is labeled with S and for which the itcrative 
application of the composition operation produces R. 
The two-stage composition operation is 
illustrated by a simple example. We therefore assume a 
corpus containing the representation in figure 1 for the 
sentence Kim eats and the representation in figure 6 for 
the sentence John fell. 
• : I I'RH,'J°h.' \] 
l 
i'17 :7: n 'fall(SUl~l)' 
Figure 6. Corpus represeutation for John fell 
Figure 7 shows the effect of the LFG-DOP composition 
operation using two fragments from this corpus, resulting 
in a representation for the new sentence Kim fell. 
I'RID 'Killl' = 
fell \[ PRI:,) 'fl~H(SUIJJ)' i 
StJBJ PRED 'K\[III' 1 
1 
- Kim fi.er PRI{') f Ill(SUB J) 
Figure 7. Illustration of the composition operation 
This representation satisfies the well-formedness 
conditions and is therefore valid. Note that the sentence 
Kim fell can be parsed by fragments that are generated 
by the decomposition operations Root and Frontier only, 
without using generalized fragments (i.e. fragments 
generated by tile Discard operation). Bed & Kaplan 
(1998) call a sentence "granunatical with respect to a 
corpus" if it call be parsed without generalized 
fragments. Generalized fragments are needed ouly to 
parse sentences that a,e "ungrammatical with respect to 
the corpus". 
2.4 Probability models 
As in Tree-DOP, an LFG-DOP representation R can 
typically be derived in many different ways. If each 
derivation D has a probability P(D), then the probability 
of deriving R is the sum of the individual derivation 
probabilities, as shown in (1): 
(1) P(R) = ~I.) derives R P(D) 
An LFG-DOP derivation is produced by a stochastic 
process which starts by randomly choosing a fraglnent 
whose c-structure is labeled with the initial category 
(e.g. S). At each subsequent step, a next fragment is 
chosen at random from among the fragments that can be 
composed with the current subanalysis. The chosen 
fragment is composed with the current subanalysis to 
produce a new one; the process stops when an analysis 
results with no non-te,'minal leaves. We will call the set 
of composable fragments at a certain step in the 
stochastic process the competition set at that step. Let 
CP(fl CS) denote the probability of choosing a fragment 
ffrom a competition set CS containing J; then the 
probability of a derivation D = <fJ,f2 ...Jk> is 
C2) P(<ag,f, ...fk>) = Hi cpq} I csi) 
where the competitio, l~robability CP0el CS) is ex- 
pressed in terms of fragment probabilities Pq): 
(3) CP(f I CS) - PCt) 
Z,,,~ cs P(/+') 
Bed & Kaplan give three definitions of increasing 
complexity for the competition set: the first definition 
groups all fi'agments that only satisfy the Category- 
matching condition o1' the composition operation (thus 
leaving out the Uniqueness, Coherence and Complete- 
uess conditions); the second definition groups all 
fragments which satisfy both Category-matching and 
Uniqueness; and the third defi,fition groups all fragments 
which satisfy Category-matching, Uniqueness and 
Coherence. Bed & Kaplan point out that the 
Completeness condition cannot be enforced at each step 
of the stochastic derivation process. It is a property of 
the final representation which can only be enforced by 
sampling valid representations from the outpt, t of the 
stochastic process. 
In this paper, we will only deal with the third 
definition of competition set, as it selects only those 
64 
l'ragments at each derivation step that may finally restdt 
in a valid LFG representation, lhus reducing lhe off-line 
validity checking just to the Completeness condition. 
Notice that the computation o1' the competition 
probability in (3) still reqttires a del:inition for the 
fragment probability P(f). Bed & Kaplan define the 
probability of a fragment simply as its relative frequency 
in the bag of all fragments generated from the corpus. 
Thus Bed & Kaplan do not distinguish between 
Root~Frontier-generated l'ragments a11d Discard- 
generated l'ragments, the latter being generalizations 
over Root~Frontier-generated fragments. Although Bed 
& Kaplan illustrate with a simple example that their 
probability model exhibits a preference for the most 
specific representation containing the fewest feature 
generalizations (mainly because specific representations 
tend to have more derivations than generalized 
representations), they do not perform an empirical 
evaluation el' lheir model. We will assess their model on 
the LFG-annotated Verbmobil and Itomecentre corpora 
in section 3 of this paper. 
However, we will also assess an alternative 
definition of fragment probability which is a refinement 
of Bed & Kaplan's model. This definition does 
distinguish between fragments supplied by Root~Frontier 
and fragments supplied by Discard. We will treat the 
first type el' fragments as seen events, and the second 
type of fragments as previously unseen events. We thus 
create two separate bags corresponding to two separate 
distributions: a bag with l'ragments generated by Root 
and Frontier, and a bag with l'ragments generated by 
Discard. We assign probability mass to the fragments of 
each bag by means of discounting: the relative 
l'requencies of seen events are discounted and the 
gained probability mass is reserved for Ihe bag el' unseen 
events (cf. Ney et al. 1997). We accolnplish lhis by a 
very simple estimalor: lhe Turing-Good estimator (Good 
1953) which computes lhe probability mass of unseen 
events as nl/N where n I is the ,mmber of singleton 
events and N is the total number of seen events. This 
probability mass is assigned to the bag o1' Discard- 
generated fragments. The remaining mass (1 -nl/N)is 
assigned to the bag o1' Root~Frontier-generated 
l'ragments. Thus tim total probability mass is 
redistributed over tim seen and unseen fragments. The 
probability of each l'ragment is then computed as its 
relative frequency 2 in its bag multiplied by the prol)a- 
bility mass assigned to this bag. Let Ill denote the 
frequency of a fragment f, then its probability is given 
by: 
2 Bed (2000) discusses some alternative fragment probability 
estimators, e.g. based on maximum likelihood. 
(4) P(/'ll'is generated byRootlFrontier) = 
(I - n i/N) Ifl 
"~-"J': .fis generated by Rool/Fro,,tier I:'l 
(5) PUlJis generated byDiscard) = 
Ill 0q/N) 
"~'f:fisgene,'aledby Discard I.rl 
Note that this probability model assigns less probability 
mass to Discard-generated fragments than Bed & 
Kaphm's model. For each Root~Frontier-generated 
fragment there are exponentially many Discard- 
generated fragments (exponential in the number of 
features the fragment contains), which means that in 
Bed & Kaphm's model the Discard-generated IYagnaents 
absorb a vast amount of probability mass. Our model, on 
the other hand, assigns a fixed probability mass to the 
distribution of Discard-generated l'ragments and 
lherefore the exponential explosion of these fi'agments 
does not affect the probabilities of Root~Frontier- 
generated fragments. 
3 Testing tile LFG-DOP model 
3.1 Computing tile most probable analysis 
In his PhD-thesis, C(nmous (1999) describes a parsing 
algorithnl for I~FG-DOP which is based on tile Tree-DOP 
parsing teclmique given in Bed (1998). Cormons first 
converts LFG-representations into more compact 
indexed lrees: each node in the c-structure is assigned 
an index which refers to the ~-c(wresponding f-struclure 
unit. For example, the rcpresentalion in figure 6 is 
indexed as 
(S. 1 (NP.2 John.2) 
(VP. 1 fell. 1 )) 
where 
1 --> \[ (SUB J=2) 
(TENSE = PAST) 
(PRED = fall(SUBJ)) \] 
2 --> \[ (PRED = John) 
(NUM = SG) \] 
The indexed trees are then fragmented by applying the 
Tree-DOP decomposition operations described in section 
2. Next, the LFG-DOP decomposition operations Root, 
Frontier and Discard are applied to the f-structure units 
that correspond to the indices in the e-structure subtrees. 
ltaving obtained the set of LFG-DOP fragments in this 
way, each test sentence is parsed by a bottom-up chart 
parser using initially the indexed subtrees only. Thus 
only the Category-matching condition is enforced during 
65 
the chart-parsing process. Tile Uniqueness and 
Coherence conditions of the corresponding f-structure 
units are enforced during tile disambiguation (or chart- 
decoding) process. Disambiguation is accomplished by 
computing a large number o1' random derivations from 
the chart; this technique is known as "Monte Carlo 
disambiguation" and has been extensively described ill 
the literature (e.g. Bed 1998; Chappelier & Rajman 
1998; Goodman 1998). Sampling a random deriwttion 
l'rom the chart consists of choosing at random one o1' the 
fragments fi'om the sel of composable fragments at every 
labeled chart-entry (ill a top-down, leftmost order so as 
to maintain (he LFG-DOP derivation order). Thus the 
competition set of composable fragments is computed 
on the fly at each derivation step during the Monte 
Carlo sampling process by grouping the f-structure units 
that unify and that are coherent with the subderivation 
built so far. 
As mentioned in 2.4, the Completeness 
condition can only be checked after the derivation 
process. Incomplete derivations are simply removed 
from the sampling distribution. After sampling a large 
number of random derivations that satisfy the LFG 
validity requirements, the most probable analysis is 
estimated by the analysis which results most often from 
the sampled derivations. For our experiments in section 
3.2, we used a sample size of N = 10,000 derivations 
which corresponds to a maximal standard error o" of 
0.005 (o_< 1/(2~/N), see Bed 1998). 
3.2 Experiments with LFG-DOP 
We tested LFG-DOP on two LFG-anuotated corpora: the 
Verbmobil corpus, which contains appointment planning 
dialogues, and the Homecentre corpus, which contains 
Xerox printer documentation. Both corpora have been 
annotated by Xerox PARC. They contain packed LFG- 
representations (Maxwell & Kaplan 1991) of the 
grammatical parses of each sentence together with an 
indication which of these parses is the correct one. The 
parses are represented in a binary form and were 
debinarized using software provided to us by Xerox 
PARC. 3 For our experiments we only used the correct 
parses of each sentence resulting ill 540 Verbmobil 
parses and 980 Homecentre parses. Each corpus was 
divided into a 90% trai,ting set and a 10% test set. This 
division was random except for one constraint: that all 
the words ill the test set actually occurred in the training 
set. The sentences from the test set were parsed and 
disambiguated by means of the fragments from the 
training set. Due to memory limitations, we limited the 
depth of the indexed subtrees to 4. Because of the small 
3 Thanks to Hadar Shemtov for providing us wi~.h the relevant 
software. 
size of the corpora we averaged our results on 10 
different training/test set splits. Besides an exact match 
accuracy metric, we also used a more fine-grained 
metric based ell the well-known PARSEVAL metrics 
that evaluate phrase-structure trees (Black et al. 1991). 
The PARSEVAL metrics compare a proposed parse P 
with the corresponding correct treebank parse 7" as 
follows: 
# correct constituents in 1' 
Precision = 
# constituents in P 
# correct constituents in P 
Recall = 
# constituents in 7 
In order to apply these metrics to LFG analyses, we 
extend the PARSEVAL notion of "correct constituent" in 
the following way: a constituent in P is correct if there 
exists a constituent ill T of the same label that spans the 
same words and that ¢l)-corresponds to the same f- 
structure unit. 
We illustrate the evaluation metrics with a 
simple example. In the next figure, a proposed parse P is 
compared with the correct parse T for the test sentence 
Kim fell. The proposed parse is incorrect since it has the 
incorrect feature value for the TENSE attribute. Thus, il' 
this were the only test sentence, the exact match would 
be 0%. The precision, on the other hand, is higher than 
0% as it compares the parse on a constiluent basis. Both 
the proposed parse and the correct parse contain three 
constituents: S, NP and VP. While all three constituents 
ill P have the same label and span the same words as in 
T, only the NP constituent in P also maps to the same f- 
structure unit as ill T. The precision is thus equal to 1/3. 
Note that in this example the recall is equal to the 
precision, but this need not always be the case. 
, SIIIIJ \[ PI~EI)'KiI11'J 
1 
- Wun fell" PRI!') 'fall(SUllJ)' 
Proposed parse P 
Correct parse 7" 
Ill out" expm'iments we are first of all interested ill 
comparing the performance of Bed & Kaplan's 
probability model against our probability model (its 
explained ill section 2.4). Moreover, we also want to 
66 
study tile contribution of l)iscard-gelmrated fragments to 
the parse accuracy. We therefore created for each 
training sol two sets of fragments: one which contains 
all fragments (up to depth 4) and one which excludes 
the fragmenls generated by Discard. The exclusion of 
the \])iscard-generated fragments means lhat all 
prolmbility mass goes to tile fraglnents generated by 
Root and I"rot+lier in which case our model is equivalent 
to Bed & Kaplan's. The following two tables present the 
results of our experiments where +l)iscard refers to lilt+' 
full set of fragments and -Discard refers to the fragment 
set without Discard-generated fragments. 
Exact Match l'lecision Recall 
+I)iscad-I)iscard +Discard-Discard +I)iscard-l)iscafd 
I~ud&Kaplam98 I.I+,4 3525{ 13.89~ 76.0~,{ 11.5~,~ 7-1.95{ 
OurModcl 35.9~A 35.2rA 77.YA 76.0cA 76.,F,4 74.9c/~ 
Table I. l{xperimental results on tile Verbmobil colpus 
lixact Match Ihvcisi0n Recall 
+Discard q)iscm,I +l)iscmxl q)iscm'd 4I)iscm,I q)iscm'd 
l~0d&Kaplan98 2.7~{ 37.9g 17.15i 77.85{ 15.5+,~ 77.2g 
OurM0del 38.4~/~ 37.9c7< 80.05~ 77.85{ 78.65{ 77.2g 
Table 2. l+,xperiincntal ,esults on the l lomccentre corpus 
The tables show that Bed & Kaplan's model scores 
exlremely bad if all fragments are used: the exacl lnatch 
is only 1.1% on tile Verbmobil corptts alld 2.7% on the 
l\]olllecentre corpus, whereas our model scores 
respectively 35.9% and 38.4% on these corpora. Also the 
nlore fine-grained precision and recall scores of P, od & 
Kaplan's model are quite low: e.g. 13.8% and 11.5% on 
the Verbmobil corpus, where our tnodel obtains 77.5% 
and 76.4%. We l'()ulld ()tit Ihat even for the few lest 
senlences that occur literally in tile training set, Bed & 
Kaplan's model does not always generate tile correct 
analysis, whereas our model does. Inlerestingly, lhe 
accuracy of P, od & Kaphm's model is much higher if 
Discard-generated fragments are excluded. This suggests 
that treating generalized fragments l)robabilisl,ically in 
the same way as ungeueralized fragments ix ha,mful. 
Connons (1999) has made a mathematical obserwllion 
which also shows that generalized fragments can get too 
much probability mass. 
The tables also show l,hat Otll + way ()f assigning 
probabilities to Discard-generated fragments leads only 
Io a slight accuracy increase (compared to tile 
experiments ill which l)iscard-generated fragments a,'e 
excluded). According to paired t-testing none of these 
diffefences ill accuracy were statistically signilicant. 
This suggests that Discard-goner;lied fragtnents do not 
significanlly conl,'ibute to tile parse accuracy, or that 
perhaps these fragments are too ntllllerous to be reliably 
estimated Ol1 the basis of our small corpora. We also 
varied the f~lolmbilil,y mass assigned to Discard- 
generated llaglllenls: eXCel)l, for very small (_< 0.01)or 
large values (_>.0.88), which led to an accuracy 
decrease, there was no significant change. 4 
It is difficult to say how good or bad our results 
are with respect to other approaches. The only other 
published results on tile LFG-annotated Verbmobil and 
llomecetll,re corpora are by Johnson el, al. (1999) and 
Johnson & P, iezler (2000) who use a log-linear model lo 
estimale probabilities. But while we first parse the lest 
sentences with l'ragnlenls froln tile training set and 
sul)sequently c()tnpule the IllOSl, pr()bal)le parse, Jollns()ll 
cl al. directly use the packed lJ+G-representations from 
lhe lest set to select the most probable parse, Ihetel)y 
completely skipping the parsing phase (Mark Johnson, 
p.o.). Moreover, 42% of the Verbmobil sentences and 
51% of the llomecentre sentences arc unaml+iguous (i.e. 
their packed IAVG-representations contain only one 
analysis), which makes J()hns<.m et al's lask completely 
trivial for these sentences. In our apl)foaeh, all tosl 
Selltences wefe alllbiguous, Iestdling ill :.l I/IUC\]I nlol'e 
difficult lask. A quantitative comparison between our 
model and Johnson el, al.'s is therefore meaningless. 
l:inally, we are interested in the impact of 
functional structures on predicting lhe colreel 
constituet~t structures. We therefore removed all f- 
structure ut/ils from the fragtnetll,s (Ihus yielding a Trce- 
1)O1' model) and compared the results against our 
version of LFG-I)()I ~ (which inclttde tile Discard- 
generated fraglnenls). We ewlluated tile parse accuracy 
on tile l,ree-slructures only, using exact match together 
with tile PARSIr+VAI, measures. We used tile same 
training/test set splits as in the previot, s experiments and 
limited tile maximum sublree depth again to 4. The 
lollowing tables show the results. 
Exact Match Precision P.ccall 
Trcc-I)Ol ~ 46.6% 88.9% 86.7% 
LFG-I)OI' 50.8% 90.3% 88A% 
Table 3. C-structure accuracy on the Verbmobil 
1 Although generalized tfagnlcnls thus seem statistically 
unimportant fen these cm-pora, they rclnail~ ilnportant for 
parsing ungrammatical sentences (which was the original 
motivation for including them -- see Bed & Kaplan 1998). 
67 
Exact Match Precision Recall 
Tree-DOP 49.0% 93.4% 92.1% 
LFG-DOP 53.2% 95.8% 94.7% 
Table 4. C-structure accuracy on the Homecentre 
The results indicate that LFG-DOP's functional 
structures help to improve the parse accuracy of tree- 
structures. In other words, LFG-DOP outperforms Tree- 
DOP if evaluated on tree-structures only. According to 
paired t-tests the differences in accuracy were statis- 
tically significant. 
4 Conclusion 
We have given an empirical assessment of the LFG- 
DOP model introduced by Bed & Kaplan (1998). We 
developed a new probability model for LFG-DOP which 
treats fragments with generalized features as previously 
unseen events. The experiments showed that our 
probability model outperforms Bed & Kaplan's model on 
the Verbmobil and Homeceutre corpora. Moreover, Bed 
& Kaplan's model turned out to be inadequate in dealing 
with generalized fragments. We also established that the 
contribution of generalized fragments to the parse 
accuracy in our model is minimal and statistically 
insignil'icant. Finally, we showed that LFG's l'unctional 
structures contribute to significantly higher parse 
accuracy on tree structures. This suggests that our model 
may be successfully used to exploit the functional 
annotations in the Penn Treebank (Marcus et al. 1994), 
provided that these annotations can be converted into 
LFG-style l'unctional structures. As future research, we 
want to test LFG-DOP using log-linear models, as such 
models maximize the likelihood o1' the traiuing corpus. 

References 

M. van den Berg, R. Bed and R. Scha, 1994. "A Corpus-Based 
Approach to Semantic Interpretation", Proceedings 
Ninth Amsterdam Colloquium, Amsterdam, The 
Netherlands. 

E. Black et al., 1991. "A Procedure for Quantitatively 
Comparing the Syntactic Coverage of English", 
Proceedings DARPA Speech and Natural Language 
Workshop, Pacific Grove, Morgan Kaufinann. 

R. Bed, 1993. "Using an Annotated Language Corpus as a 
Virtual Stochastic Grammar", Proceedings AAAl'93, 
Washington D.C. 

R. Bed, 1998. Beyond Grammar, CSLI Publications, Cambridge 
University Press. 

R. Bod, 2000. "Parsing with the Shortest Derivation", Proceedings COLING-2000, Saarbrticken, Gerlnany. 

R. Bod and R. Kaplan, 1998. "A Probabilistic Corpus-Driven 
Model for Lexical Functional Analysis", Proceedings 
COLING-ACL'98, Montreal, Canada. 

R. Botmema, R. Bed and R. Scha, 1997. "A DOP Model for 
Semantic Interpretation", Proceedings ACL EACL -97, 
Madrid, Spain. 

J. Chal~pelier and M. P, ajman, 1998. "Extraction stochastique 
d'arbres d'analyse pour le mod61e DOP", Proceedings 
TALN'98, Paris, France. 

B. Cormons, 1999. Analyse et dd.~ambiguisation: Une approche h 
base de corpus (Data-Oriented l'arsing) pour les 
r@resentations lexicales fonctionnelles. Phi) thesis, 
Universit6 de Rennes, France. 

I. Good, 1953. "The Population Frequencies of Species and the 
Estimation of Population Parameters", Biometrika 40, 
237-264. 

J. Goodman, 1998. Palwing Inside-Out, PhD thesis, Harwud 
University, Mass. 

M. Johnson, S. Geman, S. Canon, Z. Chi and S. Riezler, 1999. 
"Estimators for Stochastic Unification-Based Grammars", Proceedings ACL'99, Maryland. 

M. Johnson and S. Riezler, 2000. "Exploiting Auxiliary 
Distributions in Stochastic Unification-Based Grammars", Proceedings ANLP-NAACL-2000, Seattle, 
Washington. 

R. Kaplan, and J. Bresnan, 1982. "Lexical-Functional 
Grammar: A Formal System for Grammatical 
Representation", in J. Bresnan (ed.), The Mental 
Representation of Grammatical Relations, The MIT 
Press, Cambridge, Mass. 

M. Marcus, G. Kim, M. Marcinkiewicz, R. MacIntyre, A. Bies, 
M. Ferguson, K. Katz and B. Schasberger, 1994. "The 
Penn Treebank: Annotating Predicate Argument 
Structure". In: ARPA Human Language Technology 
Workshop, II 0-115. 

J. Maxwell and R. Kaplan, 1991. "A Method for Disjunctive 
Constraint Satisfaction", in M. Tomita (ed.), Current 
lssttes in Parsing Technology, Kluwer Academic 
Publishers. 

G. Neumann and D. Flickinger, 1999. "Learning Stochastic 
Lexicalized Tree Grammars from HPSG", I)FKI 
Technical Report, Saarbrackcn, Germany. 

H. Ney, S. Martin and F. Wessel, 1997. "Statistical Language 
Modeling Using Leaving-One-Out", in S. Young & G. 
Bloothooft (eds.), Corpus-Based Methods in Language 
and Speech Processing, Kluwer Academic Publishers. 

K. Sima'an, 1995. "An optimized algorithm for Data Oriented 
Parsing", in R. Mitkov and N. Nicolov (cds.), Recent 
Advances in Natural Language Plvcessing 1995, vohune 
136 of Current Issues in Linguistic Theot3,. John 
Benjamins, Amsterdam. 

K. Sima'an, 1999. Learning E\[ficient Disambiguation. PhD 
thesis, ILLC dissertation series number 1999-02. 
Utrecht / Amsterdam. 

D. Tugwell, 1995. "A State-Transition Grammar for l)ata- 
Oriented Parsing", PIwceedings Ettropean Chapter of the 
ACL95, Dublin, Ireland. 

A. Way, 1999. "A Hybrid Architecture for Robust MT using 
LFG-DOP", Journal of Experimental and 77teoretical 
Artificial Intelligence I I (Special Issue on Memory- 
Based Language Processing) 
