File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-1010_metho.xml
Size: 23,926 bytes
Last Modified: 2025-10-06 14:07:07
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-1010"> <Title>An Empirical Evaluation of LFG-DOP</Title> <Section position="3" start_page="0" end_page="64" type="metho"> <SectionTitle> 2 Summary of LFG-DOP and an Extension </SectionTitle> <Paragraph position="0"> In accordance with Bed (1998), a particular DOP model is described by specifying settings for the following four parameters: * a formal definition of a well-formed representation for tltlcl'allcc (lllalys~s, * a set of decomposition operations that divide a given utterance analysis into a set of.fragments, * a set of composition operations by which such fragments may be recombined to derive an analysis of a new utterance, and * a probabili O, model that indicates how the probability of a new utterance analysis is computed.</Paragraph> <Paragraph position="1"> In defining a l)OP model for l.exicaI-Fuuctional Grammar representations, Bed & Kaphm (1998) give the following sctlings for l)OP's four parameters.</Paragraph> <Section position="1" start_page="62" end_page="62" type="sub_section"> <SectionTitle> 2.1 Representations </SectionTitle> <Paragraph position="0"> The representations used by LFG-I)OP are direclly taken from LFG: they consist of a c-structure, an f-strt|ett|re and a mapping q~ between them (sue Kaplan & thesnan 1982). The following figure shows an example representation for the utterance Kim eats. (We leave out somK features to keep the example simple.) ~/~~ I&quot;RH' 'Kim' l &quot; Kim caL~; l'Rlil) 'L I{(,SUIH) ' Figttt'c I. A representation Ibr Kim eat.~ Bed & Kaphm also introduce the notion of accessibility which they later use for defining the decomposition operations of LFG-DOP: An f-struc{t|re unit./'is ?p-accessible f|om a node n iff either ;t is ~l)-Iinked to j' (that is, f= ~l)(;;) ) or./' is contained within d)(n) (that is, there is a chain (31 atlribu(es that leads from qb(n) Iof).</Paragraph> <Paragraph position="1"> According to tire IA;G representation theory, c-st|'uctures and f-structures must salisfy certain fo|n|al well-formedness conditions. A c-strt|cture/f-structu|e pair is a valid LFG represcntalion only if it satisfies the Nonbranching Dominance, UniquelleSs, Coherence and Completeness conditions (see Kaplan & Bresnan 1982).</Paragraph> </Section> <Section position="2" start_page="62" end_page="63" type="sub_section"> <SectionTitle> 2.2 Decomposition operations and FragmelflS </SectionTitle> <Paragraph position="0"> The l'x'agmellts for LFG-I)OP consist (3t&quot; connected sublrees whose nodes are in <l~-correslxmdence with Ihe cor|'epondi|lg sul>units of f-structures. To give a precise definition of L1;G-I)OP fragments, it is convenient to recall the decomposition operations employed by the simpler &quot;Tree-l)OP&quot; model which is based on phrasestructt|,'e treks only (Bed 1998): (I) Root: the Root operation selects any node of a trek to be the root o1' the new subtree and erases all nodes except the selected node and the nodes it dominates.</Paragraph> <Paragraph position="1"> (2) Frontier: the Frontier operation then chooses a set (possibly en|pty) of nodes in tile new subtree different from its ,'oot and erases all subtrees dominated by the chosen nodes.</Paragraph> <Paragraph position="2"> Bed & Kaplau extend Tree-1)OP's Root and Frontier operations so that they also apply to the nodes of the c-structure in L1;G, while respecting the fundamental principles of c-structure/f-structure correspotKlence. When a node is selected by the Root operation, all nodes outside of that node's subtree are erased, jttst as in Tree-DOP. Further, I'or LFG-DOP, all d? links leaving the erased nodes arc removed and all f-structure traits that are not (~-accessible from the remaining nodes are erased. For example, if Root selects the NP in figure 1, then the f-structure corresponding lo the S node is erased, giving figure 2 as a possible fl'agment: lqgurc 2. An LFG-DOI ~ fragment obtained by Root In addition tile Root Ol)eratioll deletes from the remaining f-st,ucturo all semantic forlns lhat are local to f-structu|'es thai correspond to el'used c-slructure nodes, and it thereby also maintains the fundamental two-way connection between words and meanings. Thus, if Root selects the VP node so thai the NP is erased, the subject semantic form &quot;Kim&quot; is also deleted: if cats I'RI;I) 'Ca|(SUIH)' As with Tree-1)OP, the Frontier operation thel~ selects a set of f,outier nodes and deletes all subtrees they dominate. Like Root, it also removes Ihe q~ links of the deleted nodes and erases any semantic form that cOlleSpolldS to ~.llly (3|&quot; those nodes. Frontier does not delete any other f-structure fealures, however. For instance, if tire NP in figure I is sclec{ed as a l't'onlim&quot; node, Frontier erases the predicate &quot;Kim&quot; from the fragment:</Paragraph> <Paragraph position="4"> Finally, Bed & Kaplan present a third decomposition operation, Discard, defined to conslruct generalizations of the fragments supplied by Root and Frontier. Discard acts to delete combinations of altrilmte-value pairs subject Io Ihe following condition: Discard does not delete pairs whose values ~-correspond to relnaining c-</Paragraph> </Section> <Section position="3" start_page="63" end_page="63" type="sub_section"> <SectionTitle> 2.3 The composition operation </SectionTitle> <Paragraph position="0"> In LFG-DOP the operation for combining fragments, indicated by o, is carried out in two steps. First the c-structures are combined by left-most substitution subject to the category-matching condition, .just as in Tree-DOP (cf. Bed 1993, 98). This is followed by the recursive unil'ication of the f-structures corresponding to the matching nodes. A derivation for an LFG-DOP representation R is a sequence o1' fragments the first of which is labeled with S and for which the itcrative application of the composition operation produces R.</Paragraph> <Paragraph position="1"> The two-stage composition operation is illustrated by a simple example. We therefore assume a corpus containing the representation in figure 1 for the sentence Kim eats and the representation in figure 6 for the sentence John fell.</Paragraph> <Paragraph position="3"> operation using two fragments from this corpus, resulting in a representation for the new sentence Kim fell.</Paragraph> <Paragraph position="4"> This representation satisfies the well-formedness conditions and is therefore valid. Note that the sentence Kim fell can be parsed by fragments that are generated by the decomposition operations Root and Frontier only, without using generalized fragments (i.e. fragments generated by tile Discard operation). Bed & Kaplan (1998) call a sentence &quot;granunatical with respect to a corpus&quot; if it call be parsed without generalized fragments. Generalized fragments are needed ouly to parse sentences that a,e &quot;ungrammatical with respect to the corpus&quot;.</Paragraph> </Section> <Section position="4" start_page="63" end_page="64" type="sub_section"> <SectionTitle> 2.4 Probability models </SectionTitle> <Paragraph position="0"> As in Tree-DOP, an LFG-DOP representation R can typically be derived in many different ways. If each derivation D has a probability P(D), then the probability of deriving R is the sum of the individual derivation probabilities, as shown in (1):</Paragraph> <Paragraph position="2"> An LFG-DOP derivation is produced by a stochastic process which starts by randomly choosing a fraglnent whose c-structure is labeled with the initial category (e.g. S). At each subsequent step, a next fragment is chosen at random from among the fragments that can be composed with the current subanalysis. The chosen fragment is composed with the current subanalysis to produce a new one; the process stops when an analysis results with no non-te,'minal leaves. We will call the set of composable fragments at a certain step in the stochastic process the competition set at that step. Let CP(fl CS) denote the probability of choosing a fragment ffrom a competition set CS containing J; then the probability of a derivation D = <fJ,f2 ...Jk> is C2) P(<ag,f, ...fk>) = Hi cpq} I csi) where the competitio, l~robability CP0el CS) is expressed in terms of fragment probabilities Pq):</Paragraph> <Paragraph position="4"> Bed & Kaplan give three definitions of increasing complexity for the competition set: the first definition groups all fi'agments that only satisfy the Category-matching condition o1' the composition operation (thus leaving out the Uniqueness, Coherence and Completeuess conditions); the second definition groups all fragments which satisfy both Category-matching and Uniqueness; and the third defi,fition groups all fragments which satisfy Category-matching, Uniqueness and Coherence. Bed & Kaplan point out that the Completeness condition cannot be enforced at each step of the stochastic derivation process. It is a property of the final representation which can only be enforced by sampling valid representations from the outpt, t of the stochastic process.</Paragraph> <Paragraph position="5"> In this paper, we will only deal with the third definition of competition set, as it selects only those l'ragments at each derivation step that may finally restdt in a valid LFG representation, lhus reducing lhe off-line validity checking just to the Completeness condition.</Paragraph> <Paragraph position="6"> Notice that the computation o1' the competition probability in (3) still reqttires a del:inition for the fragment probability P(f). Bed & Kaplan define the probability of a fragment simply as its relative frequency in the bag of all fragments generated from the corpus.</Paragraph> <Paragraph position="7"> Thus Bed & Kaplan do not distinguish between Root~Frontier-generated l'ragments a11d Discard-generated l'ragments, the latter being generalizations over Root~Frontier-generated fragments. Although Bed & Kaplan illustrate with a simple example that their probability model exhibits a preference for the most specific representation containing the fewest feature generalizations (mainly because specific representations tend to have more derivations than generalized representations), they do not perform an empirical evaluation el' lheir model. We will assess their model on the LFG-annotated Verbmobil and Itomecentre corpora in section 3 of this paper.</Paragraph> <Paragraph position="8"> However, we will also assess an alternative definition of fragment probability which is a refinement of Bed & Kaplan's model. This definition does distinguish between fragments supplied by Root~Frontier and fragments supplied by Discard. We will treat the first type el' fragments as seen events, and the second type of fragments as previously unseen events. We thus create two separate bags corresponding to two separate distributions: a bag with l'ragments generated by Root and Frontier, and a bag with l'ragments generated by Discard. We assign probability mass to the fragments of each bag by means of discounting: the relative l'requencies of seen events are discounted and the gained probability mass is reserved for Ihe bag el' unseen events (cf. Ney et al. 1997). We accolnplish lhis by a very simple estimalor: lhe Turing-Good estimator (Good 1953) which computes lhe probability mass of unseen events as nl/N where n I is the ,mmber of singleton events and N is the total number of seen events. This probability mass is assigned to the bag o1' Discard-generated fragments. The remaining mass (1 -nl/N)is assigned to the bag o1' Root~Frontier-generated l'ragments. Thus tim total probability mass is redistributed over tim seen and unseen fragments. The probability of each l'ragment is then computed as its relative frequency 2 in its bag multiplied by the prol)ability mass assigned to this bag. Let Ill denote the frequency of a fragment f, then its probability is given by: 2 Bed (2000) discusses some alternative fragment probability estimators, e.g. based on maximum likelihood.</Paragraph> <Paragraph position="9"> (4) P(/'ll'is generated byRootlFrontier) = (I - n i/N) Ifl &quot;~-&quot;J': .fis generated by Rool/Fro,,tier I:'l (5) PUlJis generated byDiscard) = Ill 0q/N) &quot;~'f:fisgene,'aledby Discard I.rl Note that this probability model assigns less probability mass to Discard-generated fragments than Bed & Kaphm's model. For each Root~Frontier-generated fragment there are exponentially many Discard-generated fragments (exponential in the number of features the fragment contains), which means that in Bed & Kaphm's model the Discard-generated IYagnaents absorb a vast amount of probability mass. Our model, on the other hand, assigns a fixed probability mass to the distribution of Discard-generated l'ragments and lherefore the exponential explosion of these fi'agments does not affect the probabilities of Root~Frontier-generated fragments.</Paragraph> </Section> </Section> <Section position="4" start_page="64" end_page="67" type="metho"> <SectionTitle> 3 Testing tile LFG-DOP model </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="64" end_page="65" type="sub_section"> <SectionTitle> 3.1 Computing tile most probable analysis </SectionTitle> <Paragraph position="0"> In his PhD-thesis, C(nmous (1999) describes a parsing algorithnl for I~FG-DOP which is based on tile Tree-DOP parsing teclmique given in Bed (1998). Cormons first converts LFG-representations into more compact indexed lrees: each node in the c-structure is assigned an index which refers to the ~-c(wresponding f-struclure unit. For example, the rcpresentalion in figure 6 is indexed as (S. 1 (NP.2 John.2) (VP. 1 fell. 1 )) where</Paragraph> <Paragraph position="2"> The indexed trees are then fragmented by applying the Tree-DOP decomposition operations described in section 2. Next, the LFG-DOP decomposition operations Root, Frontier and Discard are applied to the f-structure units that correspond to the indices in the e-structure subtrees.</Paragraph> <Paragraph position="3"> ltaving obtained the set of LFG-DOP fragments in this way, each test sentence is parsed by a bottom-up chart parser using initially the indexed subtrees only. Thus only the Category-matching condition is enforced during the chart-parsing process. Tile Uniqueness and Coherence conditions of the corresponding f-structure units are enforced during tile disambiguation (or chartdecoding) process. Disambiguation is accomplished by computing a large number o1' random derivations from the chart; this technique is known as &quot;Monte Carlo disambiguation&quot; and has been extensively described ill the literature (e.g. Bed 1998; Chappelier & Rajman 1998; Goodman 1998). Sampling a random deriwttion l'rom the chart consists of choosing at random one o1' the fragments fi'om the sel of composable fragments at every labeled chart-entry (ill a top-down, leftmost order so as to maintain (he LFG-DOP derivation order). Thus the competition set of composable fragments is computed on the fly at each derivation step during the Monte Carlo sampling process by grouping the f-structure units that unify and that are coherent with the subderivation built so far.</Paragraph> <Paragraph position="4"> As mentioned in 2.4, the Completeness condition can only be checked after the derivation process. Incomplete derivations are simply removed from the sampling distribution. After sampling a large number of random derivations that satisfy the LFG validity requirements, the most probable analysis is estimated by the analysis which results most often from the sampled derivations. For our experiments in section 3.2, we used a sample size of N = 10,000 derivations which corresponds to a maximal standard error o&quot; of 0.005 (o_< 1/(2~/N), see Bed 1998).</Paragraph> </Section> <Section position="2" start_page="65" end_page="67" type="sub_section"> <SectionTitle> 3.2 Experiments with LFG-DOP </SectionTitle> <Paragraph position="0"> We tested LFG-DOP on two LFG-anuotated corpora: the Verbmobil corpus, which contains appointment planning dialogues, and the Homecentre corpus, which contains Xerox printer documentation. Both corpora have been annotated by Xerox PARC. They contain packed LFG-representations (Maxwell & Kaplan 1991) of the grammatical parses of each sentence together with an indication which of these parses is the correct one. The parses are represented in a binary form and were debinarized using software provided to us by Xerox PARC. 3 For our experiments we only used the correct parses of each sentence resulting ill 540 Verbmobil parses and 980 Homecentre parses. Each corpus was divided into a 90% trai,ting set and a 10% test set. This division was random except for one constraint: that all the words ill the test set actually occurred in the training set. The sentences from the test set were parsed and disambiguated by means of the fragments from the training set. Due to memory limitations, we limited the depth of the indexed subtrees to 4. Because of the small 3 Thanks to Hadar Shemtov for providing us wi~.h the relevant software.</Paragraph> <Paragraph position="1"> size of the corpora we averaged our results on 10 different training/test set splits. Besides an exact match accuracy metric, we also used a more fine-grained metric based ell the well-known PARSEVAL metrics that evaluate phrase-structure trees (Black et al. 1991). The PARSEVAL metrics compare a proposed parse P with the corresponding correct treebank parse 7&quot; as follows: In order to apply these metrics to LFG analyses, we extend the PARSEVAL notion of &quot;correct constituent&quot; in the following way: a constituent in P is correct if there exists a constituent ill T of the same label that spans the same words and that C/l)-corresponds to the same f-structure unit.</Paragraph> <Paragraph position="2"> We illustrate the evaluation metrics with a simple example. In the next figure, a proposed parse P is compared with the correct parse T for the test sentence Kim fell. The proposed parse is incorrect since it has the incorrect feature value for the TENSE attribute. Thus, il' this were the only test sentence, the exact match would be 0%. The precision, on the other hand, is higher than 0% as it compares the parse on a constiluent basis. Both the proposed parse and the correct parse contain three constituents: S, NP and VP. While all three constituents ill P have the same label and span the same words as in T, only the NP constituent in P also maps to the same f-structure unit as ill T. The precision is thus equal to 1/3. Note that in this example the recall is equal to the precision, but this need not always be the case.</Paragraph> <Paragraph position="3"> Ill out&quot; expm'iments we are first of all interested ill comparing the performance of Bed & Kaplan's probability model against our probability model (its explained ill section 2.4). Moreover, we also want to study tile contribution of l)iscard-gelmrated fragments to the parse accuracy. We therefore created for each training sol two sets of fragments: one which contains all fragments (up to depth 4) and one which excludes the fragmenls generated by Discard. The exclusion of the \])iscard-generated fragments means lhat all prolmbility mass goes to tile fraglnents generated by Root and I&quot;rot+lier in which case our model is equivalent to Bed & Kaplan's. The following two tables present the results of our experiments where +l)iscard refers to lilt+' full set of fragments and -Discard refers to the fragment The tables show that Bed & Kaplan's model scores exlremely bad if all fragments are used: the exacl lnatch is only 1.1% on tile Verbmobil corptts alld 2.7% on the l\]olllecentre corpus, whereas our model scores respectively 35.9% and 38.4% on these corpora. Also the nlore fine-grained precision and recall scores of P, od & Kaplan's model are quite low: e.g. 13.8% and 11.5% on the Verbmobil corpus, where our tnodel obtains 77.5% and 76.4%. We l'()ulld ()tit Ihat even for the few lest senlences that occur literally in tile training set, Bed & Kaplan's model does not always generate tile correct analysis, whereas our model does. Inlerestingly, lhe accuracy of P, od & Kaphm's model is much higher if Discard-generated fragments are excluded. This suggests that treating generalized fragments l)robabilisl,ically in the same way as ungeueralized fragments ix ha,mful.</Paragraph> <Paragraph position="4"> Connons (1999) has made a mathematical obserwllion which also shows that generalized fragments can get too much probability mass.</Paragraph> <Paragraph position="5"> The tables also show l,hat Otll + way ()f assigning probabilities to Discard-generated fragments leads only Io a slight accuracy increase (compared to tile experiments ill which l)iscard-generated fragments a,'e excluded). According to paired t-testing none of these diffefences ill accuracy were statistically signilicant. This suggests that Discard-goner;lied fragtnents do not significanlly conl,'ibute to tile parse accuracy, or that perhaps these fragments are too ntllllerous to be reliably estimated Ol1 the basis of our small corpora. We also varied the f~lolmbilil,y mass assigned to Discard-generated llaglllenls: eXCel)l, for very small (_< 0.01)or large values (_>.0.88), which led to an accuracy decrease, there was no significant change. 4 It is difficult to say how good or bad our results are with respect to other approaches. The only other published results on tile LFG-annotated Verbmobil and llomecetll,re corpora are by Johnson el, al. (1999) and Johnson & P, iezler (2000) who use a log-linear model lo estimale probabilities. But while we first parse the lest sentences with l'ragnlenls froln tile training set and sul)sequently c()tnpule the IllOSl, pr()bal)le parse, Jollns()ll cl al. directly use the packed lJ+G-representations from lhe lest set to select the most probable parse, Ihetel)y completely skipping the parsing phase (Mark Johnson, p.o.). Moreover, 42% of the Verbmobil sentences and 51% of the llomecentre sentences arc unaml+iguous (i.e.</Paragraph> <Paragraph position="6"> their packed IAVG-representations contain only one analysis), which makes J()hns<.m et al's lask completely trivial for these sentences. In our apl)foaeh, all tosl Selltences wefe alllbiguous, Iestdling ill :.l I/IUC\]I nlol'e difficult lask. A quantitative comparison between our model and Johnson el, al.'s is therefore meaningless.</Paragraph> <Paragraph position="7"> l:inally, we are interested in the impact of functional structures on predicting lhe colreel constituet~t structures. We therefore removed all f-structure ut/ils from the fragtnetll,s (Ihus yielding a Trce1)O1' model) and compared the results against our version of LFG-I)()I ~ (which inclttde tile Discard-generated fraglnenls). We ewlluated tile parse accuracy on tile l,ree-slructures only, using exact match together with tile PARSIr+VAI, measures. We used tile same training/test set splits as in the previot, s experiments and limited tile maximum sublree depth again to 4. The lollowing tables show the results.</Paragraph> <Paragraph position="8"> unimportant fen these cm-pora, they rclnail~ ilnportant for parsing ungrammatical sentences (which was the original motivation for including them -- see Bed & Kaplan 1998). The results indicate that LFG-DOP's functional structures help to improve the parse accuracy of treestructures. In other words, LFG-DOP outperforms Tree-DOP if evaluated on tree-structures only. According to paired t-tests the differences in accuracy were statistically significant.</Paragraph> </Section> </Section> class="xml-element"></Paper>