File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/98/j98-2002_concl.xml
Size: 5,074 bytes
Last Modified: 2025-10-06 13:57:56
<?xml version="1.0" standalone="yes"?> <Paper uid="J98-2002"> <Title>Generalizing Case Frames Using a Thesaurus and the MDL Principle</Title> <Section position="9" start_page="238" end_page="240" type="concl"> <SectionTitle> 5. Conclusions </SectionTitle> <Paragraph position="0"> We proposed a new method of generalizing case frames. Our approach of applying MDL to estimate a tree cut model in an existing thesaurus is not limited to just the problem of generalizing values of a case frame slot. It is potentially useful in other natural language processing tasks, such as the problem of estimating n-gram models (Brown et al. 1992) or the problem of semantic tagging (Cucchiarelli and Velardi 1997).</Paragraph> <Paragraph position="1"> We believe that our method has the following merits: (1) it is theoretically sound; (2) it is computationally efficient; (3) it is robust against noise. Our experimental results indicate that the performance of our method is better than, or at least comparable to, existing methods. One of the disadvantages of our method is that its performance Computational Linguistics Volume 24, Number 2 depends on the structure of the particular thesaurus used. This, however, is a problem commonly shared by any generalization method that uses a thesaurus as prior knowledge.</Paragraph> <Section position="1" start_page="239" end_page="240" type="sub_section"> <SectionTitle> Appendix A: Proof of Proposition 1 Proof </SectionTitle> <Paragraph position="0"> For an arbitrary subtree T' of a thesaurus tree T and an arbitrary tree cut model M = (F,0) of T, let MT, = (FT,,0T,) denote the submodel of M that is contained in T'. Also for any sample S and any subtree T' of T, let ST, denote the subsample of S contained in T'. (Note that MT = M, ST = S.) Then define, in general for any submodel MT, and subsample ST,, L(ST, \[ FT,, ~T') to be the data description length of subsample ST, using submodel MT,, L(~T, \[ FT,) to be the parameter description length for the submodel MT,, and L'(MT,,ST,) to be L(ST, I FT',~T') q- L(~T, \[ FT,). (Note that, when calculating the parameter description length for a submodel, the sample size of the entire sample \]S\] is used.) First note that for any (sub)tree T, (sub)model MT = (FT, ~T) contained in T, and (sub)sample ST contained in T, and T's child subtrees Ti : i = 1,..., k, we have:</Paragraph> <Paragraph position="2"> provided that Fz is not a single node (root node of T). This follows from the mutual disjointness of the Ti, and the independence of the parameters in the Ti.</Paragraph> <Paragraph position="3"> We also have, when T is a proper subtree of the thesaurus tree:</Paragraph> <Paragraph position="5"> Since the number of free parameters of a model in the entire thesaurus tree equals the number of nodes in the model minus one due to the stochastic condition (that the probability parameters must sum to one), when T equals the entire thesaurus tree, theoretically the parameter description length for a tree cut model of T should be:</Paragraph> <Paragraph position="7"> where ISI is the size of the entire sample. Since the second term -~ in (19) is constant once the input sample S is fixed, for the purpose of finding a model with the minimum description length, it is irrelevant. We will thus use the identity (18) both when T is the entire tree and when it is a proper subtree. (This allows us to use the same recursive algorithm, Find-MDL, in all cases.) It follows from (17) and (18) that the minimization of description length can be done essentially independently for each subtree. Namely, if we let Clmin (MT, ST) denote the minimum description length (as defined by \[17\] and \[18\]) achievable for (sub)model Mr on (sub)sample ST contained in (sub)tree T, \[)s(~) the MLE estimate for node ~\] Li and Abe Generalizing Case Frames using the entire sample S, and root(T) the root node of tree T, then we have:</Paragraph> <Paragraph position="9"> The rest of the proof proceeds by induction. First, when T is of a single leaf node, the submodel consisting solely of the node and the MLE of the generation probability for the class represented by T is returned, which is clearly a submodel with minimum description length in the subtree T. Next, inductively assume that Find-MDL(T ~) correctly outputs a (sub)model with the minimum description length for any tree T' of size less than n. Then, given a tree T of size n whose root node has at least two children, say Ti : i = 1 ..... k, for each Ti, Find-MDL(Ti) returns a (sub)model with the minimum description length by the inductive hypothesis. Then, since (20) holds, whichever way the if-clause on lines 8, 9 of Find-MDL evaluates to, what is returned on line 11 or line 13 will still be a (sub)model with the minimum description length, completing the inductive step.</Paragraph> <Paragraph position="10"> It is easy to see that the running time of the algorithm is linear in both the number of leaf nodes of the input thesaurus tree and the input sample size. *</Paragraph> </Section> </Section> class="xml-element"></Paper>