File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/j98-2002_intro.xml

Size: 4,633 bytes

Last Modified: 2025-10-06 14:06:27

<?xml version="1.0" standalone="yes"?>
<Paper uid="J98-2002">
  <Title>Generalizing Case Frames Using a Thesaurus and the MDL Principle</Title>
  <Section position="4" start_page="220" end_page="222" type="intro">
    <SectionTitle>
ANIMAL
BIRD INSECT
</SectionTitle>
    <Paragraph position="0"> swallow crow eagle bird bug bee insect Figure 3 An example thesaurus.</Paragraph>
    <Paragraph position="1"> case-pattern acquisition process. It is also possible to extend our model so that each word probabilistically belongs to several different classes, which would allow us to resolve both structural and word-sense ambiguities at the time of disambiguation. 2 Employing probabilistic membership, however, would make the estimation process significantly more computationally demanding. We therefore leave this issue as a future topic, and employ a simple heuristic of equally distributing each word occurrence in the data to all of its potential word senses in our experiments. Since our learning method based on MDL is robust against noise, this should not significantly degrade performance.</Paragraph>
    <Section position="1" start_page="220" end_page="222" type="sub_section">
      <SectionTitle>
2.3 The Tree Cut Model
</SectionTitle>
      <Paragraph position="0"> Since the number of partitions for a given set of nouns is extremely large, the problem of selecting the best model from among all possible class-based models is most likely intractable. In this paper, we reduce the number of possible partitions to consider by using a thesaurus as prior knowledge, following a basic idea of Resnik's (1992).</Paragraph>
      <Paragraph position="1"> In particular, we restrict our attention to those partitions that exist within the thesaurus in the form of a cut. By thesaurus, we mean a tree in which each leaf node stands for a noun, while each internal node represents a noun class, and domination stands for set inclusion (see Figure 3). A cut in a tree is any set of nodes in the tree that defines a partition of the leaf nodes, viewing each node as representing the set of all leaf nodes it dominates. For example, in the thesaurus of Figure 3, there are five cuts: \[ANIMAL\], \[BIRD, INSECT\], \[BIRD, bug, bee, insect\], \[swallow, crow, eagle, bird, INSECT\], and \[swallow, crow, eagle, bird, bug, bee, insect\]. The class of tree cut models of a fixed thesaurus tree is then obtained by restricting the partition P in the definition of a class-based model to be those partitions that are present as a cut in that thesaurus tree.</Paragraph>
      <Paragraph position="2"> Formally, a tree cut model M can be represented by a pair consisting of a tree cut lP and a probability parameter vector 0 of the same length, that is:</Paragraph>
      <Paragraph position="4"> k+l where C1, C2 ..... Ck+l is a cut in the thesaurus tree and ~i=1 P(Ci) = 1 is satisfied.</Paragraph>
      <Paragraph position="5"> For simplicity we sometimes write P(Ci), i = 1 ..... (k + 1) for P(Ci \[ v, r).</Paragraph>
      <Paragraph position="6"> If we use MLE for the parameter estimation, we can obtain five tree cut models from the co-occurrence data in Figure 1; Figures 4-6 show three of these. For example,  swa'll .... ow eagle bi'rd bug bee ins'ect Figure 5 A tree cut model with \[BIRD, bug, bee, insect\].</Paragraph>
      <Paragraph position="7"> ~- (\[BIRD, bug, bee, insect\], \[0.8,0,0.2,0\]) shown in Figure 5 is one such tree cut model. Recall that M defines a conditional probability distribution PM(n I v,r) as follows: For any noun that is in the tree cut, such as bee, the probability is given as explicitly specified by the model, i.e., PM(bee I flY, argl) = 0.2. For any class in the tree cut, the probability is distributed uniformly to all nouns dominated by it. For example, since there are four nouns that fall under the class BIRD, and swallow is one of them, the probability of swallow is thus given by Pt~(swallow I flY, argl) = 0.8/4 = 0.2. Note that the probabilities assigned to the nouns under BIRD are smoothed, even if the nouns have different observed frequencies.</Paragraph>
      <Paragraph position="8"> We have thus formalized the problem of generalizing values of a case frame slot as that of estimating a model from the class of tree cut models for some fixed thesaurus tree; namely, selecting a model that best explains the data from among the class of tree cut models.</Paragraph>
      <Paragraph position="9"> 3. Generalization Method Based On MDL The question now becomes what strategy (criterion) we should employ to select the best tree-cut model. We adopt the Minimum Description Length principle (Rissanen 1978,</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML