File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/96/c96-1003_evalu.xml

Size: 7,882 bytes

Last Modified: 2025-10-06 14:00:20

<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-1003">
  <Title>Clustering Words with the MDL Principle</Title>
  <Section position="6" start_page="0" end_page="7" type="evalu">
    <SectionTitle>
5 Experimental Results
</SectionTitle>
    <Paragraph position="0"> --it. he con:party they we i the t:rue model and the estimated model. ('l'hc algorithm used for MI,E was lhe same as that showJt in Figure 1, except the 'data description length' replaces the (total) description length' in Sl.ep 2.) Figure 3(a) plots the number of obtained IIOlllI clusters (leaf nodes in the obtained thesaurus trc~,) w?rsus the input data size, aw;raged ow;r 10 trials.</Paragraph>
    <Paragraph position="1"> (The number of noun clusters in the true model is 4.) Figure 3(b) plots the KI, distance versus the data size, also averaged over l:he san&gt; 10 trials. The results indicalc that MI)L conw,rges to the true Inode\] fasl.er i.\]ian M I,E. Also, MI,I'; tends to select a mo(h'l overfittil:g the data, while Ml)l, t.cnds to seh&gt;ct a. model which is simple and yet tits the data reasonably well.</Paragraph>
    <Paragraph position="3"> We desert b c our experimental rcsull s ill th is section. null</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.1 Experiment 1: MDL v.s. MLE
</SectionTitle>
      <Paragraph position="0"> We COml)ared the performance of elnploying M1)\], as a criterion in our silnulatcd annealing algorithm, against that of employing M IA~; by simulation experiments. We artificially constructed a true model of word co-occurrence, and then generated data according to its distributiou. We then used the data. to estimale a model (clustering words), and measured the I(L distancd ~ between deg'l'he K\], distance (relative Clt|,l:Opy), which is widely used in information theory and sta, tist, ics, is a, nleasur,2 of 'dista, n&lt;:c' l&gt;~\[,wcen two distributions</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="7" type="sub_section">
      <SectionTitle>
5.2 Experiment 2: Qualitative Evaluation
</SectionTitle>
      <Paragraph position="0"> We extracted roughly 180,000 case fl:anles from the bracketed WSJ (Wall Street Journal) corpus of the Penn Tree Bank (Marcus et al., 1993) as co-occurrence data. We then eonstrucl.ed a number of thesauri based on these data, using our method. Figure 2 shows all example thesaurus for the 20 most frequently occurred nouns in the data, constructed based on their appearances as subject and object of roughly 2000 verbs. The obtained thesaurus seems to agree with human intuition to settle degr(~e. For example, 'million' and 'billion' are classilied in one IIOll\[I chlster, alld 'stock' and 'share' arc classified together. Not all of tile IlOUII C\]ltsters, however, seem to be meaningful in the useflll sense. This is probably because the. data size we had was not large enough.</Paragraph>
      <Paragraph position="1"> Pragmatically speaking, however, whethcl: the obtained thesaurus agrees with our intuition in itself is only of secondary concern, since the main lmr pose is to use the constructed t.hcsaurus to help i~uprow~ on a disaml)igual.ion I,ask.</Paragraph>
      <Paragraph position="2"> (('.over and Tl,omas, 1991). \]t is Mways non-negative a.nd is zero iff the two distributions arc identical.</Paragraph>
    </Section>
    <Section position="3" start_page="7" end_page="7" type="sub_section">
      <SectionTitle>
5.3 Experiment 3: Disambiguation
</SectionTitle>
      <Paragraph position="0"> We also evaluated our method by using a constructed thesaurus in a pp-attachment disan&gt; bigua.tion experiment.</Paragraph>
      <Paragraph position="1"> We used as training data the same 180,000 case fl'ames in Experiment 1. We also extracted as our test data 172 (verb, no~nll,prep,'noune) patterns Dora the data in the same corpus, which is not used in the training data. For the 150 words that appear in the position of ,oun.e in the test data, we constructed a thesaurus based on the co-occurrences between heads and slot. values of the fl'ames in the training data. This is because in our disambiguation test we only need a. thesaurus consisting of these 150 words. We then applied the learning method proposed in (Li and Abe, 1995) to learn case fl'ame patterns with the constructed thesaurus as input using the same training data. That is, we used it to learn the conditional distributions P( Classlll,erb, prep), P(Classe \[n, ounl, prep), where Class1 and Classe vary over the internal nodes in a certain 'cut' in the thesaurus tree l0 We then compare  which are estimated based on the case fl'ame patterns, to determine the a.ttachment site of (prep, not*he). More specifically, if the former is larger than the latter, we attach it. to verb, and if the latter is larger tha.n the former, we attach it. to n.o'unl, and otherwise (including when both are 1degEach 'cut.' in a t.hesa.urus tree defines a different noun paxt.ition. See (Li and Abe, 1995) for details. 0), we conclude that we cannot make a decision.</Paragraph>
      <Paragraph position="2"> Table 1 shows the results of our pp-attachment disambiguation experiment in terms of 'coverage' and 'accuracy.' tlere 'coverage' refers to the proportion (in percentage) of the test patterns on which the disambiguation method could make a decision. 'Base Line' refers to tile method of always ~ttaching (prep, noun.~.) to noun1. 'Word-Based', 'MLE-Thesaurus', and 'MDL-Thesaurus' respectively stand tbr using word-based estimates, using a thesaurus constructed by employing MLE, and using a thesaurus constructed by our method.</Paragraph>
      <Paragraph position="3"> Note that the coverage of ~MDL-Thesaurus' signifiea.ntly outperformed that of 'Word-Based', while basically maintaining high accuracy (though it drops somewhat), indicating that using an automatically constructed thesaurus can improve disambiguation results in terms of coverage.</Paragraph>
      <Paragraph position="4"> We also tested the method proposed in (Li and Abe, 1995) of learning case frames patterns using all existing thesaurus. In particular, we used this method with WordNet (Miller et al., 1993) and using the same training data., and then conducted pp-attachment disambiguation experiment using the obtained case frame patterns. We show the result of this experiment as 'WordNet' in Table 1.</Paragraph>
      <Paragraph position="5"> We can see that in terms of 'coverage', ~WordNet' outperforms 'MDL-Thesaurus', but in terms of &amp;quot;accuracy', 'MDL-Thesaurus' outperforms 'Word-Net.'. These results can be interpreted as follows. An automa.tically constructed thesaurus is more domaiu dependent and captures the domain dependent features better, and thus using it achieves high accuracy. On the other hand, since training data. we had available is insufficient, its coverage is smaller than that of a hand made thesaurus.</Paragraph>
      <Paragraph position="6"> In practice, it makes sense to combine both types of thesauri. More specifically, an atttomatically constructed thesaurus can be used within its coverage, and outside its coverage, a hand made thesaurus can be used. Given the current state of the word clustering technique (namely, it requires data size that is usually not available, and it tends to be computationally demanding), this strategy is practical. We show the result of this combined  cow:rage of disainbiguation. We also tested +M1)I, Thesaurus + WordNel.-t- I,A -t- l)('fatllt.', which sl:ands for using l.hc' learm~d thesaurus altd \Vord-Net first+, t.heu t.he lexical associal.iotl valtm l&gt;rO posed by (lIindle a.nd F/.ooth, 1991), and finally tile defa.ull; (i.e. always atl.aching \])/'el), *~ottl+2 l;o no+tn~). Our hest disaml)iguatioll rcsull, obtained using t, his last; combined niet.tiod sontewhat improves t, he accuracy rei&gt;orl.ed itt (Li and At&gt;&lt;', 1.995) (84.3%).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML