File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/00/c00-1002_evalu.xml

Size: 2,980 bytes

Last Modified: 2025-10-06 13:58:34

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-1002">
  <Title>Learning Word Clusters from Data Types</Title>
  <Section position="5" start_page="12" end_page="12" type="evalu">
    <SectionTitle>
4 Experiment and evaluation
</SectionTitle>
    <Paragraph position="0"> We, were able to extract all S l's relative to the entire K\]3. However, we report here an intrinsic evaluation of the accuracy of acquired ce.ntroids which involves ()lily a small subset of our results, since provision of a refhrence class tyl)ology is extremely labour intensive. 1 We consider 20 Italian verbs and their object collocates.gThe object collocates were automatically extracted fi'om the &amp;quot;Italian SPAIIKLE  a,sl)cttarc 'expect', cambiarc 'change', C(t*taO,?'t: t(:}tllSe:~ chicdcrc 'ask', considc.rarc 'consider', dare. 'give', dcciderc 'decide', fornive 'provide', muoverc. 'move', pcrm~'.ttere 'allow', portarc 'bring', p~wlurrc 'produce', sccglicrc 'choose', sentirc 'feel', stabilire 'establislF, tagliarc 'cut', terminate 'end', trovarc 'find'.</Paragraph>
    <Paragraph position="1"> newspapers of about one million word tokens (Federici ct al. 1998).</Paragraph>
    <Paragraph position="2"> l?or each test verb, an indetmndent classification of its collocates was created lnanually, by partitioning the collocates into disjoint sets of semantically coherent lexical preferences, each set pointing to distinct senses of the test; verb, according to a reference monolingual dictionary (Garzanti 1984). This considerably reduces the anlount of subjectivity inevitably involved in the creation of a reference partition, and minimizes the probability that more than one sense of a t)olysemous noun can appear in the same class of collocates.</Paragraph>
    <Paragraph position="3"> The inferred centroids, selected from clusters ranked by c~(SI) defined as in (9), are t)rojected ~gainst the reference classification. Precision is delined as the ratio between 1;t1(; mmflmr of controids t)roperly inchlded in one reference class and l;he nmnber of inferred centroids. Recall is defined as the ratio between the number of relhrtime classes which properly inchlde at least one centroid an(t the nmnber of all reference classes.</Paragraph>
    <Paragraph position="4"> Fig.3 shows results for the sets of object collocates of 1)olysemous {;est verbs only, as lttOltOSemous verbs trivially yMd 100% precision recall. An average, wflue over the sets of object collocates of all verbs is also shown, with 86% 88% of precision recall. Another average value is also l)lotted (as a black ul)right triangle), ol)l:ained \])y ranking n(mn clusters l)y ~(S\]) calculated as ill (10). This average wflue (53% 53% precision recall) provides a sort of baseline of the difliculty of the task, and sheds considerable light on the use of APs, rather than simple verb noun pairs, as inforlnation units ibr measuring internal cohesion of centroids.</Paragraph>
    <Paragraph position="6"/>
  </Section>
class="xml-element"></Paper>
Download Original XML