File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/99/p99-1050_evalu.xml

Size: 3,713 bytes

Last Modified: 2025-10-06 14:00:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="P99-1050">
  <Title>Projecting Corpus-Based Semantic Links on a Thesaurus*</Title>
  <Section position="7" start_page="393" end_page="394" type="evalu">
    <SectionTitle>
5 Evaluation
</SectionTitle>
    <Paragraph position="0"> Table 2 shows the results of the projection of corpus-based links. The first column indicates the semantic class from Table 1. The next  three columns indicate the number of multi-word links projected through Specialization, the number of correct links and the corresponding value of precision. The same values are provided for Transfer projections in the following three columns.</Paragraph>
    <Paragraph position="1"> Transfer projections are more frequent (507 links) than Specializations (77 links). Some classes, such as chemical elements, cereals and fruits are very productive because they are composed of generic terms. Other classes, such as trees, vegetables, polyols or proteins, yield few semantic variations. They tend to contain more specific or less frequent terms.</Paragraph>
    <Paragraph position="2"> The average precision of Specializations is relatively low (58.4% on average) with a high standard deviation (between 16.7% and 100%).</Paragraph>
    <Paragraph position="3"> Conversely, the precision of Transfers is higher (83.8% on average) with a smaller standard deviation (between 69.0% and 100%). Since Transfers are almost ten times more numerous than Specializations, the overall precision of projections is high: 80.5%.</Paragraph>
    <Paragraph position="4"> In addition to relations between multi-word terms, the projection of single-word hierarchies on multi-word terms yields new candidate terms: the variants of candidate terms produced at the first step. For instance, sdchage de la banane (banana drying) is a semantic variant of sdchage de fruits (fruit drying) which is not provided by the first step of the process. As in the case of links, the production of multi-word terms is more important with Transfers (72 multi-word terms) than Specializations (345 multi-word terms) (see Table 3). In all, 417 relevant multi-word terms are acquired through semantic variation.</Paragraph>
    <Paragraph position="5"> Comparison with AGROVOC Links In order to compare the projection of corpus-based links with the projection of links extracted from a thesaurus, a similar study was made using semantic links from the thesaurus (AGROVOC, 1995). 6 The results of this second experiment are very similar to the first experiment. Here, the preci6(AGROVOC, 1995) is composed of 15,800 descriptors but only single-word terms found in the corpus \[AGRO\] are used in this evaluation (1,580 descriptors). From these descriptors, 168 terms representing 4 topics (cultivation, plant anatomy, plant products and flavorings) axe selected for the purpose of evaluation. sion of Specializations is similar (57.8% for 45 links inferred), while the precision of Transfers is slightly lower (72.4% for 326 links inferred). Interestingly, these results show that links resulting from the projection of a thesaurus have a significantly lower precision (70.6%) than projected corpus-based links (80.5%).</Paragraph>
    <Paragraph position="6"> A study of Table 3 shows that, while 197 projected links are produced from 94 corpus-based links (ratio 2.1), only 88 such projected links are obtained through the projection of 159 links from AGROVOC (ratio 0.6). Actually, the ratio of projected links is higher with corpus-based links than thesaurus links, because corpus-based links represent better the ontology embodied in the corpus and associate more easily with other single word to produce projected hierarchies.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML