File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/98/w98-0716_relat.xml

Size: 2,813 bytes

Last Modified: 2025-10-06 14:16:11

<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-0716">
  <Title>I I I I I I I i I I I I I I I I I I I A Comparison of WordNet and Roget's Taxonomy for Measuring Semantic Similarity</Title>
  <Section position="5" start_page="118" end_page="119" type="relat">
    <SectionTitle>
5 Related Work
</SectionTitle>
    <Paragraph position="0"> Agirre and Rigau (Agirre and Rigau 1996) use a conceptual distance formula that was created to by sensitive to the length of the shortest path that connects the concepts involved, the depth of the hierarchy and the density of concepts in the hierarchy. Their work was designed for measuring words in context and is not directly applicable to the isolated word pair measurements done here. Agirre and Rigau feel that concepts in a dense part of the hierarchy are relatively closer than those in a more sparse region; a point which was covered above. To measure the distance, they use a conceptual density formula. The Conceptual Density of a concept, as they define it, is the ratio of areas; the area expected beneath the concept divided by the area actually beneath it.</Paragraph>
    <Paragraph position="1"> Some of the results given in Table 1 seem to support the use of density. The word pairs forest-graveyard and chord-smile both have an edge distance of 8. The number of intervening words for each pair are considerably different (296 and 3253 respectively). For these particular word pairs the latter numbers more closely match the ranking given by humans. If one considers density important then perhaps we can use a different measure of density by computing the number of intervening words per edge 4. This metric was tested with the 28 word pairs and the results were a slight improvement (r=.6472) over the number of intervening words but are still well below that attained by simple edge counting.</Paragraph>
    <Paragraph position="2">  This paper presented the results of using Roget's International Thesaurus as the taxonomy in a semantic similarity measurement task. Four similarity metrics were taken from the literature and applied to Roget's. The experimental evaluation suggests that the traditional edge counting approach does surprisingly well (a correlation of r=0.8862 with a benchmark set of human similarity judgements, with an upper  bound of r=0.9015 for human subjects performing the same task.) The results should provide incentive to those wishing to understand the effect of various attributes on metrics for semantic relatedness across hierarchies. Further investigation of why this dramatic improvement in edge counting occurs in the shallow, uniform hierarchy of Roget's needs to be conducted. The results should prove beneficial to those doing research with Roget's, WordNet and other semantic based hierarchies.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML