File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/w98-0716_intro.xml
Size: 1,777 bytes
Last Modified: 2025-10-06 14:06:46
<?xml version="1.0" standalone="yes"?> <Paper uid="W98-0716"> <Title>I I I I I I I i I I I I I I I I I I I A Comparison of WordNet and Roget's Taxonomy for Measuring Semantic Similarity</Title> <Section position="3" start_page="116" end_page="117" type="intro"> <SectionTitle> 3 The frequencies were computed for Roget's as the </SectionTitle> <Paragraph position="0"> total frequency for each word divided by the number of senses in Roget. This gives us an approximation of the information content for each concept. The frequency data were taken from the MRC Psycholinguistic database available from the Oxford Text Archive.</Paragraph> <Paragraph position="1"> the hierarchy) computations for the entire Roget hierarchy. This is sizeable overhead compared to edge counting which requires no a priori computations. Of course, once the computations are done they do not need to be recomputed until a new word is added to the hierarchy. Since the values for information content bubble up from the words, each addition of a word would require that all the hierarchy above it be recomputed.</Paragraph> <Paragraph position="2"> Jiang and Conrath (Jiang and Conrath 97) also used information content to measure semantic relatedness but they combined it with edge counting using a formula that also took into consideration local density, node depth and link type. They optimized the formula by using two parameters, ct and ~, that controlled the degree of how much the node depth and density factors contributed to the edge weighting computation. If t~----0 and 13=1, then their formula for the distance between two concepts cl and c2 simplifies to</Paragraph> <Paragraph position="4"> Where LS(cbc2) denotes the lowest superordinate ofcl and c2.</Paragraph> </Section> class="xml-element"></Paper>