File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/98/p98-2180_evalu.xml

Size: 5,148 bytes

Last Modified: 2025-10-06 14:00:34

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-2180">
  <Title>MindNet: acquiring and structuring semantic information from text</Title>
  <Section position="9" start_page="1099" end_page="1100" type="evalu">
    <SectionTitle>
8 Similarity and inference
</SectionTitle>
    <Paragraph position="0"> Many researchers, both in the dictionary- and corpus-based camps, have worked extensively on developing methods to identify similarity between words, since similarity determination is crucial to many word sense disambiguation and parametersmoothing/inference procedures. However, some researchers have failed to distinguish between substitutional similarity and general relatedness. The similarity procedure of MindNet focuses on measuring substitutional similarity, but a function is also provided for producing clusters of generally related words.</Paragraph>
    <Paragraph position="1"> Two general strategies have been described in the literature for identifying substitutional similarity. One is based on identifying direct, paradigmatic relations between the words, such as Hypernym or Synonym.</Paragraph>
    <Paragraph position="2"> For example, paradigmatic relations in WordNet have been used by many to determine similarity, including Li et al. (1995) and Agirre and Rigau (1996). The other strategy is based on identifying syntagmatic relations with other words that similar words have in common.</Paragraph>
    <Paragraph position="3"> Syntagmatic strategies for determining similarity have often been based on statistical analyses of large corpora that yield clusters of words occurring in similar bigram and trigram contexts (e.g., Brown et al. 1992, Yarowsky 1992), as well as in similar predicate-argument structure contexts (e.g., Grishman and Sterling 1994).</Paragraph>
    <Paragraph position="4"> There have been a number of attempts to combine paradigmatic and syntagmatic similarity strategies (e.g., Hearst and Grefenstette 1992, Resnik 1995). However, none of these has completely integrated both syntagmatic and paradigmatic information into a single repository, as is the case with MindNet.</Paragraph>
    <Paragraph position="5"> The MindNet similarity procedure is based on the top-ranked (by weight) semrel paths between words.</Paragraph>
    <Paragraph position="6"> For example, some of the top semrel paths in MindNet between pen and pencil, are shown below:  pencil In the above example, a pattern of semrel symmetry clearly emerges in many of the paths. This observation of symmetry led to the hypothesis that similar words are typically connected in MindNet by semrel paths that frequently exhibit certain patterns of relations (exclusive of the words they actually connect), many patterns being symmetrical, but others not.</Paragraph>
    <Paragraph position="7"> Several experiments were performed in which word pairs from a thesaurus and an anti-thesaurus (the latter containing dissimilar words) were used in a training phase to identify semrel path patterns that indicate similarity. These path patterns were then used in a testing phase to determine the substitutional similarity or dissimilarity of unseen word pairs (algorithms are described in Richardson 1997). The results, summarized in the table below, demonstrate the strength of this integrated approach, which uniquely exploits both the paradigmatic and the syntagmatic relations in MindNet.</Paragraph>
    <Paragraph position="8">  Training: over 100,000 word pairs from a thesaurus and anti-thesaurus produced 285,000 semrel paths containing approx. 13,500 unique path patterns.</Paragraph>
    <Paragraph position="9"> Testing: over 100,000 (different) word pairs from a thesaurus and anti-thesaurus were evaluated using the path patterns. Similar correct Dissimilar correct 84% 82% Human benchmark: random sample of 200 similar and dissimilar word pairs were evaluated by 5 humans and by MindNet: Similar correct Dissimilar correct  This powerful similarity procedure may also be used to extend the coverage of the relations in MindNet. Equivalent to the use of similarity determination in corpus-based approaches to infer absent n-grams or triples (e.g., Dagan et al. 1994, Grishman and Sterling 1994), an inference procedure has been developed which allows semantic relations not presently in MindNet to be inferred from those that are. It also exploits the top-ranked paths between the words in the relation to be inferred. For example, if the relation watch--Means--&gt;telescope were not in MindNet, it could be inferred by first finding the semrel paths between watch and telescope, examining those paths to see if another word appears in a Means relation with telescope, and then checking the similarity between that word and watch. As it turns out, the word observe satisfies these conditions in the path: watch--Hyp--&gt;observe--Means-&gt;telescope and therefore, it may be inferred that one can watch by Means of a telescope. The seamless integration of the inference and similarity procedures, both utilizing the weighted, extended paths derived from inverted semrel structures in MindNet, is a unique strength of this approach.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML