File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/05/w05-0601_relat.xml
Size: 5,150 bytes
Last Modified: 2025-10-06 14:15:50
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-0601"> <Title>Effective use of WordNet semantics via kernel-based learning</Title> <Section position="7" start_page="6" end_page="7" type="relat"> <SectionTitle> 5 Related Work </SectionTitle> <Paragraph position="0"> The IR studies in this area focus on the term similarity models to embed statistical and external knowledge in document similarity.</Paragraph> <Paragraph position="1"> In (Kontostathis and Pottenger, 2002) a Latent Semantic Indexing analysis was used for term clustering. Such approach assumes that values xij in the transformed term-term matrix represents the similarity (> 0) and anti-similarity between terms i and j. By extension, a negative value represents an anti-similarity between i and j enabling both positive and negative clusters of terms. Evaluation of query expansion techniques showed that positive clusters can improve Recall of about 18% for the CISI collection, 2.9% for MED and 3.4% for CRAN. Furthermore, the negative clusters, when used to prune the result set, improve the precision.</Paragraph> <Paragraph position="2"> The use of external semantic knowledge seems to be more problematic in IR. In (Smeaton, 1999), the impact of semantic ambiguity on IR is studied. A WN-based semantic similarity function between noun pairs is used to improve indexing and document-query matching. However, the WSD algorithm had a performance ranging between 6070%, and this made the overall semantic similarity not effective.</Paragraph> <Paragraph position="3"> Other studies using semantic information for improving IR were carried out in (Sussna, 1993) and (Voorhees, 1993; Voorhees, 1994). Word semantic information was here used for text indexing and query expansion, respectively. In (Voorhees, 1994) it is shown that semantic information derived directly from WN without a priori WSD produces poor results.</Paragraph> <Paragraph position="4"> The latter methods are even more problematic in TC (Moschitti and Basili, 2004). Word senses tend to systematically correlate with the positive examples of a category. Different categories are better characterized by different words rather than different senses. Patterns of lexical co-occurrences in the training data seem to suffice for automatic disambiguation. (Scott and Matwin, 1999) use WN senses to replace simple words without word sense disambiguation and small improvements are derived only for a small corpus. The scale and assessment provided in (Moschitti and Basili, 2004) (3 corpora using cross-validation techniques) showed that even the accurate disambiguation of WN senses (about 80% accuracy on nouns) did not improve TC.</Paragraph> <Paragraph position="5"> In (Siolas and d'Alch Buc, 2000) was proposed an approach similar to the one presented in this article. A term proximity function is used to design a kernel able to semantically smooth the similarity between two document terms. Such semantic kernel was designed as a combination of the Radial Basis Function (RBF) kernel with the term proximity matrix. Entries in this matrix are inversely proportional to the length of the WN hierarchy path linking the two terms. The performance, measured over the 20NewsGroups corpus, showed an improvement of 2% over the bag-of-words. Three main differences exist with respect to our approach. First, the term proximity does not fully capture the WN topological information. Equidistant terms receive the same similarity irrespectively from their generalization level. For example, Sky and Location (direct hyponyms of Entity) receive a similarity score equal to knife and gun (hyponyms of weapon). More accurate measures have been widely discussed in literature, e.g. (Resnik, 1997). Second, the kernel-based CD similarity is an elegant combination of lexicalized and semantic information. In (Siolas and d'Alch Buc, 2000) the combination of weighting schemes, the RBF kernel and the proximitry matrix has a much less clear interpretation. Finally, (Siolas and d'Alch Buc, 2000) selected only 200 features via Mutual Information statistics. In this way rare or non statistically significant terms are neglected while being source of often relevant contributions in the SK space modeled over WN.</Paragraph> <Paragraph position="6"> Other important work on semantic kernel for retrieval has been developed in (Cristianini et al., 2002; Kandola et al., 2002). Two methods for inferring semantic similarity from a corpus were proposed. In the first a system of equations were derived from the dual relation between word-similarity based on document-similarity and viceversa. The equilibrium point was used to derive the semantic similarity measure. The second method models semantic relations by means of a diffusion process on a graph defined by lexicon and co-occurrence information. The major difference with our approach is the use of a different source of prior knowledge.</Paragraph> <Paragraph position="7"> Similar techniques were also applied in (Hofmann, 2000) to derive a Fisher kernel based on a latent class decomposition of the term-document matrix.</Paragraph> </Section> class="xml-element"></Paper>