File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/w98-0705_metho.xml

Size: 14,124 bytes

Last Modified: 2025-10-06 14:15:08

<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-0705">
  <Title>I I I I I I I I I I I I Indexing with WordNet synsets can improve text retrieval</Title>
  <Section position="3" start_page="38" end_page="39" type="metho">
    <SectionTitle>
2 The test collection
</SectionTitle>
    <Paragraph position="0"> The best-known publicly available corpus hand-tagged with WordNet senses is SEMCOR (Miller et al., 1993), a subset of the Brown Corpus of about 100 documents that occupies about 11 Mb. (including tags) The collection is rather heterogeneous, covering politics, sports, music, cinema, philosophy, excerpts from fiction novels, scientific texts... A new, bigger version has been made available recently (Landes et al., 1998), but we have not still adapted it for our collection.</Paragraph>
    <Paragraph position="1"> We have adapted SEMCOR in order to build a test collection -that we call IR-SEMCOR- in four manual steps: * We have split the documents to get coherent chunks of text for retrieval. We have obtained 171 fragments that constitute our text collection, with an averagv length of 1331 words per fragment.</Paragraph>
    <Paragraph position="2"> * We have extended the original TOPIC tags of the Brown Corpus with a hierarchy of subtags, assigning a set of tags to each text in our collection. This is not used in the experiments reported here.</Paragraph>
    <Paragraph position="3"> * We have written a summary for each of the fragments, with lengths varying between 4 and 50 words and an average of 22 words per summary.</Paragraph>
    <Paragraph position="4"> Each summary is a human explanation of the text contents, not a mere bag of related keywords. These summaries serve as queries on the text collection, and then there is exactly one relevant document per query.</Paragraph>
    <Paragraph position="5"> * Finally, we have hand-tagged each of the summaries with WordNet 1.5 senses. When a word or term was not present in the database, it was left unchanged. In general, such terms correspond to groups (vg. Fulton_County_Grand-Jury), persons (Cervantes) or locations (Fulton).</Paragraph>
    <Paragraph position="6"> We also generated a list Of &amp;quot;stop-senses&amp;quot; and a list of &amp;quot;stop-synsets', automatically translating a standard list of stop words for English.</Paragraph>
    <Paragraph position="7"> Such a test collection offers the chance to measure the adequacy of WordNet-based approaches to IR independently from the disambiguator being used, but also offers the chance to measure the role of automatic disambiguation by introducing different rates  of &amp;quot;disambignation errors&amp;quot; in the collection. The only disadvantage is the small size of the collection, which does not allow fine-grained distinctions in the results. However, it has proved large enough to give meaningful statistics for the experiments reported here.</Paragraph>
    <Paragraph position="8"> Although designed for our concrete text retrieval testing purposes, the resulting database could also be useful for many other tasks. For instance, it could be used to evaluate automatic summarization systems (measuring the semantic relation between the manually written and hand-tagged summaries of IR-SEMCOR and the output of text summarization systems) and other related tasks.</Paragraph>
  </Section>
  <Section position="4" start_page="39" end_page="40" type="metho">
    <SectionTitle>
3 The experiments
</SectionTitle>
    <Paragraph position="0"> We have performed a number of experiments using a standard vector-model based text retrieval system, SMART (Salton, 1971), and three different indexing spaces: the original terms in the documents (for standard SMART runs), the word-senses corresponding to the document terms (in other words, a manually disambiguated version of the documents) and the WordNet synsets corresponding to the document terms (roughly equivalent to concepts occurring in the documents).</Paragraph>
    <Paragraph position="1"> These are all the experiments considered here:  1. The original texts as documents and the summaries as queries. This is a classic SMART run, with the peculiarity that there is only one relevant document per query.</Paragraph>
    <Paragraph position="2"> 2. Both documents (texts) and queries (sum null maries) are indexed in terms of word-senses.</Paragraph>
    <Paragraph position="3"> That means that we disambiguate manually all terms. For instance &amp;quot;debate&amp;quot; might be substituted with &amp;quot;debate~l:10:01:?'. The three numbers denote the part of speech, the WordNet lexicographer's file and the sense number within the file. In this case, it is a noun belonging to the noun.communication file.</Paragraph>
    <Paragraph position="4"> With this collection we can see if plain disambiguation is helpful for retrieval, because word senses are distinguished but synonymous word senses are not identified.</Paragraph>
    <Paragraph position="5"> 3. In the previous collection, we substitute each word sense for a unique identifier of its associated synset. For instance, &amp;quot;debate~l:lO:01:.&amp;quot; is substituted with &amp;quot;n04616654&amp;quot;, which is an identifier for &amp;quot;{argument, debate1}&amp;quot; (a discussion in which reasons are advanced for and against some proposition or proposal; &amp;quot;the argument over foreign aid goes on and on') This collection represents conceptual indexing, as equivalent word senses are represented with a unique identifier.</Paragraph>
    <Paragraph position="6"> 4. We produced different versions of the synset indexed collection, introducing fixed percentages of erroneous synsets. Thus we simulated a word-sense disambiguation process with 5%, 10%, 20%, 30% and 60% error rates. The errors were introduced randomly in the ambiguous words of each document. With this set of experiments we can measure the sensitivity of the retrieval process to disambiguation errors.</Paragraph>
    <Paragraph position="7"> 5. To complement the previous experiment, we also prepared collections indexed with all possible meanings (in their word sense and synset versions) for each term. This represents a lower bound for automatic disambiguation: we should not disambiguate if performance is worse than considering all possible senses for every word form.</Paragraph>
    <Paragraph position="8"> 6. We produced also a non-disambiguated version of the queries (again, both in its word sense and  against the manually disambiguated collection.</Paragraph>
    <Paragraph position="9"> In all cases, we compared arc and ann standard weighting schemes, and they produced very similar results. Thus we only report here on the results for nnn weighting scheme.</Paragraph>
  </Section>
  <Section position="5" start_page="40" end_page="42" type="metho">
    <SectionTitle>
4 Discussion of results
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="40" end_page="41" type="sub_section">
      <SectionTitle>
4.1 Indexing approach
</SectionTitle>
      <Paragraph position="0"> In Figure 1 we compare different indexing approaches: indexing by synsets, indexing by words (basic SMART) and indexing by word senses (experiments 1, 2 and 3). The leftmost point in each curve represents the percentage of documents that were successfully ranked as the most relevant for its summary/query. The next point represents the documents retrieved as the first or the second most relevant to its summary/query, and so on. Note that, as there is only one relevant document per query, the leftmost point is the most representative of each curve. Therefore, we have included this results separately in Table 1.</Paragraph>
      <Paragraph position="1"> The results are encouraging: * Indexing by WordNet synsets produces a remarkable improvement on our test collection.</Paragraph>
      <Paragraph position="2"> A 62% of the documents are retrieved in first place by its summary, against 48% of the basic SMART run. This represents 14% more documents, a 29% improvement with respect to SMART. This is an excellent result, although we should keep in mind that is obtained with manually disambiguated queries and documents. Nevertheless, it shows that WordNet can greatly enhance text retrieval: the problem resides in achieving accurate automatic Word Sense Disambiguation.</Paragraph>
      <Paragraph position="3"> * Indexing by word senses improves performance when considering up to four documents retrieved for each query/summary, although it is worse than indexing by synsets. This confirms our intuition that synset indexing has advantages over plain word sense disambiguation, because it permits matching semantically similar terms.</Paragraph>
      <Paragraph position="4"> Taking only the first document retrieved for each summary, the disambiguated collection gives a 53.2% success against a 48% of the plain SMART query, which represents a 11% improvement. For recall levels higher than 0.85, however, the disambiguated collection performs slightly worse. This may seem surprising, as word sense disambiguation should only increase our knowledge about queries and documents.</Paragraph>
      <Paragraph position="5"> But we should bear in mind that WordNet 1.5 is not the perfect database for text retrieval, and indexing by word senses prevents some matchdeg ings that can be useful for retrieval. For in-</Paragraph>
      <Paragraph position="7"> ! ! I ! 1. Manual disambiguation x 2. 5% error -~--3. 10% error -E3-4. 20% error .-~ ..... 5. 30% error --~--6. All possible synsets per word (without disambigua~on) -~.7. 60% error -&lt;,--8. SMART -~'--21 3~ .... ::.-....-...:.. &amp;quot;-....-..&gt;. * .--..::....... ..... ~. .~ dege. ~--</Paragraph>
    </Section>
    <Section position="2" start_page="41" end_page="41" type="sub_section">
      <SectionTitle>
Recall
</SectionTitle>
      <Paragraph position="0"> stance, design is used as a noun repeatedly in one of the documents, while its summary uses design as a verb. WordNet 1.5 does not include cross-part-of-speech semantic relations, so this relation cannot be used with word senses, while term indexing simply (and successfully!) does not distinguish them. Other problems of Word-Net for text retrieval include too much fine-grained sense-distinctions and lack of domain information; see (Gonzalo et al., In press) for a more detailed discussion on the adequacy of WordNet structure for text retrieval.</Paragraph>
    </Section>
    <Section position="3" start_page="41" end_page="42" type="sub_section">
      <SectionTitle>
4.2 Sensitivity to disambiguation errors
</SectionTitle>
      <Paragraph position="0"> Figure 2 shows the sensitivity of the synset indexing system to degradation of disambiguation accuracy (corresponding to the experiments 4 and 5 described above). Prom the plot, it can be seen that: * Less than 10% disambiguating errors does not substantially affect performance. This is roughly in agreement with (Sanderson, 1994).</Paragraph>
      <Paragraph position="1"> * For error ratios over 10%, the performance degrades quickly. This is also in agreement with (Sanderson, 1994).</Paragraph>
      <Paragraph position="2"> * However, indexing by synsets remains better than the basic SMART run up to 30% disambiguation errors. From 30% to 60%, the data does not show significant differences with standard SMART word indexing. This prediction differs from (Sanderson, 1994) result (namely, that it is better not to disambiguate below a 90% accuracy). The main difference is that we are using concepts rather than word senses.</Paragraph>
      <Paragraph position="3"> But, in addition, it must be noted that Sanderson's setup used artificially created ambiguous pseudo words (such as 'bank/spring ~ which are not guaranteed to behave as real ambiguous words. Moreover, what he understands as disambiguating is selecting -in the example- bank or spring which remain to be ambiguous words themselves.</Paragraph>
      <Paragraph position="4"> * If we do not disambiguate, the performance is slightly worse than disambiguating with 30% errors, but remains better than term indexing, although the results are not definitive. An interesting conclusion is that, if we can disambiguate reliably the queries, WordNet synset indexing could improve performance even without disambiguating the documents. This could be confirmed on much larger collections, as it does not involve manual disambiguation.</Paragraph>
      <Paragraph position="5"> It is too soon to say if state-of-the-art WSD techniques can perform with less than 30% errors, because each technique is evaluated in fairly different settings. Some of the best results on a comparable setting (namely, disambiguating against Word-Net, evaluating on a subset of the Brown Corpus, and treating the 191 most frequently occurring and  ambiguous words of English) are reported reported in (Ng, 1997). They reach a 58.7% accuracy on a Brown Corpus subset and a 75.2% on a subset of the Wall Street Journal Corpus. A more careful evaluation of the role of WSD is needed to know if this is good enough for our purposes.</Paragraph>
      <Paragraph position="6"> Anyway, we have only emulated a WSD algorithm that just picks up one sense and discards the rest. A more reasonable approach here could be giving different probabilities for each sense of a word, and use them to weight synsets in the vectorial representation of documents and queries.</Paragraph>
    </Section>
    <Section position="4" start_page="42" end_page="42" type="sub_section">
      <SectionTitle>
4.3 Performance for non-disambiguated
</SectionTitle>
      <Paragraph position="0"> queries In Figure 3 we have plot the results of runs with a non-disambiguated version of the queries, both for word sense indexing and synset indexing, against the manually disambiguated collection (experiment 6). The synset run performs approximately as the basic SMART run. It seems therefore useless to apply conceptual inde.,dng if no disambiguation of the query is feasible. This is not a major problem in an interactive system that may help the user to disambiguate his query, but it must be taken into account if the process is not interactive and the query is too short to do reliable disambiguation.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML