XML Viewer - h05-1051

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/h05-1051_evalu.xml
Size: 8,933 bytes
Last Modified: 2025-10-06 13:59:20
<?xml version="1.0" standalone="yes"?>
<Paper uid="H05-1051">
  <Title>Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 403-410, Vancouver, October 2005. c(c)2005 Association for Computational Linguistics Differentiating Homonymy and Polysemy in Information Retrieval</Title>
  <Section position="6" start_page="406" end_page="407" type="evalu">
    <SectionTitle>
5 Results
</SectionTitle>
    <Paragraph position="0"> The first set of results (section 5.1) addresses the question of granularity by quantifying the impact that adding either additional homonymy or polysemy has on retrieval effectiveness. The second set of results (section 5.2) looks at the question of disambiguation accuracy by simulating the impact that varied levels of accuracy have on retrieval effectiveness.</Paragraph>
    <Section position="1" start_page="406" end_page="407" type="sub_section">
      <SectionTitle>
5.1 Homonymy vs. Polysemy
</SectionTitle>
      <Paragraph position="0"> Let us first consider the impact of adding additional homonymy. Figure 3 graphs precision across the 11 standard points of recall for retrieval from both the baseline collection and one where additional homonymy has been added. Note that the introduction of additional homonymy brings about a small drop in retrieval effectiveness. With regard to the single value measures contained in table 1, this is a decrease of 2.5% in terms of absolute R-Precision (average precision after the total number of known relevant documents in the collection has been retrieved). This is a relative decrease of 14.3%. Similar drops in both precision@10 (precision after the first 10 documents retrieved) and average precision are also seen.</Paragraph>
      <Paragraph position="1"> Next let us consider retrieval effectiveness over the root pseudoword collection where additional polysemy has been added (figure 4). Note that the introduction of additional polysemy has a more substantive impact upon retrieval effectiveness. In terms of R-Precision this decrease is 5.3% in absolute terms, a relative decrease of 30% compared to baseline retrieval from the unmodified collection.</Paragraph>
      <Paragraph position="2"> In addition, an even larger decrease in precision@10 occurs where the introduction of additional polysemy brings about a 7% drop in retrieval effectiveness.</Paragraph>
      <Paragraph position="3"> In terms of the relative effects of homonymy and polysemy on retrieval effectiveness then note that adding additional polysemy has over double the impact of adding homonymy. This provides a clear indication that the retrieval process is more substantially affected by polysemy than homonymy. null</Paragraph>
    </Section>
    <Section position="2" start_page="407" end_page="407" type="sub_section">
      <SectionTitle>
5.2 The Impact of Disambiguation
</SectionTitle>
      <Paragraph position="0"> We now address the second part of the research question: to what accuracy should disambiguation be performed in order to enhance retrieval effectiveness? Figure 5 plots the impact, in terms of R-Precision, of performing disambiguation to varying degrees of accuracy after additional homonymy has been added to the collection. The dotted line represents the breakeven point, with R-Precision below this line indicating reduced performance as a result of disambiguation. Results show that where additional homonymy has been added to the collection disambiguation accuracy at or above 76% is required in order for disambiguation to be of benefit. Performing disambiguation which is less than 76% accurate leads to lower performance than if the additional homonymy had been left unresolved. null Moving on to consider the root pseudoword collection (figure 6) note that the breakeven point is only 55% where additional polysemy has been added. Consider that the results in section 5.1 showed that the introduction of additional polysemy had over double the impact of introducing additional homonymy. This is reflected in the relative effects of disambiguation in that the break-even point is considerably lower for polysemy than homonymy.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="407" end_page="408" type="evalu">
    <SectionTitle>
6 Discussion
</SectionTitle>
    <Paragraph position="0"> The results in section 5.1 show that retrieval effectiveness is more sensitive to polysemy than homonymy. One explanation for this can be  of Recall for the Baseline and the Collection Con-</Paragraph>
    <Section position="1" start_page="407" end_page="408" type="sub_section">
      <SectionTitle>
taining Additional Polysemy
</SectionTitle>
      <Paragraph position="0"> tiveness after the Addition of Polysemy (Note the dashed line is the breakeven point)  hypothesized from previous studies (Krovetz and Croft, 1992; Sanderson and Van Rijsbergen, 1999) which highlight the importance of co-occurrence between query words. Where two (or more) words appear together in a query, statistical retrieval inherently performs some element of disambiguation. However, in the case of a word with many closely related senses, co-occurrence between query words may not be sufficient for a given sense to become apparent. This is particularly exasperated in Web retrieval given that the average query length in these experiments was 2.9 words.</Paragraph>
      <Paragraph position="1"> Clearly, the inherent disambiguation performed by statistical IR techniques is sensitive to polysemy in the same way as systems which explicitly perform disambiguation.</Paragraph>
      <Paragraph position="2"> With regard to disambiguation accuracy and IR (section 5.2) these experiments establish that performance gains begin to occur where disambiguation is between 55% and 76%. Where within this range the actual breakeven point lies is dependent on the granularity of the disambiguation and the balance between polysemy and homonymy in a given collection. Consider that coarse-grained disambiguation is frequently advocated on the basis that it can be performed more accurately. Whilst this is undoubtedly true these results suggest that homonymy has to be resolved to a much higher level of accuracy than polysemy in order to be of benefit in IR.</Paragraph>
      <Paragraph position="3"> It would seem prudent to consider the results of this study in relation to the state-of-the-art in disambiguation. At Senseval-3 (Mihalcea et al., 2004) the top systems were considered to have reached a ceiling, in terms of performance, at 72% for fine-grained disambiguation and 80% for coarsegrained. When producing the English language test collections the rate of agreement between humans performing manual disambiguation was approximately 74%. This suggests that machine disambiguation has reached levels comparable to the performance of humans. In parallel with this the IR community has begun to report increased retrieval effectiveness through explicitly performing disambiguation to varying levels of granularity.</Paragraph>
      <Paragraph position="4"> A final point of discussion is the way in which we simulate disambiguation both in this study and those previously (Sanderson, 1994; Gonzalo et al., 1998). There is growing evidence (Leacock et al., 1998; Agirre and Martinez, 2004) to suggest that simulating uniform rates of accuracy and error across both words and senses may not reflect the performance of modern disambiguation systems.</Paragraph>
      <Paragraph position="5"> Supervised approaches are known to exhibit inherent bias that exists in their training data. Examples include Zipf's law (Zipf, 1949) which denotes that a small number of words make up a large percentage of word use and Krovetz and Croft's (1992) observation that one sense of a word accounts for the majority of all use. It would seem logical to presume that supervised systems show their best performance over the most frequent senses of the most frequent words in their training data. Not enough is known about the potential impact of these biases to allow for them to be incorporated into this simulation. Still, it should be noted that Stokoe et al. (2003) utilized frequency statistics in their disambiguator and that a by-product of Schutze and Pederson's (1992) approach was that they eliminated infrequently observed senses.</Paragraph>
      <Paragraph position="6"> There is supporting evidence from Sanderson and Van Rijsbergen (1999) to suggest that accounting for this frequency bias is in some way advantageous. Therefore, it is worth considering that simulating a uniform accuracy and error rate across all words and senses might actually offer a pessimistic picture of the potential for disambiguation and IR.</Paragraph>
      <Paragraph position="7"> Whilst this merits further study, the focus of this research was contrasting the relative effects of two types of ambiguity and both models were subject to the same uniform disambiguation.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML