File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/w05-0704_evalu.xml
Size: 4,207 bytes
Last Modified: 2025-10-06 13:59:33
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-0704"> <Title>Examining the Effect of Improved Context Sensitive Morphology on Arabic Information Retrieval</Title> <Section position="6" start_page="26" end_page="31" type="evalu"> <SectionTitle> 4 Results and Discussion </SectionTitle> <Paragraph position="0"> Figure 1 shows a summary of the results for different index terms. Tables 1 and 2 show statistical significance between different index terms using the p value of the Wilcoxon test.</Paragraph> <Paragraph position="1"> When comparing index terms obtained using IBM-LM and Sebawai, the results clearly show that using better morphological analysis produces better retrieval effectiveness. The dramatic difference in retrieval effectiveness between Sebawai and IBM-LM highlight the effect of errors in morphology that lead to inconsistency in analysis. When using contextual information in analysis (compared to analyzing words in isolation - out of context) resulted in only a 3% increase in mean average precision when using stems (IBM-LMS), which is a small difference compared to the effect of blind relevance feedback (about 6% increase) and produced mixed results when using roots (IBM-SEB-r). Nonetheless, the improvement for stems was almost statistically significant with p values of 0.063 and 0.054 for the cases with and without blind relevance feedback. Also considering that improvement in retrieval effectiveness resulted from changing the analysis for only 0.12% of the words in the collection (from analyzing them out of context to analyzing them in context) and that the authors of IBM-LM report about 2.9% error rate in morphology, perhaps further improvement in morphology may lead to further improvement in retrieval effectiveness. However, further improvements in morphology and retrieval effectiveness are likely to be difficult. One of difficulties associated with developing better morphology is the disagreement on what constitutes &quot;better&quot; morphology. For example, should &quot;mktb&quot; and &quot;ktb&quot; be conflated? &quot;mktb&quot; translates to office, while ktb translates to books. Both words share the common root &quot;ktb,&quot; but they are not interchangeable in meaning or usage. One Approximately 7% of unique tokens had two or more different analysis in the collection when doing in-context morphology. In tokens with more than one analysis, one of the analyses was typically used more than 98% of the time. would expect that increasing conflation would improve recall at the expense of precision and decreasing conflation would have the exact opposite effect. It is known that IR is more tolerant of over-conflation than under-conflation [18]. This fact is apparent in the results when comparing roots and stems. Even though roots result in greater conflation than stems, the results for stems and roots are almost the same. Another property of IR is that IR is sensitive to consistency of analysis. In the case of light stemming, stemming often mistakenly removes prefixes and suffixes leading to over conflation, for which IR is tolerant, but the mistakes are done in a consistent manner. It is noteworthy that sense disambiguation has been reported to decrease retrieval effectiveness [18]. However, since improving the correctness of morphological analysis using contextual information is akin to sense disambiguation, the fact that retrieval results improved, though slightly, using context sensitive morphology is a significant result.</Paragraph> <Paragraph position="2"> In comparing the IBM-LM analyzer (in context or out of context) to light stemming (using Al-Stem), although the difference in retrieval effectiveness is small and not statistically significant, using the IBM-LM analyzer, unlike using Al-Stem, leads to statistically significant improvement over using words. Therefore there is some advantage, though only a small one, to using statistical analysis over using light stemming. The major drawback to morphological analysis (specially in-context analysis) is that it requires considerably more computing time than light stemming .</Paragraph> </Section> class="xml-element"></Paper>