File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/p06-2069_concl.xml

Size: 1,394 bytes

Last Modified: 2025-10-06 13:55:25

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2069">
  <Title>Examining the Content Load of Part of Speech Blocks for Information Retrieval</Title>
  <Section position="7" start_page="537" end_page="537" type="concl">
    <SectionTitle>
5 Conclusion
</SectionTitle>
    <Paragraph position="0"> We described a block-based part of speech (POS) modeling of language distribution, induced from a corpus, and statistically smoothened using two different estimators. We hypothesised that high-frequency POS blocks bear more content than low-frequency POS blocks. Also, we hypothesised that the more closed class components a POS block contains, the less content it bears. We evaluated both hypotheses in the context of Information Retrieval, across two standard test collections, and five statistically different term weighting schemes. Our hypotheses led to a general improvement in retrieval performance. This improvement was overall higher for the smaller of the two collections, indicating that data sparseness may have an effect on retrieval. The use of query expansion worked well with our hypotheses, by helping weaker weighting schemes to benefit more from the reduction of noise in the queries.</Paragraph>
    <Paragraph position="1"> In the future, we wish to investigate varying the size a0 of POS blocks, as well as testing our hypotheses on shorter queries.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML