File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/00/w00-0102_concl.xml

Size: 2,293 bytes

Last Modified: 2025-10-06 13:52:50

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-0102">
  <Title>Using Long Runs as Predictors of Semantic Coherence in a Partial Document Retrieval System</Title>
  <Section position="8" start_page="11" end_page="12" type="concl">
    <SectionTitle>
5. Conclusion
</SectionTitle>
    <Paragraph position="0"> This research tested three statistical hypotheses extending from two observations: (1) Jang (1997) observed the clustering of long runs of content words and established the distribution of long run lengths and short run lengths are drawn from different populations, (2) our observation that these long runs of content words originate from the prepositional phrase and subject complement positions. According to Halliday (1985) those grammar structures function as  minor predication and as such are loci of semantic intent or coherence. In order to facilitate the use of long runs as predictors, we modified the traditional measures of Boyd et al. (1994), Wendlandt (1991) to accommodate semantic categories and partial text retrieval. The revised metrics and the computational method we propose were used in the statistical experiments presented above. The main findings of this work are 1. the distribution semantic coherence (SEMCAT weights) of long runs is not statistically greater than that of short runs, 2. for paragraphs containing both long runs and short runs, the SEMCAT weight distributions are drawn from different populations 3. there is a positive correlation between the sum of long run SEMCAT weights and the total SEMCAT weight of the paragraph (its semantic coherence).</Paragraph>
    <Paragraph position="1"> Significant additional work is required to validate these preliminary results. The collection employed in Jang (1997) is not a standard Corpus so we have no way to test precision and relevance of the proposed method. The results of the proposed method are subject to the accuracy of the stop lists and filtering function.</Paragraph>
    <Paragraph position="2"> Nonetheless, we feel the approach proposed has potential to improve performance through reduced token processing and increased relevance through consideration of semantic coherence of long runs. Significantly, our approach does not require knowledge of the collection.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML