File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/06/p06-1076_abstr.xml
Size: 1,401 bytes
Last Modified: 2025-10-06 13:44:59
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-1076"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics A Comparison of Document, Sentence, and Term Event Spaces</Title> <Section position="2" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> The trend in information retrieval systems is from document to sub-document retrieval, such as sentences in a summarization system and words or phrases in question-answering system. Despite this trend, systems continue to model language at a document level using the inverse document frequency (IDF). In this paper, we compare and contrast IDF with inverse sentence frequency (ISF) and inverse term frequency (ITF). A direct comparison reveals that all language models are highly correlated; however, the average ISF and ITF values are 5.5 and 10.4 higher than IDF. All language models appeared to follow a power law distribution with a slope coefficient of</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 1.6 for documents and 1.7 for sentences </SectionTitle> <Paragraph position="0"> and terms. We conclude with an analysis of IDF stability with respect to random, journal, and section partitions of the 100,830 full-text scientific articles in our experimental corpus.</Paragraph> </Section> </Section> class="xml-element"></Paper>