File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/n06-3007_concl.xml

Size: 1,140 bytes

Last Modified: 2025-10-06 13:55:15

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-3007">
  <Title>Document Representation and Multilevel Measures of Document Similarity</Title>
  <Section position="5" start_page="237" end_page="237" type="concl">
    <SectionTitle>
4 Conclusion
</SectionTitle>
    <Paragraph position="0"> We developed the GLSA framework for computing semantically motivated term and document vectors. This framework takes advantage of the availability of large document collections and recent research of corpus-based term similarity measures and combines them with dimensionality reduction algorithms. null Different measures of similarity may be required for different groups of terms such as content bearing vocabulary words and named entities. To extend the GLSA approach to computing the document vectors, we use a combination of similarity measures between terms to model the document similarity. This approach defines a fine-grained similarity measure between documents and sentences. Our goal is to develop a multilevel measure of document similarity that will be helpful for summarization and information extraction.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML