File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-0701_intro.xml
Size: 2,540 bytes
Last Modified: 2025-10-06 14:03:55
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-0701"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Dimensionality Reduction Aids Term Co-Occurrence Based Multi-Document Summarization</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> We present experiments on the task of query-oriented multi-document summarisation as explored in the DUC 2005 and DUC 2006 shared tasks, which aim to model real-world complex question-answering. Input consists of a detailed query1 and a set of 25 to 50 relevant documents. We implement an extractive approach where pieces of the original texts are selected to form a summary and then smoothing is performed to create a discursively coherent summary text.</Paragraph> <Paragraph position="1"> The key modelling task in the extraction phase of such a system consists of estimating responsiveness to the query and avoiding redundancy. Both of these are often approached through some textual measure of semantic similarity. In the Embra2 system, we follow this approach in a sentence extraction framework. However, we model the semantics of a sentence using a very large distributional semantics (i.e. term co-occurrence) space A number of papers in the literature look at singular value decomposition and compare it to unreduced term x document or term co-occurrence matrix representations. These explore varied tasks and obtain mixed results. For example, Pedersen et al. (2005) find that SVD does not improve performance in a name discrimination task while Matveeva et al. (2005) and Rohde et al. (In prep) find that dimensionality reduction with SVD does help on word similarity tasks.</Paragraph> <Paragraph position="2"> The experiments contained herein investigate the contribution of singular value decomposition on the query-oriented multi-document summarisation task. We compare the singular value decomposition of a term co-occurrence matrix derived from a corpus of approximately 100 million words (DS+SVD) to an unreduced version of the matrix (DS). These representations are described in Section 2. Next, Section 3 contains a discussion of related work using SVD for summarisation and a description of the sentence selection component in the Embra system. The paper goes on to give an overview of the experimental design and results in Section 4. This includes a detailed analysis of the statistical significance of the results.</Paragraph> </Section> class="xml-element"></Paper>