File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/03/w03-1004_relat.xml

Size: 1,900 bytes

Last Modified: 2025-10-06 14:15:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1004">
  <Title>Sentence Alignment for Monolingual Comparable Corpora</Title>
  <Section position="3" start_page="0" end_page="0" type="relat">
    <SectionTitle>
2 Related Work
</SectionTitle>
    <Paragraph position="0"> Most of the work in monolingual corpus alignment is in the context of summarization. In single document summarization, alignment between full documents and summaries written by humans is used to learn rules for text compression. Marcu (1999) computes sentence similarity using a cosine-based metric. Jing (2002) identifies phrases that were cut and pasted together using a Hidden Markov Model with features incorporating word identity and positioning within sentences, thereby providing an alignment of the document and its summary. However, both of these methods construct an alignment by looking at sentences one at a time, independently of the decisions made about other sentences. Because summaries often reuse original document text to a large extent, these methods achieve good results.</Paragraph>
    <Paragraph position="1"> In the context of multidocument summarization, SimFinder (Hatzivassiloglou et al., 1999) identifies sentences that convey similar information across input documents to select the summary content. Even though the input documents are about the same subject, they exhibit a great deal of lexical variability. To address this issue, SimFinder employs a complex similarity function, combining features that extend beyond a simple word count and include noun phrase, proper noun, and WordNet sense overlap.</Paragraph>
    <Paragraph position="2"> Since many documents are processed in parallel, clustering is used to combine pairwise alignments.</Paragraph>
    <Paragraph position="3"> In contrast to our approach, SimFinder does not take the context around sentences into account.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML