File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/01/p01-1008_concl.xml

Size: 1,569 bytes

Last Modified: 2025-10-06 13:53:01

<?xml version="1.0" standalone="yes"?>
<Paper uid="P01-1008">
  <Title>Extracting Paraphrases from a Parallel Corpus</Title>
  <Section position="8" start_page="0" end_page="0" type="concl">
    <SectionTitle>
7 Conclusions and Future work
</SectionTitle>
    <Paragraph position="0"> In this paper, we presented a method for corpus-based identification of paraphrases from multiple English translations of the same source text.</Paragraph>
    <Paragraph position="1"> We showed that a co-training algorithm based on contextual and lexico-syntactic features of paraphrases achieves high performance on our data.</Paragraph>
    <Paragraph position="2"> The wide range of paraphrases extracted by our algorithm sheds light on the paraphrasing phenomena, which has not been studied from an empirical perspective.</Paragraph>
    <Paragraph position="3"> Future work will extend this approach to extract paraphrases from comparable corpora, such as multiple reports from different news agencies about the same event or different descriptions of a disease from the medical literature. This extension will require using a more selective alignment technique (similar to that of (Hatzivassiloglou et al., 1999)). We will also investigate a more powerful representation of contextual features. Fortunately, statistical parsers produce reliable results on news texts, and therefore can be used to improve context representation. This will allow us to extract macro-syntactic paraphrases in addition to local paraphrases which are currently produced by the algorithm.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML