File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/n03-1024_intro.xml

Size: 942 bytes

Last Modified: 2025-10-06 14:01:45

<?xml version="1.0" standalone="yes"?>
<Paper uid="N03-1024">
  <Title>Syntax-based Alignment of Multiple Translations: Extracting Paraphrases and Generating New Sentences</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Data
</SectionTitle>
    <Paragraph position="0"> The data we use in this work is the LDC-available Multiple-Translation Chinese (MTC) Corpus1 developed for machine translation evaluation, which contains 105 news stories (993 sentences) from three sources of journalistic Mandarin Chinese text. These stories were independently translated into English by 11 translation agencies. Each sentence group, which consists of 11 semantically equivalent translations, is a rich source for learning lexical and structural paraphrases. In our experiments, we use 899 of the sentence groups -- the sentence groups with sentences longer than 45 words were dropped.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML