File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/93/p93-1001_abstr.xml

Size: 1,097 bytes

Last Modified: 2025-10-06 13:47:54

<?xml version="1.0" standalone="yes"?>
<Paper uid="P93-1001">
  <Title>Char_align: A Program for Aligning Parallel Texts at the Character Level</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> There have been a number of recent papers on aligning parallel texts at the sentence level, e.g., Brown et al (1991), Gale and Church (to appear), Isabelle (1992), Kay and R/Ssenschein (to appear), Simard et al (1992), Warwick-Armstrong and Russell (1990). On clean inputs, such as the Canadian Hansards, these methods have been very successful (at least 96% correct by sentence). Unfortunately, if the input is noisy (due to OCR and/or unknown markup conventions), then these methods tend to break down because the noise can make it difficult to find paragraph boundaries, let alone sentences. This paper describes a new program, charalign, that aligns texts at the character level rather than at the sentence/paragraph level, based on the cognate approach proposed by Simard et al.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML