File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/95/e95-1010_abstr.xml

Size: 1,366 bytes

Last Modified: 2025-10-06 13:48:23

<?xml version="1.0" standalone="yes"?>
<Paper uid="E95-1010">
  <Title>Text Alignment in the Real World: Improving Alignments of Noisy Translations Using Common Lexical Features, String Matching Strategies and N-Gram Comparisons ~</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> Alignment methods based on byte-length comparisons of alignment blocks have been remarkably successful for aligning good translations from legislative transcriptions. For noisy translations in which the parallel text of a document has significant structural differences, byte-alignment methods often do not perform well. The Pan American Health Organization (PAHO) corpus is a series of articles that were first translated by machine methods and then improved by professional translators. Many of the Spanish PAHO texts do not share formatting conventions with the corresponding English documents, refer to tables in stylistically different ways and contain extraneous information. A method based on a dynamic programming framework, but using a decision criterion derived from a combination of byte-length ratio measures, hard matching of numbers, string comparisons and n-gram co-occurrence matching substantially improves the performance of the alignment process.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML