File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/95/e95-1010_evalu.xml

Size: 2,129 bytes

Last Modified: 2025-10-06 14:00:17

<?xml version="1.0" standalone="yes"?>
<Paper uid="E95-1010">
  <Title>Text Alignment in the Real World: Improving Alignments of Noisy Translations Using Common Lexical Features, String Matching Strategies and N-Gram Comparisons ~</Title>
  <Section position="8" start_page="72" end_page="73" type="evalu">
    <SectionTitle>
7 Performance
</SectionTitle>
    <Paragraph position="0"> Table 1 shows the performance of the original alignment algorithm compared to the improved algorithm.</Paragraph>
    <Paragraph position="1"> The results are for two documents. #HAND is the number of alignment blocks found in the hand aligned set. #FOUND is the number of alignment blocks found by the algorithms. Values of #FOUND lower than the value of #HAND indicates that alignment blocks that contain multiple segments have been found by the algorithm (e.g., a 3-3 match has supplanted three 1-1 matches). #CORRECT is the number of the found blocks which exactly match blocks in the hand aligned set. Note that the number of exact matches is a conservative estimate of the number of  acceptable alignments, as different translators may, for example, differ about whether a 2-2 match can take the place of two 1-1 matches and still be considered aligned.</Paragraph>
    <Paragraph position="2"> In general, the performance of the improved alignment algorithm was very good, improving the hit rates from 23% to 49% on Document 1 and from 0.00381% to 70% on Document 2. The abysmal performance of the byte-length method on Document 2 can be attributed to the massive amounts of header information, significant added whitespace and inconsistent table and list formats that occurred in one document but not the other. The algorithm encountered only 1 hit (the document start) in the first quarter of the document. The training texts for these runs were the texts themselves, and therefore the results must be reviewed with care. The statistics of just two documents, applied directly to those two documents for evaluation does not necessarily provide a direct estimate of the same statistics to a broader spectrum of documents.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML