File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/w96-0201_intro.xml

Size: 1,880 bytes

Last Modified: 2025-10-06 14:06:11

<?xml version="1.0" standalone="yes"?>
<Paper uid="W96-0201">
  <Title>A Geometric Approach to Mapping Bitext Correspondence</Title>
  <Section position="4" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2. Definitions
</SectionTitle>
    <Paragraph position="0"> Several key terms will help to explain SIMR. First, a bitext (Harris 1988) comprises two versions of a text, such as a text in two different languages.</Paragraph>
    <Paragraph position="1"> Translators create a bitext each time they translate a text. Second, each bitext defines a rectangular bitext space, such as Figure 1. The width  and height of the rectangle are the lengths of the two component texts, in characters. The lower left corner of the rectangle is the origin of the bitext space and represents the two texts' beginnings. The upper right corner is the terminus and represents the texts' ends. The line between the origin and the terminus is the main diagonal. The slope of the main diagonal is the bitext slope.</Paragraph>
    <Paragraph position="2"> Each bitext space contains a number of true points of correspondence (TPCs), other than the origin and the terminus. For example, if a token at position p on the x-axis and a token at position q on the y-axis are translations of each other, then the coordinate (p, q) in the bitext space is a TPC 1. TPCs also exist at corresponding boundaries of text units such as sentences, paragraphs, and sections. Groups of TPCs with a roughly linear arrangement in the bitext space are called chains.</Paragraph>
    <Paragraph position="3"> Bitext maps are bijective functions in bitext spaces. For each bitext, the true bitext map (TBM) is the shortest bitext map that runs through all the TPCs. The purpose of a bitext mapping algorithm is to produce bitext maps that are the best possible approximations of each bitext's TBM.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML