File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/93/j93-2003_intro.xml

Size: 5,233 bytes

Last Modified: 2025-10-06 14:05:28

<?xml version="1.0" standalone="yes"?>
<Paper uid="J93-2003">
  <Title>The Mathematics of Statistical Machine Translation: Parameter Estimation</Title>
  <Section position="4" start_page="265" end_page="267" type="intro">
    <SectionTitle>
3. Alignments
</SectionTitle>
    <Paragraph position="0"> We say that a pair of strings that are translations of one another form a translation, and we show this by enclosing the strings in parentheses and separating them by a vertical bar. Thus, we write the translation (Qu'aurions-nous pu faire? I What could we have done?) to show that What could we have done? is a translation of Qu'aurions-nous pu faire? When the strings end in sentences, we usually omit the final stop unless it is a question mark or an exclamation point.</Paragraph>
    <Paragraph position="1"> Brown et al. (1990) introduce the idea of an alignment between a pair of strings as an object indicating for each word in the French string that word in the English string from which it arose. Alignments are shown graphically, as in Figure 1, by drawing lines, which we call connections, from some of the English words to some of the French words. The alignment in Figure I has seven connections: (the, Le), (program, programme), and so on. Following the notation of Brown et al., we write this alignment as (Le programme a ~t~ mis en application I And the(l) program(2) has(3) been(4) implemented(5,6,7)). The list of numbers following an English word shows the positions in the French string of the words to which it is connected. Because And is not connected to any French words here, there is no list of numbers after it. We consider every alignment to be correct with some probability, and so we find (Le programme a ~t~ mis en application I And(I,2,3,4,5,6,7) the program has been implemented) perfectly acceptable. Of course, we expect it to be much less probable than the alignment shown in Figure 1.</Paragraph>
    <Paragraph position="2"> In Figure 1 each French word is connected to exactly one English word, but more general alignments are possible and may be appropriate for some translations. For example, we may have a French word connected to several English words as in Figure 2, which we write as (Le reste appartenait aux autochtones I The(l) balance(2) was(3) the(3) territory(3) of(4) the(4) aboriginal(5) people(5)). More generally still, we may have several French words connected to several English words as in Figure 3, which we write as (Les pauvres sont d~munis I The(l) poor(2) don't(3,4) have(3,4) any(3,4) money(3,4)). Here, the four English words don't have any money work together to generate the two French words sont d~munis.</Paragraph>
    <Paragraph position="3"> In a figurative sense, an English passage is a web of concepts woven together according to the rules of English grammar. When we look at a passage, we cannot see the concepts directly but only the words that they leave behind. To show that these words are related to a concept but are not quite the whole story, we say that they form a cept. Some of the words in a passage may participate in more than one cept, while  Computational Linguistics Volume 19, Number 2 others may participate in none, serving only as a sort of syntactic glue to bind the whole together. When a passage is translated into French, each of its cepts contributes some French words to the translation. We formalize this use of the term cept and relate it to the idea of an alignment as follows.</Paragraph>
    <Paragraph position="4"> We call the set of English words connected to a French word in a particular alignment the cept that generates the French word. Thus, an alignment resolves an English string into a set of possibly overlapping cepts that we call the ceptual scheme of the English string with respect to the alignment. The alignment in Figure 3 contains the three cepts The, poor, and don't have any money. When one or more of the French words is connected to no English words, we say that the ceptual scheme includes the empty cept and that each of these words has been generated by this empty cept.</Paragraph>
    <Paragraph position="5"> Formally, a cept is a subset of the positions in the English string together with the words occupying those positions. When we write the words that make up a cept, we sometimes affix a subscript to each one showing its position. The alignment in Figure 2 includes the cepts the~ and of 6 the7, but not the cepts of 6 the1 or the7. In (J'applaudis ?l la ddcision \] I(1) applaud(2) the(4) decision(5)), ?l is generated by the empty cept. Although the empty cept has no position, we place it by convention in position zero, and write it as e0. Thus, we may also write the previous alignment as (J'applaudis ?~ la d~cision</Paragraph>
    <Paragraph position="7"> We denote the set of alignments of if\[e) by .A(e, f). If e has length I and f has length m, there are Im different connections that can be drawn between them because each of the m French words can be connected to any of the I English words. Since an alignment is determined by the connections that it contains, and since a subset of the possible connections can be chosen in 2 lm ways, there are 2 zm alignments in .A(e, f).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML