XML Viewer - p03-1011

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/p03-1011_metho.xml
Size: 15,241 bytes
Last Modified: 2025-10-06 14:08:14
<?xml version="1.0" standalone="yes"?>
<Paper uid="P03-1011">
  <Title>Loosely Tree-Based Alignment for Machine Translation</Title>
  <Section position="3" start_page="0" end_page="4" type="metho">
    <SectionTitle>
2 The Tree-to-String Model
</SectionTitle>
    <Paragraph position="0"> We begin by summarizing the model of Yamada and Knight (2001), which can be thought of as representing translation as an Alexander Calder mobile. If we follow the process of an English sentence's transformation into French, the English sentence is first given a syntactic tree representation by a statistical parser (Collins, 1999). As the first step in the translation process, the children of each node in the tree can be re-ordered.</Paragraph>
    <Paragraph position="1"> For any node with m children, m! re-orderings are possible, each of which is assigned a probability</Paragraph>
    <Paragraph position="3"> conditioned on the syntactic categories of the parent node and its children. As the second step, French words can be inserted at each node of the parse tree. Insertions are modeled in two steps, the first predicting whether an insertion to the left, an insertion to the right, or no insertion takes place with probability P ins , conditioned on the syntactic category of the node and that of its parent. The second step is the choice of the inserted</Paragraph>
    <Paragraph position="5"> (fjNULL), which is predicted without any conditioning information. The final step, a French translation of each original English word, at the leaves of the tree, is chosen according to a distribution P t (fje). The French word is predicted conditioned only on the English word, and each English word can generate at most one French word, or can generate a NULL symbol, representing deletion. Given the original tree, the re-ordering, insertion, and translation probabilities at each node are independent of the choices at any other node. These independence relations are analogous to those of a stochastic context-free grammar, and allow for efficient parameter estimation by an inside-outside Expectation Maximization (EM) algorithm. The computation of inside probabilities , outlined below, considers possible reordering of nodes in the original tree in a bottom-up manner: for all nodes &amp;quot; i in input tree T do for all k;l such that 1 &lt;k&lt;l&lt;Ndo for all orderings of the children &amp;quot;</Paragraph>
    <Paragraph position="7"> for all partitions of span k;l into k</Paragraph>
    <Paragraph position="9"> This algorithm has computational complexity</Paragraph>
    <Paragraph position="11"> ), where m is the maximum number of children of any node in the input tree T, and N the length of the input string. By storing partially completed arcs in the chart and interleaving the inner two loops, complexity of O(jTjn</Paragraph>
    <Paragraph position="13"> achieved. Thus, while the algorithm is exponential in m, the fan-out of the grammar, it is polynomial in the size of the input string. Assuming jTj = O(n), the algorithm is O(n  ).</Paragraph>
    <Paragraph position="14"> The model's efficiency, however, comes at a cost. Not only are many independence assumptions made, but many alignments between source and target sentences simply cannot be represented. As a minimal example, take the tree: A</Paragraph>
    <Paragraph position="16"> Of the six possible re-orderings of the three terminals, the two which would involve crossing the bracketing of the original tree (XZY and YZX) are not allowed. While this constraint gives us a way of using syntactic information in translation, it may in many cases be too rigid. In part to deal with this problem, Yamada and Knight (2001) flatten the trees in a pre-processing step by collapsing nodes with the same lexical head-word. This allows, for example, an English subject-verb-object (SVO) structure, which is analyzed as having a VP node spanning the verb and object, to be re-ordered as VSO in a language such as Arabic. Larger syntactic divergences between the two trees may require further relaxation of this constraint, and in practice we expect such divergences to be frequent. For example, a nominal modifier in one language may show up as an adverbial in the other, or, due to choices such as which information is represented by a main verb, the syntactic correspondence between the two  cloning (indicated by the arrow), and word translation. After the insertion operation (not shown), the tree's English yield is: How many pairs of gloves is each of you issued in winter? sentences may break down completely.</Paragraph>
    <Section position="1" start_page="4" end_page="4" type="sub_section">
      <SectionTitle>
2.1 Tree-to-String Clone Operation
</SectionTitle>
      <Paragraph position="0"> In order to provide some flexibility, we modify the model in order to allow for a copy of a (translated) subtree from the English sentences to occur, with some cost, at any point in the resulting French sentence. For example, in the case of the input tree</Paragraph>
      <Paragraph position="2"> This operation, combined with the deletion of the original node Z, produces the alignment (XZY) that was disallowed by the original tree reordering model. Figure 1 shows an example from our Korean-English corpus where the clone operation allows the model to handle a case of wh-movement in the English sentence that could not be realized by any reordering of subtrees of the Korean parse.</Paragraph>
      <Paragraph position="3"> The probability of adding a clone of original node</Paragraph>
      <Paragraph position="5"> as a child of node &amp;quot; j is calculated in two steps: first, the choice of whether to insert a clone under</Paragraph>
      <Paragraph position="7"> ), and the choice of which original node to copy, with probability</Paragraph>
      <Paragraph position="9"> is the probability of an original node producing a copy. In our implementation, for simplicity, P ins (clone) is a single number, estimated by the EM algorithm but not conditioned on the parent node &amp;quot; j , and P makeclone is a constant, meaning that the node to be copied is chosen from all the nodes in the original tree with uniform probability.  It is important to note that P makeclone is not dependent on whether a clone of the node in question has already been made, and thus a node may be &amp;quot;reused&amp;quot; any number of times. This independence assumption is crucial to the computational tractability of the algorithm, as the model can be estimated using the dynamic programming method above, keeping counts for the expected number of times each node has been cloned, at no increase in computational complexity. Without such an assumption, the parameter estimation becomes a problem of parsing with crossing dependencies, which is exponential in the length of the input string (Barton, 1985).</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="4" end_page="4" type="metho">
    <SectionTitle>
3 The Tree-to-Tree Model
</SectionTitle>
    <Paragraph position="0"> The tree-to-tree alignment model has tree transformation operations similar to those of the tree-to-string model described above. However, the transformed tree must not only match the surface string of the target language, but also the tree structure assigned to the string by the treebank annotators. In order to provide enough flexibility to make this possible, additional tree transformation operations allow a single node in the source tree to produce two nodes in the target tree, or two nodes in the source tree to be grouped together and produce a single node in the target tree. The model can be thought of as a synchronous tree substitution grammar, with probabilities parameterized to generate the target tree conditioned on the structure of the source tree.</Paragraph>
    <Paragraph position="1">  is modeled in a sequence of steps proceeding from the root of the target tree down. At each level of the tree: 1. At most one of the current node's children is  grouped with the current node in a single elementary tree, with probability P  operation is similar to the re-order operation in the tree-to-string model, with the extension that 1) the alignment can include insertions and deletions of individual children, as nodes in either the source or target may not correspond to anything on the other side, and 2) in the case where two nodes have been grouped into t a , their children are re-ordered together in one step.</Paragraph>
    <Paragraph position="2"> In the final step of the process, as in the tree-to-string model, lexical items at the leaves of the tree are translated into the target language according to a distribution P</Paragraph>
    <Paragraph position="4"> Allowing non-1-to-1 correspondences between nodes in the two trees is necessary to handle the fact that the depth of corresponding words in the two trees often differs. A further consequence of allowing elementary trees of size one or two is that some reorderings not allowed when reordering the children of each individual node separately are now possible. For example, with our simple tree  jA ) BZ), their collective children will be reordered with probability</Paragraph>
    <Paragraph position="6"> giving the desired word ordering XZY. However, computational complexity as well as data sparsity prevent us from considering arbitrarily large elementary trees, and the number of nodes considered at once still limits the possible alignments. For example, with our maximum of two nodes, no transformation of the tree A</Paragraph>
    <Paragraph position="8"> is capable of generating the alignment WYXZ.</Paragraph>
    <Paragraph position="9"> In order to generate the complete target tree, one more step is necessary to choose the structure on the target side, specifically whether the elementary tree has one or two nodes, what labels the nodes have, and, if there are two nodes, whether each child attaches to the first or the second. Because we are ultimately interested in predicting the correct target string, regardless of its structure, we do not assign probabilities to these steps. The nonterminals on the target side are ignored entirely, and while the alignment algorithm considers possible pairs of nodes as elementary trees on the target side during training, the generative probability model should be thought of as only generating single nodes on the target side.</Paragraph>
    <Paragraph position="10"> Thus, the alignment algorithm is constrained by the bracketing on the target side, but does not generate the entire target tree structure.</Paragraph>
    <Paragraph position="11"> While the probability model for tree transformation operates from the top of the tree down, probability estimation for aligning two trees takes place by iterating through pairs of nodes from each tree in bottom-up order, as sketched below: for all nodes &amp;quot;</Paragraph>
    <Paragraph position="13"> The outer two loops, iterating over nodes in each tree, require O(jTj  ). Because we restrict our elementary trees to include at most one child of the root node on either side, choosing elementary trees for a node pair is O(m  ), where m refers to the maximum number of children of a node. Computing the alignment between the 2m children of the elementary tree on either side requires choosing which sub-set of source nodes to delete, O(2  ), and how to reorder the remaining nodes from source to target tree, O((2m)!). Thus overall complexity of the algo- null (2m)!), quadratic in the size of the input sentences, but exponential in the fan-out of the grammar.</Paragraph>
    <Section position="1" start_page="4" end_page="4" type="sub_section">
      <SectionTitle>
3.1 Tree-to-Tree Clone Operation
</SectionTitle>
      <Paragraph position="0"> Allowing m-to-n matching of up to two nodes on either side of the parallel treebank allows for limited non-isomorphism between the trees, as in HajiVc et al. (2002). However, even given this flexibility, requiring alignments to match two input trees rather than one often makes tree-to-tree alignment more constrained than tree-to-string alignment. For example, even alignments with no change in word order may not be possible if the structures of the two trees are radically mismatched. This leads us to think it may be helpful to allow departures from  the constraints of the parallel bracketing, if it can be done in without dramatically increasing computational complexity.</Paragraph>
      <Paragraph position="1"> For this reason, we introduce a clone operation, which allows a copy of a node from the source tree to be made anywhere in the target tree. After the clone operation takes place, the transformation of source into target tree takes place using the tree decomposition and subtree alignment operations as before. The basic algorithm of the previous section remains unchanged, with the exception that the alignments between children of two elementary trees can now include cloned, as well as inserted, nodes on the target side. Given that specifies a new cloned node as a child of &amp;quot; j , the choice of which node to clone is made as in the tree-to-string model:</Paragraph>
      <Paragraph position="3"> Because a node from the source tree is cloned with equal probability regardless of whether it has already been &amp;quot;used&amp;quot; or not, the probability of a clone operation can be computed under the same dynamic programming assumptions as the basic tree-to-tree model. As with the tree-to-string cloning operation, this independence assumption is essential to keep the complexity polynomial in the size of the input sentences.</Paragraph>
      <Paragraph position="4"> For reference, the parameterization of all four models is summarized in Table 1.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="4" end_page="4" type="metho">
    <SectionTitle>
4 Data
</SectionTitle>
    <Paragraph position="0"> For our experiments, we used a parallel Korean-English corpus from the military domain (Han et al., 2001). Syntactic trees have been annotated by hand for both the Korean and English sentences; in this paper we will be using only the Korean trees, modeling their transformation into the English text. The corpus contains 5083 sentences, of which we used 4982 as training data, holding out 101 sentences for evaluation. The average Korean sentence length was 13 words. Korean is an agglutinative language, and words often contain sequences of meaning-bearing suffixes. For the purposes of our model, we represented the syntax trees using a fairly aggressive tokenization, breaking multimorphemic words into separate leaves of the tree. This gave an average of 21 tokens for the Korean sentences. The average English sentence length was 16. The maximum number of children of a node in the Korean trees was 23 (this corresponds to a comma-separated list of items). 77% of the Korean trees had no more than four children at any node, 92% had no more than five children, and 96% no more than six children. The vocabulary size (number of unique types) was 4700 words in English, and 3279 in Korean -before splitting multi-morphemic words, the Korean vocabulary size was 10059. For reasons of computation speed, trees with more than 5 children were excluded from the experiments described below.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML