File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/j97-3002_metho.xml

Size: 19,034 bytes

Last Modified: 2025-10-06 14:14:31

<?xml version="1.0" standalone="yes"?>
<Paper uid="J97-3002">
  <Title>Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora</Title>
  <Section position="5" start_page="379" end_page="381" type="metho">
    <SectionTitle>
1 The expressiveness of simple transduction grammars is equivalent to nondeterministic pushdown
</SectionTitle>
    <Paragraph position="0"> transducers (Savitch 1982). 2 Also keep in mind that ITGs turn out to be especially suited for bilingual parsing applications, whereas pushdown transducers and syntax-directed transduction grammars are designed for monolingual parsing (in tandem with generation).</Paragraph>
    <Paragraph position="2"> Alternatively, a graphical parse tree notation is shown in Figure 2, where the (/ level of bracketing is indicated by a horizontal line. The English is read in the usual depth-first left-to-right order, but for the Chinese, a horizontal line means the right subtree is traversed before the left.</Paragraph>
    <Paragraph position="3"> Parsing, in the case of an ITG, means building matched constituents for input sentence-pairs rather than sentences. This means that the adjacency constraints given by the nested levels must be obeyed in the bracketings of both languages. The result of the parse yields labeled bracketings for both sentences, as well as a bracket alignment indicating the parallel constituents between the sentences. The constituent alignment includes a word alignment as a by-product.</Paragraph>
    <Paragraph position="4"> The nonterminals may not always look like those of an ordinary CFG. Clearly, the nonterminals of an ITG must be chosen in a somewhat different manner than for a monolingual grammar, since they must simultaneously account for syntactic patterns of both languages. One might even decide to choose nonterminals for an ITG that do not match linguistic categories, sacrificing this to the goal of ensuring that all corresponding substrings can be aligned.</Paragraph>
    <Paragraph position="5"> An ITG can accommodate a wider range of ordering variation between the lan- null Computational Linguistics Volume 23, Number 3 Where is the Secretary of Finance when needed ?</Paragraph>
    <Paragraph position="7"> An extremely distorted alignment that can be accommodated by an ITG.</Paragraph>
    <Paragraph position="8"> guages than might appear at first blush, through appropriate decomposition of productions (and thus constituents), in conjuction with introduction of new auxiliary non-terminals where needed. For instance, even messy alignments such as that in Figure 3 can be handled by interleaving orientations:</Paragraph>
    <Paragraph position="10"> This bracketing is of course linguistically implausible, so whether such parses are acceptable depends on one's objective. Moreover, it may even remain possible to align constituents for phenomena whose underlying structure is not context-free--say, ellipsis or coordination--as long as the surface structures of the two languages fortuitously parallel each other (though again the bracketing would be linguistically implausible).</Paragraph>
    <Paragraph position="11"> We will return to the subject of ITGs' ordering flexibility in Section 4.</Paragraph>
    <Paragraph position="12"> We stress again that the primary purpose of ITGs is to maximize robustness for parallel corpus analysis rather than to verify grammaticality, and therefore writing grammars is made much easier since the grammars can be minimal and very leaky.</Paragraph>
    <Paragraph position="13"> We consider elsewhere an extreme special case of leaky ITGs, inversion-invariant transduction grammars, in which all productions occur with both orientations (Wu 1995). As the applications below demonstrate, the bilingual lexical constraints carry greater importance than the tightness of the grammar.</Paragraph>
    <Paragraph position="14"> Formally, an inversion transduction grammar, or ITG, is denoted by G = (N, W1,W2,TC/,S), where dV is a finite set of nonterminals, W1 is a finite set of words (terminals) of language 1, }4;2 is a finite set of words (terminals) of language 2, TC/ is a finite set of rewrite rules (productions), and S E A/&amp;quot; is the start symbol. The space of word-pairs (terminal-pairs) X = (W1 U {c}) x (W2 U {c}) contains lexical translations denoted x/y and singletons denoted x/C/ or C//y, where x E W1 and y E W2. Each production is either of straight orientation written A --~ \[ala2 ... ar\], or of inverted orientation written A ~ (ala2.. * ar), where ai E A/&amp;quot; U X and r is the rank of the production. The set of transductions generated by G is denoted T(G). The sets of (monolingual) strings generated by G for the first and second output languages are denoted LffG) and L2(G), respectively.</Paragraph>
  </Section>
  <Section position="6" start_page="381" end_page="383" type="metho">
    <SectionTitle>
3. A Normal Form for Inversion Transduction Grammars
</SectionTitle>
    <Paragraph position="0"> We now show that every ITG can be expressed as an equivalent ITG in a 2-normal form that simplifies algorithms and analyses on ITGs. In particular, the parsing algorithm of the next section operates on ITGs in normal form. The availability of a 2-normal</Paragraph>
    <Section position="1" start_page="382" end_page="383" type="sub_section">
      <SectionTitle>
Wu Bilingual Parsing
</SectionTitle>
      <Paragraph position="0"> form is a noteworthy characteristic of ITGs; no such normal form is available for unrestricted context-free (syntax-directed) transduction grammars (Aho and Ullman 1969b). The proof closely follows that for standard CFGs, and the proofs of the lemmas are omitted.</Paragraph>
      <Paragraph position="1">  inversion transduction grammar G, there exists an equivalent inversion transgrammar G' where T(G) = T(G'), such that the right-hand side of any proof G t contains either a single terminal-pair or a list of nonterminals.  types. The remaining two types are transformed as follows: For each production of the form A --~ \[B1... Bn\] we introduce new nonterminals X1... X,_2 in order to replace the production with the set of rules A --* \[B1X1\],X1 ---+ \[B2X2\] ..... Xn-3 --+ \[Bn-2Xn-a\],Xn-2 ---+ \[Bn-IB,\]. Let (e,c) be any string-pair derivable from A ~ \[B1.&amp;quot; Bn\], where e is output on stream 1 and c on stream 2. Define e i as the substring of e derived from Bi, and similarly define c i. Then Xi generates (e i+1.. .en, c i+1 ...C n) for all 1 ~ i &lt; n - 1, so the new production A --+ \[BIX1\] also generates (e, c). No additional string-pairs are generated due to the new productions (since each Xi is only reachable from Xi-1 and X1 is only reachable from A).</Paragraph>
      <Paragraph position="2"> For each production of the form A -~ (B1 ... Bn) we replace the production with the set of rules A ~ ( B1Y1) , Y1 --~ ( B2 Y2) , . . . , Yn- 3 ---+ ( Bn- R Yn- 2), Yn- 2 --~ ( Bn- I Bn). Let (e, c) be any string-pair derivable from A ~ (B1 ''. Bn), where e is output on stream 1 and c on stream 2. Again define e i and c i as the substrings derived from Bi, but in this case (e, c) = (e 1 * * * e &amp;quot;, c&amp;quot; * * * c 1 ). Then Yi generates (e i+1 * * * e n, c n * * * c i+1 ) for all  Computational Linguistics Volume 23, Number 3 1 _~ i &lt; n - 1, so the new production A --* (B1Y1) also generates (e,c). Again, no additional string-pairs are generated due to the new productions. \[\] Henceforth all transduction grammars will be assumed to be in normal form.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="383" end_page="385" type="metho">
    <SectionTitle>
4. Expressiveness Characteristics
</SectionTitle>
    <Paragraph position="0"> We now turn to the expressiveness desiderata for a matching formalism. It is of course difficult to make precise claims as to what characteristics are necessary and/or sufficient for such a model, since no cognitive studies that are directly pertinent to bilingual constituent alignment are available. Nonetheless, most related previous parallel corpus analysis models share certain conceptual approaches with ours, loosely based on cross-linguistic theories related to constituency, case frames, or thematic roles, as well as computational feasibility needs. Below we survey the most common constraints and discuss their relation to ITGs.</Paragraph>
    <Paragraph position="1"> Crossing Constraints. Arrangements where the matchings between subtrees cross each another are prohibited by crossing constraints, unless the subtrees' immediate parent constituents are also matched to each other. For example, given the constituent matchings depicted as solid lines in Figure 4, the dotted-line matchings corresponding to potential lexical translations would be ruled illegal. Crossing constraints are implicit in many phrasal matching approaches, both constituency-oriented (Kaji, Kida, and Morimoto 1992; Cranias, Papageorgiou, and Peperidis 1994; Grishman 1994) and dependency-oriented (Sadler and Vendelmans 1990; Matsumoto, Ishimoto, and Utsuro 1993). The theoretical cross-linguistic hypothesis here is that the core arguments of frames tend to stay together over different languages. The constraint is also useful for computational reasons, since it helps avoid exponential bilingual matching times.</Paragraph>
    <Paragraph position="2"> ITGs inherently implement a crossing constraint; in fact, the version enforced by ITGs is even stronger. This is because even within a single constituent, immediate subtrees are only permitted to cross in exact inverted order. As we shall argue below, this restriction reduces matching flexibility in a desirable fashion.</Paragraph>
    <Paragraph position="3"> Rank Constraints. The second expressiveness desideratum for a matching formalism is to somehow limit the rank of constituents (the number of children or right-hand-side symbols), which dictates the span over which matchings may cross. As the number of subtrees of an Ll-constituent grows, the number of possible matchings to subtrees of the corresponding L2-constituent grows combinatorially, with corresponding time complexity growth on the matching process. Moreover, if constituents can immediately dominate too many tokens of the sentences, the crossing constraint loses effectiveness--in the extreme, if a single constituent immediately dominates the entire sentence-pair, then any permutation is permissible without violating the crossing constraint. Thus, we would like to constrain the rank as much as possible, while still permitting some reasonable degree of permutation flexibility.</Paragraph>
    <Paragraph position="4"> Recasting this issue in terms of the general class of context-free (syntax-directed) transduction grammars, the number of possible subtree matchings for a single constituent grows combinatorially with the number of symbols on a production's right-hand side. However, it turns out that the ITG restriction of allowing only matchings with straight or inverted orientation effectively cuts the combinatorial growth, while still maintaining flexibility where needed.</Paragraph>
    <Paragraph position="5"> To see how ITGs maintain needed flexibility, consider Figure 5, which shows all 24 possible complete matchings between two constituents of length four each. Nearly all of these--22 out of 24--can be generated by an ITG, as shown by the parse trees (whose</Paragraph>
    <Section position="1" start_page="384" end_page="385" type="sub_section">
      <SectionTitle>
Wu Bilingual Parsing
</SectionTitle>
      <Paragraph position="0"> The Security Bureau grante / authority to__the polic~ station Figure 4 The crossing constraint.</Paragraph>
      <Paragraph position="1"> nonterminal labels are omitted). 3 The 22 permitted matchings are representative of real transpositions in word order between the English-Chinese sentences in our data. The only two matchings that cannot be generated are very distorted transpositions that we might call &amp;quot;inside-out&amp;quot; matchings. We have been unable to find real examples in our data of constituent arguments undergoing &amp;quot;inside-out&amp;quot; transposition. Note that this hypothesis is for fixed-word-order languages that are lightly inflected, such as English and Chinese. It would not be expected to hold for so-called scrambling or free-word-order languages, or heavily inflected languages. However, inflections provide alternative surface cues for determining constituent roles (and  Growth in number of legal complete subconstituent matchings for context-free (syntax-directed) transduction grammars with rank r, versus ITGs on a pair of subconstituent sequences of length r each.</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="385" end_page="389" type="metho">
    <SectionTitle>
5. Stochastic Inversion Transduction Grammars
</SectionTitle>
    <Paragraph position="0"> In a stochastic ITG (SITG), a probability is associated with each rewrite rule. Following the standard convention, we use a and b to denote probabilities for syntactic and lexical rules, respectively. For example, the probability of the rule NN 0~ \[A N\] is aNN-,\[A N\] = 0.4. The probability of a lexical rule A 0.0001 x/y is bA(X,y) ~- 0.001. Let W1, W2 be the vocabulary sizes of the two languages, and X = {A1 ..... AN} be the set of nonterminals with indices 1,...,N. (For conciseness, we sometimes abuse the notation by writing an index when we mean the corresponding nonterminal symbol, as long as this introduces no confusion.) Then for every 1 &lt; i &lt; N, the production probabilities are subject to the constraint that</Paragraph>
    <Paragraph position="2"> We now introduce an algorithm for parsing with stochastic ITGs that computes an optimal parse given a sentence-pair using dynamic programming. In bilingual parsing, just as with ordinary monolinguat parsing, probabilizing the grammar permits ambiguities to be resolved by choosing the maximum-likelihood parse. Our algorithm is similar in spirit to the recognition algorithm for HMMs (Viterbi 1967) and to CYK parsing (Kasami 1965; Younger 1967).</Paragraph>
    <Paragraph position="3"> Let the input English sentence be el ..... eT and the corresponding input Chinese sentence be cl ..... cv. As an abbreviation we write es.t for the sequence of words %+1, es+2 ..... et, and similarly for cu v; also, es s = c is the empty string. It is convenient to use a 4-tuple of the form q = (s, t, u, v) to identify each node of the parse tree, where  Growth in number of all legal subconstituent matchings (complete or partial, meaning that some subconstituents are permitted to remain unmatched as singletons) for context-flee (syntax-directed) transduction grammars with rank r, versus ITGs on a pair of subconstituent sequences of length r each.</Paragraph>
    <Paragraph position="4"> the substrings es..t and C/u..v both derive from the node q. Denote the nonterminal label on q by f(q). Then for any node q = (s, t, u, v), define</Paragraph>
    <Paragraph position="6"> as the maximum probability of any derivation from i that successfully parses both es .t and cu..v. Then the best parse of the sentence pair has probability 60,T,0,v(S).</Paragraph>
    <Paragraph position="7"> The algorithm computes 60,T,0,v(S) using the following recurrences. Note that we generalize argmax to the case where maximization ranges over multiple indices, by making it vector-valued. Also note that \[\] and 0 are simply constants, written mnemonically. The condition (S - s)(t -S) + (U - u)(v - U) ~ 0 is a way to specify  that the substring in one, but not both, languages may be split into an empty string c and the substring itself; this ensures that the recursion terminates, but permits words that have no match in the other language to map to an ~ instead.</Paragraph>
    <Paragraph position="8"> 1. Initialization</Paragraph>
    <Paragraph position="10"/>
    <Paragraph position="12"> Initialize by setting the root of the parse tree to ql = (0, T, 0, V) and its nonterminal label to t(ql) = S. The remaining descendants in the optimal parse tree are then given recursively for any q = (s, t, u, v) by:</Paragraph>
    <Paragraph position="14"> The time complexity of this algorithm in the general case is O(N3T3V3), where N is the number of distinct nonterminals and T and V are the lengths of the two sentences. This is a factor of V 3 more than monolingual chart parsing, but has turned out to remain quite practical for corpus analysis, where parsing need not be real-time.</Paragraph>
  </Section>
  <Section position="9" start_page="389" end_page="389" type="metho">
    <SectionTitle>
6. Translation-driven Segmentation
</SectionTitle>
    <Paragraph position="0"> Segmentation of the input sentences is an important step in preparing bilingual corpora for various learning procedures. Different languages realize the same concept using varying numbers of words; for example, a single English word may surface as a compound in French. This complicates the problem of matching the words between a sentence-pair, since it means that compounds or collocations must sometimes be treated as lexical units. The translation lexicon is assumed to contain collocation translations to facilitate such multiword matchings. However, the input sentences do not come broken into appropriately matching chunks, so it is up to the parser to decide when to break up potential collocations into individual words.</Paragraph>
    <Paragraph position="1"> The problem is particularly acute for English and Chinese because word boundaries are not orthographically marked in Chinese text, so not even a default chunking exists upon which word matchings could be postulated. (Sentences (2) and (5) demonstrate why the obvious trick of taking single characters as words is not a workable strategy.) The usual Chinese NLP architecture first preprocesses input text through a word segmentation module (Chiang et al. 1992; Lin, Chiang, and Su 1992, 1993; Chang and Chen 1993; Wu and Tseng 1993; Sproat et al. 1994; Wu and Fung 1994), but, clearly, bilingual parsing will be hampered by any errors arising from segmentation ambiguities that could not be resolved in the isolated monolingual context because even if the Chinese segmentation is acceptable monolingually, it may not agree with the words present in the English sentence. Matters are made still worse by unpredictable omissions in the translation lexicon, even for valid compounds.</Paragraph>
    <Paragraph position="2"> We therefore extend the algorithm to optimize the Chinese sentence segmentation in conjunction with the bracketing process. Note that the notion of a Chinese &amp;quot;word&amp;quot; is a longstanding linguistic question, that our present notion of segmentation does not address. We adhere here to a purely task-driven definition of what a correct &amp;quot;segmentation&amp;quot; is, namely that longer segments are desirable only when no compositional translation is possible. The algorithm is modified to include the following computations, and remains the same otherwise:</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML