XML Viewer - p95-1033

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/95/p95-1033_metho.xml
Size: 18,431 bytes
Last Modified: 2025-10-06 14:14:06
<?xml version="1.0" standalone="yes"?>
<Paper uid="P95-1033">
  <Title>An Algorithm for Simultaneously Bracketing Parallel Texts by Aligning Words</Title>
  <Section position="5" start_page="244" end_page="244" type="metho">
    <SectionTitle>
-. \[Bet NN\]
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="244" end_page="244" type="sub_section">
      <SectionTitle>
(DetClass NN)
</SectionTitle>
      <Paragraph position="0"> Before turning to bracketing, we take note of three lemmas for IITGs (proofs omitted): Lemma l For any inversion-invariant transduction grammar G, there exists an equivalent inversion-invariant transduction grammar G' where T(G) = T( G'), such that:  1. lfe E LI(G) and e E L2(G), then G' contains a single production of the form S' --~ e / c, where S' is the start symbol of G' and does not appear on the right-hand side of any production of G' ; 2. otherwise G' contains no productions of the form  A ~ e/e.</Paragraph>
      <Paragraph position="1"> Lemma2 For any inversion-invariant transduction grammar G, there exists an equivalent inversion-invariant transduction gratrm~r G' where T(G) = T(G'), T(G) = T(G'), such that the right-hand side of any production of G' contains either a single terminal-pair or a list of nonterminals.</Paragraph>
      <Paragraph position="2"> Lemma3 For any inversion-invariant transduction grammar G, there exists an equivalent inversion transduction grammar G' where T( G) = T( G'), such that G' does not contain any productions of the form A --, B.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="244" end_page="248" type="metho">
    <SectionTitle>
3 Bracketing Transduction Grammars
</SectionTitle>
    <Paragraph position="0"> For the remainder of this paper, we focus our attention on pure bracketing. We confine ourselves to bracketing  transduction grammars (BTGs), which are IITGs where constituent categories ate not differentiated. Aside from the start symbol S, BTGs contain only one non-terminal symbol, A, which rewrites either recursively as a string of A's or as a single terminal-pair. In the former case, the productions has the form A ~-, A ! where we use A ! to abbreviate A... A, where thefanout f denotes the number of A's. Each A corresponds to a level of bracketing and can be thought of as demarcating some unspecified kind of syntactic category. (This same &amp;quot;repetitive expansion&amp;quot; restriction used with standard context-free grammars and transduetion grammars yields bracketing grammars without orientation invariauce.) A full bracketing transduction grammar of degree f contains A productions of every fanout between 2 and f, thus allowing constituents of any length up to f. In principle, a full BTG of high degree is preferable, having the greatest flexibility to acx~mmdate arbitrarily long matching sequences. However, the following theorem simplifies our algorithms by allowing us to get away with degree-2 BTGs. I ~t~ we will see how postprocessing restores the fanout flexibility (Section 5.2).</Paragraph>
    <Paragraph position="1"> Theorem 1 For any full bracketing transduction grammar T, there exists an equivalent bracketing transduction grammar T' in normal form where every production takes one of the followingforms:  contains only productions of the form S ~-* e/e, A z/y, A ~ z/e, A ~-* e/y, and A ,--* AA... A. For proof by induction, we need only show that any full BTG T of degree f &gt; 2 is equivalent to a full BTG T' of degree f- 1. It suffices to show that the production A ~-, A ! call be removed without any loss to the generated language, i.e., tha! the remaining productions in T' can still derive any string-pair derivable by T (removing a production cannot increase the set of derivable string-pairs). Let (E, C) be any siring-pair derivable from A ~ A 1, where E is output on stream 1 and C on stream 2. Define E i as the substring of E derived from the ith A of the production, and similarly define C i. There are two cases depending on the concatenation orientation, but (E, C) is derivable by T' in either case.</Paragraph>
    <Paragraph position="2"> In the first case, if the derivation used was A ..-, \[A!\],  thenE = E 1 ...E l andC = C1...C 1. Let(E',C') = (E 1 ... E !-x, C1... C1-1). Then (E', C') is derivable from A --~ \[A!-I\], and thus (E, C) = (E~E 1, C~C !) is derivable from A ~ \[A A\]: In the second case, the derivation used was A ---. {A !), and we still have E = E 1 ... E ! but now C -- CY... C 1. Now let (E', C&amp;quot;) =  In a stochastic BTG (SBTG), each rewrite rule has a probability. Let a! denote the probability of the A-production with fanout degree f. For the remaining (lexical) prodnctions, we use b(z, y) to denote P\[A ~ z/vlA\]. The probabiliti~ obey the constraint that</Paragraph>
    <Paragraph position="4"> For our experiments we employed a normal form transduction grammar, so a! = 0 for all f # 2. The Aproductions used were:  for all z, y lexical translations for all z English vocabulary for all y Chinese vocabulary The b(z, y) distribution actually encodes the English-Chinese translation lexicon. As discussed below, the lexicon we employed was automatically learned from a parallel corpus, giving us the b(z, y) probabilities directly. The latter two singleton forms permit any word in either sentence to be unmatched. A small e-constant is chosen for the probabilities b(z, e) and b(e, y), so that the optimal bracketing resorts to these productions only when it is otherwise impossible to match words.</Paragraph>
    <Paragraph position="5"> With BTGs, to parse means to build matched bracketings for senmnce-pairs rather than sentences. Tiffs means that the adjacency constraints given by the nested levels must be obeyed in the bracketings of both languages. The result of the parse gives bracketings for both input sentences, as well as a bracket alignment indicating the corresponding brackets between the sentences. The bracket alignment includes a word alignment as a byproduct.</Paragraph>
    <Paragraph position="6"> Consider the following sentence pair from our corpus:  (6) a. The Authority will be accountable to the Financial Secretary.</Paragraph>
    <Paragraph position="7"> b. Ift~l~t'~l~t~t~o  Assume we have the productions in Figure 2, which is a fragment excerpted from our actual BTG. Ignoring capitalization, an example of a valid parse that is consistent with our linguistic ideas is:</Paragraph>
    <Paragraph position="9"> Figure 3 shows a graphic representation of the same brac&amp;eting, where the 0 level of lrac, keting is marked by the horizontal line. The English is read in the usual depth-first left-to-right order, but for the Chinese, a horizontal line means the right subtree is traversed before the left.</Paragraph>
    <Paragraph position="10"> The () notation concisely displays the common structure of the two sentences. However, the bracketing is clearer if we view the sentences monolingually, which allows us to invert the Chinese constituents within the 0 so that only \[\] brackets need to appear..</Paragraph>
    <Paragraph position="11">  (8) a. \[\[\[ The Authority \] \[ will \[\[ be accountable \] \[ to \[ the \[\[ Financial Secretary \]\]\]\]\]\]1. \]</Paragraph>
    <Paragraph position="13"> In the monolingual view, extra brackets appear in one language whenever there is a singleton in the other language.</Paragraph>
    <Paragraph position="14"> If the goal is just to obtain ~ for monolingual sentences, the extra brackets can be discarded aft~ parsing:</Paragraph>
    <Paragraph position="16"> The basis of the bracketing strategy can be seen as choosing the bracketing that maximizes the (probabilistically weighted) number of words matched, subject to the BTG representational constraint, which has the effect of limiting the possible crossing patterns in the word alignment. A simpler, related idea of penalizing distortion from some ideal matching pattern can be found in the statistical translation (Brown et al. 1990; Brown et al. 1993) and word alignment (Dagan et al. 1993; Dagan &amp; Church 1994) models. Unlike these models, however, the BTG aims m model constituent structure when determining distortion penalties. In particular, crossings that are consistent with the constituent tree structure are not penalized. The implicit assumption is that core arguments of frames remain similar across languages, and tha! core arguments of the same frame will surface adjacently. The accuracy of the method on a particular language pair will therefore depend upon the extent to which this language universals hypothesis holds.</Paragraph>
    <Paragraph position="17"> However, the approach is robust because if the assumption is violated, damage will be limited to dropping the fewest possible crossed word matchings.</Paragraph>
    <Paragraph position="18"> We now describe how a dynzmic-programming parser can compute an optimal bxackcting given a sentence-pair and a stochastic BTG. In bilingual parsing, just as with ordinary monolingual parsing, probabilizing the grammar  permits ambiguities to be resolved by choosing the maximum likelihood parse. Our algorithm is similar in spirit to the recognition algorithm for HMMs (Viterbi 1967).</Paragraph>
    <Paragraph position="19"> Denote the input English sentence by el, * *., er and the corresponding input Chinese sentence by el,..., cv.</Paragraph>
    <Paragraph position="20"> As an abbreviation we write co.., for the sequence of words eo+l,e,+2,... ,e~, and similarly for c~..~. Let 6.tu~ = maxP\[e,..t/e~..~\] be the maximum probability of any derivation from A that__ successfully parses both substrings es..t and C/u..v. The best parse of the sentence pair is that with probability 60,T,0y.</Paragraph>
    <Paragraph position="21"> The algorithm computes 6o,T,0,V following the recurfences below. 2 The time complexity of this algorithm is O(TaV a) where T and V are the lengths of the two sen~.</Paragraph>
    <Paragraph position="22">  of the parse tree, initially set qx = (0, T, 0, V) to be the root. The remaining descendants in the optimal parse tree are then given recursively for any q = (s, t, u, v) by: LEFT' &amp;quot; &amp;quot;s ~r\[\] u v \[\] ~ / ~q) = ( ' \[~ '&amp;quot;~' '\[\] ''&amp;quot;~) f if0,t~ = \[\]</Paragraph>
    <Paragraph position="24"> Several additional extensions on this algorithm were found to be useful, and are briefly described below. Details are given in Wu (1995).</Paragraph>
    <Paragraph position="25"> 2We are gene~!izing argmax as to allow arg to specify the index of interest.</Paragraph>
    <Section position="1" start_page="247" end_page="247" type="sub_section">
      <SectionTitle>
4.1 Simultaneous segmentation
</SectionTitle>
      <Paragraph position="0"> We often find the same concept realized using different numbers of words in the two languages, creating potential difficulties for word alignment; what is a single word in English may be realized as a compound in Chinese. Since Chinese text is not orthographically separated into words, the standard methodology is to first preproce~ input texts through a segmentation module (Chiang et al. 1992; Linet al. 1992; Chang &amp; Chert 1993; Linet al. 1993; Wu &amp; Tseng 1993; Sproat et al. 1994). However, this serionsly degrades our algorithm's performance, since the the segmenter may encounter ambiguities that are unresolvable monolingually and thereby introduce errors.</Paragraph>
      <Paragraph position="1"> Even if the Chinese segmentation is acceptable moaolingually, it may not agree with the division of words present in the English sentence. Moreover, conventional compounds are frequently and unlmxlictably missing from translation lexicons, and this can furllu~ degrade perforInane. null To avoid such problems we have extended the algorithm to optimize the segmentation of the Chinese sentence in parallel with the ~ting lm~:ess. Note that this treatment of segmentation does not attempt to address the open linguistic question of what constitutes a Chinese &amp;quot;word&amp;quot;. Our definition of a correct &amp;quot;segmentation&amp;quot; is purely task-driven: longer segments are desirable if and only ff no compositional translation is possible.</Paragraph>
    </Section>
    <Section position="2" start_page="247" end_page="247" type="sub_section">
      <SectionTitle>
4.2 Pre/post-positional biases
</SectionTitle>
      <Paragraph position="0"> Many of the bracketing errors are caused by singletons.</Paragraph>
      <Paragraph position="1"> With singletons, there is no cross-lingual discrimination to increase the certainty between alternative brackeaings.</Paragraph>
      <Paragraph position="2"> A heuristic to deal with this is to specify for each of the two languages whether prepositions or postpositions more common, where &amp;quot;preposition&amp;quot; here is meant not in the usual part-of-speech sense, but rather in a broad sense of the tendency of function words to attach left or right. This simple swategcm is effective because the majority of unmatched singletons are function words that counterparts in the other language. This observation holds assuming that the translation lexicon's coverage is reasonably good. For both English and Chinese, we specify a prepositional bias, which means that singletons are attached to the right whenever possible.</Paragraph>
    </Section>
    <Section position="3" start_page="247" end_page="248" type="sub_section">
      <SectionTitle>
4.3 Punctuation constraints
</SectionTitle>
      <Paragraph position="0"> Certain punctuation characters give strong constituency indications with high reliability. &amp;quot;Perfect separators&amp;quot;, which include colons and Chinese full stops, and &amp;quot;petfeet delimiters&amp;quot;, which include parentheses and quotation marks, can be used as bracketing constraints. We have extended the algorithm to precluded hypotheses that are inconsistent with such constraints, by initializing those entries in the DP table corresponding to illegal sub-hypotheses with zero probabilities, These entries are blocked from recomputation during the DP phase. As their probabilities always remain zero, the illegal bracketings can never participate in any optimal bracketing.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="248" end_page="249" type="metho">
    <SectionTitle>
5 Postprocessing
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="248" end_page="248" type="sub_section">
      <SectionTitle>
5.1 A Singleton-Rebalancing Algorithm
</SectionTitle>
      <Paragraph position="0"> We now introduce an algorithm for further improving the bracketing accuracy in cases of singletons. Consider the following bracketing produced by the algorithm of the previous section:</Paragraph>
      <Paragraph position="2"> The prepositional bias has already correctly restricted the singleton &amp;quot;Tbe/d' to attach to the right, but of course &amp;quot;The&amp;quot; does not belong outside the rest of the sentence, but rather with &amp;quot;Authority&amp;quot;. The problem is that singletons have no discriminative power between alternative bracket matchings--they only contribute to the ambiguity. However, we can minimize the impact by moving singletons as deep as possible, closer to the individual word they precede or succeed, by widening the scope of the brackets immediately following the singleton. In general this improves precision since wide-scope brackets are less constraining.</Paragraph>
      <Paragraph position="3"> The algorithm employs a rebalancing strategy remniscent of balanced-tree structures using left and right rotations. A left rotation changes a (A(BC)) structure to a ((AB)C) structure, and vice versa for a right rotation.</Paragraph>
      <Paragraph position="4"> The task is complicated by the presence of both \[\] and 0 brackets with both LI- and L2-singletons, since each combination presents different interactions. To be legal, a rotation must preserve symbol order on both output streams. However, the following lemma shows that any subtree can always be rebalanced at its root if either of its children is a singleton of either language.</Paragraph>
      <Paragraph position="5"> Lenuna 4 Let x be a L1 singleton, y be a L2 singleton, and A, B, C be arbitrary constituent subtrees. Then the following properties hold for the \[\] and 0 operators:</Paragraph>
      <Paragraph position="7"> The method of Figure 4 modifies the input tree to attach singletons as closely as possible to couples, but remaining consistent with the input tree in the following sense: singletons cannot &amp;quot;escape&amp;quot; their inmmdiately surrounding brackets. The key is that for any given subtree, if the outermost bracket involves a singleton that should be rotated into a subtree, then exactly one of the singleton rotation properties will apply. The method proceeds depth-first, sinking each singleton as deeply as possible.</Paragraph>
      <Paragraph position="8"> For example, after rebalm~cing, sentence (10) is bracketed as follows:</Paragraph>
    </Section>
    <Section position="2" start_page="248" end_page="249" type="sub_section">
      <SectionTitle>
5.2 Flattening the Bracketing
</SectionTitle>
      <Paragraph position="0"> Because the BTG is in normal form, each bracket can only hold two constituents. This improves parsing efficiency, but requires overcommiUnent since the algorithm is always forced to choose between (A(BC)) and ((AB)C) statures even when no choice is clearly better. In the worst case, both senteau:~ might have perfectly aligned words, lending no discriminative leverage whatsoever to the bfac~ter. This leaves a very large number of choices: if both sentences are of length i = m, then thel~ ~ (21) 1 possible lracJw~ngs with fanout 2, none of which is better justitied than any other. Thus to improve accuracy, we should reduce the specificity of the bracketing's commitment in such cases.</Paragraph>
      <Paragraph position="1"> We implement this with another postprocessing stage.</Paragraph>
      <Paragraph position="2"> The algorithm proceeds bottom-up, elimiDming as malay brackets as possible, by making use of the associafivity equivalences \[ABel = \[A\[BC\]\] = \[lAB\]C\] and</Paragraph>
      <Paragraph position="4"> rectionality and flipping eommutativity equivalences (see Lemma 4) are also applied, whenever they render the associativity equivalences applicable.</Paragraph>
      <Paragraph position="5"> The final result after flattening sentence (11) is as follows: null</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML