File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-0303_metho.xml

Size: 8,083 bytes

Last Modified: 2025-10-06 14:08:22

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0303">
  <Title>Word Alignment Based on Bilingual Bracketing</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Bilingual Bracketing
</SectionTitle>
    <Paragraph position="0"> In [Wu 1997], the Bilingual Bracketing PCFG was introduced, which can be simplified as the following production rules:  A ! [AA] (1) A ! &lt; AA &gt; (2) A ! f=e (3) A ! f=null (4) A ! null=e (5)  Where f and e are words in the target vocabulary Vf and source vocabulary Ve respectively. A is the alignment of texts. There are two operators for bracketing: direct bracketing denoted by [ ], and inverse bracketing, denoted by &lt;&gt;. The A-productions are divided into two classes: syntactic f(1),(2)gand lexical rules f(3),(4),(5)g. Each A-production rule has a probability. In our algorithm, we use the same PCFG. However, instead of estimating the probabilities for the production rules via EM as described in [Wu 1997], we assign the probabilities to the rules using the Model-1 statistical translation lexicon [Brown et al. 1993].</Paragraph>
    <Paragraph position="1"> Because the syntactic A-production rules do not compete with the lexical rules, we can set them some default values. Also we make no assumptions which bracketing direction is more likely to occur, thus the probabilities for [ ] and &lt;&gt; are set to be equal. As for the lexical rules, we experimented with the conditional probabilities p(ejf), p(fje) and the interpolation of p(fje;epos) and p(fje) (described in section 4.1). As for these probabilities of aligning a word to the null word or to unknown words, they are set to be 1e-7, which is the default small value used in training Model-1.</Paragraph>
    <Paragraph position="2"> The word alignment can then be done via maximizing the likelihood of matched words subject to the bracketing grammar using dynamic programming.</Paragraph>
    <Paragraph position="3"> The result of the parsing gives bracketing for both input sentences as well as bracket alignments indicating the corresponding brackets between the sentence pairs. The bracket alignment includes a word alignment as a byproduct. One example for French-English (the test set sentence pair #18) is shown as below:</Paragraph>
    <Paragraph position="5"/>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Boosting Strategy of Model-1 Lexicon
</SectionTitle>
    <Paragraph position="0"> The probabilities for the lexical rules are Model-1 conditional probabilities p(fje), which can be estimated using available toolkits such as [Franz 2000].</Paragraph>
    <Paragraph position="1"> This strategy is a three-pass training of Model-1, which was shown to be effective in our Chinese-English alignment experiments. The first two passes are carried out to get Viterbi word alignments based on Model-1's parameters in both directions: from source to target and then vice versa. An intersection of the two Viterbi word alignments is then calculated. The highly frequent word-pairs in the intersection set are considered to be important samples supporting the alignment of that word-pair. This approach, which is similar to importance sampling, can be summarized as follows: Denote a sample as a co-occurred word-pair as x = (ei;fj) with its observed frequency: C(x) = freq(ei;fj); Denote I(x) = freq(ei;fj) as the frequency of that word-pair x observed in the intersection of the two Viterbi alignments.</Paragraph>
    <Paragraph position="2"> + Build I(x) = freq(ei;fj) from the intersection of alignments in two directions.</Paragraph>
    <Paragraph position="3">  + Generate x = (ei;fj) and its C(x) = freq(ei;fj) observed from a given parallel corpus; + Generate random variable u from uniform [0,1] distribution independent of x; + If I(x)MC/C(x) , u, then accept x, where M is a finite known constant M &gt; 0; + Re-weight sample x: Cb(x) = C(x)/(1+&amp;quot;);&amp;quot; &gt; 0)  The modified counts (weighted samples) are renormalized to get a proper probability distribution, which is used in the next iteration of EM training. The constant M is a threshold to remove the potential noise from the intersection set. M's value is related to the size of the training corpus, the larger its size, the larger M should be. &amp;quot; is chosen as a small positive value. The overall idea is to collect those word-pairs which are reliable and give an additional pseudo count to them.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Incorporating English Grammatical
Constraints
</SectionTitle>
    <Paragraph position="0"> There are several POS taggers, base noun phrase detectors and parsers available for English. Both the shallow and full parsing information of English sentences can be used as constraints in Bilingual Bracketing. Here, we explored utilizing English POS and English base noun phrase boundaries.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Incorporating English POS
</SectionTitle>
      <Paragraph position="0"> The correctly aligned words from two languages are very likely to have the same POS. For example, a Chinese noun is very likely to be aligned with a English noun.</Paragraph>
      <Paragraph position="1"> While the English POS tagging is often reliable and accurate, the POS tagging for other languages is usually not easily acquired nor accurate enough. Modelling only the English POS in word alignment is usually a practical way.</Paragraph>
      <Paragraph position="2"> Given POS information for only the English side, we can discriminate English words and thus disambiguate the translation lexicon. We tagged each English word in the parallel corpus, so that each English word is associated with its POS denoted as epos. The English word and its POS were concatenated into one pseudo word. For example: beginning/NN and beginning/VBG are two pseudo words which occurred in our training corpus. Then the Model-1 training was carried out on this concatenated parallel corpus to get estimations of p(fje;epos).</Paragraph>
      <Paragraph position="3"> One potential problem is the estimation of p(fje;epos).</Paragraph>
      <Paragraph position="4"> When we concatenated the word with its POS, we implicitly increased the vocabulary size. For example, for French-English training set, the English vocabulary increased from 57703 to 65549. This may not cause a problem when the training data's size is large. But for small parallel corpora, some correct word-pair's p(fje;epos) will be underestimated due to the sparse data, and some word-pairs become unknown in p(fje;epos). So in our system, we actually interpolated p(fje;epos) with p(fje) as a mixture model for robustness:</Paragraph>
      <Paragraph position="6"> Where , can be estimated by EM for this two-mixture model on the training data, or a grid search via crossvalidation. null</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Incorporating English Base Noun Boundaries
</SectionTitle>
      <Paragraph position="0"> The English sentence is bracketed according to the syntactic A-production rules. This bracketing can break an English noun phrase into separated pieces, which are not in accordance with results from standard base noun phrase detectors. Though the word-alignments may still be correct, but for the phrase level alignment, it is not desired.</Paragraph>
      <Paragraph position="1"> One solution is to constrain the syntactic A-production rules to penalize bracketing English noun phrases into separated pieces. The phrase boundaries can be obtained by using a base noun phrase detection toolkit [Ramshaw 1995], and the boundaries are loaded into the bracketing program. During the dynamic programming, before applying a syntactic A-production rule, the program checks if the brackets defined by the syntactic rule violate the noun phrase boundaries. If so, an additional penalty is attached to this rule.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML