File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1059_metho.xml

Size: 13,590 bytes

Last Modified: 2025-10-06 14:08:42

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1059">
  <Title>Language Model Adaptation for Statistical Machine Translation with Structured Query Models</Title>
  <Section position="4" start_page="0" end_page="11" type="metho">
    <SectionTitle>
TN
</SectionTitle>
    <Paragraph position="0"> Q has several good characteristics: First it contains translation candidates, and thus is more informative than</Paragraph>
    <Paragraph position="2"> translated words usually occur in every hypothesis in the n-best list, therefore have a stronger impact on the retrieval result due to the higher term frequency (tf) in the query. Thirdly, most of the hypotheses are only different from each other in one word or two. This means, there is not so much noise and variance introduced in this query model.</Paragraph>
    <Paragraph position="3">  To fully leverage the available knowledge from the translation system, the translation model can be used to guide the language model adaptation process. As introduced in section 1, the translation model represents the full knowledge of translating words, as it encodes all possible translations candidates for a given source sentence. Thus the query model based on the translation model, has potential advantages over both</Paragraph>
    <Paragraph position="5"> To utilize the translation model, all the n-grams from the source sentence are extracted, and the corresponding candidate translations are collected from the translation model. These are then converted into a bag-of-words representation as follows:</Paragraph>
    <Paragraph position="7"> is a source n-gram, and I is the number of n-grams in the source sentence.</Paragraph>
    <Paragraph position="9"> is a candidate target word as translation of</Paragraph>
    <Paragraph position="11"> . Thus the translation model is converted into a collection of target words as a bag-of-word query model. There is no decoding process involved to build  is subject to more noise.</Paragraph>
    <Paragraph position="12"> 3 Structured Query Models Word proximity and word order is closely related  to syntactic and semantic characteristics. However, it is not modeled in the query models presented so far, which are simple bag-of-words representations. Incorporating syntactic and semantic information into the query models can potentially improve the effectiveness of LM adaptation.</Paragraph>
    <Paragraph position="13"> The word-proximity and word ordering information can be easily extracted from the first-best hypothesis, the n-best hypothesis list, and the translation lattice built from the translation model. After extraction of the information, structured query models are proposed using the structured query language, described in the Section 3.1.</Paragraph>
    <Section position="1" start_page="0" end_page="11" type="sub_section">
      <SectionTitle>
3.1 Structured Query Language
</SectionTitle>
      <Paragraph position="0"> This query language essentially enables the use of proximity operators (ordered and unordered windows) in queries, so that it is possible to model the syntactic and semantic information encoded in phrases, n-grams, and co-occurred word pairs.</Paragraph>
      <Paragraph position="1"> The InQuery implementation (Lemur 2003) is applied. So far 16 operators are defined in InQuery to model word proximity (ordered, unordered, phrase level, and passage level). Four of these operators are used specially for our language model adaptation:</Paragraph>
      <Paragraph position="3"> ) are treated as having equal influence on the final retrieval result. The belief values provided by the arguments of the sum are averaged to produce the belief value of the #sum node.</Paragraph>
      <Paragraph position="4"> Weighted Sum Operator: #wsum(</Paragraph>
      <Paragraph position="6"> unequally to the final result according to the</Paragraph>
      <Paragraph position="8"> The terms must be found within N words of each other in the text in order to contribute to the document's belief value. An n-gram phrase can be modeled as an ordered distance operator with N=n.</Paragraph>
      <Paragraph position="9"> Unordered Distance Operator: #uwN(</Paragraph>
      <Paragraph position="11"> The terms contained must be found in any order within a window of N words in order for this operator to contribute to the belief value of the document.</Paragraph>
    </Section>
    <Section position="2" start_page="11" end_page="11" type="sub_section">
      <SectionTitle>
3.2 Structured Query Models
</SectionTitle>
      <Paragraph position="0"> Given the representation power of the structured query language, the Top-1 hypothesis, Top-N Best hypothesis list, and the translation lattice can be converted into three Structured Query Models respectively.</Paragraph>
      <Paragraph position="1"> For first-best and n-best hypotheses, we collect related target n-grams of a given source word according to the alignments generated in the Viterbi decoding process. While for the translation lattice, similar to the construction of</Paragraph>
      <Paragraph position="3"> collect all the source n-grams, and translate them into target n-grams. In either case, we get a set of target n-grams for each source word. The structured query model for the whole source sentence is a collection of such subsets of target ngrams. null</Paragraph>
      <Paragraph position="5"> In our experiments, we consider up to trigram for better retrieval efficiency, but higher order n-grams could be used as will. The second simplification is that every source word is equally important, thus each n-gram subset</Paragraph>
      <Paragraph position="7"> will have an equal contribution to the final retrieval results. The last simplification is each n-gram within the set of</Paragraph>
      <Paragraph position="9"> has an equal weight, i.e. we do not use the translation probabilities of the translation model.</Paragraph>
      <Paragraph position="10"> If the system is a phrase-based translation system, we can encode the phrases using the ordered distance operator (#N) with N equals to the number of the words of that phrase, which is denoted as the #phrase operator in InQuery implementation. The 2-grams and 3-grams can be encoded using this operator too.</Paragraph>
      <Paragraph position="11"> Thus our final structured query model is a sum operator over a set of nodes. Each node corresponds to a source word. Usually each source word has a number of translation candidates (unigrams or phrases). Each node is a weighted sum over all translation candidates weighted by their frequency in the hypothesis set. An example is shown below, where #phrase indicates the use of the ordered distance operator with varying n:</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="11" end_page="11" type="metho">
    <SectionTitle>
4 Experiments
</SectionTitle>
    <Paragraph position="0"> Experiments are carried out on a standard statistical machine translation task defined in the NIST evaluation in June 2002. There are 878 test sentences in Chinese, and each sentence has four human translations as references. NIST score (NIST 2002) and Bleu score (Papineni et. al. 2002) of mteval version 9 are reported to evaluate the translation quality.</Paragraph>
    <Section position="1" start_page="11" end_page="11" type="sub_section">
      <SectionTitle>
4.1 Baseline Translation System
</SectionTitle>
      <Paragraph position="0"> Our baseline system (Vogel et al., 2003) gives scores of 7.80 NIST and 0.1952 Bleu for Top-1 hypothesis, which is comparable to the best results reported on this task.</Paragraph>
      <Paragraph position="1"> For the baseline system, we built a translation model using 284K parallel sentence pairs, and a trigram language model from a 160 million words general English news text collection. This LM is the background model to be adapted.</Paragraph>
      <Paragraph position="2"> With the baseline system, the n-best hypotheses list and the translation lattice are extracted to build the query models. Experiments are carried out on the adapted language model using the three bag-of-words query models:</Paragraph>
      <Paragraph position="4"> corresponding structured query models.</Paragraph>
    </Section>
    <Section position="2" start_page="11" end_page="11" type="sub_section">
      <SectionTitle>
4.2 Data: GigaWord Corpora
</SectionTitle>
      <Paragraph position="0"> The so-called GigaWord corpora (LDC, 2003) are very large English news text collections. There are four distinct international sources of English  As the Lemur toolkit could not handle the two large corpora (APW and NYT) we used only 200 million words from each of these two corpora. In the preprocessing all words are lowercased and punctuation is separated. There is no explicit removal of stop words as they usually fade out by tf.idf weights, and our experiments showed not positive effects when removing stop words.</Paragraph>
    </Section>
    <Section position="3" start_page="11" end_page="11" type="sub_section">
      <SectionTitle>
4.3 Bag-of-Words Query Models
</SectionTitle>
      <Paragraph position="0"> Table-2 shows the size of</Paragraph>
      <Paragraph position="2"> Table-2: Query size in number of tokens As words occurring several times are reduced to word-frequency pairs, the size of the queries generated from the 100-best translation lists is only 9 times as big as the queries generated from the first-best translations. The queries generated from the translation model contain many more translation alternatives, summing up to almost 3.4 million tokens. Using the lattices the whole information of the translation model is kept.</Paragraph>
      <Paragraph position="4"> In the first experiment we used the first-best translations to generate the queries. For each of the 4 corpora different numbers of similar sentences (1, 10, 100, and 1000) were retrieved to build specific language models. Figure-2 shows the language model adaptation after tuning the interpolation factor l by a grid search over [0,1].</Paragraph>
      <Paragraph position="5">  We see that each corpus gives an improvement over the baseline. The best NIST score is 7.94, and the best Bleu score is 0.2018. Both best scores are realized using top 100 relevant sentences corpus per source sentence mined from the AFE.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="11" end_page="11" type="metho">
    <SectionTitle>
4.3.2 Results for Query
</SectionTitle>
    <Paragraph position="0"> Using the translation alternatives to retrieve the data for language model adaptation gives an improvement over using the first-best translation only for query construction. Using only one translation hypothesis to build an adapted language model has the tendency to reinforce that translation.</Paragraph>
  </Section>
  <Section position="7" start_page="11" end_page="11" type="metho">
    <SectionTitle>
4.3.3 Results for Query
TM
Q
</SectionTitle>
    <Paragraph position="0"> The third bag-of-words query model uses all translation alternatives for source words and source phrases. Figure-4 shows the results of this query model</Paragraph>
  </Section>
  <Section position="8" start_page="11" end_page="11" type="metho">
    <SectionTitle>
TM
</SectionTitle>
    <Paragraph position="0"> Q . The best results are 7.91 NIST score and 0.1995 Bleu. For this query model best results were achieved using the top 1000 relevant sentences mined from the AFE corpus per source sentence.</Paragraph>
    <Paragraph position="1"> The improvement is not as much as the other two query models. The reason is probably that all translation alternatives, even wrong translations resulting from errors in the word and phrase alignment, contribute alike to retrieve similar sentences. Thereby, an adapted language model is built, which reinforces not only good translations, but also bad translations.</Paragraph>
    <Paragraph position="2"> All the three query models showed improvements over the baseline system in terms of NIST and Bleu scores. The best bag-of-words query model is</Paragraph>
    <Section position="1" start_page="11" end_page="11" type="sub_section">
      <SectionTitle>
4.4 Structured Query Models
</SectionTitle>
      <Paragraph position="0"> The next series of experiments was done to study if using word order information in constructing the queries could help to generate more effective adapted language models. By using the structured query language we converted the same first-best hypothesis, the 100-best list, and the translation lattice into structured query models. Results are reported for the AFE corpus only, as this corpus gave best translation scores.</Paragraph>
      <Paragraph position="1"> Figure-5 shows the results for all three structured query models, built from the first-best hypothesis (&amp;quot;1-Best&amp;quot;), the 100 best hypotheses list (&amp;quot;100-Best&amp;quot;), and translation lattice (&amp;quot;TM-Lattice&amp;quot;). Using these query models, different numbers of most similar sentences, ranging from 100 to 4000, where retrieved from the AFE corpus. The given baseline results are the best results achieved from the corresponding bag-of-words query models.</Paragraph>
      <Paragraph position="2"> Consistent improvements were observed on NIST and Bleu scores. Again, optimal interpolation factors to interpolate the specific language models with the background language model were used, which typically were in the range of [0.6, 0.7]. Structured query models give most improvements when using more sentences for language model adaptation. The effect is more pronounced for Bleu then for NIST score.</Paragraph>
      <Paragraph position="3">  structured query models The really interesting result is that the structured query model</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML