File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1045_metho.xml

Size: 6,331 bytes

Last Modified: 2025-10-06 14:08:43

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1045">
  <Title>Improving Word Alignment Quality using Morpho-syntactic Information</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Statistical Alignment Models
</SectionTitle>
    <Paragraph position="0"> The goal of statistical machine translation is to translate an input word sequence f1;:::;fJ in the source language into a target language word sequence e1;:::;eI. Given the source language sequence, we have to choose the target language sequence that maximises the product of the language model probability Pr(eI1) and the translation model probability Pr(fJ1 jeI1). The translation model describes the correspondence between the words in the source and the target sequence whereas the language model describes well-formedness of a produced target sequence.</Paragraph>
    <Paragraph position="1"> The translation model can be rewritten in the following way:</Paragraph>
    <Paragraph position="3"> where aJ1 are called alignments and represent a mapping from the source word position j to the target word position i = aj. Alignments are introduced into translation model as a hidden variable, similar to the concept of Hidden Markov Models (HMM) in speech recognition.</Paragraph>
    <Paragraph position="4"> The translation probability Pr(fJ1 ;aJ1jeI1) can be further rewritten as follows:</Paragraph>
    <Paragraph position="6"> probability.</Paragraph>
    <Paragraph position="7"> In all popular translation models IBM-1 to IBM-5 as well as in HMM translation model, the lexicon probability Pr(fjjfj!11 ;aj1;eI1) is approximated with the simple single-word lexicon probability p(fjjeaj) which takes into account only full forms of the words fj and eaj. The difference between these models is based on the definition of alignment model Pr(ajjfj!11 ;aj!11 ;eI1). Detailed description of those models can be found in (Brown et al., 1993), (Vogel et al., 1996) and (Och and Ney, 2003).</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Hierarchical Representation of the
</SectionTitle>
    <Paragraph position="0"> Lexicon Model Typically, the statistical lexicon model is based only on the full forms of the words and does not have any information about the fact that some different full forms are actually derivations of the same base form. For highly inflected languages like German this might cause problems because the coverage of the lexicon might be low since the token/type ratio for German is typically much lower than for English (e.g. for Verbmobil: English 99.4, German 56.3).</Paragraph>
    <Paragraph position="1"> To take these interdependencies into account, we use the hierarchical representation of the statistical lexicon model as proposed in (Niessen and Ney, 2001). A constraint grammar parser GERCG for lexical analysis and morphological and syntactic disambiguation for German language is used to obtain morpho-syntactic information. For each German word, this tool provides its base form and the sequence of morpho-syntactic tags, and this information is then added into the original corpus. For example, the German word &amp;quot;gehe&amp;quot; (go), a verb in the indicative mood and present tense which is derived from the base form &amp;quot;gehen&amp;quot; is annotated as &amp;quot;gehe#gehen-V-IND-PRES#gehen&amp;quot;.</Paragraph>
    <Paragraph position="2"> This new representation of the corpus where full word forms are enriched with its base forms and tags enables gradual accessing of information with different levels of abstraction. Consider for example the above mentioned German word &amp;quot;gehe&amp;quot; which can be translated into the English word &amp;quot;go&amp;quot;. Another derivation of the same base form &amp;quot;gehen&amp;quot; is &amp;quot;gehst&amp;quot; which also can be translated by &amp;quot;go&amp;quot;. Existing statistical translation models cannot handle the fact that &amp;quot;gehe&amp;quot; and &amp;quot;gehst&amp;quot; are derivatives of the same base form and both can be translated into the same English word &amp;quot;go&amp;quot;, whereas the hierarchical representation makes it possible to take such interdependencies into account.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 EM Training
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.1 Standard EM training (review)
</SectionTitle>
      <Paragraph position="0"> In this section, we will briefly review the standard EM algorithm for the training of the lexicon model.</Paragraph>
      <Paragraph position="1"> In the E-step the lexical counts are collected over all sentences in the corpus:</Paragraph>
      <Paragraph position="3"> The procedure is similar for the other model parameters, i.e. alignment and fertility probabilities. null For models IBM-1, IBM-2 and HMM, an efficient computation of the sum over all alignments is possible. For the other models, the sum is approximated using an appropriately defined neighbourhood of the Viterbi alignment (see (Och and Ney, 2003) for details).</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.2 EM training using hierarchical
counts
</SectionTitle>
      <Paragraph position="0"> In this section we describe the EM training of the lexicon model using so-called hierarchical counts which are collected from the hierarchicaly annotated corpus.</Paragraph>
      <Paragraph position="1"> In the E-step the following types of counts are collected: + full form counts:</Paragraph>
      <Paragraph position="3"> where fbt represents the base form of the word f with sequence of corresponding tags, e.g. &amp;quot;gehen-V-IND-PRES&amp;quot;; + base form counts:</Paragraph>
      <Paragraph position="5"> where fb is the base form of the word f, e.g. &amp;quot;gehen&amp;quot;.</Paragraph>
      <Paragraph position="6"> For each full form, refined hierarchical counts are obtained in the following way:</Paragraph>
      <Paragraph position="8"> and the M-step is then performed using hierarchical counts:</Paragraph>
      <Paragraph position="10"> The training procedure for the other model parameters remains unchanged.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML