XML Viewer - p03-1040

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/p03-1040_metho.xml
Size: 24,811 bytes
Last Modified: 2025-10-06 14:08:13
<?xml version="1.0" standalone="yes"?>
<Paper uid="P03-1040">
  <Title>Feature-Rich Statistical Translation of Noun Phrases</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Noun Phrase Translation as a Subtask
</SectionTitle>
    <Paragraph position="0"> In this work, we consider both noun phrases and prepositional phrases, which we will refer to as NP/PPs. We include prepositional phrases for a number of reasons. Both are attached at the clause level. Also, the translation of the preposition often depends heavily on the noun phrase (in the morning). Moreover, the distinction between noun phrases and prepositional phrases is not always clear (note the Japanese bunsetsu) or hard to separate (German joining of preposition and determiner into one lexical unit, e.g., ins a0 in das a1 in the).</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Definition
</SectionTitle>
      <Paragraph position="0"> We define the NP/PPs in a sentence as follows: Given a sentence a2 and its syntactic parse tree a3 , the NP/PPs of the sentence a2 are the subtrees a3a5a4 that contain at least one noun and no verb, and are not part of a larger subtree that contains no verb.</Paragraph>
      <Paragraph position="1">  The NP/PPs are the maximal noun phrases of the sentence, not just the base NPs. This definition excludes NP/PPs that consist of only a pronoun. It also excludes noun phrases that contain relative clauses. NP/PPs may have connectives such as and.</Paragraph>
      <Paragraph position="2"> For an illustration, see Figure 1.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Translation of NP/PPs
</SectionTitle>
      <Paragraph position="0"> To understand the behavior of noun phrases in the translation process, we carried out a study to examine how they are translated in a typical parallel corpus. Clearly, we cannot simply expect that certain syntactic types in one language translate to equivalent types in another language. Equivalent types might not even exist.</Paragraph>
      <Paragraph position="1"> This study answers the questions: a0 Do human translators translate noun phrases in foreign texts into noun phrases in English? a0 If all noun phrases in a foreign text are translated into noun phrases in English, is an acceptable sentence translation possible? a0 What are the properties of noun phrases which cannot be translated as noun phrases without rendering the overall sentence translation unacceptable? null Using the Europarl corpus1, we consider a translation task from German to English. We marked the NP/PPs in the German side of a small 100 sentence parallel corpus manually. This yielded 168 NP/PPs according to our definition.</Paragraph>
      <Paragraph position="2"> We examined if these units are realized as noun phrases in the English side of the parallel corpus. This is the case for 75% of the NP/PPs.</Paragraph>
      <Paragraph position="3"> Second, we tried to construct translations of these NP/PPs that take the form of NP/PPs in English in an overall acceptable translation of the sentence. We could do this for 98% of the NP/PPs.</Paragraph>
      <Paragraph position="4"> The four exceptions are: a0 in Anspruch genommen; Gloss: take in demand a0 Abschied nehmen; take good-bye a0 meine Zustimmung geben; give my agreement a0 in der Hauptsache; in the main-thing The first three cases are noun phrases or prepositional phrases that merge with the verb. This is similar to the English construction make an observation, which translates best into some languages as a verb equivalent to observe. The fourth example, literally translated as in the main thing, is best translated as mainly.</Paragraph>
      <Paragraph position="5">  Why is there such a considerable discrepancy between the number of noun phrases that can be translated as noun phrases into English and noun phrases that are translated as noun phrases? The main reason is that translators generally try to translate the meaning of a sentence, and do not feel bound to preserve the same syntactic structure. This leads them to sometimes arbitrarily restructure the sentence. Also, occasionally the translations are sloppy.</Paragraph>
      <Paragraph position="6"> The conclusion of this study is: Most NP/PPs in German are translated to English as NP/PPs. Nearly all of them, 98%, can be translated as NP/PPs into English. The exceptions to this rule should be treated as special cases and handled separately.</Paragraph>
      <Paragraph position="7"> We carried out studies for Chinese-English and Portuguese-English NP/PPs with similar results.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 The Role of External Context
</SectionTitle>
      <Paragraph position="0"> One interesting question is if external context is necessary for the translation of noun phrases. While the sentence and document context may be available to the NP/PP subsystem, the English output is only assembled later and therefore harder to integrate.</Paragraph>
      <Paragraph position="1"> To address this issue, we carried out a manual experiment to check if humans can translate NP/PPs without any external context. Using the same corpus of 168 NP/PPs as in the previous section, a human translator translated 89% of the noun phrases correctly, 9% had the wrong leading preposition, and only 2% were mistranslated with the wrong content word meaning.</Paragraph>
      <Paragraph position="2"> Picking the right phrase start (e.g., preposition or determiner) can sometimes only be resolved when the English verb is chosen and its subcategorization is known. Otherwise, sentence context does not play a big role: Word choice can almost always be resolved within the internal context of the noun phrase.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.4 Integration into an MT System
</SectionTitle>
      <Paragraph position="0"> The findings of the previous section indicate that NP/PP translation can be conceived as a separate subsystem of a complete machine translation system - with due attention to special cases. We will now estimate the importance of such a system.</Paragraph>
      <Paragraph position="1"> As a general observation, we note that NP/PPs cover roughly half of the words in news or similar  sentence translations and BLEU score texts. All nouns are covered by NP/PPs. Nouns are the biggest group of open class words, in terms of the number of distinct words. Constantly, new nouns are added to the vocabulary of a language, be it by borrowing foreign words such as Fahrvergn&amp;quot;ugen or Zeitgeist, or by creating new words from acronyms such as AIDS, or by other means. In addition to new words, new phrases with distinct meanings are constantly formed: web server, home page, instant messaging, etc. Learning new concepts from text sources when they become available is an elegant solution for this knowledge acquisition problem.</Paragraph>
      <Paragraph position="2"> In a preliminary study, we assess the impact of an NP/PP subsystem on the quality of an overall machine translation system. We try to answer the following questions: a0 What is the impact on a machine translation system if noun phrases are translated in isolation? null a0 What is the performance gain for a machine translation system if an NP/PP subsystem provides perfect translations of the noun phrases? We built a subsystem for NP/PP translation that uses the same modeling as the overall system (IBM Model 4), but is trained on only NP/PPs. With this system, we translate the NP/PPs in isolation, without the assistance of sentence context. These translations are fixed and provided to the general machine translation system, which does not change the fixed NP/PP translation.</Paragraph>
      <Paragraph position="3"> In a different experiment, we also provided correct translations (motivated by the reference translation) for the NP/PPs to the general machine translation system. We carried out these experiments on the same 100 sentence corpus as in the previous sections. The 164 translatable NP/PPs are marked and translated in isolation.</Paragraph>
      <Paragraph position="4"> The results are summarized in Table 1. Treating NP/PPs as isolated units, and translating them in iso- null system: The base model generates an n-best list that is rescored using additional features lation with the same methods as the overall system has little impact on overall translation quality. In fact, we achieved a slight improvement in results due to the fact that NP/PPs are consistently translated as NP/PPs. A perfect NP/PP subsystem would triple the number of correctly translated sentences. Performance is also measured by the BLEU score (Papineni et al., 2002), which measures similarity to the reference translation taken from the English side of the parallel corpus.</Paragraph>
      <Paragraph position="5"> These findings indicate that solving the NP/PP translation problem would be a significant step toward improving overall translation quality, even if the overall system is not changed in any way. The findings also indicate that isolating the NP/PP translation task as a subtask does not harm performance.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Framework
</SectionTitle>
    <Paragraph position="0"> When translating a foreign input sentence, we detect its NP/PPs and translate them with an NP/PP translation subsystem. The best translation (or multiple best translations) is then passed on to the full sentence translation system which in turn translates the remaining parts of the sentence and integrates the chosen NP/PP translations.</Paragraph>
    <Paragraph position="1"> Our NP/PP translation subsystem is designed as follows: We train a translation system on a NP/PP parallel corpus. We use this system to generate an n-best list of possible translations. We then rescore this n-best list with the help of additional features.</Paragraph>
    <Paragraph position="2"> This design is illustrated by Figure 2.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Evaluation
</SectionTitle>
      <Paragraph position="0"> To evaluate our methods, we automatically detected all of the 1362 NP/PPs in 534 sentences from parts of the Europarl corpus which are not already used as training data. Our evaluation metric is human assessment: Can the translation provided by the system be part of an acceptable translation of the whole sentence? In other words, the noun phrase has to be translated correctly given the sentence context.</Paragraph>
      <Paragraph position="1"> The NP/PPs are extracted in the same way that NP/PPs are initially detected for the acquisition of the NP/PP training corpus. This means that there are some problems with parse errors, leading to sentence fragments extracted as NP/PPs that cannot be translated correctly. Also, the test corpus contains all detected NP/PPs, even untranslatable ones, as discussed in Section 2.2.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Acquisition of an NP/PP Training Corpus
</SectionTitle>
      <Paragraph position="0"> To train a statistical machine translation model, we need a training corpus of NP/PPs paired with their translation. We create this corpus by extracting NP/PPs from a parallel corpus.</Paragraph>
      <Paragraph position="1"> First, we word-align the corpus with Giza++ (Och and Ney, 2000). Then, we parse both sides with syntactic parsers (Collins, 1997; Schmidt and Schulte im Walde, 2000)2. Our definition easily translates into an algorithm to detect NP/PPs in a sentence.</Paragraph>
      <Paragraph position="2"> Recall that in such a corpus, only part of the NP/PPs are translated as such into the foreign language. In addition, the word-alignment and syntactic parses may be faulty. As a consequence, initially only 43.4% of all NP/PPs could be aligned. We raise this number to 67.2% with a number of automatic data cleaning steps:  projekte/gramotron/SOFTWARE/LoPar-en.html we always strip the adverb from these constructions. null a0 German verbal adjective constructions are broken up if they involve arguments or adjuncts (e.g., der von mir gegessene Kuchen = the by me eaten cake), because this poses problems more related to verbal clauses.</Paragraph>
      <Paragraph position="3"> a0 Alignment points involving punctuation are stripped from the word alignment. Punctuation is also stripped from the edges of NP/PPs.</Paragraph>
      <Paragraph position="4"> A total of 737,388 NP/PP pairs are collected from the German-English Europarl corpus as training data.</Paragraph>
      <Paragraph position="5"> Certain German NP/PPs consistently do not align to NP/PPs in English (see the example in Section 2.2). These are detected at this point. The obtained data of unaligned NP/PPs can be used for dealing with these special cases.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Base Model
</SectionTitle>
      <Paragraph position="0"> Given the NP/PP corpus, we can use any general statistical machine translation method to train a translation system for noun phrases. As a baseline, we use an IBM Model 4 (Brown et al., 1993) system3 with a greedy decoder4 (Germann et al., 2001).</Paragraph>
      <Paragraph position="1"> We found that phrase based models achieve better translation quality than IBM Model 4. Such models segment the input sequence into a number of (non-linguistic) phrases, translate each phrase using a phrase translation table, and allow for reordering of phrases in the output. No phrases may be dropped or added.</Paragraph>
      <Paragraph position="2"> We use a phrase translation model that extracts its phrase translation table from word alignments generated by the Giza++ toolkit. Details of this model are described by Koehn et al. (2003).</Paragraph>
      <Paragraph position="3"> To obtain an n-best list of candidate translations, we developed a beam search decoder. This decoder employs hypothesis recombination and stores the search states in a search graph - similar to work by Ueffing et al. (2002) - which can be mined with stan- null list for different sizes a2</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.4 Acceptable Translations in the n-Best List
</SectionTitle>
      <Paragraph position="0"> One key question for our approach is how often an acceptable translation can be found in an n-best list.</Paragraph>
      <Paragraph position="1"> The answer to this is illustrated in Figure 3: While an acceptable translation comes out on top for only about 60% of the NP/PPs in our test corpus, one can be found in the 100-best list for over 90% of the NP/PPs6. This means that rescoring has the potential to raise performance by 30%.</Paragraph>
      <Paragraph position="2"> What are the problems with the remaining 10% for which no translation can be found? To investigate this, we carried out an error analysis of these NP/PPs. Results are given in Table 2. The main sources of error are unknown words (34%) or words for which the correct translation does not occur in the training data (14%), and errors during tagging and parsing that lead to incorrectly detected NP/PPs (28%).</Paragraph>
      <Paragraph position="3"> There are also problems with NP/PPs that require complex syntactic restructuring (7%), and NP/PPs that are too long, so an acceptable translation could not be found in the 100-best list, but only further down the list (6%). There are also NP/PPs that cannot be translated as NP/PPs into English (2%), as discussed in Section 2.2.</Paragraph>
    </Section>
    <Section position="5" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.5 Maximum Entropy Reranking
</SectionTitle>
      <Paragraph position="0"> Given an n-best list of candidates and additional features, we transform the translation task from a search problem into a reranking problem, which we address using a maximum entropy approach.</Paragraph>
      <Paragraph position="1"> As training data for finding feature values, we collected a development corpus of 683 NP/PPs. Each  NP/PP comes with an n-best list of candidate translations that are generated from our base model and are annotated with accuracy judgments. The initial features are the logarithm of the probability scores that the model assigns to each candidate translation: the language model score, the phrase translation score and the reordering (distortion) score. The task for the learning method is to find a probability distribution a0a2a1a4a3a6a5 a7a9a8 that indicates if the candidate translation a3 is an accurate translation of the input a7 . The decision rule to pick the best translation</Paragraph>
      <Paragraph position="3"> The development corpus provides the empirical probability distribution by distributing the probability mass over the acceptable translations a14a15a3a17a16a19a18a21a20 :</Paragraph>
      <Paragraph position="5"> lations for a given input a7 is acceptable, we pick the candidates that are closest to reference translations measured by minimum edit distance.</Paragraph>
      <Paragraph position="6"> We use a maximum entropy framework to parametrize this probability distribution as</Paragraph>
      <Paragraph position="8"> are the feature values and the</Paragraph>
      <Paragraph position="10"> weights.</Paragraph>
      <Paragraph position="11"> Since we have only a sample of the possible translations a3 for the given input a7 , we normalize the probability distribution, so that a35  each feature a38 a4 . These expectations are computed as sums over all candidate translations a3 for all inputs a7 : a35a57a56a59a58a61a60a11a21a62 a22a0a63a1a4a7a9a8a64a0a33a32a65a1a4a3a28a5 a7a9a8a21a38 a4a27a1a4a7a40a39a24a3a15a8</Paragraph>
      <Paragraph position="13"> A nice property of maximum entropy training is that it converges to a global optimum. There are a number of methods and tools available to carry out this training of feature values. We use the toolkit7 developed by Malouf (2002). Berger et al. (1996) and Manning and Sch&amp;quot;utze (1999) provide good introductions to maximum entropy learning.</Paragraph>
      <Paragraph position="14"> Note that any other machine learning, such as support vector machines, could be used as well. We chose maximum entropy for its ability to deal with both real-valued and binary features. This method is also similar to work by Och and Ney (2002), who use maximum entropy to tune model parameters.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Properties of NP/PP Translation
</SectionTitle>
    <Paragraph position="0"> We will now discuss the properties of NP/PP translation that we exploit in order to improve our NP/PP translation subsystem. The first of these (compounding of words) is addressed by preprocessing, while the others motivate features which are used in n-best list reranking.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Compound Splitting
</SectionTitle>
      <Paragraph position="0"> Compounding of words, especially nouns, is common in a number of languages (German, Dutch, Finnish, Greek), and poses a serious problem for machine translation: The word Aktionsplan may not be known to the system, but if the word were broken up into Aktion and Plan, the system could easily translate it into action plan, or plan for action.</Paragraph>
      <Paragraph position="1"> The issues for breaking up compounds are: Knowing the morphological rules for joining words, resolving ambiguities of breaking up a word (Hauptsturm a1 Haupt-Turm or Haupt-Sturm), and finding the right level of splitting granularity (Frei-Tag or Freitag).</Paragraph>
      <Paragraph position="2"> Here, we follow an approach introduced by Koehn and Knight (2003): First, we collect frequency statistics over words in our training corpus. Compounds may be broken up only into known words in the corpus. For each potential compound we check if morphological splitting rules allow us to break it up into such known words.</Paragraph>
      <Paragraph position="3"> Finally, we pick a splitting option (perhaps not breaking up the compound at all). This decision is based on the frequency of the words involved.</Paragraph>
      <Paragraph position="5"> The German side of both the training and testing corpus is broken up in this way. The base model is trained on a compound-split corpus, and input is broken up before being passed on to the system.</Paragraph>
      <Paragraph position="6"> This method works especially well with our phrase-based machine translation model, which can recover more easily from too eager or too timid splits than word-based models. After performing this type of compound splitting, hardly any errors occur with respect to compounded words.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Web n-Grams
</SectionTitle>
      <Paragraph position="0"> Generally speaking, the performance of statistical machine translation systems can be improved by better translation modeling (which ensures correspondence between input and output) and language modeling (which ensures fluent English output).</Paragraph>
      <Paragraph position="1"> Language modeling can be improved by different types of language models (e.g., syntactic language models), or additional training data for the language model.</Paragraph>
      <Paragraph position="2"> Here, we investigate the use of the web as a language model. In preliminary studies we found that 30% of all 7-grams in new text can be also found on the web, as measured by consulting the search engine Google8, which currently indexes 3 billion web pages. This is only the case for 15% of 7-grams generated by the base translation system.</Paragraph>
      <Paragraph position="3"> There are various ways one may integrate this vast resource into a machine translation system: By building a traditional n-gram language model, by using the web frequencies of the n-grams in a candidate translation, or by checking if all n-grams in a candidate translation occur on the web.</Paragraph>
      <Paragraph position="4"> We settled on using the following binary features: Does the candidate translation as a whole occur in the web? Do all n-grams in the candidate translation occur on the web? Do all n-grams in the candidate translation occur at least 10 times on the web? We use both positive and negative features for n-grams of the size 2 to 7.</Paragraph>
      <Paragraph position="5"> We were not successful in improving performance by building a web n-gram language model or using  the actual frequencies as features. The web may be too noisy to be used in such a straight-forward way without significant smoothing efforts.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.3 Syntactic Features
</SectionTitle>
      <Paragraph position="0"> Unlike in decoding, for reranking we have the complete candidate translation available. This means that we can define features that address any prop-erty of the full NP/PP translation pair. One such set of features is syntactic features.</Paragraph>
      <Paragraph position="1"> Syntactic features are computed over the syntactic parse trees of both input and candidate translation. For the input NP/PPs, we keep the syntactic parse tree we inherit from the NP/PP detection process. For the candidate translation, we use a part-of-speech tagger and syntactic parser to annotate the candidate translation with its most likely syntactic parse tree.</Paragraph>
      <Paragraph position="2"> We use the following three syntactic features: a0 Preservation of the number of nouns: Plural nouns generally translate as plural nouns, while singular nouns generally translate as singular a0 Preservation of prepositions: base prepositional phrases within NP/PPs generally translate as prepositional phrases, unless there is movement involved. BaseNPs generally translate as baseNPs. German genitive baseNP are treated as basePP.</Paragraph>
      <Paragraph position="3"> a0 Within a baseNP/PP the determiner generally agree in number with the final noun (e.g., not: this nice green flowers).</Paragraph>
      <Paragraph position="4"> The features are realized as integers, i.e., how many nouns did not preserve their number during translation? These features encode relevant general syntactic knowledge about the translation of noun phrases.</Paragraph>
      <Paragraph position="5"> They constitute soft constraints that may be overruled by other components of the system.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML