XML Viewer - w06-1628

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1628_metho.xml
Size: 21,089 bytes
Last Modified: 2025-10-06 14:10:46
<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1628">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics A Discriminative Model for Tree-to-Tree Translation</Title>
  <Section position="5" start_page="233" end_page="236" type="metho">
    <SectionTitle>
3 A Translation Architecture Based on
Aligned Extended Projections
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="233" end_page="234" type="sub_section">
      <SectionTitle>
3.1 Background: Extended Projections (EPs)
</SectionTitle>
      <Paragraph position="0"> Extended projections (EPs) play a crucial role in the lexicalized tree adjoining grammar (LTAG) (Joshi, 1985) approach to syntax described by Frank (2002). In this paper we focus almost exclusively on extended projections associated with main verbs; note, however, that EPs are typically associated with all content words (nouns, adjectives, etc.). As an example, a parse tree for the sentence we know that the main obstacle has been the predictable resistance of manufacturers would make use of EPs for the words we, know, main, obstacle, been, predictable, resistance, and manufacturers. Function words (in this sentence that, the, has, and of) do not have EPs; instead, as we describe shortly, each function word is incorporated in an EP of some content word.</Paragraph>
      <Paragraph position="1"> Figure 1 has examples of EPs. Each one is an LTAG elementary tree which contains a sin- null gle content word as one of its leaves. Substitution nodes (such as NP-A or SBAR-A) in the elementary trees specify the positions of arguments of the content words. Each EP may contain one or more function words that are associated with the content word. For verbs, these function words include items such as modal verbs and auxiliaries (e.g., should and has); complementizers (e.g., that); and wh-words (e.g., which). For nouns, function words include determiners and prepositions.</Paragraph>
      <Paragraph position="2"> Elementary trees corresponding to EPs form the basic units in the LTAG approach described by Frank (2002). They are combined to form a full parse tree for a sentence using the TAG operations of substitution and adjunction. For example, the EP for been in Figure 1 can be substituted into the SBAR-A position in the EP for know; the EP for obstacle can be substituted into the subject position of the EP for been.</Paragraph>
    </Section>
    <Section position="2" start_page="234" end_page="236" type="sub_section">
      <SectionTitle>
3.2 Aligned Extended Projections (AEPs)
</SectionTitle>
      <Paragraph position="0"> We now build on the idea of extended projections to give a detailed description of AEPs. Figure 2 shows examples of German clauses paired with the AEPs found in training data.2 The German clause is assumed to have n (where n [?] 0) modifiers. For example, the first German parse in Figure 2 has two arguments, indexed as 1 and 2. Each of these modifiers must either have a translation in the corresponding English clause, or must be deleted.</Paragraph>
      <Paragraph position="1"> An AEP consists of the following parts: STEM: A string specifying the stemmed form of the main verb in the clause.</Paragraph>
      <Paragraph position="2"> SPINE: A syntactic structure associated with the main verb. The structure has the symbol V as one of its leaf nodes; this is the position of the main verb. It includes higher projections of the verb such as VPs, Ss, and SBARs. It also includes leaf nodes NP-A in positions corresponding to noun-phrase arguments (e.g., the subject or object) of the main verb. In addition, it may contain leaf nodes labeled with categories such as WHNP or WHADVP where a wh-phrase may be placed. It may include leaf nodes corresponding to one or more complementizers (common examples being that, if, so that, and so on).</Paragraph>
      <Paragraph position="3"> VOICE: One of two alternatives, active or passive, specifying the voice of the main verb.</Paragraph>
      <Paragraph position="4"> 2Note that in this paper we consider translation from German to English; in the remainder of the paper we take English to be synonymous with the target language in translation and German to be synonymous with the source language.</Paragraph>
      <Paragraph position="5"> SUBJECT: This variable can be one of three types. If there is no subject position in the SPINE variable, then the value for SUBJECT is NULL.</Paragraph>
      <Paragraph position="6"> Otherwise, SUBJECT can either be a string, for example there,3 or an index of one of the n modifiers in the German clause.</Paragraph>
      <Paragraph position="7"> OBJECT: This variable is similar to SUBJECT, and can also take three types: NULL, a specific string, or an index of one of the n German modifiers. It is always NULL if there is no object position in theSPINE; it can never be a modifier index that has already been assigned to SUBJECT.</Paragraph>
      <Paragraph position="8"> WH: This variable is always NULL if there is no wh-phrase position within the SPINE; it is always a non-empty string (such as which, or in which) if a wh-phrase position does exist.</Paragraph>
      <Paragraph position="9"> MODALS: This is a string of verbs that constitute the modals that appear within the clause. We use NULL to signify an absence of modals.</Paragraph>
      <Paragraph position="10"> INFL: The inflected form of the verb.</Paragraph>
      <Paragraph position="11"> MOD(i): There are n modifier variables MOD(1), MOD(2), ..., MOD(n) that specify the positions for German arguments that have not already been assigned to the SUBJECT or  OBJECT positions in the spine. Each variable MOD(i) can take one of five possible values: * null: This value is chosen if and only if the modifier has already been assigned to the subject or object position.</Paragraph>
      <Paragraph position="12"> * deleted: This means that a translation of the i'th German modifier is not present in the English clause.</Paragraph>
      <Paragraph position="13"> * pre-sub: The modifier appears after any complementizers or wh-phrases, but before the subject of the English clause.</Paragraph>
      <Paragraph position="14"> * post-sub: The modifier appears after the subject of the English clause, but before the modals.</Paragraph>
      <Paragraph position="15"> * in-modals: The modifier appears after the  between the German clause and its English translation is not entirely direct. The subject in the English is the expletive there; the subject in the German clause becomes the object in English. This is a typical pattern for the German verb bestehen. The German PP zwischen ... appears at the start of the clause in German, but is post-verbal in the English. The modifier also--whose English translation is so--is in an intermediate position in the German clause, but appears in the pre-subject position in the English clause.</Paragraph>
      <Paragraph position="16"> 4 Extracting AEPs from a Corpus A crucial step in our approach is the extraction of training examples from a translation corpus.</Paragraph>
      <Paragraph position="17"> Each training example consists of a German clause paired with an English AEP (see Figure 2).</Paragraph>
      <Paragraph position="18"> In our experiments, we used the Europarl corpus (Koehn, 2005). For each sentence pair from this data, we used a version of the German parser described by Dubey (2005) to parse the German component, and a version of the English parser described by Collins (1999) to parse the English component. To extract AEPs, we perform the following steps: NP and PP Alignment To align NPs and PPs, first all German and English nouns, personal and possessive pronouns, numbers, and adjectives are identified in each sentence and aligned using GIZA++ (Och and Ney, 2003). Next, each NP in an English tree is aligned to an NP or PP in the corresponding German tree in a way that is consistent with the word-alignment information. That is, the words dominated by the English node must be aligned only to words dominated by the German node, and vice versa. Note that if there is more than one German node that is consistent, then the one rooted at the minimal subtree is selected.</Paragraph>
      <Paragraph position="19"> Clause alignment, and AEP Extraction The next step in the training process is to identify German/English clause pairs which are translations of each other. We first break each English or German parse tree into a set of clauses; see Appendix A for a description of how we identify clauses. We retain only those training examples where the English and German sentences have the same number of clauses. For these retained examples, define the English sentence to contain the clause sequence &lt;e1,e2,...,en&gt; , and the German sentence to contain the clause sequence &lt;g1,g2,...,gn&gt; . The clauses are ordered according to the position of their main verbs in the original sentence. We create n candidate pairs &lt;(e1,g1),(e2,g2),...,(en,gn)&gt; (i.e., force a one-to-one correspondence between the two clause sequences). We then discard any clause pairs (e,g) which are inconsistent with the NP/PP alignments for that sentence.4 4A clause pair is inconsistent with the NP/PP alignments if it contains an NP/PP on either the German or English side which is aligned to another NP/PP which is not within the clause pair.</Paragraph>
      <Paragraph position="20">  Note that this method is deliberately conservative (i.e., high precision, but lower recall), in that it discards sentence pairs where the English/German sentences have different numbers of clauses. In practice, we have found that the method yields a large number of training examples, and that these training examples are of relatively high quality.</Paragraph>
      <Paragraph position="21"> Future work may consider improved methods for identifying clause pairs, for example methods that make use of labeled training examples.</Paragraph>
      <Paragraph position="22"> An AEP can then be extracted from each clause pair. The EP for the English clause is first extracted, giving values for all variables except for SUBJECT, OBJECT, and MOD(1), . . . , MOD(n). The values for the SUBJECT, OBJECT, and MOD(i) variables are derived from the alignments between NPs/PPs, and an alignment of other clauses (ADVPs, ADJPs, etc.) derived from GIZA++ alignments. If the English clause has a subject or object which is not aligned to a German modifier, then the value for SUBJECT or OBJECT is taken to be the full English string.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="236" end_page="238" type="metho">
    <SectionTitle>
5 The Model
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="236" end_page="236" type="sub_section">
      <SectionTitle>
5.1 Beam search and the perceptron
</SectionTitle>
      <Paragraph position="0"> In this section we describe linear history-based models with beam search, and the perceptron algorithm for learning in these models. These methods will form the basis for our model that maps German clauses to AEPs.</Paragraph>
      <Paragraph position="1"> We have a training set of n examples, (xi,yi) for i = 1...n, where each xi is a German parse tree, and each yi is an AEP. We follow previous work on history-based models, by representing each yi as a series of N decisions &lt;d1,d2,...dN&gt; . In our approach, N will be a fixed number for any input x: we take the N decisions to correspond to the sequence of variables STEM, SPINE, ..., MOD(1), MOD(2), ..., MOD(n) described in section 3. Each di is a member of a set Di which specifies the set of allowable decisions at the i'th point (for example, D2 would be the set of all possible values for SPINE). We assume a function ADVANCE(x,&lt;d1,d2,...,di[?]1&gt; ) which maps an input x together with a prefix of decisions d1 ...di[?]1 to a subset ofDi. ADVANCE is a function that specifies which decisions are allowable for a past history &lt;d1,...,di[?]1&gt; and an input x. In our case the ADVANCE function implements hard constraints on AEPs (for example, the constraint that the SUBJECT variable must be NULL if no subject position exists in the SPINE). For any input x, a well-formed decision sequence for x is a sequence &lt;d1,...,dN&gt; such that for i = 1...n, di [?] ADVANCE(x,&lt;d1,...,di[?]1&gt; ). We define GEN(x) to be the set of all decision sequences (or AEPs) which are well-formed for x.</Paragraph>
      <Paragraph position="2"> The model that we will use is a discriminatively-trained, feature-based model. A significant advantage to feature-based models is their flexibility: it is very easy to sensitize the model to dependencies in the data by encoding new features. To define a feature-based model, we assume a function -ph(x,&lt;d1,...,di[?]1&gt; ,di) [?] Rd which maps a decision di in context (x,&lt;d1,...,di[?]1&gt; ) to a feature vector. We also assume a vector -a [?]Rd of parameter values. We define the score for any partial or complete decision sequence y = &lt;d1,d2,...,dm&gt;</Paragraph>
      <Paragraph position="4"> In particular, given the definitions above, the output structure F(x) for an input x is the highest-scoring well-formed structure for x:</Paragraph>
      <Paragraph position="6"> To decode with the model we use a beam-search method. The method incrementally builds an AEP in the decision order d1,d2,...,dN. At each point, a beam contains the top M highest-scoring partial paths for the first m decisions, where M is taken to be a fixed number. The score for any partial path is defined in Eq. 1. The ADVANCE function is used to specify the set of possible decisions that can extend any given path in the beam. To train the model, we use the averaged perceptron algorithm described by Collins (2002).</Paragraph>
      <Paragraph position="7"> This combination of the perceptron algorithm with beam-search is similar to that described by Collins and Roark (2004).5 The perceptron algorithm is a convenient choice because it converges quickly -usually taking only a few iterations over the training set (Collins, 2002; Collins and Roark, 2004).</Paragraph>
    </Section>
    <Section position="2" start_page="236" end_page="238" type="sub_section">
      <SectionTitle>
5.2 The Features of the Model
</SectionTitle>
      <Paragraph position="0"> The model's features allow it to capture dependencies between the AEP and the German clause, as well as dependencies between different parts of the AEP itself. The features included in -ph 5Future work may consider alternative algorithms, such as those described by Daum'e and Marcu (2005).</Paragraph>
      <Paragraph position="1">  features in the AEP prediction model.</Paragraph>
      <Paragraph position="2"> can consist of any function of the decision history &lt;d1,...,di[?]1&gt; , the current decision di, or the German clause. In defining features over AEP/clause pairs, we make use of some basic functions which look at the German clause and the AEP (see Tables 1 and 2). We use various combinations of these basic functions in the prediction of each decision di, as described below.</Paragraph>
      <Paragraph position="3"> STEM: Features for the prediction of STEM conjoin the value of this variable with each of the functions in lines 1-13 of Table 1. For example, one feature is the value of STEM conjoined with the main verb of the German clause. In addition, -ph includes features sensitive to the rank of a candidate stem in an externally-compiled lexicon.6 SPINE: Spine prediction features make use of the values of the variables SPINE and STEM from the AEP, as well as functions of the spine in lines 1-7 of Table 2, conjoined in various ways with the functions in lines 4, 12, and 14-21 of Table 1. Note that the functions in Table 2 allow us to look  tures in the AEP prediction model.</Paragraph>
      <Paragraph position="4"> at substructure in the spine. For instance, one of the features for SPINE is the label SBARQ or SQ, if it exists in the candidate spine, conjoined with a verbal preterminal label if there is a verb in the first position of the German clause. This feature captures the fact that German yes/no questions begin with a verb in the first position.</Paragraph>
      <Paragraph position="5"> VOICE: Voice features in general combine values of VOICE, SPINE, and STEM, with the functions in lines 1-5, 22, and 23 of Table 1.</Paragraph>
      <Paragraph position="6"> SUBJECT: Features used for subject prediction make use of the AEP variables VOICE and STEM.</Paragraph>
      <Paragraph position="7"> In addition, if the value of SUBJECT is an index i (see section 3), then -ph looks at the nonterminal label of the German node indexed by i as well as the surrounding context in the German clausal tree. Otherwise, -ph looks at the value ofSUBJECT. These basic features are combined with the functions in lines 1, 3, and 24-27 of Table 1.</Paragraph>
      <Paragraph position="8"> OBJECT: We make similar features to those for the prediction of SUBJECT. In addition, -ph can look at the value predicted for SUBJECT.</Paragraph>
      <Paragraph position="9"> WH: Features for WH look at the values of WH and SPINE, conjoined with the functions in lines 1, 15, and 19 of Table 1.</Paragraph>
      <Paragraph position="10"> MODALS: For the prediction of MODALS, -ph looks at MODALS, SPINE, and STEM, conjoined with the functions in lines 2-5 and 12 of Table 1. INFL: The features for INFL include the values of INFL, MODALS, and SUBJECT, and VOICE, and the function in line 8 of Table 2.</Paragraph>
      <Paragraph position="11"> MOD(i): For the MOD(i) variables, -ph looks at the value of MODALS, SPINE and the current MOD(i), as well as the nonterminal label of the root node of the German modifier being placed, and the functions in lines 24 and 28 of Table 1.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="238" end_page="238" type="metho">
    <SectionTitle>
6 Deriving Full Translations
</SectionTitle>
    <Paragraph position="0"> As we described in section 1.1, the translation of a full German sentence proceeds in a series of steps: a German parse tree is broken into a sequence of clauses; each clause is individually translated; and finally, the clause-level translations are combined to form the translation for a full sentence. The first and last steps are relatively straightforward. We now show how the second step is achieved--i.e., how AEPs can be used to derive English clause translations from German clauses.</Paragraph>
    <Paragraph position="1"> We will again use the following translation pair as an example: dass das haupthemmnis der vorhersehbare widerstand der hersteller war./that the main obstacle has been the predictable resistance of manufacturers.</Paragraph>
    <Paragraph position="2"> First, an AEP like the one at the top of Figure 2 is predicted. Then, for each German modifier which does not have the value deleted, an English translation is predicted. In the example, the modifiers das haupthemmnis and der vorhersehbare widerstand der hersteller would be translated to the main obstacle, and the predictable resistance of manufacturers, respectively.</Paragraph>
    <Paragraph position="3"> A number of methods could be used for translation of the modifiers. In this paper, we use the phrase-based system of Koehn et al. (2003) to generate n-best translations for each of the modifiers, and we then use a discriminative reranking algorithm (Bartlett et al., 2004) to choose between these modifiers. The features in the reranking model can be sensitive to various properties of the candidate English translation, for example the words, the part-of-speech sequence or the parse tree for the string. The reranker can also take into account the original German string. Finally, the features can be sensitive to properties of the AEP, such as the main verb or the position in which the modifier appears (e.g., subject, object, pre-sub, post-verb, etc.) in the English clause. See Appendix B for a full description of the features used in the modifier translation model. Note that the reranking stage allows us to filter translation candidates which do not fit syntactically with the position in the English tree. For example, we can parse the members of the n-best list, and then learn a feature which strongly disprefers prepositional phrases if the modifier appears in subject position.</Paragraph>
    <Paragraph position="4"> Finally, the full string is predicted. In our example, the AEP variables SPINE, MODALS, and INFL in Figure 2 give the ordering &lt;that SUBJECT has been OBJECT&gt;. The AEP and modifier translations would be combined to give the final English string. In general, any modifiers assigned to pre-sub, post-sub, in-modals or post-verb are placed in the corresponding position within the spine. For example, the second AEP in Figure 2 has a spine with ordering &lt;SUBJECT are OBJECT&gt;; modifiers 1 and 2 would be placed in positions pre-sub and post-verb, giving the ordering &lt;MOD2 SUBJECT are OBJECT MOD1&gt;. Note that modifiers assigned post-verb are placed after the object. If multiple modifiers appear in the same position (e.g., post-verb), then they are placed in the order seen in the original German clause.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML