XML Viewer - w06-1608

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-1608_intro.xml
Size: 10,512 bytes
Last Modified: 2025-10-06 14:03:58
<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1608">
  <Title>The impact of parse quality on syntactically-informed statistical machine translation</Title>
  <Section position="4" start_page="0" end_page="63" type="intro">
    <SectionTitle>
2 Background
</SectionTitle>
    <Paragraph position="0"> We trained statistical machine translation systems on technical text. In the following sections we provide background on the data used for training, the dependency parsing framework used to produce treelets, the treelet translation framework and salient characteristics of the target languages.</Paragraph>
    <Section position="1" start_page="0" end_page="62" type="sub_section">
      <SectionTitle>
2.1 Dependency parsing
</SectionTitle>
      <Paragraph position="0"> Dependency analysis is an alternative to constituency analysis (Tesni`ere, 1959; MelVcuk, 1988).</Paragraph>
      <Paragraph position="1"> In a dependency analysis of syntax, words directly modify other words, with no intervening non-lexical nodes. We use the terms child node and parent node to denote the tokens in a dependency relation. Each child has a single parent, with the lexical root of the sentence dependent on a synthetic ROOT node.</Paragraph>
      <Paragraph position="2"> We use the parsing approach described in (Corston-Oliver et al., 2006). The parser is trained on dependencies extracted from the English Penn Treebank version 3.0 (Marcus et al., 1993) by using the head-percolation rules of (Yamada and Matsumoto, 2003).</Paragraph>
      <Paragraph position="3"> Given a sentence x, the goal of the parser is to find the highest-scoring parse ^y among all possible</Paragraph>
      <Paragraph position="5"> The score of a given parse y is the sum of the</Paragraph>
      <Paragraph position="7"> where the link (i, j) indicates a parent-child dependency between the token at position i and the token at position j. The score d(i, j) of each dependency link (i, j) is further decomposed as the weighted sum of its features f(i, j).</Paragraph>
      <Paragraph position="8"> The feature vector f(i, j) computed for each possible parent-child dependency includes the part-of-speech (POS), lexeme and stem of the parent and child tokens, the POS of tokens adjacent to the child and parent, and the POS of each token that intervenes between the parent and child.</Paragraph>
      <Paragraph position="9"> Various combinations of these features are used, for example a new feature is created that combines the POS of the parent, lexeme of the parent, POS of the child and lexeme of the child. Each feature is also conjoined with the direction and distance of the parent, e.g. does the child precede or follow the parent, and how many tokens intervene? To set the weight vector w, we train twenty averaged perceptrons (Collins, 2002) on different shuffles of data drawn from sections 02-21 of the Penn Treebank. The averaged perceptrons are then combined to form a Bayes Point Machine (Herbrich et al., 2001; Harrington et al., 2003), resulting in a linear classifier that is competitive with wide margin techniques.</Paragraph>
      <Paragraph position="10"> To find the optimal parse given the weight vector w and feature vector f(i, j) we use the decoder described in (Eisner, 1996).</Paragraph>
    </Section>
    <Section position="2" start_page="62" end_page="62" type="sub_section">
      <SectionTitle>
2.2 Treelet translation
</SectionTitle>
      <Paragraph position="0"> For syntactically-informed translation, we follow the treelet translation approach described in (Quirk et al., 2005). In this approach, translation is guided by treelet translation pairs. Here, a treelet is a connected subgraph of a dependency tree. A treelet translation pair consists of a source treelet S, a target treelet T , and a word alignment A [?] SxT such that for all s [?] S, there exists a unique t [?]T such that (s,t)[?]A, and if t is the root of T , there is a unique s [?] S such that (s,t)[?] A.</Paragraph>
      <Paragraph position="1"> Translation of a sentence begins by parsing that sentence into a dependency representation.</Paragraph>
      <Paragraph position="2"> This dependency graph is partitioned into treelets; like (Koehn et al., 2003), we assume a uniform probability distribution over all partitions. Each source treelet is matched to a treelet translation pair; together, the target language treelets in those treelet translation pairs will form the target translation. Next the target language treelets are joined to form a single tree: the parent of the root of each treelet is dictated by the source. Let tr be the root of the target language treelet, and sr be the source node aligned to it. If sr is the root of the source sentence, then tr is made the root of the target language tree. Otherwise let sp be the parent of sr, and tp be the target node aligned to sp: tr is attached to tp. Finally the ordering of all the nodes is determined, and the target tree is specified, and the target sentence is produced by reading off the labels of the nodes in order.</Paragraph>
      <Paragraph position="3"> Translations are scored according to a log-linear combination of feature functions, each scoring different aspects of the translation process. We use a beam search decoder to find the best translation T[?] according to the log-linear combination of models:</Paragraph>
      <Paragraph position="5"> The models include inverted and direct channel models estimated by relative frequency, lexical weighting channel models following (Vogel et al., 2003), a trigram target language model using modified Kneser-Ney smoothing (Goodman, 2001), an order model following (Quirk et al., 2005), and word count and phrase count functions. The weights for these models are determined using the method described in (Och, 2003).</Paragraph>
      <Paragraph position="6"> To estimate the models and extract the treelets, we begin from a parallel corpus. First the corpus is word-aligned using GIZA++ (Och and Ney, 2000), then the source sentence are parsed, and finally dependencies are projected onto the target side following the heuristics described in (Quirk et al., 2005). This word aligned parallel dependency tree corpus provides training material for an order model and a target language tree-based language model. We also extract treelet translation pairs from this parallel corpus. To limit the combinatorial explosion of treelets, we only gather treelets that contain at most four words and at most two gaps in the surface string. This limits the number of mappings to be O(n3) in the worst case, where n is the number of nodes in the dependency tree.</Paragraph>
    </Section>
    <Section position="3" start_page="62" end_page="63" type="sub_section">
      <SectionTitle>
2.3 Language pairs
</SectionTitle>
      <Paragraph position="0"> In the present paper we focus on English-to-German and English-to-Japanese machine transla- null you can set this property using Visual Basic Sie konnen diese Eigenschaft auch mit Visual Basic festlegen  from English in ways that we believe illuminate well the strengths of a syntactically-informed SMT system. We provide a brief sketch of the linguistic characteristics of German and Japanese relevant to the present study.</Paragraph>
      <Paragraph position="1">  Although English and German are closely related - they both belong to the western branch of the Germanic family of Indo-European languages - the languages differ typologically in ways that are especially problematic for current approaches to statistical machine translation as we shall now illustrate. We believe that these typological differences make English-to-German machine translation a fertile test bed for syntax-based SMT. German has richer inflectional morphology than English, with obligatory marking of case, number and lexical gender on nominal elements and person, number, tense and mood on verbal elements. This morphological complexity, combined with pervasive, productive noun compounding is problematic for current approaches to word alignment (Corston-Oliver and Gamon, 2004).</Paragraph>
      <Paragraph position="2"> Equally problematic for machine translation is the issue of word order. The position of verbs is strongly determined by clause type. For example, in main clauses in declarative sentences, finite verbs occur as the second constituent of the sentence, but certain non-finite verb forms occur in final position. In Figure 1, for example, the English &amp;quot;can&amp;quot; aligns with German &amp;quot;k&amp;quot;onnen&amp;quot; in second position and &amp;quot;set&amp;quot; aligns with German &amp;quot;festlegen&amp;quot; in final position.</Paragraph>
      <Paragraph position="3"> Aside from verbs, German is usually characterized as a &amp;quot;free word-order&amp;quot; language: major constituents of the sentence may occur in various orders, so-called &amp;quot;separable prefixes&amp;quot; may occur bound to the verb or may detach and occur at a considerable distance from the verb on which they depend, and extraposition of various kinds of subordinate clause is common. In the case of extraposition, for example, more than one third of relative clauses in human-translated German technical text are extraposed. For comparable English text the figure is considerably less than one percent (Gamon et al., 2002).</Paragraph>
      <Paragraph position="4">  Word order in Japanese is rather different from English. English has the canonical constituent order subject-verb-object, whereas Japanese prefers subject-object-verb order. Prepositional phrases in English generally correspond to postpositional phrases in Japanese. Japanese noun phrases are strictly head-final whereas English noun phrases allow postmodifiers such as prepositional phrases, relative clauses and adjectives. Japanese has little nominal morphology and does not obligatorily mark number, gender or definiteness. Verbal morphology in Japanese is complex with morphological marking of tense, mood, and politeness. Topicalization and subjectless clauses are pervasive, and problematic for current SMT approaches.</Paragraph>
      <Paragraph position="5"> The Japanese sentence in Figure 1 illustrates several of these typological differences. The sentence-initial imperative verb &amp;quot;move&amp;quot; in the English corresponds to a sentence-final verb in the Japanese. The Japanese translation of the object noun phrase &amp;quot;the camera slider switch&amp;quot; precedes the verb in Japanese. The English preposition &amp;quot;to&amp;quot; aligns to a postposition in Japanese.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML