File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-3119_metho.xml

Size: 5,873 bytes

Last Modified: 2025-10-06 14:11:01

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-3119">
  <Title>Syntax Augmented Machine Translation via Chart Parsing</Title>
  <Section position="3" start_page="138" end_page="138" type="metho">
    <SectionTitle>
2 Rule Generation
</SectionTitle>
    <Paragraph position="0"> We start with phrase translations on the parallel training data using the techniques and implementation described in (Koehn et al., 2003a). This phrase table provides the purely lexical entries in the final hierarchical rule set that will be used in decoding.</Paragraph>
    <Paragraph position="1"> We then use Charniak's parser (Charniak, 2000) to generate the most likely parse tree for each English target sentence in the training corpus. Next, we determine all phrase pairs in the phrase table whose source and target side occur in each respective source and target sentence pair defining the scope of the initial rules in our SynCFG.</Paragraph>
    <Paragraph position="2"> Annotation If the target side of any of these initial rules correspond to a syntactic category C of the target side parse tree, we label the phrase pair with that syntactic category. This label corresponds to the left-hand side of our synchronous grammar. Phrase pairs that do not correspond to a span in the parse tree are given a default category &amp;quot;X&amp;quot;, and can still play a role in the decoding process. In work done after submission to the 2006 data track, we assign such phrases an extended category of the form C1 + C2, C1/C2, or C2\C1, indicating that the phrase pair's target side spans two adjacent syntactic categories (e.g., she went: NP+V), a partial syntactic category C1 missing a C2 to the right (e.g., the great: NP/NN), or a partial C1 missing a C2 to the left (e.g., great wall: DT\NP), respectively.</Paragraph>
    <Paragraph position="3"> Generalization In order to mitigate the effects of sparse data when working with phrase and n-gram models we would like to generate generalized phrases, which include non-terminal symbols that can be filled with other phrases. Therefore, after annotating the initial rules from the current training sentence pair, we adhere to (Chiang, 2005) to recursively generalize each existing rule; however, we abstract on a per-sentence basis. The grammar extracted from this evaluation's training data contains 75 nonterminals in our standard system, and 4000 nonterminals in the extended-category system.</Paragraph>
    <Paragraph position="4"> Figure 1 illustrates the annotation and generalization process.</Paragraph>
  </Section>
  <Section position="4" start_page="138" end_page="139" type="metho">
    <SectionTitle>
3 Scoring
</SectionTitle>
    <Paragraph position="0"> We employ a log-linear model to assign costs to the SynCFG. Given a source sentence f, the preferred translation output is determined by computing the lowest-cost derivation (combination of hierarchical and glue rules) yielding f as its source side, where the cost of a derivation R1 *****Rn with respective feature vectors v1,...,vn [?] Rm is given by</Paragraph>
    <Paragraph position="2"> Here, l1,...,lm are the parameters of the log-linear model, which we optimize on a held-out portion of the training set (2005 development data) using minimum-error-rate training (Och, 2003). We use the following features for our rules:</Paragraph>
  </Section>
  <Section position="5" start_page="139" end_page="139" type="metho">
    <SectionTitle>
4 Parsing
</SectionTitle>
    <Paragraph position="0"> Our SynCFG rules are equivalent to a probabilistic context-free grammar and decoding is therefore an application of chart parsing. Instead of the common method of converting the CFG grammar into Chomsky Normal Form and applying a CKY algorithm to produce the most likely parse for a given source sentence, we avoided the explosion of the rule set caused by the introduction of new non-terminals in the conversion process and implemented a variant of the CKY+ algorithm as described in (J.Earley, 1970).</Paragraph>
    <Paragraph position="1"> Each cell of the parsing process in (J.Earley, 1970) contains a set of hypergraph nodes (Huang and Chiang, 2005). A hypergraph node is an equivalence class of complete hypotheses (derivations) with identical production results (left-hand sides of the corresponding applied rules). Complete hypotheses point directly to nodes in their backwards star, and the cost of the complete hypothesis is calculated with respect to each back pointer node's best cost.</Paragraph>
    <Paragraph position="2"> This structure affords efficient parsing with minimal pruning (we use a single parameter to restrict the number of hierarchical rules applied), but sacrifices effective management of unique language model states contributing to significant search errors during parsing. At initial submission time we simply re-scored a K-Best list extracted after first best parsing using the lazy retrieval process in (Huang and Chiang, 2005).</Paragraph>
    <Paragraph position="3"> Post-submission After our workshop submission, we modified the K-Best list extraction process to integrate an n-gram language model during K-Best extraction. Instead of expanding each derivation (complete hypothesis) in a breadth-first fashion, we expand only a single back pointer, and score this new derivation with its translation model scores and a language model cost estimate, consisting of an accurate component, based on the words translated so far, and an estimate based on each remaining (not expanded) back pointer's top scoring hypothesis.</Paragraph>
    <Paragraph position="4"> To improve the diversity of the final K-Best list, we keep track of partially expanded hypotheses that have generated identical target words and refer to the same hypergraph nodes. Any arising twin hypothesis is immediately removed from the K-Best extraction beam during the expansion process.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML