File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/p04-3014_metho.xml

Size: 4,094 bytes

Last Modified: 2025-10-06 14:09:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-3014">
  <Title>Improving Bitext Word Alignments via Syntax-based Reordering of English</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 System
</SectionTitle>
    <Paragraph position="0"> Figure 1 shows the system architecture. We start by running the Collins parser (Collins, 1999) on the English side of both training and testing data, and apply our source-language-specific heuristics to the  AP: appositional (prepositional or postpositional) phrases with Apposition and Object, and NP: noun phrases withNoun andAdjective orRelative clause. Chinese has both prepositions and postpositions.</Paragraph>
    <Paragraph position="1"> resulting trees. This yields Englishprime text, along with traces recording correspondences between Englishprime words and the English originals. We use GIZA++ (Och and Ney, 2000) to align the Englishprime with the source language text, yielding alignments in terms of the Englishprime. Finally, we use the traces to map these alignments to the original English words.</Paragraph>
    <Paragraph position="2"> Figure 2 shows an illustrative Hindi-English sentence pair, with true word alignments, and parse-tree over the English sentence. Although it is only a short sentence, the large number of crossing alignments clearly show the high-degree of reordering, and especially long-distance motion, caused by the syntactic divergences between Hindi and English.</Paragraph>
    <Paragraph position="3"> Figure 3 shows the same sentence pair after English has been transformed into Englishprime by our system. Tree nodes whose children have been reordered  are marked by a subtended arc. Crossings have been eliminated, and the alignment is now monotonic.</Paragraph>
    <Paragraph position="4"> Table 1 shows the basic word order of three major phrase types for each of the languages we treated. In each case, our heuristics transform the English trees to achieve these same word orders. For the Chinese case, we apply several more language-specific transformations. Because Chinese has both prepositions and postpositions, we retain the original preposition and add an additional bracketing postposition. We also move verb modifiers other than noun phrases to the left of the head verb.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="35" type="metho">
    <SectionTitle>
4 Experiments
</SectionTitle>
    <Paragraph position="0"> For each language we treated, we assembled sentence-aligned, tokenized training and test corpora, with hand-annotated gold-standard word alignments for the latter1. We did not apply any sort of morphological analysis beyond basic word tokenization. We measured system performance with wa eval align.pl, provided by Rada Mihalcea and Ted Pedersen.</Paragraph>
    <Paragraph position="1"> Each training set provides the aligner with information about lexical affinities and reordering patterns. For Hindi, Korean and Chinese, we also tested our system under the more difficult situation of having only a bilingual word list but no bitext available. This is a plausible low-resource language scenario  of sentence pairs, mean length of English sentences, and correlation r2 between English and source-language normalized word positions in gold-standard data, for direct and Englishprime situations.</Paragraph>
    <Paragraph position="2"> and a test of the ability of the system to take sole responsibility for knowledge of reordering.</Paragraph>
    <Paragraph position="3"> Table 3 describes the test sets and shows the correlation in gold standard aligned word pairs between the position of the English word in the English sentence and the position of the source-language word in the source-language sentence (normalizing the positions to fall between 0 and 1). The baseline (direct) correlations give quantitative evidence of differing degrees of syntactic divergence with English, and the Englishprime correlations demonstrate that our heuristics do have the effect of better fitting source language word order.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML