File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/01/w01-1406_relat.xml

Size: 10,502 bytes

Last Modified: 2025-10-06 14:15:36

<?xml version="1.0" standalone="yes"?>
<Paper uid="W01-1406">
  <Title>A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora</Title>
  <Section position="4" start_page="0" end_page="0" type="relat">
    <SectionTitle>
5 Experiments and Results
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.1 Evaluation methodology
</SectionTitle>
      <Paragraph position="0"> In the evaluation process, we found that various evaluation metrics of alignment in isolation bore very little relationship to the quality of the translations produced by a system that used the results of such alignment. Since it is the overall translation quality that we care about, we use the output quality (as judged by humans) of the MT system incorporating the transfer mappings produced by an alignment algorithm (keeping all other aspects of the system constant) as the metric for that algorithm.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.2 Translation system
</SectionTitle>
      <Paragraph position="0"> Our translation system (Richardson, 2001) begins by parsing an input sentence and obtaining a logical form. We then search the transfer mappings acquired during alignment, for mappings that match portions of the input LF. We prefer larger (more specific) mappings to smaller (more general) mappings. Among mappings of equal size, we prefer higher-frequency mappings. We allow overlapping mappings that do not conflict. The lemmas in any portion of the LF not covered by a transfer mapping are translated using the same bilingual dictionary employed during alignment, or by a handful of hard-coded transfer rules (see Section</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.7 for a discussion of the contribution made by
</SectionTitle>
      <Paragraph position="0"> each of these components). Target LF fragments from matched transfer mappings and default dictionary translations are stitched together to form an output LF. From this, a rule-based generation component produces an output sentence.</Paragraph>
      <Paragraph position="1"> The system provides output for every input sentence. Sentences for which spanning parses are not found are translated anyway, albeit with lower quality.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.3 Training corpus
</SectionTitle>
      <Paragraph position="0"> We use a sentence-aligned Spanish-English training corpus consisting of 208,730 sentence pairs mostly from technical manuals. The data was already aligned at the sentence-level since it was taken from sentence-level translation memories created by human translators using a commercial translation-memory product. This data was parsed and aligned at the sub-sentence level by our system, using the techniques described in this paper. Our parser produces a parseineverycase,butineachlanguage roughly 15% of the parses produced are &amp;quot;fitted&amp;quot; or non-spanning. Since we have a relatively large training corpus, we apply a conservative heuristic and only use in alignment those sentence-pairs that produced spanning parses in both languages. In this corpus 161,606 pairs (or 77.4% of the corpus) were used. This is a substantially larger training corpus than those used in previous work on learning transfer mappings from parsed data. Table-1 presents some data on the mappings extracted from this corpus using Best-First.</Paragraph>
      <Paragraph position="1"> Total Sentence pairs 208,730 Sentence pairs used 161,606 Number of transfer mappings 1,202,828 Transfer mappings per pair 7.48 Num. unique transfer mappings 437,479 Num. unique after elim. conflicts 369,067 Num. unique with frequency &gt; 1 58,314</Paragraph>
    </Section>
    <Section position="5" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.4 Experiments
</SectionTitle>
      <Paragraph position="0"> In each experiment we used 5 human evaluators in a blind evaluation, to compare the translations produced by the test system with those produced by a comparison system. Evaluators were presented, for each sentence, with a reference human translation and with the two machine translations in random order, but not the original source language sentence. They were asked to pick the better overall translation, taking into account both content and fluency. They were allowed to choose &amp;quot;Neither&amp;quot; if they considered both translations equally good or equally bad.</Paragraph>
      <Paragraph position="1"> All the experiments were run with our Spanish-English system. The test sentences were randomly chosen from unseen data from the same domain. Experiment-1 used 200 sentences and each sentence was evaluated by all raters.</Paragraph>
      <Paragraph position="2"> Sentences were rated better for one system or the other if a majority of the raters agreed.</Paragraph>
      <Paragraph position="3"> Experiments 2-4 used 500 sentences each, but each sentence was rated by a single rater.</Paragraph>
      <Paragraph position="4"> In each experiment, the test system was the system described in section 5.2, loaded with transfer mappings acquired using the techniques described in this paper (hereafter &amp;quot;Best-First&amp;quot;).</Paragraph>
    </Section>
    <Section position="6" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.5 Comparison systems
</SectionTitle>
      <Paragraph position="0"> In the first experiment the comparison system is a highly rated commercial system, Babelfish (http://world.altavista.com).</Paragraph>
      <Paragraph position="1"> Each of the next three experiments varies some key aspect of Best-First in order to explore the properties of the algorithm.</Paragraph>
      <Paragraph position="2">  Experiment-2 compares Best-First to the previous algorithm we employed, which used a bottom-up approach, similar in spirit to that used by Meyers (1998a).</Paragraph>
      <Paragraph position="3"> This algorithm follows the procedure described in section 3.1 to establish tentative lexical correspondences. However, it does not use an alignment grammar, and relies on a bottom-up rather than a best-first strategy. It starts by aligning the leaf nodes and proceeds upwards, aligning nodes whose child nodes have already aligned. Nodes that do not align are skipped over, and later rolled-up with ancestor nodes that have successfully aligned.</Paragraph>
      <Paragraph position="4">  Experiment-3 uses a comparison algorithm that differs from Best First in that it retains no context (see section 4.1) when emitting transfer mappings.</Paragraph>
      <Paragraph position="5">  The comparison algorithm used in Experiment-4 differs from Best First in that the frequency threshold (see section 4.2.1) is not applied, i.e. all transfer mappings are retained.</Paragraph>
    </Section>
    <Section position="7" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.6 Discussion
</SectionTitle>
      <Paragraph position="0"> The results of the four experiments are presented in Table-2.</Paragraph>
      <Paragraph position="1"> Experiment-1 establishes that the algorithm presented in this paper automatically acquires translation knowledge of sufficient quantity and quality as to enable translations that exceed the quality of a highly rated traditional MT system. Note however that Babelfish/Systran was not customized to this domain.</Paragraph>
      <Paragraph position="2"> Experiment-2 shows that Best-First produces transfer mappings resulting in significantly better translations than Bottom-Up. Using Best-First produced better translations for a net of 22.6% of the sentences.</Paragraph>
      <Paragraph position="3"> Experiment-3 shows that retaining sufficient context in transfer mappings is crucial to translation quality, producing better translations for a net of 23.6% of the sentences.</Paragraph>
      <Paragraph position="4"> Experiment-4 shows that the frequency threshold hurts translation quality slightly (a net loss of 2%), but as Table-3 shows it results in a much smaller (approx. 6 times) and faster</Paragraph>
    </Section>
    <Section position="8" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.7 Transfer mapping coverage
</SectionTitle>
      <Paragraph position="0"> Using end-to-end translation quality as a metric for alignment leaves open the question of how much of the translation quality derives from alignment versus other sources of translation knowledge in our system, such as the bilingual dictionary, or the 2 hand-coded transfer rules in our system. To address this issue we measured the contribution of each using a 3264-sentence test set. Table-4 presents the results. The first column indicates the total number of words in each category. The next four columns indicate the percentage translated using each knowledge source, and the percentage not translated respectively.</Paragraph>
      <Paragraph position="1"> As the table shows, the vast majority of content words get translated using transfermappings obtained via alignment.</Paragraph>
      <Paragraph position="2"> Our alignment algorithm does not explicitly attempt to learn transfer mappings for pronouns, but pronouns are sometimes included in transfer mappings when they form part of the context that is included with each mapping (see section 4.1). The 31.89% of pronoun translations that the table indicates as coming from alignment fall into this category.</Paragraph>
      <Paragraph position="3"> Our algorithm does try to learn transfer mappings for prepositions and conjunctions, which are represented in the Logical Form as labels on arcs (see Figure-1). Mappings for prepositions and conjunctions always include the nodes on both ends of this arc. These mappings may translate a preposition in the source language to a preposition in the target language, or to an entirely different relationship, such as direct object, indirect object, modifier etc.</Paragraph>
      <Paragraph position="4"> As the table shows, the system is currently less successful at learning transfer mappings for prepositions and conjunctions than it is for content words.</Paragraph>
      <Paragraph position="5"> As a temporary measure we have 2 hand-coded transfer rules that apply to prepositions, which account for 8.4% of such transfers. We intend for these to eventually be replaced by mappings learned from the data.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML