File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/c04-1031_evalu.xml

Size: 5,147 bytes

Last Modified: 2025-10-06 13:59:04

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1031">
  <Title>Word to word alignment strategies</Title>
  <Section position="5" start_page="65" end_page="65" type="evalu">
    <SectionTitle>
5 Experiments
</SectionTitle>
    <Paragraph position="0"> Several alignment search strategies have been discussed in the previous sections. Our clue aligner implements these strategies in order to test their impact on the alignment performance.</Paragraph>
    <Paragraph position="1"> In the experiments we used one of our English-Swedish bitext from the PLUG corpus (S@agvall Hein, 2002), the novel &amp;quot;To Jerusalem and back: A personal account&amp;quot; by Saul Bellow. This corpus is fairly small (about 170,000 words) and therefore well suited for extensive studies of alignment parameters. For evaluation, a gold standard of 468 manually aligned links is used (Merkel et al., 2002). It includes 122 links with MWUs either on the source or on the target side (= 26% of the gold standard). 109 links contain source language MWUs, 59 links target language MWUs, and 46 links MWUs in both languages. 10 links are null links, i.e. a link of one word to an empty string. Three different clue types are used for the alignment: the Dice coefficient (dice), lexical translation probabilities derived from statistical translation models (giza) using the GIZA++ toolbox (Och and Ney, 2003), and, finally, POS/relative-wordposition-clues learned from previous alignments (pp). Alignment strategies are compared on the basis of three different settings: dice+pp, giza, and giza+pp. In figure 3, the alignment results are shown for the three clue settings using different search strategies as discussed earlier.</Paragraph>
    <Section position="1" start_page="65" end_page="65" type="sub_section">
      <SectionTitle>
5.1 Discussion
</SectionTitle>
      <Paragraph position="0"> Figure 3 illustrates the relation between precision and recall when applying different algorithms. As expected, the intersection of directional alignment strategies yields the highest precision at the expense of recall, which is generally lower than for the other approaches. Contrary to the intersection, the union of directional links produces alignments with the highest recall values but lower precision than all other search algorithms. Too many (partially) incorrect MWUs are included in the union of directional links. The intersection on the other hand includes only one-to-one word links that tend to be correct. However, many links are missed in this strategy evident in the low recall valgiza+pp non-MWU MWU-links (122 in total) English MWU Swedish MWU both  ues. Directional alignment strategies generally yield lower F-values than other refined symmetric alignment strategies. Their implementation is straightforward but the results are highly dependent on the language pair under consideration. The differences between the two alignment directions in our example are surprisingly inconsistent. Using the giza clues both alignment results are very close in terms of precision and recall whereas a larger difference can be observed using the other two clue settings when applying different directional alignment strategies. Competitive linking is somewhat in between the intersection approach and the two symmetric approaches, &amp;quot;best-first&amp;quot; and &amp;quot;refined&amp;quot;. This could also be expected as competitive linking only allows non-overlapping one-to-one word links. The refined bi-directional alignment approach and the constrained best-first approach are almost identical in our examples with a more or less balanced relation between precision and recall. One advantage of the best-first approach is the possibility of incorporating different constraints that suit the current task.</Paragraph>
      <Paragraph position="1"> The adjacency check is just one of the possible constraints. For example, syntactic criteria could be applied in order to force linked items to be complete according to existing syntactic markup. Non-contiguous elements could also be identified using the same approach simply by removing the adjacency constraint. However, this seems to increase the noise significantly according to experiments not shown in this paper.</Paragraph>
      <Paragraph position="2"> Further investigations on optimizing alignment constraints for certain tasks have to be done in the future. Focusing on MWUs, the numbers in table 1 show a clear picture about the difficulties of all approaches to find correct MWU links. Symmetric alignment strategies like refined and best-first produce in general the best results for MWU links. However, the main portion of such links is only partially correct even for these approaches. Using our partiality measure, the intersection of directional alignments still produces the highest precision values when considering MWU links only even though no MWUs are included in these alignments at all. The best results among MWU links are achieved for the ones including MWUs in both languages. However, these results are still significantly lower than for single-word links (non-MWU).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML