File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/w03-0306_evalu.xml

Size: 2,202 bytes

Last Modified: 2025-10-06 13:58:59

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0306">
  <Title>Word Alignment Baselines</Title>
  <Section position="7" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
7 Results
</SectionTitle>
    <Paragraph position="0"> Table 1 shows results of the explored methods on the trial data, ordered by degree of supervision and AER on the Romanian-English dataset. The biased coin random aligner is indicated as random and the final punctuation aligner is fpunct. The classifier based on relative length is len. The three edit distance measures are exact match (exact), edit distance (wedit), and lower-case edit distance (lcedit). The geometric measures are word distance to the diagonal (wdiag), distance to the character diagonal, (cdiag), and distance from the character box made by the word pair to the character diagonal, (cbox).</Paragraph>
    <Paragraph position="1"> The aligners that take advantage of the training data are below the first horizontal line inside the table. freqratio is the classifier based on the relative frequency of the two tokens, P(L|R) aligns words in the LHS with words from the RHS that are often collocated in the training sentences, and the reverse for P(R|L). The bag-ofdocuments distance classifier is evaluated in bos.</Paragraph>
    <Paragraph position="2"> The two supervised fusion methods are presented in the final two lines of the file: the binary nearest neighbor rule based on the classification output of the aligners (bnnrule), and the nearest neighbor rule based on the distances produced by the aligners (nnrule). Both of these results are leave-one-out estimates of performance from the trial set. Note that there is incomplete dominance: the binary representation was superior for English-French and the distance representation was superior for Romanian-English.</Paragraph>
    <Paragraph position="3"> Table 2 shows results of the explored methods on the test data. The presented order is the same as the order in Table 1. None of the results varied widely from observations on the trial dataset, suggesting that none of the classifiers were drastically overtrained in the course of optimization on the trial data.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML