File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/w04-3228_evalu.xml

Size: 3,813 bytes

Last Modified: 2025-10-06 13:59:22

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-3228">
  <Title>Dependencies vs. Constituents for Tree-Based Alignment</Title>
  <Section position="6" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
5 Discussion
</SectionTitle>
    <Paragraph position="0"> The constituent-based version of the alignment model significantly outperforms the dependency-based model. The IBM models outperform the constituent tree-to-tree model to a lesser degree, with tree-to-tree achieving higher recall, and IBM higher precision. It is particularly significant that the tree-based model gets higher recall than the other models, since it is limited to one-to-one alignments unless the clone operation is used, bounding the recall it can achieve.</Paragraph>
    <Paragraph position="1"> In order to better understand the differences between the constituent and dependency representations of our data, we analyzed how well the two representations match our hand annotated alignment data. We looked for consistently aligned pairs of constituents in the two parse trees. By consistently aligned, we mean that all words within the English constituent are aligned to words inside the Chinese constituent (if they are aligned to anything), and vice versa. In our example in Figure 1, the NP &amp;quot;14 Chinese border cities&amp;quot; and the Chinese subject NP &amp;quot;Zhongguo shisi ge bianjing kaifang chengshi&amp;quot; are consistenly aligned, but the PP &amp;quot;in economic construction&amp;quot; has no consistently aligned constituent in the Chinese sentence. We found that of the 2623 constituents in our English parse trees (not counting unary consituents, which have the same boundaries as their children), for 1044, or 40%, there exists some constituent in the Chinese parse tree that is consistently aligned. This confirms the results of Fox (2002) and Galley et al. (2004) that many translation operations must span more than one parse tree node. For each of our consistently aligned pairs, we then found the head word of both the Chinese and English constituents according to our head rules.</Paragraph>
    <Paragraph position="2"> The two head words correspond in the annotated alignments 67% of the time (700 out of 1044 consistently aligned constituent pairs). While the headswapping operation of our translation model will be able to handle some cases of differing heads, it can only do so if the two heads are adjacent in both tree structures.</Paragraph>
    <Paragraph position="3"> Our system is trained and test on automatically generated parse trees, which may contribute to the mismatches in the tree structures. As our test data was taken from the Chinese Treebank, hand-annotated parse trees were available for the Chinese, but not the English, sentences. Running the analysis on hand-annotated Chinese trees found slightly better English/Chinese agreement overall, but there were still disagreements in the head words choices for a third of all consistently aligned constuent pairs. Running our alignment system on gold standard trees did not improve results. The comparison between parser output and gold standard trees is summarized in Table 3.</Paragraph>
    <Paragraph position="4"> We used head rules developed for statistical parsers in both languages, but other rules may be better suited to the alignment task. For example, the tensed auxiliary verb is considered the head of English progressive and perfect verb phrases, rather than the present or past particple of the main verb.</Paragraph>
    <Paragraph position="5"> Such auxiliaries carry agreement information relevant to parsing, but generally have no counterpart in Chinese. A semantically oriented dependency structure, such as Tree Adjoining Grammar derivation trees, may be more appropriate for alignment.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML