XML Viewer - p04-3014

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/p04-3014_concl.xml
Size: 1,953 bytes
Last Modified: 2025-10-06 13:54:09
<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-3014">
  <Title>Improving Bitext Word Alignments via Syntax-based Reordering of English</Title>
  <Section position="7" start_page="35" end_page="35" type="concl">
    <SectionTitle>
6 Conclusions
</SectionTitle>
    <Paragraph position="0"> We have developed a system to improve the performance of bitext word alignment between English and a source language by first reordering parsed English into an order more closely resembling that 1Hindi training: news text from the LDC for the 2003 DARPA TIDES Surprise Language exercise; Hindi testing: news text from Rebecca Hwa, then at the University of Maryland; Hindi dictionary: The Hindi-English Dictionary, v. 2.0 from IIIT (Hyderabad) LTRC; Korean training: Unbound Bible; Korean testing: half from Penn Korean Treebank and half from Universal declaration of Human Rights, aligned by Woosung Kim at the Johns Hopkins University; Korean dictionary: EngDic v. 4; Chinese training: news text from FBIS; Chinese testing: Penn Chinese Treebank news text aligned by Rebecca Hwa, then at the University of Maryland; Chinese dictionary: from the LDC; Romanian training and testing: (Mihalcea and Pedersen, 2003).</Paragraph>
    <Paragraph position="1"> of the source language, based only on knowledge of the coarse basic word order of the source language, such as can be obtained from any cross-linguistic survey of languages, and requiring no parsing of the source language. We applied the system to the task of aligning English with Hindi, Korean, Chinese and Romanian. Performance improvement is greatest for Hindi and Korean, which exhibit longer-distance constituent reordering with respect to English. These properties suggest the proposed Englishprime word alignment method can be an effective approach for word alignment to languages with both greater cross-linguistic word-order divergence and an absence of available parsers.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML