File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/p04-3005_concl.xml

Size: 1,113 bytes

Last Modified: 2025-10-06 13:54:11

<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-3005">
  <Title>Customizing Parallel Corpora at the Document Level</Title>
  <Section position="9" start_page="0" end_page="0" type="concl">
    <SectionTitle>
7 Conclusions
</SectionTitle>
    <Paragraph position="0"> We have examined the issue of selecting appropriate training resources for cross-lingual information retrieval. We have proposed and evaluated a simple method for creating a customized parallel corpus from other available parallel corpora by matching the domain of the test documents with that of individual parallel documents. We noticed that choosing the largest collection, using all resources available without weights, and even choosing a large collection in the medical domain are all sub-optimal strategies. The techniques we have presented here are not restricted to CLIR and can be applied to other areas where parallel corpora are necessary, such as statistical machine translation. The trained translation matrix can also be reused and can be converted to any of the formats required by such applications.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML