File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-3114_concl.xml

Size: 1,168 bytes

Last Modified: 2025-10-06 13:55:48

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-3114">
  <Title>Out-of-domain test set</Title>
  <Section position="7" start_page="109" end_page="109" type="concl">
    <SectionTitle>
5 Conclusions
</SectionTitle>
    <Paragraph position="0"> We carried out an extensive manual and automatic evaluation of machine translation performance on European language pairs. While many systems had similar performance, the results offer interesting insights, especially about the relative performance of statistical and rule-based systems.</Paragraph>
    <Paragraph position="1"> Due to many similarly performing systems, we are not able to draw strong conclusions on the question of correlation of manual and automatic evaluation metrics. The bias of automatic methods in favor of statistical systems seems to be less pronounced on out-of-domain test data.</Paragraph>
    <Paragraph position="2"> The manual evaluation of scoring translation on a graded scale from 1-5 seems to be very hard to perform. Replacing this with an ranked evaluation seems to be more suitable. Human judges also pointed out difficulties with the evaluation of long sentences.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML