File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/w05-0825_evalu.xml

Size: 1,740 bytes

Last Modified: 2025-10-06 13:59:34

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0825">
  <Title>A Generalized Alignment-Free Phrase Extraction</Title>
  <Section position="6" start_page="143" end_page="143" type="evalu">
    <SectionTitle>
7 Experimental Results
</SectionTitle>
    <Paragraph position="0"> Our system is based on the IBM Model-4 parameters. We train IBM Model 4 with a scheme of 1720h73043 using GIZA++ (Och and Ney, 2003).</Paragraph>
    <Paragraph position="1"> The maximum fertility for an English word is 3. All the data is used as given, i.e. we do not have any preprocessing of the English-French data. The word alignment provided in the workshop is not used in our evaluations. The language model is provided by the workshop, and we do not use other language models.</Paragraph>
    <Paragraph position="2"> The French phrases up to 8-gram in the development and test sets are extracted with top-3 candidate English phrases. There are in total 2.6 million phrase pairs 1 extracted for both development set and the unseen test set. We did minimal tuning of the parameters in the pharaoh decoder (Koehn, 2004) settings, simply to balance the length penalty for Bleu score. Most of the weights are left as they are given: [ttable-limit]=20, [ttable-threshold]=0.01, 1Our phrase table is to be released to public in this workshop [stack]=100, [beam-threshold]=0.01, [distortionlimit]=4, [weight-d]=0.5, [weight-l]=1.0, [weightw]=-0.5. Table 1 shows the algorithm's performance on several settings for the seven basic scores provided in section 6.</Paragraph>
    <Paragraph position="3">  In Table 1, setting s1 was our submission without using the inverse relative frequency of Prf(ei+ki |fj+lj ). s2 is using all the seven scores.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML