XML Viewer - w05-0814

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-0814_metho.xml
Size: 8,062 bytes
Last Modified: 2025-10-06 14:09:55
<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0814">
  <Title>ISI's Participation in the Romanian-English Alignment Task</Title>
  <Section position="5" start_page="91" end_page="91" type="metho">
    <SectionTitle>
3 Vocabulary Size Reduction
</SectionTitle>
    <Paragraph position="0"> Romanian is a Romance language which has a system of suffixes for inflection which is richer than English. Given the small amount of training data, we decided that vocabulary size reduction was desirable. As a baseline for vocabulary reduction, we tried reducing words to prefixes of varying sizes for both English and Romanian after lowercasing the corpora. We also tried Porter stemming (Porter, 1997) for English.</Paragraph>
    <Paragraph position="1"> (Rogati et al., 2003) extended Model 1 with an additional hidden variable to represent the split points in Arabic between the prefix, the stem and the suffix to generate a stemming for use in Cross-Lingual Information Retrieval. As in (Rogati et al., 2003), we can find the most probable stemming given the model, apply this stemming, and retrain our word alignment system. However, we can also use the modified model directly to find the best word alignment without converting the text to its stemmed form.</Paragraph>
    <Paragraph position="2"> We introduce a variable rj for the Romanian stem and a variable sj for the Romanian suffix (which when concatenated together give us the Romanian word fj) into the formula for the probability of generating a Romanian word fj using an alignment aj given only an English sentence e. We use the index z to denote a particular stemming possibility. For a given Romanian word the stemming possibilities are simply every possible split point where the stem is at least one character (this includes the null suffix).</Paragraph>
    <Paragraph position="4"> If the assumption is made that the stem and the suffix are generated independently from e, we can assume conditional independence.</Paragraph>
    <Paragraph position="6"> We performed two sets of experiments, one set where the English was stemmed using the Porter stemmer and one set where each English word was stemmed down to its first four characters. We tried the best performing scoring heuristic for Arabic from (Rogati et al., 2003) where p(sj,z, aj|e) is modeled using the heuristic p(sj,z|lj) where sj,z is the Romanian suffix, and lj is the last letter of the Romanian word fj; these adjustments are updated during EM training. We also tried several other approximations of p(sj,z, aj|e) with and without updates in EM training. We were unable to produce better results and elected to use the baseline vocabulary reduction technique for the shared task.</Paragraph>
  </Section>
  <Section position="6" start_page="91" end_page="92" type="metho">
    <SectionTitle>
4 New Model and Training Algorithm
</SectionTitle>
    <Paragraph position="0"> Our motivation for a new model and a new training approach which combines likelihood maximization with error rate minimization is threefold:  alignments, but we have very few labels We create a new model and train it using an algorithm which has a step which increases likelihood (like one iteration in the EM algorithm), alternating with a step which decreases error. We accomplish this by: * grouping the parameters of Model 4 into 5 sub-models null * implementing 6 new submodels * combining these into a single log-linear model with 11 weights, l1 to l11, which we group into the vector l * defining a search algorithm for finding the alignment of highest probability given the sub-models and l * devising a method for finding a l which minimizes alignment error given fixed submodels and a set of gold standard alignments * inventing a training method for alternating steps which estimate the submodels by increasing likelihood with steps which set l to decrease alignment error The submodels in our new alignment model are listed in table 1, where for ease of exposition we</Paragraph>
  </Section>
  <Section position="7" start_page="92" end_page="92" type="metho">
    <SectionTitle>
6 TTABLE ESTIMATED FROM INTERSECTION OF TWO STARTING ALIGNMENTS FOR THIS ITERATION
7 TRANSLATION TABLE FROM ENGLISH TO ROMANIAN MODEL 1 ITERATION 5
8 TRANSLATION TABLE FROM ROMANIAN TO ENGLISH MODEL 1 ITERATION 5
9 BACKOFF FERTILITY (FERTILITY ESTIMATED OVER ALL ENGLISH WORDS)
10 ZERO FERTILITY ENGLISH WORD PENALTY
11 NON-ZERO FERTILITY ENGLISH WORD PENALTY
</SectionTitle>
    <Paragraph position="0"> consider English to be the source language and Romanian the target language.</Paragraph>
    <Paragraph position="1"> The log-linear alignment model is specified by equation 3. The model assigns non-zero probabilities only to 1-to-many alignments, like Model 4. (Cettolo and Federico, 2004) used a log-linear model trained using error minimization for the translation task, 3 of the submodels were taken from Model 4 in a similar way to our first 5 submodels.</Paragraph>
    <Paragraph position="3"> Given l, the alignment search problem is to find the alignment a of highest probability according to equation 3. We solve this using the local search defined in (Brown et al., 1993).</Paragraph>
    <Paragraph position="4"> We set l as follows. Given a sequence A of alignments we can calculate an error function, E(A). For these experiments average sentence AER was used.</Paragraph>
    <Paragraph position="5"> We wish to minimize this error function, so we select l accordingly:  pl(a, f|e))) (4) Maximizing performance for all of the weights at once is not computationally tractable, but (Och, 2003) has described an efficient one-dimensional search for a similar problem. We search over each lm (holding the others constant) using this technique to find the best lm to update and the best value to update it to. We repeat the process until no further gain can be found.</Paragraph>
    <Paragraph position="6"> Our new training method is:</Paragraph>
  </Section>
  <Section position="8" start_page="92" end_page="92" type="metho">
    <SectionTitle>
REPEAT
</SectionTitle>
    <Paragraph position="0"> criminative training set (new D-step) We use the first 148 sentences of the 2003 test set for the discriminative training set. 10 settings for l are found, the hypothesis list is augmented using the results of 10 searches using these settings, and then another 10 settings for l are found. We then select the best l. The discriminative training regimen is otherwise similar to (Och, 2003).</Paragraph>
  </Section>
  <Section position="9" start_page="92" end_page="93" type="metho">
    <SectionTitle>
5 Experiments
</SectionTitle>
    <Paragraph position="0"> Table 2 provides a comparison of our baseline systems using the &amp;quot;refined&amp;quot; symmetrization metric with the best limited resources track system from WPT03 (Dejean et al., 2003) on the 2003 test set. The best results are obtained by stemming both English and Romanian words to the first four letters, as described in section 2.</Paragraph>
    <Paragraph position="1"> Table 3 provides details on our shared task submission. RUN1 is the word-based baseline system.</Paragraph>
    <Paragraph position="2"> RUN2 is the stem-based baseline system. RUN4 uses only the first 6 submodels, while RUN5 uses all 11 submodels. RUN3 had errors in processing, so we omit it.</Paragraph>
    <Paragraph position="3"> Results: * Our new 1-to-many alignment model and training method are successful, producing decreases of 0.03 AER when the source is Romanian, and 0.01 AER when the source is English.</Paragraph>
    <Paragraph position="4">  provement in the end-to-end task of producing many-to-many alignments with a balanced precision and recall. We had a very small decrease of 0.002 AER using the &amp;quot;refined&amp;quot; heuristic. * The many-to-many alignments produced using &amp;quot;union&amp;quot; and the 1-to-1 alignments produced using &amp;quot;intersection&amp;quot; were also improved. * It may be a problem that we trained p0 using likelihood (it is in submodel 3) rather than optimizing p0 discriminatively as we did for the baseline.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML