XML Viewer - w01-1414

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/w01-1414_metho.xml
Size: 15,835 bytes
Last Modified: 2025-10-06 14:07:45
<?xml version="1.0" standalone="yes"?>
<Paper uid="W01-1414">
  <Title>Adding Domain Specificity to an MT system</Title>
  <Section position="3" start_page="0" end_page="1" type="metho">
    <SectionTitle>
2 Domain Specificity
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Word-Association list
</SectionTitle>
      <Paragraph position="0"> Moore (2001) describes a method for learning translation relationship between words from bilingual corpora. The five step process is  restated here: 1. Extract word lemmas from the Logical Form created by parsing the raw training data.</Paragraph>
      <Paragraph position="1"> 2. Compute association scores for individual lemmas.</Paragraph>
      <Paragraph position="2"> 3. Hypothesize occurrences of compounds in the training data, replacing lemmas constituting hypothesized occurrences of a compound with a single token representing the compound.</Paragraph>
      <Paragraph position="3"> 4. Recompute association scores for compounds and remaining individual individual lemmas.</Paragraph>
      <Paragraph position="4"> 5. Recompute association scores, taking  into account only co-occurrences such that there is no equally strong or stronger association for either item in the aligned logical-form pair.</Paragraph>
      <Paragraph position="5"> The word-association list (WA) was created by applying this method to our training data set of 200,000 aligned French-English sentences of computer manual and help file data. A French linguist determined the best cutoff for the raw data, i.e. determined the association score which would determine the cutoff, and otherwise left the file unedited for inclusion in the transfer training stage. For internal reasons, we used only associations which are conceptually single word to single word, where a single word can be defined as an item returned as one unit by the analyzer, even though it might be a multi-word item in the source text, e.g base_de_donnee &lt;-&gt; database. The files included 30,000 pairs, which in their totality, were judged to be 60%</Paragraph>
      <Paragraph position="7"> The word association file was used only in training (see Figure 2) to enhance the opportunity for alignment during the detection of transfer patterns.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="1" type="sub_section">
      <SectionTitle>
2.2 Title Association list
</SectionTitle>
      <Paragraph position="0"> The second file used was a specialized file created using the same algorithm, but allowing multi-word titles that are all in capitals in English to associate with multiple words in French that have mixed capitalization on major content words. Because these phrases are identified by using capitalization, they are also referred to as Captoids (Moore, 2001). Items such as Organizational Units, which occur with complete capitalization in English, are  The size of of the WA file of 42,486 reported in Moore 2001 includes multiple word associations which were not used in this experiment.</Paragraph>
      <Paragraph position="1"> associated with the French translation, Unites d' organisation, a unit which is less easily identified on its own, due to the mixed case.</Paragraph>
      <Paragraph position="2"> The information yields approximately 2600 pairs of this type: Unites d' organisation &lt;-&gt;</Paragraph>
    </Section>
    <Section position="3" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
Organizational Units
</SectionTitle>
      <Paragraph position="0"> Voir aussi &lt;-&gt; Related Topics This title association file (TA) is used in training of the transfer patterns but are also added to the processing of the French training text; they are treated as multi-word lexical entries similar to any French dictionary entry. They become part of the translation dictionary as well. The inclusion of Voir aussi as a lexical noun phrase at the analysis stage (French) allows it to parse correctly, and permits the correct translation. Many of the occurrences of Title association pairs are menu names which are syntactically verb phrases (Voir aussi)and would have parsed less well without the TA file.</Paragraph>
      <Paragraph position="2"> Source: Pour plus d'informations sur l'utilisation du Gestionnaire de peripheriques, consultez Voir aussi.</Paragraph>
      <Paragraph position="3"> Reference: For more information about using Device Manager, see Related Topics.</Paragraph>
      <Paragraph position="4"> ALL translation: For more information about using of the manager of devices, see Related Topics.</Paragraph>
      <Paragraph position="5"> NONE translation: See for more information on using of the Device Manager; also See. However, the evaluation shows that the overall effect of title associations is much less than that of word associations, presumably because the frequency of these items is low in the overall test set.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="1" end_page="2" type="metho">
    <SectionTitle>
3 Experiment and Methodology
</SectionTitle>
    <Paragraph position="0"> In order to evaluate the relative quality of the translations with and without the word association and title association strategies, we performed several evaluations of machine translation quality. These evaluations were performed by an independent organization that provides support for NL application development; the evaluators are completely independent of development activities.</Paragraph>
    <Paragraph position="1"> We performed two separate sets of evaluations. In the first, we evaluated the full version of our system with the Word Association and Title Association components against versions of the system from which we had removed those components. We thus expected that versions of the system with the WA and TA components would outperform those without.</Paragraph>
    <Paragraph position="2"> In the second evaluation, we tested the versions of our system with and without the WA and TA components against a benchmark system (the latest release of the French-English Systran system, run with settings appropriate for the computer domain) to see whether the addition of the combination of these components would significantly improve our scores with respect to that benchmark.</Paragraph>
    <Section position="1" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
3.1 Evaluation design
</SectionTitle>
      <Paragraph position="0"> For each condition to be tested, seven evaluators were asked to evaluate the same set of 250 blind test sentences. For each sentence, raters were presented with a reference sentence, the original English translation from which the human French translation was derived. In order to maintain consistency among raters who may have different levels of fluency in the source language, raters were not shown the original French sentence (for similar methodologies, see Ringger et al., 2001; White et al., 1993). Raters were also shown two machine translations, one from the system with the component being tested (System 1), and one from the comparison system (System 2). Because the order of the two machine translation sentences was randomized on each sentence, evaluators could not determine which sentence was from System 1.</Paragraph>
      <Paragraph position="1"> The order of presentation of sentences was also randomized for each rater in order to eliminate any ordering effect.</Paragraph>
      <Paragraph position="2"> The raters were asked to make a three-way choice. For each sentence, the raters were to determine which of the two automatically translated sentences was the better translation of the (unseen) source sentence, assuming that the reference sentence was a perfect translation, with the option of choosing &amp;quot;neither&amp;quot; if the differences were negligible. Raters were instructed to use their best judgment about the relative importance of fluency/style and accuracy/content preservation. We chose to use this simple three-way scale in order to avoid making any a priori judgments about the relative importance of these parameters for subjective judgments of quality. The three-way scale also allows sentences to be rated on the same scale, regardless of whether the differences between outputfromsystem1andsystem2were substantial or relatively small; and regardless of whether either version of the system produced an adequate translation.</Paragraph>
      <Paragraph position="3"> The scoring system is similarly simple; each judgment by a rater was represented as 1 (sentence from System 1 judged better), 0 (neither sentence judged better), or -1 (System 2 judged better). The score for each condition is the mean of the scores of all sentences for all raters.</Paragraph>
    </Section>
    <Section position="2" start_page="1" end_page="2" type="sub_section">
      <SectionTitle>
4Results
4.1 Results with multiple versions of our
system
</SectionTitle>
      <Paragraph position="0"> In order to isolate the effects of the WA and TA components on the system as a whole, we built 3  new versions of the system: * NONE: Includes neither TA nor WA. * No TA: Includes WA but not TA.</Paragraph>
      <Paragraph position="1"> * No WA: Includes TA but not WA.</Paragraph>
      <Paragraph position="2">  We evaluated each of these versions of the system against our baseline system (ALL), which contains both the WA and TA components. Our hypothesis was that the removal of each of the two components would cause the experimental systems to significantly underperform the ALL system.</Paragraph>
      <Paragraph position="3"> We evaluated 250 sentences  in each condition in which the output strings for System 1 (ALL) and System 2 (NONE, NoWA, and NoTA, respectively) were not identical. In other words, this analysis shows the amount of improvement between the systems in only those sentences which show any change at all in each condition. For each condition, we calculated the statistical significance of the hypothesis that ALL system is better than the comparison system (e.g. that the score is greater than 0), taking into account both variations in the sentence sample, and variations across the judgments of individual raters.</Paragraph>
      <Paragraph position="4">  The data used for testing is blind, i.e. withheld from development and not included in the training set.  The results show that, for sentences affected by the combination of the WA and TA components, the ALL condition is significantly better than the NONE condition, at a significance level of 0.95. In addition, for sentences affected by the presence of the WA component only, the ALL condition is significantly better than the No WA condition. However, the ALL condition is not significantly better the NoTA condition. Another question of interest is the effect of the experimental components on the corpus as a whole, rather than just on the sentences that changed; it is possible that the effects we found might have become diluted below the significance threshold because of sparsity of the differences across the whole corpus. Rather than do additional evaluations, we determined the proportion of differences in each condition, and extrapolated a larger sample, assuming that sentences which were absolutely identical would receive a score of 0, using the same 250 judgments as in the previous analysis.</Paragraph>
      <Paragraph position="5">  As expected, the results using the projected sample were still positive, though the scores were lower due to the larger sample size. Again, the improvements in the NONE/ALL and NoWA/ALL conditions are significant across the whole data set.</Paragraph>
    </Section>
    <Section position="3" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
4.2 Results against benchmark system
</SectionTitle>
      <Paragraph position="0"> In a second analysis, we tested to see if the experimental changes to the system improved the performance of our system against our regular benchmark. We selected a random sample of 250 sentences, and translated them using first the ALL, and then the NONE, versions of our system. We also translated them using the benchmark system. We predicted that sentences translated using the ALL system would be significantly better than the sentences translated using the NONE system in its performance against the benchmark.</Paragraph>
      <Paragraph position="1">  The difference between these two scores is on the border of significance using a one-tailed paired t-test (p = .051825; t = -1.6334).</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="2" end_page="2" type="metho">
    <SectionTitle>
5 Discussion
</SectionTitle>
    <Paragraph position="0"> The premise of the experiment described here was that pairs of translations which were automatically derived from the training data would increase the number of transfer pairings found and improve the quality of translation.</Paragraph>
    <Paragraph position="1"> The results show that the combination of the word association list and title association list does in fact give us an improvement in quality of translation.</Paragraph>
    <Paragraph position="2"> We have measured the change in size in the transfer database, and found that the database shows increased numbers of transfer patterns retained (transfer patterns seen only once were discarded) when the word association file is used, for instance:  We have found from informal observation that increased number of transfers in the transfer database correlates with better performance, particularly if the translation correspondence includes more than one word.</Paragraph>
    <Paragraph position="3"> Whereas the WA and TA files have been judged elsewhere on the quality of the translation pairs themselves (Moore 2001), we are primarily interested in whether the data interacts in a positive way with a full-scale automatic alignment process. The result might appear disappointing at first glance, since it is barely significant. However, our experience is that a gain of .04 against the benchmark represents a noticeable difference in quality translation from the user's perspective.</Paragraph>
    <Paragraph position="4"> It is important to note as well that this result was achieved even in the presence of a sizeable translation dictionary. We found that the combination of the bilingual dictionary and the structural mapping in the alignment process had already enabled a number of &amp;quot;domain specific&amp;quot; translation correspondences, e.g. journal &lt;-&gt; log as in example (2) below. In a sense, the alignment algorithm had been able to overcome some domain specific lexical gaps on its own.</Paragraph>
    <Paragraph position="5"> The evaluation results give us a number of illustrations of improved transfer patterns. The only difference between the output categorized as NONE and the output categorized as ALL is the use of a transfer database trained with both the WA and TA files included.</Paragraph>
    <Paragraph position="6"> (2) Source: Le tableau ci-dessous explique la fonction des differentes options disponibles dans l'onglet Journal des transactions de la boite de dialogue Proprietes de la base de donnees.</Paragraph>
    <Paragraph position="7"> Reference: This table shows the options and their functions available on the Transaction Log tab of the Database Properties dialog box.</Paragraph>
    <Paragraph position="8"> ALL translation: The table explains the function of different options available in the Transaction Log tab of the dialog Properties box of the database below.</Paragraph>
    <Paragraph position="9"> NONE translation: The table explains the function of different options available in the tab transactions Log of the dialog Properties box of the database below.</Paragraph>
    <Paragraph position="10"> The pattern which caused the improvement is the correspondence (Journal des transactions</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML