File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/91/p91-1017_metho.xml

Size: 15,868 bytes

Last Modified: 2025-10-06 14:12:49

<?xml version="1.0" standalone="yes"?>
<Paper uid="P91-1017">
  <Title>Two Languages Are More Informative Than One *</Title>
  <Section position="3" start_page="131" end_page="132" type="metho">
    <SectionTitle>
2 The Linguistic Model
</SectionTitle>
    <Paragraph position="0"> The ambiguity of a word is determined by the number of distinct, non-equivalent representations into which the word can be mapped (Van Eynde et al., 1982). In the case of machine translation the ambiguity of a source word is thus given by the number of target representations for that word in the bilingual lexicon of the translation system. Given a specific syntactic context the ambiguity can be reduced to the number of alternatives which may appear in that context. For instance, if a certain translation of a verb corresponds to an intransitive occurrence of that verb, then this possibility is eliminated when the verb occurs with a direct object. In this work we are interested only in those ambiguities that are left after applying all the deterministic syntactic constraints. null For example, consider the following Hebrew sentence, taken from the daily Haaretz, September 1990:  (4) Diplomatim svurim ki hitztarrfuto shell Hon Sun magdila et ha.sikkuyim l-hassagat hitqaddmut ba-sihot.</Paragraph>
    <Paragraph position="1">  Here, the ambiguous words in translation to English are 'magdila', 'hitqaddmut' and 'sih_ot'. To facilitate the reading, we give the translation of the sentence to English, and in each case of an ambiguous selection all the alternatives are listed within curly brackets, the first alternative being the correct one.  (5) Diplomats believe that the joining of Hon Sun { increases I enlarges I magnifies } the chances for achieving { progress \[ advance I advancement } in the { talks I conversations I calls }.</Paragraph>
    <Paragraph position="2"> We use the term a lezical relation to denote the cooccurrence relation of two (or possibly more) specific words in a sentence, having a certain syntactic relationship between them. Typical relations are between verbs and their subjects, objects, complements, adverbs and modifying prepositional phrases. Similarly, nouns are related also with their objects, with their modifying nouns in compounds and with their modifying adjectives and prepositional phrases. The relational representation of a sentence is simply the list of all lexical relations that occur in the sentence. For our purpose, the relational representation contains only those relations that involve at least one ambiguous word. The relational representation for example (4) is given in (6) (for readability we represent the Hebrew word by its English equivalent, prefixed by 'H' to denote the fact that it is a Hebrew word):  (6) a. (subj-verb: H-joining H-increase) b. (verb-obj: H-increase H-chance) c. (verb-obj: H-achieve H-progress) d. (noun-pp: H-progress H-in H-talks)  The relational representation of a source sentence is reflected also in its translation to a target sentence. In some cases the relational representation of the target sentence is completely equivalent to that of the source sentence, and can be achieved just by substituting the source words with target words. In other cases, the mapping between source and target relations is more complicated, as is the case for the following German example: (7) Der Tisch gefaellt mir. -- I like the table.</Paragraph>
    <Paragraph position="3"> Here, the original subject of the source sentence becomes the object in the target sentence. This kind of mapping usually influences the translation process and is therefore encoded in components of the translation program, either explicitly or implicitly, especially in transfer based systems. Our model assumes that such a mapping of source language relations to target language relations is possible, an assumption that is valid for many practical cases.</Paragraph>
    <Paragraph position="4"> When applying the mapping of relations on one lexicai relation of the source sentence we get several alternatives for a target relation. For instance, applying the mapping to example (6-c) we get three alternatives for the relation in the target sentence: (8) (verb-obj: achieve progress) (verb-obj: achieve advance) (verb-obj: achieve advancement) For example (6-d) we get 9 alternatives, since both 'H-progress' and 'H-talks' have three alternative translations.</Paragraph>
    <Paragraph position="5"> In order to decide which alternative is the most probable, we count the frequencies of all the alternative target relations in very large corpora. For example (8) we got the counts 20, 5 and 1 respectively. Similarly, the target relation 'to increase chance' was counted 20 times, while the other alternatives were not observed at all. These counts are given as input to the statistical model described in the next section, which performs the actual target word selection.</Paragraph>
  </Section>
  <Section position="4" start_page="132" end_page="133" type="metho">
    <SectionTitle>
3 The Statistical Model
</SectionTitle>
    <Paragraph position="0"> Our selection algorithm is based on the following statistical model. Consider first a single relation. The linguistic model provides us with several alternatives as in example (8). We assume that each alternative has a theoretical probability Pi to be appropriate for this case. We wish to select the alternative for which Pi is maximal, provided that it is significantly larger than the others.</Paragraph>
    <Paragraph position="1"> We have decided to measure this significance by the odds ratio of the two most probable alternatives</Paragraph>
    <Paragraph position="3"> probabilities, therefore we get a bound for p using the frequencies of the alternatives in the corpus.</Paragraph>
    <Paragraph position="4"> Let/3 i be the probabilities as observed in the corpus (101 = ni/n, where ni is the number of times that alternative i appeared in the corpus and n is the total number of times that all the alternatives for the relation appeared in the corpus).</Paragraph>
    <Paragraph position="5"> For mathematical convenience we bound In p instead of p. Assuming that samples of the alternative relations are distributed normally, we get the following bound with confidence 1 - a: where Z is the eonfidenee coefficient. We approximate the variance by the delta method (e.g. Johnson and Wichern (1982)):</Paragraph>
    <Paragraph position="7"> We denote the right hand side (the bound) by B~,(nl, n2).</Paragraph>
    <Paragraph position="8"> In sentences with several relations, we consider the best two alternatives for each relation, and take the relation for which B,, is largest. If this Ba is less than a specified threshold then we do not choose between the alternatives. Otherwise, we choose the most frequent alternative to this relation and select the target words appearing in this alternative. We then eliminate all the other alternative translations for the selected words, and accordingly eliminate all the alternatives for the remaining relations which involve these translations. In addition we update the observed probabilities for the remaining relations, and consequently the remaining Ba's. This procedure is repeated until all target words have been determined or the maximal Ba is below the threshold.</Paragraph>
    <Paragraph position="9"> The actual parameters we have used so far were c~ = 0.05 and the bound for Bawas -0.5.</Paragraph>
    <Paragraph position="10"> To illustrate the selection algorithm, we give the details for example (6). The highest bound for the odds ratio (Ba = 1.36) was received for the relation 'increase-chance', thus selecting the translation 'increase' for 'H-increase'. The second was Ba = 0.96,  for 'achieve-progress'. This selected the translations 'achieve' and 'progress', while eliminating the other senses of 'H-progress' in the remaining relations. Then, for the relation 'progress-in-talks' we got Ba = 0.3, thus selecting the appropriate translation for 'H-talks'.</Paragraph>
  </Section>
  <Section position="5" start_page="133" end_page="133" type="metho">
    <SectionTitle>
4 The Experiment
</SectionTitle>
    <Paragraph position="0"> An experiment was conducted to test the performance of the statistical model in translation from Hebrew and German to English. Two sets of paragraphs were extracted randomly from current Hebrew and German press. The Hebrew set contained 10 paragraphs taken from foreign news sections, while the German set contained 12 paragraphs of text not restricted to a specific topic.</Paragraph>
    <Paragraph position="1"> Within these paragraphs we have (manually) identified the target word selection ambiguities, using a bilingual dictionary. Some of the alternative translations in the dictionary were omitted if it was judged that they will not be considered by an actual component of a machine translation program. These cases included very rare or archaic translations (that would not be contained in an MT lexicon) and alternatives that could be eliminated using syntactic knowledge (as explained in section 2) 2 . For each of the remaining alternatives, it was judged if it can serve as an acceptable translation in the given context. This a priori judgment was used later to decide whether the selection of the automatic procedure is correct. As a result of this process, the Hebrew set contained 105 ambiguous words (which had at least one unacceptable translation) and the German set 54 ambiguous words.</Paragraph>
    <Paragraph position="2"> Now it was necessary to identify the lexical relations within each of the sentences. As explained before, this should be done using a source language parser, and then mapping the source relations to the target relations. At this stage of the research, we still do not have the necessary resources to perform the entire process automatically s, therefore we have approximated it by translating the sentences into English and extracting the lexical relations using the English Slot Grammar (ESG) parser (mc2Due to some technicalities, we have also restricted the experiment to cases in which all the relevant translations of a word consists exactly one English word, which is the most frequent situaticm.</Paragraph>
    <Paragraph position="3"> awe are currently integrating this process within GSG (German Slot Gr~nmm') and LMT-GE (the Germs~a to English MT prototype).</Paragraph>
    <Paragraph position="4"> Cord, 1989) 4. Using this parser we have classified the lexical relations to rather general classes of syntactic relations, based on the slot structure of ESG. The important syntactic relations used were between a verb and its arguments and modifiers (counting as one class all objects, indirect objects, complements and nouns in modifying prepositional phrases) and between a noun and its arguments and modifiers (counting as one class all noun objects, modifying nouns in compounds and nouns in modifying prepositional phrases). The success of using this general level of syntactic relations indicates that even a rough mapping of source to target language relations would be useful for the statistical model.</Paragraph>
    <Paragraph position="5"> The statistics for the alternative English relations in each sentence were extracted from three corpora: The Washington Post articles (about 40 million words), Associated Press news wire (24 million) and the Hansard corpus of the proceedings of the Canadian Parliament (85 million words). The statistics were extracted only from sentences of up to 25 words (to facilitate parsing) which contained altogether about 55 million words. The lexical relations in the corpora were extracted by ESG, in the same way they were extracted for the English version of the example sentences (see Dagan and Itai (1990a) for a discussion on using an automatic parser for extracting lexical relations from a corpus, and for the technique of acquiring the statistics). The parser failed to produce any parse for about 35% of the sentences, which further reduced the actual size of the corpora which was used.</Paragraph>
  </Section>
  <Section position="6" start_page="133" end_page="134" type="metho">
    <SectionTitle>
5 Evaluation
</SectionTitle>
    <Paragraph position="0"> Two measurements, applicability and precision, are used to evaluate the performance of the statistical model. The applicability denotes the proportion of cases for which the model performed a selection, i.e.</Paragraph>
    <Paragraph position="1"> those cases for which the bound Bapassed the threshold. The precision denotes the proportion of cases for which the model performed a correct selection out of all the applicable cases.</Paragraph>
    <Paragraph position="2"> We compare the precision of the model to that of the &amp;quot;word frequencies&amp;quot; procedure, which always selects the most frequent target word. This naive &amp;quot;straw-man&amp;quot; is less sophisticated than other methods suggested in the literature but it is useful as a common benchmark (e.g. Sadler (1989)) since it can 4The parsing process was controlled manually to make sure that we do not get wrong relational representation of the exo amp\]es due to parsing errors.</Paragraph>
    <Paragraph position="3">  be easily implemented. The success rate of the &amp;quot;word frequencies&amp;quot; procedure can serve as a measure for the degree of lexical ambiguity in a given set of examples, and thus different methods can be partly compared by their degree of success relative to this procedure.</Paragraph>
    <Paragraph position="4"> Out of the 105 ambiguous Hebrew words, for 32 the bound Badid not pass the threshold (applicability of 70%). The remaining 73 examples were distributed according to the following table:  Thus the precision of the statistical model was 92% (67/73) 5 while relying just on word frequencies yields 64% (47/73).</Paragraph>
    <Paragraph position="5"> Out of the 54 ambiguous German words, for 22 the bound Badid not pass the threshold (applicability of 59%). The remaining 32 examples were distributed according to the following table:  Thus the precision of the statistical model was 75% (24/32), while relying just on word frequencies yields 53% (18/32). We attribute the lower success rate for the German examples to the fact that they were not restricted to topics that are well represented in the corpus.</Paragraph>
    <Paragraph position="6"> Statistical analysis for the larger set of Hebrew examples shows that with 95% confidence our method succeeds in at least 86% of the applicable examples (using the parameters of the distribution of proportions). With the same confidence, our method improves the word frequency method by at least 18% (using confidence interval for the difference of proportions in multinomial distribution, where the four cells of the multinomial correspond to the four entries in the result table).</Paragraph>
    <Paragraph position="7"> In the examples that were treated correctly by our 5An a posteriorl observation showed that in three of the six errors the selection of the model was actually acceptable, and the a priori judgment of the hnman translator was too severe. For example, in one of these cases the statistics selected the expression 'to begin talks' while the human translator regarded this expression as incorrect and selected 'to start talks'. If we consider these cases as correct then there are only three selection errors, getting a 96% precision.</Paragraph>
    <Paragraph position="8"> method, such as the examples in the previous sections, the statistics succeeded to capture two major types of disambiguating data. In preferring 'signtreaty' upon 'seal-treaty', the statistics reflect the relevant semantic constraint. In preferring 'peacetreaty' upon 'peace-contract', the statistics reflect the hxical usage of 'treaty' in English which differs from the usage of 'h_oze' in Hebrew.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML