XML Viewer - c92-2080

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/c92-2080_metho.xml
Size: 19,570 bytes
Last Modified: 2025-10-06 14:13:00
<?xml version="1.0" standalone="yes"?>
<Paper uid="C92-2080">
  <Title>TRANSLATION AMBIGUITY RESOLUTION BASED ON TEXT CORPORA OF SOURCE AND TARGET LANGUAGES</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
ACRES DE COLING-92, NANTES, 23-28 ^of.rr 1992 5 2 5 PROC, oJ, COLING-92, NA~CrEs. AUG. 23-28, 1992
2 Conventional Methods of
Ambiguity Resolution
2.1 Rule-Based Translation
</SectionTitle>
    <Paragraph position="0"> In conventional methods, linguistic restrictions described in the dictionary and grammar are used to select the suitable equivMcnt translation or meaning. In general, these restrictions are de~ scribed logically on characteristics of another expression which modifies or is modified by the expression to be processed. For example, to translate predicates (verbs and predicative adjectives), semantic restrictions are deacribed on essential case arguments in forms of semantic markers to indicate features of words or terms in the thesaurus to show a hierarchy composed of word concepts.</Paragraph>
    <Paragraph position="1"> Though these conventional methods have been very useful to realize natural language processing systems, ttmy have the following problems:  1. It is impossible to decide the most suitable equivalent translation if the input expression meets two or more restrictions.</Paragraph>
    <Paragraph position="2"> 2. Analysis fails when the input expression can meet no restrictions.</Paragraph>
    <Paragraph position="3"> 3. Actually the practical systems depends on such heuristics as pre-declded application order of restrictions or some default equivalent translations or meanings.</Paragraph>
    <Paragraph position="4"> 4. The description of the restrictions is based on  direct structural dependencies, therefore it is quite difficult to describe the restrictions based on sister-dependency or between expressions belong to different sentences or paragraphs.</Paragraph>
    <Paragraph position="5"> 5. Restrictions on any dependencies cannot be thoroughly described in advance.</Paragraph>
    <Paragraph position="6"> For example, a Japanese word &amp;quot;booru&amp;quot; has two meanings, one is 'a bail(a round object used in a game or sport)' and the other is 'a bowl(a deep round container open at the tap especially used in cooking)'. When this word occurs in the following sentence, it must mean ~a bowP.</Paragraph>
    <Paragraph position="7"> JAP: Booru-nl mizu-o ireru bowl dative water obj, pour, or marker marker put in ball or fill ENG; To pour water in%o a bowl In this case, to select the meaning by the logical restrictions on dependencies, it is necessary to have described even the appearance or usage of the indirect object of the verb &amp;quot;irern&amp;quot;. To describe such detail restrictions on ,all expressions may be possible, but it is quite difficult because the trouble of description and the cost of calculation.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Example-Based Translation
</SectionTitle>
      <Paragraph position="0"> Besides the conventional translation method above, a machine translation system based on translation examples (pairs of source texts and their translations) is also proposed \[Nagao 84, Sato 90, Sumita 90\]. This type of system, called Example-Based Machine Translation, has stored a large amount of bilingual translation examples ms a database, and translates input expressions by retrieving an example most similar to tim input from the database. There is no failure of output in this method because it selects the most similar example not the identical one.</Paragraph>
      <Paragraph position="1"> However this example-based translation system needs a large-scale database of translation examples, and iL is difficult to collect an adcxluate amount of bilingual corpora. Even if it is possible, there is no means to divide the sentences of such corpora into fragments and link them automatically, and it costs us too much time and money to divide and link manually. Besides, this method can neither achieve precise meaning interpretation because it selects equivalent translation directly from the input expression and leaves meaning interpretation out of consideration.</Paragraph>
      <Paragraph position="2"> To overcome this problem, we have also proposed a new mechanism based on sentential examples in dictionary, which utilize the merits of both the translation by logical restrictions and the example-based methud, by selecting the equivalent translation which ha.s tlle most similar example to the input expression IDol 92\]. This mechanism can guarantee no failure in selecting an equivalent translation, but the description of relations are still based only on direct structural dependencies.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2,$ Statistics-Based Translation
</SectionTitle>
    <Paragraph position="0"> Several new methods especially of machine translation have been proposed lately, which select a suitable equivalent translation using statistical or probabilistic information extracted from language text \[Brown 90, Nomiyama 91\]. Because many machine readable texts have been already collected nowadays, it is not difficult to extract statistical information of each expression in the texts semiautomatically. Moreover, the statistical information reflects the context in which eactt word occurs and implies the logical restrictions based on indirect structural dependencies.</Paragraph>
    <Paragraph position="1"> Although we call the systems in a same word &amp;quot;statistics-based translation&amp;quot;, statistical information used in the methods is diverse, such as translation probability, connectivity of words, statistics for (co-)occurrence, etc. We make comments on the characteristics and the limits of these systems.</Paragraph>
    <Paragraph position="2"> The first method uses fertility probabilities, translation probabilities and distortion probabilities \[Brown 90\]. Fertility means the number of the words in target language that the word of ttle AcrEs DE COLING-92, NANTES, 23-28 AOOT 1992 5 2 6 PROC. OF COLING-92. NANTES. AUG. 23-28, 1992 source language produces, and distortion means tile distance between the position of the word of the source language and the one of the target language.</Paragraph>
    <Paragraph position="3"> Tile method has been applied to au experiinental translation system from French to English. How ever, since these probabilities are extracted from a large amount of text pairs that are translations of each other, this method must be suffered from tile santo difficulties ,as examplc~b~sed translation in collecting and analyzing an adequate amount of bilingual corpora, and it's very difficult to apply this method to the languages whose linguistic structures aren't similar each other, such as English and Japanese.</Paragraph>
    <Paragraph position="4"> The second method uses tile statistics for occurfence in target language text \[Nomiyama 91\]. It is calculated ill advance how frequently the each expression occurs in the t~rget language text, which needs only to belong the same tiehl as the source language text beblngs, but not to be a translated text of tile source language text. If there are more than one possible equivalent translations, the most frequent translation is selected through this calculated data. Moreover, this nrethod can be applied to make good use of the conventional methods of selecting equivalent translations, tbr it employs the frequency data exclusively when logical restrictions cannot select one out of candldates.</Paragraph>
    <Paragraph position="5"> However this method hms one big problem. The high frequency of the expression in the target language text may not originate from the frequency of the expression in the source language text to be translated, because one target language expression does not correspond to only one source language expression ill general.</Paragraph>
    <Paragraph position="6"> Suppose the following sentence is a first examph.': JAP: Sorlo saibankan-wa kooto-to that judge subj. coat and marker or court nekutai o katta.</Paragraph>
    <Paragraph position="7"> tie obj. bought marker ENG: The judge bought a coat and a tie.</Paragraph>
    <Paragraph position="8"> Figure 1 indicates translation process through bilingual dictionary and the statistics for co-occurrence of each pair of expressions in both Japanese and English necessary to translate tile  sentence 1. The Japanese word 'qmoto&amp;quot; has two equivalent English translations: '(over)coat' and  shown in the figures are given provisionally for understanding.</Paragraph>
    <Paragraph position="9"> with only logical restrictions on tile direct object of the Japanese verb &amp;quot;kau&amp;quot;, because we can buy both 'coLt' and 'court' tile sentence &amp;quot;Tenisu-.kooto o kau&amp;quot; :: 'To buy a tennis court' is also quite acceptable. In this case, the statistics for co-occurrence in the target language English text denotes that the most frequent pair is 'court-judge', because the word ~COllr|' also lneans a qaw court'. Then using only statistical data on the targct language text misleads a wrong expression 'court' ms the. C&lt;luivalent translation of &amp;quot;kooto&amp;quot;, and the exanlple senteach may Im translated into 'The judge bought a court and a tic.'.</Paragraph>
    <Paragraph position="10"> Tile second examph! is this sentence~: JAP: Kotori no kago-ni mizu o bird of cage dative water obj.</Paragraph>
    <Paragraph position="11"> or marker marker basket ireta booru-o ella.</Paragraph>
    <Paragraph position="12"> filled bowl obj, put or market ball EN(;: I put a bowl filled with water in the bird cage.</Paragraph>
    <Paragraph position="13"> q'ranslation process of this sentence and the statistics for co-occurrence are shown in Figure 2. Because the pair of 'basket' and 'ball' co-occurs most frequently in the target language, tile senteuce nlay be translated into 'I put a ball filled with water in the bird basket.'.</Paragraph>
    <Paragraph position="14">  Now we propose a new method to provide rea sonable criteria for selecting a suitable equivalent tralMation or meaning using the simple statistical data extracted from source language text in addition to tile one from target language text. These source and target language texts don't have to be translations of each other. The proposed method gives us a way to select tile expression with the highest frequency of the target language text that keeps high frequency of the source language text at tim same time, so it overcomes the difficulty of the method using the frequency data on the target language text only, because it does not select the expression with the highest frequency of only the target language text.</Paragraph>
    <Paragraph position="15"> ~The subject phrase &amp;quot;watashi-wa&amp;quot; = T is omitted in this sentence.</Paragraph>
    <Paragraph position="16"> AcrEs DE COLlNG-92, NANT .ES, 23-28 AObq&amp;quot; 1992 5 2 7 PRec. O1~ COLING-92, NANTES, AUG. 23-28, 1992</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Using statistical data on source
</SectionTitle>
      <Paragraph position="0"> language text The method using only statistical data on the target language text may mislead a wrong equivalent translation, because in general each target language expression corresponds to nmre than one source language expression.</Paragraph>
      <Paragraph position="1"> The equivalent translation selection with statistics for co-occurrence in the target language text when a source language expression S, has n equivalent translations in target language T,i(i = 1 *.. r~) is shown ~s this:</Paragraph>
      <Paragraph position="3"> source language expression n target language eqnivalent translations of Sk statistics for co-occurrence of two exprt~sions EI,Ej The method using only statistical data on the target language text selects T,i which maximizes the statistics for co-occurrence in the target language text 3 as the equivalent translation of S,, where the partner of the co-occurrence Tbj plays the part of the basis for the equivalent translation selection. The bigg~t problem of this method is that Tbj which depends both b and j is selected by only statistical data on the targct language text. Our new method provides reasonable criteria for selecting the basis for the equivalent translation selection using the statistical data on the source language text. First the source language expression Sb which maximizes the statistics for co-occurrence in the source language text 4 is selected, then the equivalent translation T,i which maximizes the statistics for co-occurrence in the target language text 5 is selected. The dependency relation in the source language is reflected in the translated text through this method. We call this method for equivalent translation and meaning selection</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Dual Corpora
</SectionTitle>
      <Paragraph position="0"> The algorithm of this method is summarized as follows:  1. Prepare the source and target language texts (the target language text needs not to be a translated text of the source language text). ~T.il maxb,15 SCO(T.~, Tbi) 4 Sb\[ maxl, SCO(S., Sb) bT. d max,5 SCO(T.I, Tbj) 2. Accumulate the statistics for co-occurrence of every expression in both texts.</Paragraph>
      <Paragraph position="1"> 3. When it source language expression Sa has n equivalent translations in target language</Paragraph>
      <Paragraph position="3"> (a) Select S~l maxb sco(s., sb) (b) Select T,, I maxl,j SCO(T,I, Tbj)</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Operation Example
</SectionTitle>
      <Paragraph position="0"> Figure 1 3 show operation examples. Figurc 1 and 2 are examples of Japanesc~Engllsh translation. In Figure 1, with only statlstic,t\] data on the target language text, ~court' may be chosen as an equivah;nt translation of &amp;quot;kooto&amp;quot; because tim pair of 'eonrt-judge' co-occurs most frequeatly in the target language. However with DMAX Crite rio, the equivalent translation of &amp;quot;kooto&amp;quot; is selected correctly.</Paragraph>
      <Paragraph position="1"> * The expression which co-occurs with &amp;quot;kooto&amp;quot; most frequently in the source language is * The pair of the equivalent translation of &amp;quot;kooto&amp;quot; and the one of &amp;quot;nekutai&amp;quot; which co-occurs most frequently in the target language is 'coat tie'.</Paragraph>
      <Paragraph position="2"> * As a result, &amp;quot;kooto&amp;quot; is translated into 'coat'. It is the same ms shown in Figure 2. A pair of 'basket ball' co-occurs most frequently in the target language. But using DMAX Criteria, giving attention first to the most frequent pair in the source language text, &amp;quot;kotori-kago&amp;quot; cart gain the correct equivalent translation 'cage'. Next, a pair of &amp;quot;mizu-booru&amp;quot; decides 'bowl' as an equivalent translation of &amp;quot;booru'. Finally, correct trauslation can be acquired in this way.</Paragraph>
      <Paragraph position="3"> Figure 3 shows the translation proct.~s and the statistics for co-occurrence of another English~ Japanese tr~tnslation example.</Paragraph>
      <Paragraph position="4"> ~NG: The coiling of the court was cleaned quite well.</Paragraph>
      <Paragraph position="5"> # JAP: Saibansho no tenjoo~wa court of ceiling subj. marker kireini sonjl-sareteita.</Paragraph>
      <Paragraph position="6"> quite well be cleaned In this case, the English words 'court' and 'clean' have two meanings respectively.</Paragraph>
      <Paragraph position="7"> 'court' saibansho a room or building in which law cases can be heard and judged kooto (a part of) an area specially prepared and marked for various ball games, such as tennis AcrEs DE COLING-92, NANTES, 23-28 AOOT 1992 5 2 8 PRec. OF COL1NG-92, NANaT.S, AUG. 23-28, 1992 'clean' souji-surn to clean rooms kuriiningu-surn to clean clothes with chemicals instead of water A pair of &amp;quot;kooto-knriining'u&amp;quot; co-occurs most frequently in the target language, so the sentence may be translated into &amp;quot;Kooto -no ten joe-ha kireinl kuriiningu-sareteita.&amp;quot;. But using DMAX Criteria, 'ceiling' is selected as a basis for the equivalent translation selection of 'court', and &amp;quot;saibansho&amp;quot; is selected as an equivalent translation of 'court' by the comparison between statistics for co-occurrence on the pairs of &amp;quot;tenjoo-saibansho&amp;quot; and &amp;quot;tenjoo kooto'.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Calculation of Linguistic
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Statistics for Semantic
Interpretation
</SectionTitle>
      <Paragraph position="0"> In language understanding systems or machine translation systems throngh semantic expressions, one suitable meaning must be selected out of the ones described in a dictionary according to an entry word. However in conventional systems the meaning selection mechanism isn't robust and cannot select the most suitable meaning only by logical restrictions described in the dictionaries. We presented a new method for the equivalent translation selection in the former chapter using statistical data on source language and target language through bilingual dictionary. To apply this method to meaning selection, it is necessary to calculate statistical data on the pairs of each meaning in advance, but there is no means of calculating them automatically.</Paragraph>
      <Paragraph position="1"> We have already devdoped an interlingua-based machine translation system whose interlingun named PIVOT doesn't depend on arty particular natural language \[Muraki 86, ichiyama 89, Okumura 91\]. In its dictionary, a.s illustrated in Figure 4., expressions in the source language are mapped onto some interlingua vocabularies (CONCEPTUAL-PRIMITIVE:CP), which are next mapped onto some equivalent translations.</Paragraph>
      <Paragraph position="2"> Then we propose a new method of computing linguistic statistics for occurrence of meanings automatically using this format of dictionary.</Paragraph>
      <Paragraph position="3"> Suppose linguistic statistics on the pairs of expressions in both source and target language texts have already been calculated. In case of translation, when an expression Si occurs in the source language text, an equivalent translation Tij k is decided through the passage of Si :=*'Cij ~Tij~, and as a result, CPCIj is also selected from the CPs corresponding to the expression Si. Therefore, the linguistic statistics on the pairs of CPs or meanings is nothing but coupling linguistic statistics on the pairs of corresponding exl)ressions in the target language text. Thus, the linguistic statistics ~Z on the pairs of the meaning expressions in the dictionary can be obtained as the sum of the linguistic statistics w on the pairs of target language oxpressious according to the following equation.</Paragraph>
      <Paragraph position="4"> fi(C.,,Cb,,) = ~wm(Ta.,v,T~,,,q) p,q This linguistic statistics can be added to the dictionary in advance I and we c~n select the meaning in the same way as equivalent translation selection.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML