File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/90/c90-3044_metho.xml

Size: 12,268 bytes

Last Modified: 2025-10-06 14:12:31

<?xml version="1.0" standalone="yes"?>
<Paper uid="C90-3044">
  <Title>Toward Memory--based Translation</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Matching Expression
</SectionTitle>
    <Paragraph position="0"> To implenrent the ability to combine some fragments of t.ra.nslation example in order to translate one sentence, we must determine the following: null (r) how to represent translation examples * what is a fragment * how to represe.t the combination of flag-</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
lnent.s
3.1 Translation Database
</SectionTitle>
      <Paragraph position="0"> The translation database is the collection of translation examples. A t~anslation example consists of three parts:</Paragraph>
      <Paragraph position="2"> \[j5, \[nouto,n\]\]\]\]).</Paragraph>
      <Paragraph position="3"> %% Kare ha nouto wo kau.</Paragraph>
      <Paragraph position="4"> clinks(\[\[el,jl\],\[e2,j3\],\[e3,j5\]\]). %% el &lt;-&gt; jl, e2 &lt;-&gt; j3, e3 &lt;-&gt; j5 Each number with prefix 'e' or 'j' in word-dependency trees represents the ID of the subtree. Each node in a tree contains a word (in root form) and its syntactic category. A correspondence link is represented as a pair of iDs.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Translation Unit
</SectionTitle>
      <Paragraph position="0"> A word-dependency (sub)tree which has a correspondence link is transhttable; e.g. el, e2, e3, jl, j3, j5. A translatable tree in which some translatable subtrees are removed is also translata.ble; e.g. el-e2, el-e3, el-e2-e3, jl-j3, jl-j5, jl-ja-jS. Both of them are tra.nslat-M)le fragments. Sadler calls them translation w,,its\[Sadler 89a,\].</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Matching Expression
</SectionTitle>
      <Paragraph position="0"> Next we will introduce the concept 'matching expression.' Matching expression(ME) is defined as the following:</Paragraph>
      <Paragraph position="2"> or \[r,&lt;ID&gt;, &lt;ME&gt;\] or \[a,&lt;ID&gt;,&lt;ME&gt;\] %% delete &lt;ID&gt; %% replace &lt;ID&gt; %% with &lt;ME&gt; %% add &lt;ME&gt; as a %% child of root %% node of &lt;ID&gt;  Every ID in an ME should be translatable. We assume the example in Section 3.1 and the following example.  ewd_e( fell, freud,v\] , \[el2, \['I ),prOn\]\] , \[el3, \[book,n\] , \[el4, \[a,det\] \] , \[elb, Ion,p\] , \[el6, \[politics,n\] , felT, \[international, adj\] \]\]\]\]1).</Paragraph>
      <Paragraph position="3"> Y,Y, I read a book on international %% politics.</Paragraph>
      <Paragraph position="4"> jwd_e(\[jll, \[yomu,v\] , \[j12, \[ha,p\] , \[j13, \[watashi,pron\] \]\] , \[j14, \[wo,p\] , \[j15, \[hon,n\] , \[j16, \[ta, aux\] , \[j17, \[reru,aux\] , \[j18, \[kaku,v\] , \[j19, \[nitsuite,p\] , \[j20, \[kokusaiseij i,n\] 1\]\]11\]\]\]).</Paragraph>
      <Paragraph position="5">  steps: decomposition, transfer, and composition. This process generates all candidates of translation using Prolog's backtrack mechanism. null</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Decomposition
</SectionTitle>
      <Paragraph position="0"> In decomposition, the system decomposes a source word-dependency tree(SWD) into translation units, and makes a source matching expression(SME). For example,</Paragraph>
      <Paragraph position="2"> The main tasks in this step are to retrieve translation units and compare the source WD with retrieved translation units. To retrieve translation units quickly, we use some hashing techniques. There are two program to do the comparison task; one for English WDs and one for Japanese WDs. In comparison of Japanese WDs, the order of subtrees is not inlportant.</Paragraph>
      <Paragraph position="3"> To reduce the search space and the number of candidates, we define replaceablity between syntactic categories. If two nodes are replaceable, system makes only ~ replacecommand. As a result, the the system does not make some matching expressions; e.g.</Paragraph>
      <Paragraph position="4"> \[el, \[d,e3\] , \[a,el, \[e13\]\]\]</Paragraph>
    </Section>
    <Section position="5" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Transfer
</SectionTitle>
      <Paragraph position="0"> in the transfer step, the system replaces every ID in the source matching expression with its corresponding ID. For example,</Paragraph>
      <Paragraph position="2"/>
    </Section>
    <Section position="6" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.3 Composition
</SectionTitle>
      <Paragraph position="0"> in the composition step, the system composes the target word-dependency tree according to the target matching expression. For example.</Paragraph>
      <Paragraph position="2"> ~,~. Kate ha kokusaiseiji nitsuite ~,~, kakareta hon wo kau.</Paragraph>
      <Paragraph position="3"> This step divides into two sub-steps; the main composing step and validity checking. In the main composing step, there is no ambiguity with one exception. Because an addcommand \[a,&lt;ID&gt;,&lt;ME&gt;\] specifies only the parent node(&lt;ID&gt;) to add the tree(&lt;ME&gt;), there are some choices in composing English word-dependency trees. In this step, all possibilities are generated.</Paragraph>
      <Paragraph position="4"> Validity of the composed word-dependency trees are checked using syntactic categories. Validity is checked in every parent-children unit. For example, in the above target word-dependency tree, \[v, \[p,p\] \] , \[p, \[prom \], \[p, \[n\] \], In, \[aux\] \] ....</Paragraph>
      <Paragraph position="5"> are checked. A unit is valid if there is a unit which has the same category pattern in the database. A word-dependency tree is valid if all parent-children units are valid.</Paragraph>
      <Paragraph position="7"/>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Score of Translation
</SectionTitle>
    <Paragraph position="0"> To select the best translation out of all candidates generated by system, we introduce the score of a tra.nslM.ion. We define it based on the score of the matching expression, because the matching expression determines the translation outi)ut. The scores of.the source matching expression and the target matching expression are calculated separately.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.1 Score of Translation Unit
</SectionTitle>
      <Paragraph position="0"> First, we will define the score of a translation unit. The score of a translation unit should reflect the correctness of the translation unit.</Paragraph>
      <Paragraph position="1"> Which translation unit is better? Two main fac- null t.ors are: 1. A larger translation unit is better. 2. A translation unit in a matching expression is a fragment of a source (or target) word- null dependency tree, and also a fragment of a translation example. There are two environments of a translation unit; in a source (or target) tree and in a translation example. The more similar these two environmeuts are, the better.</Paragraph>
      <Paragraph position="2"> To calculate 1, we define the size of a translation unit(TU ).</Paragraph>
      <Paragraph position="3"> size(TU) = the number of nodes in TU To calculate 2, we need a measure of similarity between two environments, i.e. external similarity. To estimate external similarity, we introduce a unit called restricted environment. A restricted environment consists of the nodes one link outside of a TU normally. If corresponding nodes are same in two environments, those environments are extended one more link outside. Figure 2 illustrates restricted environments of a TU. We estimate external similarity as the best matching of two restricted environments. To find the best matching, we first determine the correspondences between nodes in two restricted environments. Some nodes have several candidates of correspondence. For example, n7 corresponds with rn6 or m7. In this case, we select the most similar node. To do this, we assume that similarity values between nodes (words) are defined as numeric values between 0 and 1 in a thesaurus. When the best matching is found, we can calculate the matching point between two environments, mpoint(TU, WD).</Paragraph>
      <Paragraph position="4"> mpoint(TU, WD) = summation of similarity values between corresponding nodes in two restricted environments ~t the best matching We use this value as a measure of similarity between two environments.</Paragraph>
      <Paragraph position="5"> Finally, we define the score of a translation unit, seore(TU, WD).</Paragraph>
      <Paragraph position="6"> score(TU, WD) = size(TU) x (size(Tg) + mpoiut(TU, WD)) For example, we assume that the following similarity vMues are defined in a thesaurus.</Paragraph>
      <Paragraph position="7"> 250 4 sim(\[book,n\], \[notebook,n\],O.8).</Paragraph>
      <Paragraph position="8"> sire( \[buy,v\] , \[read,v\] ,0.5) .</Paragraph>
      <Paragraph position="9"> sire( \[hon,n\] , \[nouto,n\] ,0.8).</Paragraph>
      <Paragraph position="10"> sim(\[kau,v\],\[yomu,v\],O.5).</Paragraph>
      <Paragraph position="11"> Then i.he scores of translation units in the previous section are the followings.</Paragraph>
      <Paragraph position="13"/>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.2 Score of Matching Expression
</SectionTitle>
      <Paragraph position="0"> \]?he score of a nlatching expression is defined as the following.</Paragraph>
      <Paragraph position="1"> score.( .'tiE:. It'D) F~YUCME score(TU, WD) sizc(WD) 2 FOl; exaul ple, \[jl, \[r,jS, \[j15\] \]</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.3 Score of Translation
</SectionTitle>
      <Paragraph position="0"> Finally, we define the score of a translation as the following.</Paragraph>
      <Paragraph position="1"> scur~:(SWD. SME, TME, TWD) = ~,n i~( seo,'~( S ME. S WD), score( T~I E, TW D ) ) For example, the score of the translation in * the previous section is 0.6.I2.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
6 Examples
</SectionTitle>
    <Paragraph position="0"> The English verb eat corresponds to two Japanese verbs, tabcrv and okasu. For exampie. null (4) The mall eats w.:getabtes.</Paragraph>
    <Paragraph position="1"> Hito ha yasal wo taberu.</Paragraph>
    <Paragraph position="2"> (5) Acid eats metal.</Paragraph>
    <Paragraph position="3"> San ha kinzoku wo oka.qu.</Paragraph>
    <Paragraph position="4"> Figure 3 shows translation outl)uts based on example (,t) and (5) by MBT2. MBT2 chooses htberu for he cat.s t~ota, toes and okasu for sulfuric acid cals iron.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 251
7 Discussion
</SectionTitle>
    <Paragraph position="0"> Although MBT2 is not a full realization of Nagao's idea., it contains some merits from the original idea.</Paragraph>
    <Paragraph position="1"> 1. It is easy to modify the system.</Paragraph>
    <Paragraph position="2"> The knowledge of the system is in the form of translation examl)les and thesauri. We can modify the system with addition of translation examples.</Paragraph>
    <Paragraph position="3"> . It can do high quality translation.</Paragraph>
    <Paragraph position="4"> The system sees as wide a scope as possible in a sentence and uses the largest translation units. It produces high quality translations. null . It can translate some metaphorical sentences. null In the system, semantic information is not used as constraints. As a result, the system can translate some metaphorical sentences. Demerits or problems of the system are:  1. A great deal of computation is needed. 2. Can we make good thesauri?  The first l)roblem is not serious. Parallel computation or some heuristics will overcome it. But the second problem is serious. We have to study how to construct large thesauri.</Paragraph>
    <Paragraph position="5"> acknowlegments The authors would like to thank Mort Webster for his proof reading.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML