File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/82/c82-1062_metho.xml
Size: 19,014 bytes
Last Modified: 2025-10-06 14:11:29
<?xml version="1.0" standalone="yes"?> <Paper uid="C82-1062"> <Title>The Transfer Phase in an English-Japanese Translation System</Title> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 PROCESSING OF VALENCES 2-I BASIC SCHEME </SectionTitle> <Paragraph position="0"> Same syntactic forms in English (direct objects, prepositional phrases with specific prepositions, etc.) are often expressed differently in syntactic forms in Japanese. It is obvious that there are no one-to-one correspondences between syntactic functions of two languages and therefore, transforming from one language to another, based simply on syntactic functions, is net sufficient.</Paragraph> <Paragraph position="1"> There are two, essentially different solutions for avoiding this difficulty. One solution is to set up intermediate &quot;meaning n representations, through which surface forms of two languages are related. This scheme has been recurrently adopted, esp~ci~ly by Al-oriented researchers. The oth~ ~ne, which we adopted here, is the scheme called &quot;lexical unit oriented transfer&quot;, where many idiosyncratic phenomena specific to individual lexical units are treated by referring to the descriptions in the dictionaries. In this approach, the selection of target s~rfaee foetus is perf~ largely dependlng on lexioal deaorlpti~ in the Bi-li~ D~ctioeary (W), without refeeri~ to universal semantic primitives or rela h tions.</Paragraph> <Paragraph position="2"> The interface structure adopted by GETA is called &quot;multi-level analysis tree&quot; which is a kind of annotated trees where various kinds of information of several levels such as syntactic functions (SF), logical relationships (RL), morpho-syntactic categories (K) etc. are attached to each node.</Paragraph> <Paragraph position="3"> Such annotation is expressed in the form of attribute-value pairs (At GETA, &quot;attributes&quot; such as SF, RL etc. are called &quot;variables&quot;. We follow this convention in the following.) Among the variables used at GETA, VL-i(i=1, 2 : Valences) and RL play important roles in every stage of translation (Analysis, Transfer and Generation). The whole process can be schematized as follows.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> (Basic Scheme) </SectionTitle> <Paragraph position="0"> (I) The valences of each source predicate are described in analysis dictionary by using VL-i. VL-i indicates what kind of surface syntactic form is required of the elementwhich fills the i-th argument of the predicate. Suppose that the verb &quot;reach&quot; has the following valences.</Paragraph> <Paragraph position="1"> In the AS (Analyse Syntactique), the initial string of words is converted into an annotated tree strueutre by referring to these lexical description (See Fig. I).</Paragraph> <Paragraph position="2"> I reached ab~k for him.</Paragraph> </Section> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 1 VCL </SectionTitle> <Paragraph position="0"> Fig. i Result of Analysis (2) The TL replaces the source lexical units in the trees with corresponding target lexlcal units. The target units, especially target predicates, have their own valences which show in what surface forms the i-th arguments should be generated. Because different valence strucures such as above (a), (b) and (c) often lead to different selections of target equivalents, the valence information is checked during the lexical transfer(See Fig. 2). In some cases~ simple source predicates are paraphrased by composite target structures as in Fig. 3.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> THE TRANSFER PHASE IN A TRANSLATION SYSTEM 385 </SectionTitle> <Paragraph position="0"> From the above scheme, though it is over-simplified in many &quot;points, we can see that the surface forms of the two languages governed by predicates are almost directly associated with each other by the descriptions in the BD.</Paragraph> <Paragraph position="1"> Furthermore, one can consider that valences of a predicate describe surface usage patterns of the predicate, and that the BD associates such usage patterns of source predicates with different target expressions. Because GETA s multi-level analysis trees preserve information of various levels as much as possible, we can also use the information other than VL-i to enrich the specifications of usage patterns.</Paragraph> <Paragraph position="2"> For example, the usage pattern of &quot;take&quot; take the initiative in --ing, can be specified by referring to VL-i of &quot;take&quot;, morpho-syntactic category of ARG2 (gerund), the specific lexical unit &quot;initiative&quot;, etc., and this usage pattern as a whole will be associated with appropriate Japanese expressions.</Paragraph> <Paragraph position="3"> As such, we can transfer naturally idiomatic, semi-idiomatic, semisemi-idiomatic --- expressions in the source into target ones. This facility is extremely important for the language pairs like English end Japanese, where we can hardly expect one-to-one correspondence between lexical units and therefore, the selection of appropriate target units is one of the most difficult problems in the whole translation process.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 2-2 DISCUSSION </SectionTitle> <Paragraph position="0"> We adopted &quot;lexical unit oriented transfer&quot; or &quot;transfer based on usage patterns&quot; instead of using any intermediate meaning representations.</Paragraph> <Paragraph position="1"> It might be worthwhile mentioning our attitude toward the latter approach.</Paragraph> <Paragraph position="2"> The meaning representation approach seems very attractive, but the researchers in this framework have encountered great number of difficulties in designing a complete set of semantic primitives by which subtle difference of meanings of all lexical units can be expressed. As Boitet (2) pointed out, many systems often use source lexical units as primitives in their representation schemes, though they use certain &quot;universal&quot; sets of primitive relationships (Boitet (2) classified them as &quot;hybrid&quot; systems). However, even in such hybrid systems, to determine a universal set of primitive relationships, deep cases for example, is quite problematic. Moreover, we doubt whether such relationships are really useful for generating target sentences.</Paragraph> <Paragraph position="3"> We can hardly explain without referring to the specific verbs &quot;enter&quot; and &quot;go&quot;, why we say &quot;John enters the auditorium&quot; instead of &quot;John enters into auditorium&quot;, while we say &quot;John goes into the auditorium&quot;. As for deep semantic ease, &quot;the auditorium&quot; plays the same role. The only difference is that &quot;enter&quot; incorporates the meaning of &quot;into&quot; in its meaning but &quot;go&quot; doesnJt. Without semantic decompositlons of verb*s meanings, we cannot establish any rules on deep cases without referring to specific verbs, which can decide whether &quot;into&quot; is necessary or not. If the rules refer to specific verbs, the names of deep oases are not signigicant because the same deep case is differently interpreted depending on indivldual verbs. Why don t you use ARGI, ARG2 etc. instead of AGENT, INST etc ? The case relationships are not so powerful in selecting translation equivalents, either. If we don~t use semantic primitives only by which appropriate target equivalents can be selected, we have to refer to the the surrounding contexts where the source units appear, in order to choose appropriate target equivalents. Why should we reduce the rich structures such as multi-level analysis trees into poor ones ? We don't claim that semantic cases are completely useless, but only claim that a single level structure based on them is not rich enough to select appropriate target equivalents and that surface level information is also useful to specify usage patterns (or &quot;contexts where lexical units appear&quot;).</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 386 J. TSUJII 3 PROCESSING OF TENSE AND ASPECT 3-I BASIC SCHEME </SectionTitle> <Paragraph position="0"> English and Japanese have, of course, their own grammatical devices to express tense and aspect. As for aspect, for examle, English has basically two surface forms, &quot;Perfective&quot; and &quot;Progressive&quot;, and on the other hand, Japanese has the forms &quot;PREDicate+AUXiliaries&quot;, where AUX is a sequence of auxiliary verbs such as &quot;Teiru&quot;, &quot;Tsutsuaru&quot;, &quot;Kake+Teiru&quot; etc. However, we should carefully distinguish between these surface forms (Grammatical Aspects) and what are really expressed by them. In the transfer phase, we should select appropriate Japanese surface forms to express what are really expressed in English. In order to do this, we set up an intermediate representation level which is deeper than surface level. The following five variables and their values are used for this purpose.</Paragraph> <Paragraph position="1"> are expressed by auxiliary verbs which follow the predicates. The values of JSASP are such auxiliaries. These values are realized as surface auxiliaries in the GS. in some cases, more than one auxiliary are needed to express the specified DASP(see below).</Paragraph> <Paragraph position="2"> (I) ESASP (grammatical aapeet) is determlned in the AS.</Paragraph> <Paragraph position="3"> (2) DASP is determined for the combination of ESASP a~d EASP (described in the dictionary for each English predicate - lexical aspect).</Paragraph> <Paragraph position="4"> (3) An appropriate Japanese equivalent for the English predicate is selected.</Paragraph> <Paragraph position="5"> (4) JSASP is determined based on DASP and JASP of the selected Japanese predicate. null (5) Appropriate auxiliaries with adequate inflections are generated in the GS and GM.</Paragraph> <Paragraph position="6"> The above scheme and the detailed correspondence among the values are illustrated in Fig. 7, and Fig. 8, respectively. (Fig. 8 shows only the sub,ortion for &quot;progressive forms&quot;).</Paragraph> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> 3-2 MODIFICATIONS IN THE BASIC SCHEME </SectionTitle> <Paragraph position="0"> The basic shceme can treat the following sentences( Here, we will see the examples of English progressive forms).</Paragraph> <Paragraph position="2"/> <Paragraph position="4"> In these examples, the same grammatical aspect in English progressive - is realized in Japanese by using different grammatical aspects, depending on lexical aspects of both English and Japanese predicates. Note that the same DASP (TDURI) is expressed by different auxiliaries in (EX I) and (EX 2), because &quot;to open&quot; of transitive and intransitive usages correspond to the Japanese verbs &quot;Akeru&quot; and &quot;Hiraku&quot;, respectively, which have different lexical aspects (Hiraku + TEIRU expresses RES, which means &quot;the door is open&quot;).</Paragraph> <Paragraph position="5"> Though it seems to work well for relatively simple sentences, the scheme has been augmented in several points, in order to treat more complicated sentences. We will give Just two examples of such sophistications below.</Paragraph> <Paragraph position="6"> (I) The basic scheme only gives default interpretations of DASP. That is, the interpretation given in Fig. 8 is adopted, only if there is no evidence which recommends another interpretation. Occurrences of time adverbial phrases/clauses, for example, often change the interpretation.</Paragraph> <Paragraph position="7"> (EX 6) He has broken a box.</Paragraph> </Section> <Section position="8" start_page="0" end_page="0" type="metho"> <SectionTitle> (DASP:= COMP) </SectionTitle> <Paragraph position="0"> He has broken boxes for two hours.</Paragraph> <Paragraph position="1"> (DASP:= TDUR2) We currently distinguish four different types of such phrases/clauses (frequentatlve, duratlve, momentary and non-momeltary), and, before the determination of DASP, a specially designed subgrammar is executed to classify the time adverbials into these types. The augumented scheme reflects the properties of such adverbials in determining DASP. Another example of evidences which shift DASP is the occurrence of special adverbs such as &quot;ever&quot;, &quot;yet&quot;, &quot;already&quot; etc. (2) English to- and ing- clauses in predicate valences are expressed by subordinate clauses (SCL) in Japanese, and we should select appropriate surface aspectual forms for the SCLts which reflect relative time orderings among the events described by SCL's and the main clauses.</Paragraph> <Paragraph position="2"> (EX 7) I saw him walking in the garden. .... Arui-TEIRU .... Mi-TA.</Paragraph> <Paragraph position="3"> (to walk) (to see) DASP of &quot;he walks&quot; is TDURI, because the events &quot;I see&quot; and &quot;he walks&quot; occur simultaneously. TDURI for &quot;Aruku(to walk)&quot; is expressed by &quot;TEIRU&quot;, according to the rules shown in Fig. 8. (EX 8) I remembered walking in the garden.</Paragraph> <Paragraph position="4"> .... Arui-TA --- Oboe-TEIRU.</Paragraph> <Paragraph position="5"> (to walk). (to remember) DASP of &quot;I walk&quot; is COMP, because it procedes in time &quot;I remember&quot;. (EX 9) I remember to walk in the garden.</Paragraph> <Paragraph position="6"> .... Aruku-null AUX-- Oboe-TEIRU.</Paragraph> <Paragraph position="7"> (to walk) (to remember) DASP of &quot;I walk&quot; is UNCOMP, because it has not completed yet.</Paragraph> <Paragraph position="8"> In order to treat above phenomena, valences of predicates taking toand/or ing- clauses as arguments are augumented with the specifications of DASP of the argument clauses, and based on these specifications, the same scheme as above selects the grammatical aspects of the Japanese SCL.</Paragraph> </Section> <Section position="9" start_page="0" end_page="0" type="metho"> <SectionTitle> 3-3 DISCUSSION THE TRANSFER PHASE IN A TRANSLATION SYSTEM 389 </SectionTitle> <Paragraph position="0"> We emphasize in 2 the lexical oriented nature of Transfer Phase and claimed that a universal set of case relations is not so useful as often claimed in literature. On the contrary, we set up a set of &quot;semantic&quot; (or deep) markers for processing aspeotual expressions. Why ? First of all, we should notice here that, although both EASP and JASP seem to describ~ the properties of the real world actions which are denoted by the verbs, they are just the classifications of verbs based on their linguistic behaviours in each language. When we say that the Japanese verb &quot;shinu&quot;(to die) belongs to the class (I, R), we don~t claim that the action denoted by &quot;shinu&quot; is a momentary action and always happens in physically null time, but we only claim that the Japanese verb &quot;shinu&quot; linguistically behaves in a certain specific way. This becomes much clearer, when we consider the verb &quot;hiraku&quot;(&quot;to open&quot; - intransitive use) which also belongs to (I, R). While the verb &quot;hiraku&quot; behaves in Japanese as an instantaneous verb, the corresponding English verb &quot;to open&quot; behaves as a non-momentary verb (NMOM).</Paragraph> <Paragraph position="1"> (Note also that, though &quot;hiraku&quot; is an instantaneous verb, we can express &quot;Temporal Duration of Action&quot; (TDURI) by using the verb in (EX 2)). As such, the classifications given by EASP and JASP are essentially language-dependent and not universal ones.</Paragraph> <Paragraph position="2"> DASP, on the other hand, is somewhat universal. Within the scheme given in 3-I, we could omit this variable by directly associating surface expressions in the BD as we did in valence transfer. That is, we could associate etc.</Paragraph> <Paragraph position="3"> respectively. However, this direct association method cannot treat various kinds of interactions illustrated in 3-2 between DASP interpretation and the other linguistic expressions. We need a certain level of representation through which linguistic expressions of various parts interact. Without DASP, we cannot generalize, for example, the influence of time adverbials on aspectual interpretations.</Paragraph> <Paragraph position="4"> Though transferring aspectual expressions seems to be performed without referring to individual lexical units, there are several cases where we have to refer to them. This occurs when the verbs in the two languages have slightly different &quot;meaning&quot;. The English verb &quot;to drown&quot; can be roughly paraphrased as &quot;to die or kill by immersion in liquid&quot; and, as we can see, the meaning essentially contains the concept &quot;to die&quot; or &quot;to kill&quot;. &quot;To drown&quot; behaves linguistically in almost same manners as &quot;to die&quot;. It belongs to the verb class NMOM (completive but non-momentary). The progressive expresses form IMF (immediate future) as shown in (EX 3). On the other hand, the Japanese translation equivalent &quot;oboreru&quot; denotes just the real world process of one's struggling in water not to drown, and behaves as a durative and non-resultative verb. Therefore, though the two sentences denotes almost same situations in the real world, they describe them from different points of view, and DASP of (a) and (b) are IMF and TDURI, respectively. The transfer process is illustrated in Fig. 9. This process're - null fers to the individual lexical units, &quot;to drown&quot; and &quot;oboreru&quot;, and transfers &quot;drown+IMP&quot; into &quot;oboreru+TDUR1- as a whole. This shows that, even in the process of aspect transfer, we need lexical-unit-oriented operations.</Paragraph> <Paragraph position="5"> Moreover, though we talked until now as if EASP and JASP were specified for each lexlcal unit, aspectual properties of predicates often change, according to their usages. Therefore, they should be specified for each usage pattern,* and aspect transfer should be integrated into valence transfer in 2.</Paragraph> </Section> class="xml-element"></Paper>