File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/94/j94-4004_abstr.xml
Size: 9,224 bytes
Last Modified: 2025-10-06 13:48:17
<?xml version="1.0" standalone="yes"?> <Paper uid="J94-4004"> <Title>Machine Translation Divergences: A Formal Description and Proposed Solution</Title> <Section position="2" start_page="0" end_page="598" type="abstr"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> There are many cases in which the natural translation of one language into another results in a very different form than that of the original. The existence of translation divergences (i.e., cross-linguistic distinctions) makes the straightforward transfer from source structures into target structures impractical. This paper demonstrates that a systematic solution to the divergence problem can be derived from the formalization of two types of information: (1) the linguistically grounded classes upon which lexical-semantic divergences are based; and (2) the techniques by which lexical-semantic divergences are resolved. An important result of this formalization is the provision of a framework for proving that the lexical-semantic divergence classification proposed in the current approach covers all source-language/target-language distinctions based on lexical-semantic properties. Other types of divergences and mismatches are outside of the scope of this paper; these include distinctions based on purely syntactic information, idiomatic usage, aspectual knowledge, discourse knowledge, domain knowledge, or world knowledge/ Although other translation approaches have attempted to account for divergences, the main innovation of the current approach is that it provides a formalization of these divergences and the techniques by which they are resolved. This is advantageous from a computational point of view in that it facilitates the design and implementation of * Department of Computer Science, University of Maryland, A. V. Williams Building, College Park, MD 20742, USA. 1 The reader is referred to Dorr (1993a) for a discussion of how syntactic divergences are handled. Aspectual divergences are treated by Dorr (1992a). The relatio.n of the current framework to other types of knowledge outside of lexical semantics is discussed by Dorr and Voss (1993b).</Paragraph> <Paragraph position="1"> (c) 1994 Association for Computational Linguistics Computational Linguistics Volume 20, Number 4 (1) Thematic divergence: E: I like Mary ~ S: Maria me gusta a mi 'Mary pleases me' (2) Promotional divergence: E: John usually goes home 4=~ S: Juan suele ira casa 'John tends to go home' (3) Demotional divergence: E: I like eating ~ G: Ich esse gem 'I eat likingly' (4) Structural divergence: E: John entered the house 4=~ S: Juan entr6 en la casa 'John entered in the house' (5) Conflational divergence: E: I stabbed John ~ S: Yo le di pu~aladas a Juan 'I gave knife-wounds to John' (6) Categorial divergence: E: I am hungry ~ G: Ich habe Hunger 'I have hunger' (7) Lexical divergence: E: John broke into the room ~ S: Juan forz6 la entrada al cuarto Figure 1 'John forced (the) entry to the room' Examples of translation divergences with respect to English, Spanish, and German. the system: the problem is clearly defined in terms of a small number of divergence categories, and the solution is systematically stated in terms of a uniform translation mapping and a handful of simple lexical-semantic parameters. In addition, the formalization allows one to make an evaluation of the status of the system. For example, given the formal description of the interlingua and target-language root words, one is able to judge whether a particular target-language sentence fully covers the concept that underlies the corresponding source-language sentence. Finally, the formalization of the divergence types and the associated solution allows one to prove certain properties about the system. For example, one might want to determine whether the system is able to handle two or more simultaneous divergences that interact in some way. With the mechanism of the current approach, one is able to prove formally that such cases are handled in a uniform fashion.</Paragraph> <Paragraph position="2"> This paper will focus on the problem of lexical-semantic divergences and will provide support for the view that it is possible to construct a finite cross-linguistic classification of divergences and to implement a systematic mapping between the interlingual representation and the surface syntactic structure that accommodates all of the divergences in this classification. The types of divergences under consideration are those shown in Figure 1. The first divergence type is thematic: in (1), the theme is realized as the verbal object (Mary) in English but as the subject (Maria) of the main verb in Spanish. The second divergence type, promotional, is one of two head switching divergence types: in (2), the modifier (usually) is realized as an adverbial phrase in English but as the main verb soler in Spanish. The third divergence type, demotional, is another type of head switching divergence: in (3), the word like is realized as a main verb in English but as an adverbial modifier (gern) in German. 2 The fourth 2 The distinction between promotional and demotional divergences is not intuitively obvious at first glance. In both (2) and (3), the translation mapping associates a main verb with an adverbial satellite, or vice versa (i.e., in (2), the main verb soler is associated with the adverbial satellite usually, and in (3) the main verb like is associated with the adverbial satellite gern). The distinction between these two Bonnie J. Dorr Machine Translation Divergences divergence type is structural: in (4), the verbal object is realized as a noun phrase (the house) in English and as a prepositional phrase (en la casa) in Spanish. The fifth divergence type is conflational. Conflation is the incorporation of necessary participants (or arguments) of a given action. In (5), English uses the single word stab for the two Spanish words dar (give) and pu~aladas (knife-wounds); this is because the effect of the action (i.e., the knife-wounds portion of the lexical token) is conflated into the main verb in English. The sixth divergence type is categoriah in (6), the predicate is adjectival (hungry) in English but nominal (Hunger) in German. Finally, the seventh divergence type is a lexical divergence: in (7), the event is lexically realized as the main verb break in English but as a different verb forzar (literally force) in Spanish.</Paragraph> <Paragraph position="3"> The next section discusses the divergence classification given above, comparing the current divergence categories with those of other researchers. Section 3 formally defines the terms used to classify divergences. Section 4 uses this terminology to formalize the divergence classification and to define the solution to the divergence problem in the context of detailed examples. Finally, Section 5 discusses certain issues of relevance to the divergence problem including the resolution of several (recursively) interacting divergence types.</Paragraph> <Paragraph position="4"> 2. Classification of Machine Translation Divergences The divergence problem in machine translation has received increasingly greater attention in recent literature (see, for example, Barnett et al. 1991a, 1991b; Beaven 1992a, 1992b; Dorr 1990a, 1990b; Kameyama et al. 1991; Kinoshita, Phillips, and Tsujii 1992; Lindop and Tsujii 1991; Tsujii and Fujita 1991; Whitelock 1992; related discussion can also be found in work by Melby \[1986\] and Nirenburg and Nirenburg \[1988\]). In particular, Barnett et al. (1991a) divide distinctions between the source language and the target language into two categories: translation divergences, in which the same information is conveyed in the source and target texts, but the structures of the sentences are different (as in previous work by Dorr \[1990a, 1990b\]); and translation mismatches, in which the information that is conveyed is different in the source and target languages (as described by Kameyama et al. \[1991\]). 3 Although translation mismatches are a major problem for translation systems that must be addressed, they are outside the scope of the model presented here. (See Barnett et al. 1991a, 1991b; Carbonell and Tomita 1987; Meyer, Onyshkevych, and Carlson 1990; Nirenburg, Raskin, and Tucker 1987; Nirenburg and Goodman 1990; Nirenburg and Levin 1989; Wilks 1973; among others, for descriptions of interlingual machine translation approaches that take into account knowledge outside of the domain of lexical semantics.) Although researchers have only recently begun to classify divergence types systematically, the notion of translation divergences is not a new one in the machine translation community. For example, a number of researchers working on the Eurotra project have sought to solve divergent source-to-target translations, although the divergences were named differently and were resolved by construction-specific transfer rules. (For cogent descriptions of the Eurotra project, see, for example, Arnold and des Tombe 1987; Copeland et al. 1991; and Johnson, King, and des Tombe 1985). head switching cases will be made clearer in Section 4.3.</Paragraph> </Section> class="xml-element"></Paper>