File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/91/h91-1024_abstr.xml
Size: 25,277 bytes
Last Modified: 2025-10-06 13:47:09
<?xml version="1.0" standalone="yes"?> <Paper uid="H91-1024"> <Title>Machine Translation Using Abductive Inference</Title> <Section position="1" start_page="0" end_page="145" type="abstr"> <SectionTitle> SI:LI International </SectionTitle> <Paragraph position="0"> Machine Translation and World Knowledge.</Paragraph> <Paragraph position="1"> Many existing approaches to machine translation take for granted that the information presented in the output is found somewhere in the input, and, moreover, that such information should be expressed at a single representational level, say, in terms of the parse trees or of &quot;semantic&quot; assertions. Languages, however, not only express the equivalent information by drastically different linguistic means, but also often disagree in what distinctions should be expressed linguistically at all. For example, in translating from Japanese to English, it is often necessary to supply determiners for noun phrases, and this in general cannot be done without deep understanding of the source text. Similarly, in translating from English to Japanese, politeness considerations, which in English are implicit in the social situation and explicit in very diffuse ways in, for example, the heavy use of hypotheticals, must be realized grammatically in Japanese.</Paragraph> <Paragraph position="2"> Machine translation therefore requires that the appropriate inferences be drawn and that the text be interpreted to some depth. Recently, an elegant approach to inference in discourse interpretation has been developed at a number of sites (e.g., Charniak and Goldman, 1988; Hobbs et al., 1990; Norvig, 1987), all based on the notion of abduction, and we have begun to explore its potential application to machine translation. We argue that this approach provides the possibility of deep reasoning and of mapping between the languages at a variety of levels) Interpretation as Abduction. Abductive inference is inference to the best explanation. The easiest way to understand it is to compare it with two words it rhymes with--deduction and induction. Deduction is when from a specific fact p(A) and a general rule (V x)p(z) D q(x) we conclude q(A). Induction is when from a number of instances of p(A) and q(A) and perhaps other factors, we conclude (Vz)p(z) D q(z). Abduction is the third possibility. It is when from q(A) and (Vz)p(z) D q(z), we conclude p(A). Think of q(A) as some observational evidence, of (Vz)p(z) D q(z) as a general law that could 1 The authors have profited from discussions about this work with Stu Shieber, Mark Stickel, and the participants in the Translation Group at CSLI. The research was funded by the Defense Advanced Research Projects Agency under Office of Naval Research contract N00014-85-C-0013, and by a gift from the Systems Development Foundation.</Paragraph> <Paragraph position="3"> explain the occurrence of q(A), and of p(A) as the hidden, underlying specific cause of q(A). Much of the way we interpret the world in general can be understood as a process of abduction.</Paragraph> <Paragraph position="4"> When the observational evidence, the thing to be interpreted, is a natural language text, we must provide the best explanation of why the text would be true.</Paragraph> <Paragraph position="5"> In the TACITUS Project at SRI, we have developed a scheme for abductive inference that yields a significant simplification in the description of interpretation processes and a significant extension of the range of phenomena that can be captured. It has been implemented in the TACITUS System (Hobbs et ah, 1990; Stickel, 1989) and has been applied to several varieties of text. The framework suggests the integrated treatment of syntax, semantics, and pragmatics described below. Our principal aim in this paper is to examine the utility of this framework as a model for translation.</Paragraph> <Paragraph position="6"> In the abductive framework, what the interpretation of a sentence is can be described very concisely: To interpret a sentence: (1) Prove the logical form of the sentence, together with the constraints that predicates impose on their arguments, allowing for coercions, Merging redundancies where possible, Making assumptions where necessary.</Paragraph> <Paragraph position="7"> By the first line we mean &quot;prove from the predicate calculus axioms in the knowledge base, the logical form that has been produced by syntactic analysis and semantic translation of the sentence.&quot; In a discourse situation, the speaker and hearer both have their sets of private beliefs, and there is a large overlapping set of mutual beliefs. An utterance stands with one foot in mutual belief and one foot in the speaker's private beliefs. It is a bid to extend the area of mutual belief to include some private beliefs of the speaker's. It is anchored referentially in mutual belief, and when we prove the logical form and the constraints, we are recognizing this referential anchor. This is the given information, the definite, the presupposed. W\] ere it is necessary to make assumptions, the information comes from the speaker's private beliefs, and hence is the new information, the indefinite, the asserted. Merging redundancies is a way of getting a minimal, and hence a best, interpretation.</Paragraph> <Paragraph position="8"> An Example. This characterization, elegant though it may be, would be of no interest if it did not lead to the solution of the discourse problems we need to have solved. A brief example will illustrate that it indeed does.</Paragraph> <Paragraph position="9"> (2) The Tokyo office called.</Paragraph> <Paragraph position="10"> This example illustrates three problems in &quot;local pragmatics&quot;, the reference problem (What does &quot;the Tokyo office&quot; refer to?), the compound nominal interpretation problem (What is the implicit relation between Tokyo and the office?), and the metonymy problem (How can we coerce from the office to the person at the office who did the calling?).</Paragraph> <Paragraph position="11"> Let us put these problems aside, and interpret the sentence according to characterization (1). The logical form is something like (3) (3 e,x, o, b)caU'(e, x) A person(x) A rel(x, o) A office(o) A nn(t, o) A Tokyo(t) That is, there is a calling event e by a person x related somehow (possibly by identity) to the explicit subject of the sentence o, which is an office and bears some unspecified relation nn to t which is Tokyo.</Paragraph> <Paragraph position="12"> Suppose our knowledge base consists of the following facts: We know that there is a person John who works for O which is an office in Tokyo T.</Paragraph> <Paragraph position="13"> (4) person(J), work-for(J,O), office(O), in(O, T), Tokyo(T) Suppose we also know that work-for is a possible coercion relation, (5) (Vx,y)work-for(x,y) D ret(x,y) and that in is a possible implicit relation in compound nominals, (6) (V y, z)in(y, z) D nn(z, y) Then the proof of all but the first conjunct of (3) is straightforward. We thus assume (3e)call'(e, J), and this constitutes the new information.</Paragraph> <Paragraph position="14"> Notice now that all of our local pragmatics problems have been solved. &quot;The Tokyo office&quot; has been resolved to O. The implicit relation between Tokyo and the office has been determined to be the in relation. &quot;The Tokyo office&quot; has been coerced into &quot;John, who works for the Tokyo office.&quot; This is of course a simple example. More complex examples and arguments are given in Hobbs et al., (1990). A more detailed description of the method of abductive inference, particularly the system of weights and costs for choosing among possible interpretations, is given in that paper and in Stickel, (1989).</Paragraph> <Paragraph position="15"> The Integrated Framework. The idea of interpretation as abduction can be combined with the older idea of parsing as deduction (Kowalski, 1980, pp. 5253). Consider a grammar written in Prolog style just big enough to handle sentence (2).</Paragraph> <Paragraph position="17"> That is, if we have a noun phrasefrom &quot;inter-word point&quot; i to point j and a verb from j to k, then we have a sentence from i to k, and similarly for rule (8).</Paragraph> <Paragraph position="18"> We can integrate this with our abductive framework by moving the various pieces of expression (3) into these rules for syntax, as follows: (9) (V i, j, k, e, x, y, p)np(i, j, y) A v(j, k, p) A p'(e, x) ^ Req(p, x) A rel(., y) s(i, k, e) That is, if we have a noun phrase from i to j referring to y and a verb from j to k denoting predicate p, if there is an eventuality e which is the condition of p being true of some entity x (this corresponds to call'(e, x) in (3)), if x satisfies the selectional requirement p imposes on its argument (this corresponds to person(x)), and if x is somehow related to, or coercible from, y, then there is an interpretable sentence from i to k describing eventuality e.</Paragraph> <Paragraph position="19"> (lO) (V i, j, k, l, wl, w2, z, y)det(i, j, the) A n(j, k, wl) A n(k, l, w~) A Wl(Z) A w2(y) A .n(z, y) np(i, l, y) That is, if there is the determiner &quot;the&quot; from i to j, a noun from j to k denoting predicate wl, and another noun from k to l denoting predicate w2, if there is a z that wl is true of and a y that w2 is true of, and if there is an nn relation between z and y, then there is an interpretable noun phrase from i to l denoting y. These rules incorporate the syntax in the literals like v(j, k, p), the pragmatics in the literals like p'(e, x), and the compositional semantics in the way the pragmatics expressions are constructed out of the information provided by the syntactic expressions.</Paragraph> <Paragraph position="20"> To parse with a grammar in the Prolog style, we prove s(0, N) where N is the number of words in the sentence. To parse and interpret in the integrated framework, we prove (3 e)s(O, N, e).</Paragraph> <Paragraph position="21"> An appeal of such declarative frameworks is their usability for generation as well as interpretation (Shieber, 1988). Axioms (9) and (10) can be used for generation as well. In generation, we are given an eventuality E, and we need to find a sentence with some number n of words that describes it. Thus, we need to prove (3 n)s(O, n, E). Whereas in interpretation it is the new information that is assumed, in generation it is the terminal nodes, like v(j,k,p), that are assumed. Assuming them constitutes uttering them.</Paragraph> <Paragraph position="22"> Translation is a matter of interpreting in the source language (say, English) and generating in the target language (say, Japanese). Thus, it can be characterized as proving for a sentence with N words the expression (11) (Je, n)SE(O,N,e) A sj(O,n,e) (14) (Vi,j,k,l,e,p)pp(i,j,e) A pp(j,k,e) ^ v(k, l, p) ^ p(e) D s(i, t, (15) (Vi, j,k,x,e,part)np(i,j,x) A particle(j, k, part) A part(z, e) pp( i, k, e) (16) (Vi, j, k, l, x, y)nP(i, j, y) A particle(j, k, no) A up(k, l, x) A no(y, x) D np(i, l, z) (!7) (Vi, j,w,x)n(i,j,w) ^ w(z) D np(i,j,x) where sE is the root node of the English grammar and sj is the root node of the Japanese.</Paragraph> <Paragraph position="23"> Actually, this is not quite true. Missing in the logical form in (3) and in the grammar of (9) and (10) is the &quot;relative mutual identifiability&quot; relations that are encoded in the syntactic structure of sentences. For example, the office in (2) should be mutually identifiable once Tokyo is identified. In the absence Of these conditions, the generation conjunct of (11) only says to express something true of e, not something that will enable the hearer to identify it. Nevertheless, the: framework as it is developed so far will allow us to address some nontrivial problems in translation.</Paragraph> <Paragraph position="24"> This point exhibits a general problem in translation, machine or human, namely, how literal a translation should be produced. We may think of this as a scale. At one pole is what our current formalization yields--a translation that merely says something true about the eventuality asserted in the source sentence. At the other pole is a translation that translates explicitly every prop-erty that is explicit in the source sentence. Our translation below of example (2) lies somewhere in between these two poles. Ideally, the translation should be one that will lead the hearer to the same underlying situation as an interpretation. It is not yet clear how this can be specified formally.</Paragraph> <Paragraph position="25"> The Example Translated. An idiomatic translation of sentence (2) is pp(i, j, e) means that there is a particle phrase from i to j with the missing argument e. part is a particle .and the predicate it encodes.</Paragraph> <Paragraph position="26"> If we are going to translate between the two languages, we need axioms specifying the transfer relations. Let us suppose &quot;denwa&quot; is lexicMly anabiguous between the telephone instrument denwal and the calling event denwa2. This can be encoded in the two axioms</Paragraph> <Paragraph position="28"> Lexical disambiguation occurs as a byproduct of interpretation in this framework, when the proof of the logical form uses one or the other of these axioms.</Paragraph> <Paragraph position="29"> &quot;Denwa ga aru&quot; is an idiomatic way of expressing a calling event in Japanese. This can be expressed by the</Paragraph> <Paragraph position="31"> The agent of a calling event is also its source:</Paragraph> <Paragraph position="33"> We will need an axiom that coarsens the granularity of the source. If John is in Tokyo when he calls, then Tokyo as well as John is the source.</Paragraph> <Paragraph position="34"> A toy grammar plus pragmatics for Japanese, corresponding to the grammar of (9)-(10) is as follows2: 2For simplicity in this example, we are assuming the words of the sentences are given; in practice, this can be carried down to the level of characters.</Paragraph> <Paragraph position="35"> Finally, we will need axioms specifying the equivalence of the particle &quot;kara&quot; with the deep case Source (24) (V y, e)Source(y, e) ~ kate(y, e) and the equivalence between the particle &quot;no&quot; and the implicit relation in English compound nominals (25) (v y)n.(x, y) -= y) Note that these &quot;transfer&quot; axioms encode world knowledge (22 and 23), lexical ambiguities (18 and 19), direct relations between the two languages (20 and 25), and relations between the languages and deep &quot;interlingual&quot; predicates (21 and 24).</Paragraph> <Paragraph position="36"> The proof of expression (11), using the English grammar of (9)-(10), the knowledge base of (4)-(6), the Japanese grammar and lexicon of (14)-(19), and the transfer axioms of (20)-(25), is shown in Figure 1. Boxes are drawn around the expressions that need to be assumed, namely, the new information in the interpretation and the occurrence oflexical items in the generation. The axioms occur at a variety of levels, from the very superficial (axiom 25), to very language-pair specific transfer rules (axiom 20), to deep relations at the inter-lingual level (axioms 21-24). This approach thus permits mixing in one framework both transfer and interlingual approaches to translation. One can state transfer rules between two languages at various levels of linguistic abstraction, and between different levels of the respective languages. Such freedom in transfer is exactly what is needed for translation, especially for such typologically dissimilar languages as English and Japanese. It is thus possible to build a single system for translating among more than two languages in this framework, incorporating the labor savings of interlingual approaches while allowing the convenient specificities of transfer approaches. We should note that other translations for sentence (2) are possible in different contexts. Two other possibilities are the following: (26) Tokyo no office ga denwg shimashita.</Paragraph> <Paragraph position="37"> Tokyo's office Subj call did-Polite The Tokyo office made \[a/the\] call.</Paragraph> <Paragraph position="38"> (27) Tokyo no office kara no denwa ga arimashita.</Paragraph> <Paragraph position="39"> Tokyo's office from's call Subj existed-Polite There was the call from the Tokyo office (that we were expecting).</Paragraph> <Paragraph position="40"> The difference between (12) and (26) is the speaker's viewpoint. The speaker takes the receiver's viewpoint in (12), while it is neutral between the caller and the receiver in (26). (27) is a more specific version of (12) where the call is mutually identifiable. All of (12), (26) and (27) are polite with the suffix &quot;-masu&quot;. Non-polite variants are also possible translations.</Paragraph> <Paragraph position="41"> On the other hand, in the following sentence (28) Tokyo no office kara denwa shimashita.</Paragraph> <Paragraph position="42"> Tokyo's office from call did-Polite \[\] made \[a/the\] call from the Tokyo office.</Paragraph> <Paragraph position="43"> there is a strong inference that the caller is the speaker or someone else who is very salient in the current context. The use of &quot;shimashita&quot; (&quot;did&quot;)in (26) and (28)indicates the description from a neutral point of view of an event of some agent in the Tokyo office causing a telephone call to occur at the recipient's end. This neutral point of view is expressed in (26). In (28), the subject is omitted and hence must be salient, and consequently, the sentence is told from the caller's point of view. In (12) &quot;ari-mashita&quot; (&quot;existed&quot;)is used, and since the telephone call exists primarily, or only, at the recipient's end, it is assumed the speaker, at least in point of view, is at the receiver's end.</Paragraph> <Paragraph position="44"> Although we have not done it here, it looks as though these kinds of considerations can be formalized in our framework as well.</Paragraph> <Paragraph position="45"> Hard Problems: If a new approach to machine translation is to be compelling, it must show promise of being able to handle some of the hard problems. We have identified four especially hard problems in translating between English and Japanese.</Paragraph> <Paragraph position="46"> 1. The lexical differences (that occur between any two languages).</Paragraph> <Paragraph position="47"> 2. Honorifics.</Paragraph> <Paragraph position="48"> 3. Definiteness and number.</Paragraph> <Paragraph position="49"> 4. The context-dependent &quot;information structure&quot;. The last of these includes the use of &quot;wa&quot; versus &quot;ga&quot;, the order of noun phrases, and the omission of arguments. null These are the areas where one language's morphosyntax requires distinctions that are only implicit in the commonsense knowledge or context in the other language. Such problems cannot be handled by existing sentence-by-sentence translation systems without unnecessarily complicating the representations for each language. null In this short paper, we can only give the briefest indication of why we think our framework will be productive in investigating the first three of these problems.</Paragraph> <Paragraph position="50"> Lexical Differences. Lexical differences, where they can be specified precisely, can be encoded axiomatically:</Paragraph> <Paragraph position="52"> Information required for supplying Japanese numeral classifiers can be specified similarly. Thus the equivalence between the English &quot;two trees&quot; and the Japanese &quot;hi hon no ki&quot; can be captured by the axioms</Paragraph> <Paragraph position="54"> Honorifics. Politeness is expressed in very different ways in English and Japanese. In Japanese it is grammaticized and lexicalized in sometimes very elaborate ways in the form of honorifics. One might think that the problem of honorifics does not arise in most practical translation tasks, such as translating computer manuals. English lacks honorifics and in Japanese technical literature they are conventionalized. But if we are translating business letters, this aspect of language becomes very important. It is realized in English, but in a very different way. When one is writing to one's superiors, there is, for example, much more embedding of requests in hypotheticals. Consider for example the following English sentence and its most idiomatic translation: Would it perhaps be possible for you to lend me your book? Go-hon o kashite-itadak-e-masu ka.</Paragraph> <Paragraph position="55"> Honorific-book Obj lending-receive-can-Polite? In Japanese, the object requested is preceded by the honorific particle &quot;go&quot;, &quot;itadak&quot; is a verb used for a receiving by a lower status person from a higher status person, and &quot;masu&quot; is a politeness ending for verbs. In English, by contrast, the speaker embeds the request in various modals, &quot;would&quot;, &quot;perhaps&quot;, and &quot;possible&quot;, and uses a more formal register than normal, in his choice, for example, of &quot;perhaps&quot; rather than &quot;maybe&quot;. The facts about the use of honorifics can be encoded axiomatically, with predicates such as HigherStatus, where this information is known. Since all knowledge in this framework is expressed uniformly in predicate calculus axioms, it is straightforward to combine information from different &quot;knowledge sources&quot;, such as syntax and the speech act situation, into single rules. It is therefore relatively easy to write axioms that, for example, restrict the use of certain verbs, depending on the relative status of the agent and object, or the speaker and hearer. For example, &quot;to give&quot; is translated into the Japanese verb &quot;kudasaru&quot; if the giver is of higher status than the recipient, but into the verb &quot;sashiageru&quot; if the giver is of lower status. Similarly, the grammatical fact about the use of the suffix &quot;-masu&quot; and the fact about the speech act situation that speaker wishes to be polite may also be expressed in the same axiom.</Paragraph> <Paragraph position="56"> Definiteness and Number. The definiteness and number problem is illustrated by the fact that the Japanese word &quot;ki&quot; can be translated into &quot;the tree&quot; or &quot;a tree&quot; or &quot;the trees&quot; or &quot;trees&quot;. It is not so straight-forward to deal with this problem axiomatically. Nevertheless, our framework, based as it is on deep interpretation and on the distinction between given and new information, provides us with what we need to begin to address the problem. A first approximation of a method for translating Japanese NPs into English NPs is as follows: null 1. Resolve deep, i.e., find the referrent of the Japanese NP.</Paragraph> <Paragraph position="57"> 2. Does the Japanese NP refer to a set of two or more? If so, translate it as a plural, otherwise as a singular.</Paragraph> <Paragraph position="58"> 3. Is the entity (or set) &quot;mutually identifiable&quot;? If so, then translate it as a definite, otherwise as an indef null inite.</Paragraph> <Paragraph position="59"> &quot;Mutually identifiable&quot; means first of all that the description provided by the Japanese NP is mutually known, and secondly that there is a single most salient such entity. &quot;Most salient&quot; means that there are no other equally high-ranking interpretations of the Japanese sentence that resolve the NP in some other way. (Generic definite noun phrases are beyond the scope of this paper.) null Conclusion. We have sketched our solutions to the various problems in translation with a fairly broad brush in this short paper. We recognize that many details need to be worked out, and that in fact most of the work in machine translation is in working out the details. But we felt that in proposing a new formalism for translation research, it was important to stand back and get a view of the forest before moving in to examine the individual trees.</Paragraph> <Paragraph position="60"> Most machine translation systems today map the source language text into a logical form that is fairly close to the source language text, transform it into a logical form that is fairly close to a target language text, and generate the target language text. What is needed is first of all the possibility of doing deep interpretation when that is what is called for, and secondly the possibility of translating from the source to the target language at a variety of levels, from the most superficial to levels requiring deep interpretation and access to knowledge about the world, the context, and the speech act situation. This is precisely what the framework we have presented here makes possible.</Paragraph> </Section> class="xml-element"></Paper>