File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/80/c80-1064_abstr.xml
Size: 17,272 bytes
Last Modified: 2025-10-06 13:45:50
<?xml version="1.0" standalone="yes"?> <Paper uid="C80-1064"> <Title>Linguistics Department</Title> <Section position="1" start_page="0" end_page="428" type="abstr"> <SectionTitle> ITS: INTERACTIVE TRANSLATION SYSTEM </SectionTitle> <Paragraph position="0"> At COLING78 we reported on an interactive translation system now called ITS, which uses on-line man-machine interaction. This paper is an update on ITS with suggestions for future work.</Paragraph> <Paragraph position="1"> Summary of ITS ITS is a second-generation machine translation system. Processing is divided into three major steps: analysis, transfer, and synthesis. Analysis is generally independent of the target language, and synthesis is nearly independent of the source language. The transfer step is dependent on both source and target languages. The intermediate representation produced by analysis, adjusted by transfer, and processed by synthesis is defined by Junction Grammar I and consists of objects called J-trees.</Paragraph> <Paragraph position="2"> The general characteristics of ITS, namely, modular design and a full syntactic analysis as an intermediate representation, place ITS in the family of second generation systems including the Montreal project and the Grenoble project.</Paragraph> <Paragraph position="3"> However, there are some further characteristics of ITS which must be mentioned to allow a better understanding of the system.</Paragraph> <Paragraph position="4"> The source texts for ITS are selected from rather general English found in the publications of the LDS Church. These publications are not at all restricted to theological dissertations. They include articles from many authors in many styles on subjects ranging from gardening to short stories to descriptions of foreign cultures to parent-child relations. The source texts for the Montreal project, on the other hand, consist of documents from a carefully selected sub-language. Their first commercial system, the TAUM-METEO system, translates weather forecasts. Their current project, TAUM-AVIATION, is much more ambitious but is nevertheless restricted to manuals concerning the hydraulic systems of jet aircraft. When a system is designed to translate a particular sub-language, the system can take advantage of the properties of that sublanguage. null Returning to the general English texts which are the specified input to ITS, we find that there are no well-defined properties of the texts which can be used to reduce ambiguity. This is why it was decided to use on-line man-machine interaction during the actual process of translation in ITS. This human interaction is, of course, expensive and so it was decided to share the burden of interaction by analyzing the source text once and using the result of analysis as the input to several transfer-synthesis pairs. This means that ITS is a one-to-many system. Being one-to-many not only allows the overhead of analysis to be shared but also provides an element of consistency in the output across languages. This is because all the target language texts are derived from the same intermediate representation produced by analysis.</Paragraph> <Paragraph position="5"> Thus, ITS is a second generation, interactive, one-to-many translation system intended for general English text. The current target languages are Spanish, Portuguese, German, French, and (on a limited basis) Chinese.</Paragraph> <Paragraph position="6"> During the last two years, ITS has been developed to the point where it has been tested for the possibility of commercial use. The dictionaries contain nearly thirty thousand words, not counting idioms and other multi-word units. The system has recently translated a series of test passages totaling over ten thousand words.</Paragraph> <Paragraph position="7"> The results of the tests were encouraging.</Paragraph> <Paragraph position="8"> The system was shown to be capable of translating rather general text. Thanks to human interaction and an extensive English grammar, nearly all the sentences of the source text receive a full Junction Grammar analysis which contains some semantic information beyond phrase structure; and the output, though far from perfect, is worth post-editing, and could be considerably improved with more dictionary work.</Paragraph> <Paragraph position="9"> A major disappointment was the amount of time spent on human interaction. It was originally hoped that interaction could be restricted to analysis, since the overhead of analysis is shared by several target languages. However, it was found that some interaction was also required in transfer. The problem is that interaction in transfer is specific to one target language and is not shared. The result is that the average amount of interaction per page (~50 words) of text is currently about seven minutes for analysis (i.e. about thirty-five minutes shared by five languages), ten minutes for transfer interaction and fifteen minutes for post-editing. That makes a total of about thirty minutes per page per language. These thirty minutes per page include a post-edit step by a human translator and so the translation is roughly equivalent to a first draft translation by a human translator. The time spent per page by a human translator varies considerably but averages to approximately one hour per page.</Paragraph> <Paragraph position="10"> Thus ITS appears to be somewhat faster than human translation and promises a degree of consistency when a large document involves several translators. However, an examination of the total process of translation by manual methods and by the current version of ITS resulted in a recent decision not to attempt a commercialization of ITS at this time. Some points to consider are the following: (I) Even though ITS may be slightly faster than manual translation, it is not yet fast enough to justify the effort required to build and maintain the dictionaries and the expense of maintaining a computer to run ITS on.</Paragraph> <Paragraph position="11"> (2) The on-line interaction requires specially trained operators.</Paragraph> <Paragraph position="12"> (3) Most translators do not enjoy post-editing machine translation.</Paragraph> <Paragraph position="13"> (4) The scheduling of processing in a one-to null many system is rather tedious.</Paragraph> <Paragraph position="14"> Even though the current version of ITS is not being commercialized, the authors remain optimistic about the future of interactive machine translation.</Paragraph> <Paragraph position="15"> Interactive translation will be the most useful in translating texts which are too general or varied to be considered part of a sub-language. The limits of fully-automatic translation are well-known. To date, fully automatic translation has been shown to be commercially useful only when it is intended to be merely indicative (e.g. Russian-English MT at Rome Air Force Base) or when the system is tailored to a sub-language (e.g. TAUM-METEO). If the need is for high-quality translation of general text, the only possibilities seem to be (i) a highly successful large-scale AI approach, probably with a self-learning capability or (2) an interactive approach, with limited self-learning capability if possible.</Paragraph> <Paragraph position="16"> To be more specific on option (2), the authors foresee a translator's work station which would support two modes of operation.</Paragraph> <Paragraph position="17"> In one mode it would be a sophisticated but easy to use word processor. Of course, multiple character sets would be available on the video display and the typewriter quality hardcopy device. The station would also contain a glossary which could be updated by the translator. The translator could even link up to a major terminology bank over the phone.</Paragraph> <Paragraph position="18"> Martin Kay (Xerox) is currently working on such a station, with his own variations.</Paragraph> <Paragraph position="19"> In the second mode, the work station would be an interactive translation system.</Paragraph> <Paragraph position="20"> Source text could be entered directly from the keyboard or the translator could insert a diskette containing the source text as it was first entered on a word processor for publication in the source language. Eventually, OCR may be a practical input process for such a station. At any rate, after the source text is entered, the station would interactively resolve ambiguities and other problems in the translation. The interaction, to be attractive, would have to average under ten minutes per page for a one-to-one translation and the output would have to be of such high quality that it could pass as a human translation with only a few minor post-edit changes per page.</Paragraph> <Paragraph position="21"> The work station would also have to be reasonably priced (less than a compact automobile). null Finally, and most importantly, the work station must work! That is, the station must be easy enough to use that the translator will want to use it. The first mode (word processor with dictionary lookup) must allow the translator to produce a quality translation faster than by manual methods. And the second mode (interactive translation system) must allow the translator to produce a quality translation faster than by using a word processor.</Paragraph> <Paragraph position="22"> When a work station such as the one just described comes about, there may be few translators who want to try it, even if it works. Of course, if it does work, translators will have to have been involved in its development. But once a few translators venture voluntarily to use it and find that it makes them more productive and cost effective, then the pressure to use it will come from within the translator community, not from outside.</Paragraph> <Paragraph position="23"> This is one view of how the computer will be used in translation. Rather than replacing human translators, computers will serve human translators. It agrees with Andreyewsky's advice to translators: &quot;instead of fighting a 'win/lose' battle with the machines, we must work toward developing an 'everybody wins' frame of reference. ''2 The rest of this paper will consist of several independent comments on some of the work on ITS the last two years which may be useful or at least interesting to others working in machine translation. The comments will concern word selection problems, interaction, and linguistic programming languages used in transfer.</Paragraph> <Paragraph position="24"> .... 425-</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Word Selection </SectionTitle> <Paragraph position="0"> Selection of an appropriate translation, that is, replacement of source words and phrases is much more challenging in general text than in a sub-language. Several years ago, it was hoped that interaction in analysis to select word senses would permit selection of translation equivalents in transfer without further interaction. Instead it was found that the mapping between words in English and multiple target languages is so variable that no one set of English word senses will satisfy all the target languages. To illustrate this, two sample words will be considered: &quot;run&quot; and &quot;line.&quot; Sample phrases taken from a corpus of LDS English will be given. A variety of uses on a single word is manageable but consider the problem of anticipating all possible translations and how to distinguish them on more than thirty thousand words.</Paragraph> </Section> <Section position="2" start_page="0" end_page="426" type="sub_section"> <SectionTitle> Run </SectionTitle> <Paragraph position="0"> &quot;draw a crazy (crooked) line on a piece of paper&quot; &quot;they would be taught line upon line&quot; &quot;the hard leathery lines of grandmother's face&quot; &quot;those production lines&quot; Interaction The use of interaction is not a panacea. It is difficult to know when to ask a question. A major reason for the excessive interaction in the current version of ITS is the problem of knowing when to ask a particular question. How does one know if a particular ambiguity is going to make a difference in the translation or if the ambiguity will be preserved? Another problem is that many sentences that are not ambiguous to a human contain ambiguities which a machine can resolve automatically only by semantic tests. Semantic tests can be made in sub-languages with some success, but it is not easy. In general text, the increased variety means that semantic tests are much more difficult to make reliably. Consider, for example, the problem of making a general system that would naturally take care of the following ambiguities: (i) The fact that the man knew surprised us.</Paragraph> <Paragraph position="1"> (the fact which he knew vs. the fact of his knowing) (2) I saw her shaking hands.</Paragraph> <Paragraph position="2"> (...her hands were shaking vs. she shook hands with someone) (3) I made my brother a statue.</Paragraph> <Paragraph position="3"> (I made a statue for him vs. I made him into a statue) (4) Beware of falling victims to error.</Paragraph> <Paragraph position="4"> (beware of becoming victims or beware of victims that are falling, cf &quot;beware of falling rocks&quot;) The last example is not ambiguous to a human but how does one resolve the ambiguity without a rule specific to &quot;falling?&quot; Hopefully, AI techniques will someday be able to resolve such ambiguities in large scale systems as well as they now do in restricted tests systems. Indeed, major advances in AI may well be essential to acceptable functioning of the work station described above, but there is enough of Joseph Weizenbaum in me to believe that even in the long run human interaction will be needed to produce high quality raw output in the machine translation of general text. A reasonable question would be: &quot;why not guess on the unsure points during translation and let the post-editor clean them up?&quot; One answer is that raw output must be close to final form or the post-editor will either want to start over or will not post-edit up to high quality. In other words, a post-editor will only improve a text by a certain increment. This has been determined in experience with manual human translation being post-edited by another human. Another answer is that interaction may well point out ambiguities which could be missed by a human translator-particularly a native of the target language. An empirical method of determining the appropriate mix between interaction and post-editing is to have the program attach to each potential interaction a guess with a confidence level. Then the translator sets an interaction level. Then interactions will not occur if the confidence level exceeds the interaction level. This technique has been partially implemented in ITS and seems informative. null</Paragraph> </Section> <Section position="3" start_page="426" end_page="428" type="sub_section"> <SectionTitle> Transfer Languases </SectionTitle> <Paragraph position="0"> For those interested in comparing systems, we include two sample entries from the transfer dictionary of ITS 3 and two sample entries from the transfer dictionary of the TAUM-AVIATION system. 4 adjective, set the preposition, and make the direct object the object of the preposition. 1-3: Interact on the two translation choices given.</Paragraph> <Paragraph position="1"> 4: When &quot;i&quot; is returned from the MAP action code (&quot;ser conti/guo eom&quot; was chosen), call dictionary subroutine = PA, passing the 3 parameters shown.</Paragraph> <Paragraph position="2"> (=PA) i: Associate the 3 names given with the parameters passed.</Paragraph> <Paragraph position="3"> 2: Comment 3: Build a predicate adjective, setting the verb and predicate adjective to the values (&quot;ser&quot; and &quot;conti/guo&quot;) of the respective parameters.</Paragraph> <Paragraph position="4"> 4: Move any insertions on the verb to the predicate adjective.</Paragraph> <Paragraph position="5"> 5: Find the direct object of the verb (if any).</Paragraph> <Paragraph position="6"> 6. If direct object is found then build a prepositional phrase modifying the predicate I: Find prepositional object.</Paragraph> <Paragraph position="7"> 2: If found, then execute &do-group. 3: When object is a noun clause, set &quot;after&quot; to &quot;depois&quot;.</Paragraph> <Paragraph position="8"> 4: When object is a gerund, set &quot;after&quot; to &quot;depois de&quot;.</Paragraph> <Paragraph position="9"> 5-8: Otherwise, interact on the three translation choices given (utilizing disambiguation strings if present).</Paragraph> <Paragraph position="10"> 9: End of &do-group.</Paragraph> <Paragraph position="11"> 10-12: If object was not found, interact on the two translation choices given (check for disambiguation strings).</Paragraph> <Paragraph position="12"> NOTE: This transfer is referential only and no structural manipulation is performed.</Paragraph> </Section> </Section> class="xml-element"></Paper>