File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/90/c90-3074_metho.xml
Size: 13,837 bytes
Last Modified: 2025-10-06 14:12:30
<?xml version="1.0" standalone="yes"?> <Paper uid="C90-3074"> <Title>A Machine Translation System for the Target Language Inexpert</Title> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Approaches to Assuring </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Correct Translation 2.1 Back Translation </SectionTitle> <Paragraph position="0"> The seemingly most natural way of finding out whether the translation into the target language (henceforth forward translation) is correct is to translate the forward translation back into the source language (back translation). This approach can indeed expose the wrong forward translations if the back translations differ significantly from the original, but is has its complexities and is not reliable. An MT system may either employ grammars designed to be reversible, i.e., to be used for both generation and analy*Current address: AI Systems Section, 'felecom Kesearch Laboratories, 770 Blackburn Rd, Clayton 3168, Australia sis, or employ separate grammars for generation and analysis. In the former case, theoretically whatever the forward translations are, the back translation will always produce sentences in the source language that are very close, if not identical, to the original. As a result we have no way to tell whether the forward translation is correct or not.</Paragraph> <Paragraph position="1"> If the back translation is done in a system where separate grammars are used for generation and analysis, then the back translation itself may be incorrect inasmuch as the forward translation might be, so that a corT~ect forward translation may be wrongly translated back, or vice versa.</Paragraph> <Paragraph position="2"> In either case there is no guarantee of accuracy.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Paraphrasing </SectionTitle> <Paragraph position="0"> In this approach, the system generates paraphrase(s) for the original sentence, based upon the syntactic and semantic analysis result, before passing it for further processing (by the transfer and/or generation component). The user checks the paraphrase and gives the system directives by either confirming or rejecting it. For example, for the input sentence (la) The man saw the woman in the park with the telescope.</Paragraph> <Paragraph position="1"> the system might produce the following paraphrase for the user to check: (lb) With the telescope, the man saw the woman who was in the park.</Paragraph> <Paragraph position="2"> The problem with the paraphrasing approach is that the recovery may come in late, after a considerable amount of time is spent doing all the syntactic and semantic analysis required for generating the paraphrase. If the input sentence is tong and complex the cost can be high. Furthermore, it may not be unusual that only after several trials the user finally accepts an appro-364 i priate :paraphrase, with all the previous efforts wasted.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.3 Pre-editing </SectionTitle> <Paragraph position="0"> This is the approach adopted by many comrnerciM MT systems (very often paired with postediting). It requires the user to edit the input sentences before they are passed to the system for analysis and subsequent processing. Tile machine has a predefined translation capacity which must be known to the use:' so that anything in the text which may cause difficulties to the system will be removed or rewritten. For example, for the following sentence (2a) The woman cannot bear children.</Paragraph> <Paragraph position="1"> if the u';er knows the system would have difficulties resolving the ambiguous word &quot;bear&quot;, s/he can rewrite the sentence as follows (if ttlat is what s/he wants to say): (2b) The woman cannot give birth to children.</Paragraph> <Paragraph position="2"> To apply pre-editing, a set of rules must first be devi,;ed to set up lexicat and structural constraints, then the user must keep the rules in mind and apply them consistently. This may involve expensive training of system users and impose strong restrictions on the system use in practice.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.4 Interactive Disamblguation </SectionTitle> <Paragraph position="0"> Interactive disambiguation can take place at two levels. At the lexical level, whenever an ambiguous word is encountered, the user can be asked to help. (Carbonell and Tornita, 1987) gives an example of reso!ving word-sense ambiguity in this approach: The word &quot;pen&quot; means: 1) a writing pen 2) a play pen N UMBER ?> The problem with this approach is, given that any natural language is highly polysemous, the frequent occurrence of ambiguity at tile lexical level will unnecessarily prolong the translation process a~d easily bore the user. Moreover, with lexical items with many different senses, it may become very difficult to pinpoint one in particular from a screenful of choices.</Paragraph> <Paragraph position="1"> At the structural level, ambiguities can also be referred to the user for disam.biguation~ as is done in Ntran, an English-to-Japanese prototype MT system for monolingual users (Wood and Chandler, 1988). For the sentence (3) The cursor corresponds to the puck posi~ tion on the tablet.</Paragraph> <Paragraph position="2"> Ntran asks the user to choose either 1 or 2 from the following: user must be more or less familiar with linguistics to make correct choices. This may inconvenience some users; but the more severe problem is that the number of possible interpretations of an ambiguous structure can reach the hundreds (Church and Pall1, 1982), making their handling very difficult.</Paragraph> </Section> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Selective Confirmation </SectionTitle> <Paragraph position="0"> The basic idea behind our approach, described in this section, is to let the machine do most of the work without human interference, and only at certain decision making points ask for human assistance. We choose phrases as the level for user confirmation because at this level tile system both avoids frequent and unintelligent questioning of the user (as is the case with the interactive disambignation approach at the lexicM level) and does not suffer from late recovery (as is the case with the paraphrase strategy). As an example, let us consider the following sentence.</Paragraph> <Paragraph position="1"> (4) The tMented conductor dated a young star.</Paragraph> <Paragraph position="2"> The system does not ask for the user's help the first time it sees the word &quot;conductor&quot;, but wMts tili the analysis of the NP with &quot;conductor&quot; as its head is complete just before the tree representation is built. At this point it asks: Does &quot;conductor&quot; here mean &quot;an official on a bus or&quot; train or tram who collects fares&quot;? (y/n) tIere the system selected one of the &quot;human being&quot; senses of the word &quot;conductor&quot; as a result of carrying out semantic matches of the modifier &quot;talented&quot; and the head noun (a tram conductor can certainly be talented). The order of selecting the &quot;bus conductor&quot; sense vs. the &quot;orchestra conductor&quot; one is arbitrary, though with more domain information and statisticM consideration preference can be given to one over the other. If the user answers &quot;no&quot; the system backtracks to find another &quot;human being&quot; interpretation of the word: Does &quot;conductor&quot; here mean &quot;a person who conducts an orchestra&quot; ? (y/n) The answer from the user at this point is likely to be &quot;yes&quot;. If, however, the user insists on 2 365 &quot;no&quot; the system will relax the semantic constraints (Huang, 1988) and accepts the &quot;substance&quot; sense of the word, treating &quot;talented&quot; as used metaphorically.</Paragraph> <Paragraph position="3"> Suppose the user has chosen the &quot;orchestra conductor&quot; sense. The system continues the parsing to process the finite verb &quot;dated&quot; and upon the completion of subject-verb match for deciding a proper sense for the verb which suits the selected sense for &quot;conductor&quot;, it asks the user to confirm its choice: Does &quot;date&quot; here mean &quot;to go out on dates with&quot;? (y/n) If the user answers &quot;yes&quot; the systems carries on to find an interpretation for the object NP, using the chosen sense of &quot;date&quot; to help disambiguate &quot;star&quot;: Does &quot;star&quot; here mean '% celebrity&quot;? (y/n) Similar to the confirmations described above, when prepositional phrases are processed confirmation is carried out after the semantic matches have resolved the attachment ambiguity of the PPs. Confirmations would also be needed at points where semantic matches are carried out to resolve the coordinate conjunction ambiguity, such as contained in the phrase &quot;the man and the woman with an umbrella&quot; (&quot;\[the man\] and \[the woman with an umbrella\]&quot; vs. &quot;\[the man and the woman\] with an umbrella&quot;) 1 (I-Iuang, 1983). An important consideration in designing the interactive system is the number of questions asked of the user. The less questions asked, the more productive the system will be, and the less bored the user will become.</Paragraph> <Paragraph position="4"> Ideally the system should be intelligent enough not to ask for confirmation about the word &quot;dated&quot; when processing (5) ANU dated the world's oldest rock.</Paragraph> <Paragraph position="5"> or (6) Ann dated the department's oldest professor. null Whereas if the input sentence is (7) Ann dated the town's oldest coach.</Paragraph> <Paragraph position="6"> the user may not feel it unreasonable if the sys- null tem asks him/her to confirm the disambiguation of &quot;dated&quot;, &quot;oldest&quot;, and &quot;coach&quot;. What makes difference here is the so called &quot;genuine&quot; ambiguity (sentences for which more than 1At this stage only confirmations involving NPs, NP-Verb pairs and Verb-Np pairs are implemented, although in the automatic version of the system all semantic matches have been executed ((Huang, 1987) and (Wilks et ~1., 198~)).</Paragraph> <Paragraph position="7"> one interpretation is meaningful): sentences (5) and (6) should have only one valid interpretation each according to our common knowledge even though when standing in isolation the word &quot;dated&quot; is ambiguous, whereas sentence (7) might have two meaningful readings.</Paragraph> <Paragraph position="8"> If user confirmation is required only for sentences containing genuine ambiguity, the system will become much more efficient, without endangering the quality of the translation. But to decide when a word or a structure is &quot;genuinely&quot; ambiguous may involve more computing resources than is worthwhile employing. First of all it has to exhaust the whole search space, both syntactic and semantic, to find out whether more than one meaningful interpretation exists, despite the fact that often the first such interpretation might well be the only one. Secondly it can be very tricky to draw a clear line between &quot;meaningful&quot; and &quot;meaningless&quot; interpretations. For example, the sentence (8) John cannot bear children.</Paragraph> <Paragraph position="9"> might be judged as not &quot;genuinely&quot; ambiguous because common sense dictates that since John is a male (based on world knowledge of names), it is impossible for him to give birth to children, and therefore &quot;bear&quot; can have only the interpretation &quot;to tolerate&quot; in this context. But then if you tell your granddaughter &quot;Joim cannot bear children because he is a male&quot;, you are using &quot;bear&quot; in the other sense, although &quot;tolerate&quot; i,~ still an acceptable interpretation for %ear&quot;.</Paragraph> <Paragraph position="10"> For this reason at the current stage we have not attempted at singling out genuinely ambiguous sentences but instead we employ certain domain information and statistical consideration in arranging the system's lexicon, so that for sentence (4) (&quot;The talented conductor dated a young star.&quot;) it may well be the case that the user need only answer &quot;yes&quot; once to each of the questions the system asks. The worst case is when the intended meaning of the sentence is &quot;abnormal&quot; or metaphorical, for example when it is meant for sentence (4) to be interpreted as &quot;the talented tram conductor ascertained the age of a new celestial object&quot;. In such cases the user will have to answer the system's questions several times over the same word.</Paragraph> <Paragraph position="11"> With richer knowledge bases and more powerful inference engines and computing facilities it may become practical to first recognize genuinely ambiguous sentences and then to assign scores to different interpretations of such sentences, and fi366 3 nally present them to the user ibr confirmation in the order of their scores.</Paragraph> </Section> class="xml-element"></Paper>