File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/84/p84-1086_metho.xml
Size: 16,923 bytes
Last Modified: 2025-10-06 14:11:42
<?xml version="1.0" standalone="yes"?> <Paper uid="P84-1086"> <Title>DEALING WITH INCOMPLETENESS OF LINGUISTIC KNOWLEDGE IN LANGUAGE TRANSLATION TRANSFER AND GENERATION STAGE OF MU MACHINE TRANSLATION PROJECT</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> DEALING WITH INCOMPLETENESS OF LINGUISTIC KNOWLEDGE IN LANGUAGE TRANSLATION TRANSFER AND GENERATION STAGE OF MU MACHINE TRANSLATION PROJECT </SectionTitle> <Paragraph position="0"/> </Section> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> I. INTRODUCTION </SectionTitle> <Paragraph position="0"> Linguistic knowledge usable for machine translation is always imperfect. We cannot be free from the uncertainty of knowledge we have for machine translation. Especially at the transfer stage of machine translation, the selection of target language expression is rather subjective and optional.</Paragraph> <Paragraph position="1"> Therefore the linguistic contents of machine translation system always fluctuate, and make gradual progress. The system should be designed to allow such constant change and improvements. This paper explains the details of the transfer and generation stages of Japanese-to-English system of the machine translation project by the Japanese Government, with the emphasis on the ideas to deal with the incompleteness of linguistic knowledge for machine translation.</Paragraph> </Section> <Section position="3" start_page="0" end_page="420" type="metho"> <SectionTitle> 2. DESIGN STRATEGIES </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Annotated Dependency Structure </SectionTitle> <Paragraph position="0"> The intermediate representation we adopted as the result of analysis in our machine translation is the annotated dependency structure. Each node has arbitrary number of features as shown in Fig. i.</Paragraph> <Paragraph position="1"> This makes it possible to access the constituents by more than one linguistic cues. This representation is therefore powerful and flexible for the sophisticated grammatical and semantic checking, especially when the completeness of semantic analysis is not assured and trial-and-error improvements are required at the transfer and generation stages.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Multiple L~aver Grammar </SectionTitle> <Paragraph position="0"> We have three conceptual levels for grammar rules.</Paragraph> <Paragraph position="1"> lowest level: default grammar which guarantees the output of the translation process. The quality of the translation is not assured. Rules of this level apply to those inputs for which no higher layer grammar rules are applicable.</Paragraph> <Paragraph position="2"> kernel level: main grammar which chooses and generates target language structure according to semantic relations among constituents which are determined in the analysis stage.</Paragraph> <Paragraph position="3"> topmost level: heuristic grammar which attempts to get elegant translation for the input. Each rule bears heuristic nature in the sense that it is word specific and it is applicable only to some restricted classes of inputs.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.3 Multiple Relation Structure </SectionTitle> <Paragraph position="0"> In principle, we use deep case dependency structure as a semantic representation. Theoretically we can assign a unique case dependency structure to each input sentence. In practice, however, analysis phase may fail or may assign a wrong structure. Therefore we use as an intermediate representation a structure which makes it possible to annotate multiple possibilities as well as multiple level representation. An example is shown in Fig. 2. Properties at a node is represented as a vector, so that this complex dependency structure is flexible in the sense that different interpretation rules can be applied to the structure.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.4 Lexicon Driven Feature </SectionTitle> <Paragraph position="0"> Besides the transfer and generation rules which involve semantic checking functions, the grammar allows the reference to a lexical item in the dictionary. A lexical item contains its special grammatical usages and idiomatic expressions. During the transfer and generation stages, the~e rules are activated with the highest priority.</Paragraph> <Paragraph position="1"> This feature makes the system very flexible for dealing with exceptional cases. The improvement of translation quality can be achieved progressively by adding linguistic information and word usages in the dictionary entries.</Paragraph> </Section> <Section position="5" start_page="0" end_page="420" type="sub_section"> <SectionTitle> 2.5 Format-Oriented Description of Dictionary Entries </SectionTitle> <Paragraph position="0"> The quality of a machine translation system heavily depends on the quality of the dictionary.</Paragraph> <Paragraph position="1"> In order to build a machine translation dictionary, we collaborate with expert translators. We developed a format-oriented language to allow computer-naive human translators to encode their expertise without any conscious effort on programming.</Paragraph> <Paragraph position="2"> Although the format-oriented language we developed lacks full expressive power for highly sophisticated linguistic phenomena, it can cover most of the common lexical information translators may want to describe. The formatted description is automatically converted into statements in GRADE, a programming language developed by the Mu-Project. We prepared a manual according to which a man can fill in the dictionary format with linguistic data of items. The manual guarantees a certain level of quality of the dictionary, which is important when many people have to work in parallel.</Paragraph> <Paragraph position="3"> (Due %0 the advance of electronic instrumentation, auwmsted ship increases in number.)</Paragraph> <Paragraph position="5"> dummy nodes Fig. i. Representation of analysis result by features. his work o work .... \] work work</Paragraph> <Paragraph position="7"/> </Section> </Section> <Section position="4" start_page="420" end_page="422" type="metho"> <SectionTitle> 3. ORGANIZATION OF GRAMMAR RULES FOR TRANSFER AND GENERATION STAGES </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="420" end_page="421" type="sub_section"> <SectionTitle> 3.1 Heuristic Rule First </SectionTitle> <Paragraph position="0"> Grammar rules are organized along the principle that &quot;if better rule exists then the system uses it; otherwise the system attempts to use a standard rule: if it fails, the system will use a default rule.&quot; The grammar rule involves a number of stages for applying heuristic rules. Fig. 3 shows a processing flow for the transfer and generation stages.</Paragraph> <Paragraph position="1"> Heuristic rules are word specific. GRADE makes it possible to define word specific rules. Such rules can be invoked in many ways. For example, we can associate a word selection rule for an ordinary verb in a dictionary entry for a noun, as shown in terna l P re-transfer ~ post-transfer loop loop in TRANSFER internal</Paragraph> <Paragraph position="3"> ;J. !</Paragraph> </Section> <Section position="2" start_page="421" end_page="421" type="sub_section"> <SectionTitle> 3.2 Pre-transfer Rules </SectionTitle> <Paragraph position="0"> Some heuristic rules are activated just after the standard analysis of a Japanese sentence is finished, to obtain a more neutral (or target language oriente~ analyzed structure. We call such invocation the pre-transfer loop. Semantic and pragmatic interpretation are done in the pre-transfer loop. The more heuristic rules are applied in this loop, the better result will be obtained. Figs. 5 and 6 show some examples.</Paragraph> </Section> <Section position="3" start_page="421" end_page="422" type="sub_section"> <SectionTitle> 3.3 Word Selection in Target Language by Using Semantic Markers </SectionTitle> <Paragraph position="0"> Word selection in the target language is a big problem in machine translation. There are varieties of choices of translation for a word in the source language. Main principles adopted in our system are, (i) Area restriction by using field code, such as electrical Engineering, nuclear science, medicine, and so on.</Paragraph> <Paragraph position="1"> (2) Semantic code attached to a word in the analysis phase is used for the selection ofaproper target language word or a phrase.</Paragraph> <Paragraph position="2"> (3) Sentential structure of the vicinity of a word to be translated is sometimes effective for the determination of a proper word or a phrase in the target language. Table i shows examples of a part of the verb transfer dictionary. Selection of English verb is done by the semantic categories of nouns related to the verb. The number i attached to verbs like form-l, produce2 is the i-th usage of the verb. When the semantic information of nouns is not available, the column indicated by ~ is applied to</Paragraph> <Paragraph position="4"> produce a default translation.</Paragraph> <Paragraph position="5"> In most cases, we can use a fixed format for describing a translation rule for lexical items. We developed a number of dictionary formats specially designed for the ease of dictionary input by computer-naive expert translators. The expressive power of format-oriented description is, however, insufficient for a number of common verbs such as &quot;~ ~ &quot; (make, do, perform .... ) and &quot;~ ~ &quot; (become, consist of, provide, ...) etc. In such cases, we can encode transfer rules directly by GRADE. An example is shown in Fig. 7. Varieties of usages are to be listed up with their corresponding English sentential structures and semantic conditions.</Paragraph> </Section> <Section position="4" start_page="422" end_page="422" type="sub_section"> <SectionTitle> 3.4 Post-Transfer Rules </SectionTitle> <Paragraph position="0"> The transfer stage bridges the gap between Japanese and English expressions.</Paragraph> <Paragraph position="1"> There are still many odd structures after this stage, and we have to adjust further more the English internal representation into more natural ones. We call this part as post-transfer loop.</Paragraph> <Paragraph position="2"> An example is given in Fig. 8, where a Japanese factitive verb is first transferred to English &quot;make&quot;, and then a structural change is made to eliminate it, and to have a more direct expression.</Paragraph> </Section> </Section> <Section position="5" start_page="422" end_page="425" type="metho"> <SectionTitle> 4. GENERATION PROCESS </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="422" end_page="422" type="sub_section"> <SectionTitle> 4.1 Translation of Japanese Postpositions </SectionTitle> <Paragraph position="0"> Postpositions in Japanese generally express the case slots for verbs. A postposition, however, has different usages, and the determination of English prepositions for each postposition is quite difficult. It also depends on the verb which governs the noun phrase having that postposition.</Paragraph> <Paragraph position="1"> Table 2 illustrates a part of a default table for determining deep and surface case labels when no higher level rule applies. This sort of tables are defined for all case combination. In this way, we confirm at least one translation to be assigned to an input. A particular usage of preposition for a particular English verb is written in the lexical entry of the verb.</Paragraph> </Section> <Section position="2" start_page="422" end_page="423" type="sub_section"> <SectionTitle> 4.2 Determination of Global Sentential </SectionTitle> <Paragraph position="0"> of popular verbs.</Paragraph> <Paragraph position="1"> Grobal sentential structures of Japanese and English are quite different, and correspondingly the internal structure of a Japanese sentence is not the same as that of English. Fundamental difference from Japanese internal representation to that of English is absorbed at the (pre-, post -) transfer stages. But at the stage of English generation, some structural transformations are still required in such oases as (a) embedded sentential structure, (b) complex sentential structure.</Paragraph> <Paragraph position="2"> We classified four kinds of embedded sentential structures.</Paragraph> <Paragraph position="3"> (i) a case slot of an embedded sentence is vacant, and the noun modified by the embedded sentence comes to fill the slot.</Paragraph> <Paragraph position="4"> (~)The form like &quot;NI~&quot; V ~ N2&quot; m &quot; (N 2 ~ NI~'V ) N2&quot;. In this case the noun N I must have the semantic properties like parts, attributes, and action.</Paragraph> <Paragraph position="5"> (~i~)The third and the fourth classes are particular embedded expressions in Japanese, which have the connecting expressions like &quot;~ &quot; (in the case of), &quot; ~9~ &quot; (in the way that, &quot;g~,P &quot; (in that), and so on.</Paragraph> <Paragraph position="6"> An example of the structural transformation is shown in Fig. 9. The relative clause &quot;vhy...&quot; is generated after the structural transformation. Connection of two sentences in the compound and complex sentences is done according to Table 3. An example is given in Fig. i0.</Paragraph> </Section> <Section position="3" start_page="423" end_page="425" type="sub_section"> <SectionTitle> 4.3 The Process of Sentence Generation in English </SectionTitle> <Paragraph position="0"> After the transfer is done from the Japanese deep dependency structure to the English one, conversion is done to a phrase structure tree with all the surface words attached to the tree. The processes explained in 4.1 and 4.2 are involved at this generation stage. The conversion is performed top-down from the root node of the dependency tree to the leaf. Therefore when a governing verb demands a noun phrase expression or a to-infinitive expression to its dependent phrase, the structural change of the phrase must be performed. Noun to verb transformation, and noun to adjective sentence of type 3.</Paragraph> <Paragraph position="1"> an embedded sentence.</Paragraph> <Paragraph position="2"> transformation are often required due to the difference of expressions in Japanese and English. This process goes down from the root node to all the leaf nodes.</Paragraph> <Paragraph position="3"> After this process of phrase structure generation, some sentential transformations are performed such as follows.</Paragraph> <Paragraph position="4"> ( i ) When an agent is absent, passive transformation is applied.</Paragraph> <Paragraph position="5"> ( ii ) When the agent and object are both missing, the predicative verb is nominalized and placed as the subject, and such verb phrases as &quot;is made&quot;, and &quot;is performed&quot; are supplemented. null (iii) When a subject phrase is a big tree, the anticipatory subject &quot;it&quot; is introduced. ( iv ) Pronominalization of the same subject nouns is done in compound and complex sentences.</Paragraph> <Paragraph position="6"> ( v ) Duplication of a head noun in the conjunctive noun phrase is eliminated, such as, &quot;uniform component and non-uniform component&quot; > &quot;uniform and non-uniform components&quot;. (vi) Others.</Paragraph> <Paragraph position="7"> Another big structural transformation required comes from the essential difference between DOlanguage (English) and BE-language (Japanese). In English the case slots such as tools, cause/reason, and some others come to the subject position very often, while in Japanese such expressions are never used. The transformation of this kind is incorporated in the generation grannnar such as shown in Fig. ii, and produces more English-like expressions. This stylistic transformation part is still very primitive. We have to accumulate much more linguistic knowledge and lexical data to have more satisfiable English expressions.</Paragraph> <Paragraph position="8"> earthquake building collapse collapse destroy building earthquake earthquake building = The buildings collapsed \[CPO:causal potency\] due to the earthquake. = The earthquake destroyed the buildings.</Paragraph> <Paragraph position="9"> Fig. ii An example of structural transformation in the generation phase.</Paragraph> </Section> </Section> class="xml-element"></Paper>