File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/86/c86-1152_metho.xml

Size: 16,318 bytes

Last Modified: 2025-10-06 14:11:55

<?xml version="1.0" standalone="yes"?>
<Paper uid="C86-1152">
  <Title>A Prototype English-Japanese Machine Translation System for Translating IBM Computer Manuals</Title>
  <Section position="2" start_page="0" end_page="646" type="metho">
    <SectionTitle>
I. Introduction
</SectionTitle>
    <Paragraph position="0"> involved in Engllsh-Japanese machine translation for four years (I). We have developed a prototype capable of translating IBM computer manuals into Japanese.</Paragraph>
    <Paragraph position="1"> This system is based on a transfer approach in which the transfer process consists of English transformation and English-Japanese conversion. This MT system aims at I) high-quality translation; 2) an easily maintained transfer component; and 3) a smaller English-Japanese terminology dictionary. The transformation rules and the conversion rules are presently being constrBcted through tests using the IBM manual &amp;quot;VM/SP General Information&amp;quot; (60P).</Paragraph>
    <Paragraph position="2"> We are focusing on translation of IBM computer manuals for 3 reasons: I) high-quality translation is expected in a limited area; 2) English IBM manuals are presumably written as clearly as possible according to an IBM internal standard; 3) we already had a practical Engllsh-Japanese terminology dictionary for human translators.</Paragraph>
    <Paragraph position="3"> Most MT systems developed in Europe and the U.S.</Paragraph>
    <Paragraph position="4"> deal with language pairs in the Indo-European language group (2). In the case of English-Japanese translation, since both languages are categorized in different language groups, a more powerful linguistic mechanism must be implemented. For instance, word order and sentence style are different and moreover an English word sometimes corresponds to more than one Japanese equivalent. To overcome these difficulties, an English-Japanese or Japanese-English MT system might he based on a transfer or interlingua approach with a wide range of tree-transduclng Capabilities and a semantic processing mechanism.</Paragraph>
    <Paragraph position="5"> 2. Overview of the s s~ Fig. i shows the overall translation process. First of all, an English sentence is syntactically analysed in the English analysis phase. The output of this analysis is one or more English parse trees. Second, in the Engllsh-Japanese transfer phase, an English parse tree, or an English intermediate representation, is transferred to a corresponding Japanese tree, or a Japanese intermediate representation. During this transfer, an English parse tree is at first transformed by the transformation component to an English tree in Japanese-like style, and this result is converted to a Japanese tree by the English-Japanese conversion component. null  Finally, in the Japanese generation phase, one or more Japanese sentences are produced by operations such as generating Japanese auxiliary verbs, determining Japanese case particles, and rearranging word order.</Paragraph>
    <Paragraph position="6"> At present, the components shown in Fig. 1 are all implemented in LISP.</Paragraph>
  </Section>
  <Section position="3" start_page="646" end_page="648" type="metho">
    <SectionTitle>
3. English Analysis
</SectionTitle>
    <Paragraph position="0"> For analysing English, we are making use of the English parser, the English analysis grammar, and the English analysis dictionary developed by G. Heidorn et el. at the IBM T.J. Watson Research Canter (3). The English analysis is based on an augmented phrase structure grammar and is syntactically performed in a bottom-up and parallel manner. This English analysis aims at area-independent, high-performance and fail-soft analysis. The area-independent feature means portability of this analysis component to other application areas. The fail-soft feature is important for a practical MT system which should provide some Japanese segments for a human translator even if the parser fails to analyze the input sentence as a complete sentence.</Paragraph>
    <Paragraph position="1"> As the syntactic analysis of English sometimes produces more than one parse tree, the English parser computes metric values which indicate plausibility of the parse trees based on the characteristics of the modifications between phrases (4). When more than one parse tree is obtained by analysis, semantically incorrect parse trees are discarded during the Engllsh-Japanese transfer. If more than one Japanese tree remains after the transfer, the metric values copied from these English parse trees to corresponding Japanese trees are used to rank these Japanese trees in terms of plausibility. The Japanese tree which has the least value~ namely the most plausible one, is chosen by the MT system.</Paragraph>
    <Paragraph position="2"> 4:._~1_ I s h -~:{t~les e Transfer GEnerally, tlm transfer process of u transfer approach includ:lng semantic processing tends to become complicated and then difficult to maintain. But a transfer approach seems to be the most straightforward for implementing human translators ' knowledge which includes various types of linguistic information such as specific words, syntactic structures, and semantic information.</Paragraph>
    <Paragraph position="3"> ThErE are many Engllsh..proper expressions, such as 'It-that', tree-to', and 'there-be'. Their sty\]as are very different :from Japanese ones and flare no simple contrast explessions in ,JapanesE. The EnglJ.sh-Japanese transfer component of our system is divided into two separate components: an English transformation component and an English-Japanese conversion component. We ca\]\]. our apploach a two-pass transfer method. By using English transformation rules, the English transformation component rewrites an English parse tree and produces a new style of Eugl:ish tree which is close to Japanese syntax. This can then Easily be converted to a corresponding Japanese tree. When we expect dJfferent English Expressions t:o be translated to the same Japanese expcession, we only have to write Englisll transformation rules instead of E~.,\] transfer rules of a conventional transfer approach. Moreover, when we have a MT SyStEm change a Japanese expression, we are rEquirEd only to modify some E-J conversion rules instead of modifying a larger number of relating E-J transfer ru\].es. Consequently, ,:he two-pass transfer method provides us with modularlty and maintainability Df the transfer component.</Paragraph>
    <Paragraph position="4"> 4.1EnnKllsh Transformation English transformation is performed by using English * transformation rules and a transformation dictionary.</Paragraph>
    <Paragraph position="5"> The transformation sometimes requires a derivative form of an English word, such as a verbal form of a noun and an adverbial form of an adjective. The transformation dictionary contains this sort of derlvational data.</Paragraph>
    <Paragraph position="6"> The transformation rules are categorized into groups according to syntactic categories of nodes of parse trees. Each group is also classified into several sub-groups. For example, the rule group for a sentence consists of 22 sub-groups, such as an inverslon-rule sub-group, an insertion-rule sub-group, and an ellipsis-rule sub-group. The following are examples of applications of the rules to sentences.</Paragraph>
    <Paragraph position="7"> It is required that you specify the assignment.</Paragraph>
    <Paragraph position="8"> -&gt; That you specify the assignment is required.</Paragraph>
    <Paragraph position="9"> There are several records in the file.</Paragraph>
    <Paragraph position="10"> -&gt; Several records exist in the file.</Paragraph>
    <Paragraph position="11"> System operation is so impaired that the IPL procedure has to be repeated.</Paragraph>
    <Paragraph position="12"> -&gt; Because system operation is very impaired, the IPL procedure has to be repeated.</Paragraph>
    <Paragraph position="13"> The routine has a relatively low usage rate.</Paragraph>
    <Paragraph position="14"> -&gt; Usage rate of the tontine is relatively low.</Paragraph>
    <Paragraph position="15"> The following are examples of applications of the rules to noun phrases.</Paragraph>
    <Paragraph position="16"> execution of the program -&gt; executing the program a disk available with ...</Paragraph>
    <Paragraph position="17"> -&gt; a disk which is available with ...</Paragraph>
    <Paragraph position="18"> \]'he transformation is performed in a top-down manner along an English parse tree. At each node of a tree, a corresponding rule group is retrieved according to the syntact:ic type of tile :node and th:Is ru\]e group is app\].Jed to the sub-tree only once. In this application of the rule group, each sub-group is sequentially applied to the sub-tree only once. If a matcI~ing pattern of a transformation rule matches the sub-.tree and a target pattern produces a new tree, the rest of the rules :in tile sub-group are no longer used ~qnd processing of tile next sub-group begins. We have dEsignEd the rulE groups and their sub-groups to avoid backtracking and repetitive application of the same rule..</Paragraph>
    <Paragraph position="19"> A transformed Eng\].isil tree is convarted to a corre ~ spoudiug JapanEse tree by us:lug conversion rn\]es and a conversion dictionary. The functions of this process are \]) determining appropriate Japanese syntax, equivalents, and additional linguistic data such as tense, aspect, modality, and vOiCE; and 2)d:isambiguating modifications of English phrases.</Paragraph>
    <Paragraph position="20">  Nouns in computer manuals have one or more semantic markers. For example, &amp;quot;file&amp;quot; has &amp;quot;LC&amp;quot; and &amp;quot;LE&amp;quot;, &amp;quot;program&amp;quot; has &amp;quot;LE&amp;quot;, and &amp;quot;operator&amp;quot; has &amp;quot;LE&amp;quot; and &amp;quot;}{M&amp;quot;. This set of markers is so slmple that maintenance is easy.</Paragraph>
    <Paragraph position="21">  In the English-Japanese conversion dictionary, conditions for conversion are described by a combination of English syntax, semantic markers and sometimes specific Japanese words. The conversion dictionary is divided into sub-dictlonaries, such as a verb-dlctionary, a noun-dictionary, and a prepositional-dictionary. Fig.</Paragraph>
    <Paragraph position="22">  The upper half of the description in Fig. 2 specifies that if the subject of a sentence has semantic marker &amp;quot;LE&amp;quot; or &amp;quot;UD&amp;quot; and the first object has marker &amp;quot;FA&amp;quot; or &amp;quot;AT&amp;quot;, then choose the Japanese case particle &amp;quot;ga&amp;quot; for the first Japanese noun phrase, the Japanese case particle &amp;quot;wo&amp;quot; for the second one, and the Japanese verb &amp;quot;sonae&amp;quot; as the proper equivalent for the English verb &amp;quot;provide&amp;quot;. &amp;quot;YI&amp;quot; and &amp;quot;PYI&amp;quot; in Fig. 2 specify types of corresponding Japanese sub-trees to be generated. The lower half of the description gives a similar rule to the previous one except for an additional condition on a prepositional phrase. This part specifies that if the conditions are met, then use Japanese case particles &amp;quot;ga&amp;quot;, &amp;quot;hi&amp;quot;, and &amp;quot;wo&amp;quot; in this order and select &amp;quot;teikyo&amp;quot; as the appropriate Japanese verb.</Paragraph>
    <Paragraph position="23"> The verb-dictionary is used to convert an English surface case structure into a Japanese one directly by depending upon the semantic markers. This conversion must be more efficient than in the case where deep cases are introduced so as to pursue similar semantic processing. This conversion determines an appropriate Japanese verb, Japanese case particles, and Japanese syntax of a simple sentence at the same time. In some cases, an appropriate Japanese equivalent for an English noun phrase is successfully selected based on these conditions when the English noun phrase has more than one Japanese equivalent. Moreover, application of these entries also means a semantic check of the input from the computer area's point of view. Consequently, if there is no entry applicable to the input simple sentence, it is deemed inappropriate for computer manuals and is rejected by the system. This contributes to disambiguation of English analysis trees.</Paragraph>
    <Paragraph position="24"> Additional linguistic data of an English simple sentence concerning tense, aspect, modality, and voice, are also converted to corresponding data of a Japanese tree by using a contrast conversion table and the conversion dictionary. For example, voice and aspect of an English sentence are changed in a Japanese sentence according to the characteristic of the verb.</Paragraph>
    <Paragraph position="25"> 4.2.3. E-J Translation of Simple Noun Phrases One of the issues in MT is how to create and maintain a large terminology dictionary. Generally, a technical document includes a number of technical noun groups.</Paragraph>
    <Paragraph position="26"> We call a noun phrase which basically has no post modifier a simple noun phrase (SNP), such as &amp;quot;a procedure library&amp;quot;, &amp;quot;system-to-operator communication&amp;quot;, &amp;quot;IBM supplied licensed and nonlicensed programs&amp;quot; and &amp;quot;page 34&amp;quot;.</Paragraph>
    <Paragraph position="27"> Our MT system facilitates a component for translating SNPs. Even if the terminology dictionary does not have the entry in whole, a long SNP which is composed of many words is successfully translated by appropriately assembling the dictionary data of all elements of the SNP. This is mainly due to the similarity of syntax of SNPs between English and Japanese.</Paragraph>
    <Paragraph position="28"> The functions of the SNP translation component are to choose appropriate Japanese equivalents for various parts-of-~peech(e.g, noun, adjective, adverb); to insert &amp;quot;no&amp;quot; between noun phrases; to reorder Japanese equivalents; to process conjunctions within a simple noun phrase; and to handle hyphenated words. These are achieved by using a special dictionary for translating SNPs and co-occurrence frequency data of words or semantic markers in IBM computer manuals.</Paragraph>
    <Section position="1" start_page="647" end_page="648" type="sub_section">
      <SectionTitle>
4.3 E-J Conversion Process
</SectionTitle>
      <Paragraph position="0"> The English-Japanese conversion component subsequently  converts a transformed English tree to a Japanese tree in a bottom-up and parallel manner along the English tree.</Paragraph>
      <Paragraph position="1"> First of all, the English-Japanese conversion dictionary is searched for all English words which are terminal symbols of the English parse tree. This is part of Engllsh-Japanese conversion of the lowest level sub-trees of the English tree. An upper level English sub-tree is converted to a corresponding Japanese sub-tree by using the English-Japanese conversion rules and by using the English-Japanese conversion results of the current level English sub-trees. The category of the top node of the upper sub-tree determines which set of Engllsh-Japanese conversion rules is to be applied. During the conversion of sub-trees, semantic processing is performed according to the data in the English-Japanese conversion dictionary as mentioned earlier.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="648" end_page="648" type="metho">
    <SectionTitle>
5. Japanese Generation
</SectionTitle>
    <Paragraph position="0"> The Japanese generation component produces one or more Japanese sentences from a Japanese tree which conveys Japanese syntax, Japanese equivalents, and other information. null The functions of this component are to generate Japanese auxiliary verbs; to determine appropriate Japanese equivalents of adverbs, negation, determiners and conjunctions including subordinate conjunctions; to position Japanese adverbial phrases in a Japanese sentence; to modify Japanese case particles; to reorder Japanese noun phrases; to insert punctuations; and to erase a duplicate Japanese subject. Japanese auxiliary verbs are generated based on Japanese verb information, such as the original form of the verb, the conjugation type of the verb, tense, aspect, voice, and modallty.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML