File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/82/c82-1045_metho.xml
Size: 13,359 bytes
Last Modified: 2025-10-06 14:11:30
<?xml version="1.0" standalone="yes"?> <Paper uid="C82-1045"> <Title>A HEURISTIC APPROACH TO ENGLISH-INTO-JAPANESE MACHINE TRANSLATION</Title> <Section position="2" start_page="0" end_page="283" type="metho"> <SectionTitle> \]. INTRODUCTION </SectionTitle> <Paragraph position="0"> Is it true that the recipe to realize a successful machine translation is in precise and rigid language parsing? So far many studies have been done on rigid and detailed natural language parsing, some of which are so powerful as to detect some ungrammatical sentences If, 2, 3, 4\]. Notwithstanding it seems that the detailed parsing is not always connected with practically satisfying machine translations. On the other hand actual human, even foreign language learners, can translate fairly difficult English sentences without going into details of parsing.</Paragraph> <Paragraph position="1"> They only use an elementary grammatical knowledge and dictionaries.</Paragraph> <Paragraph position="2"> Thu. we have paid attention on the heuristic methods of language-learners and have dew~ed a rather non-standard linguistic model named HPM (= Heuristic Parsing Model). Here, &quot;non-standard&quot; implies that sentential constituents in HPM are different from those in widely accepted modern English grammars \[5\] or in phrase structure grammars \[6\]. In order to prove the reasonability of HPM, we have developed an English-into-Japanese translation system named ATHENE (= Automatic T_ranslation of Hitachi from E_nglish into Nihongo with Editing Support)~f. Fig. I).</Paragraph> <Paragraph position="3"> The essential features of heuristic translation are summarized as in following three points.</Paragraph> <Paragraph position="4"> (I) To segment an input sentence into new elements named Phrasal Elements (PE) and Clausal Elements (CE), (2) To assign syntactic roles to PE's and CE's, and restructure the segmented elements into tree-forms by inclusive relation and into list-forms by modifying relation.</Paragraph> <Paragraph position="5"> (3) To permute the segmented elements, and to assiqn appropriate Japanese equiva- lents with necessary case suffixes and postpos~tions. The next section presents an overview of HPM, which is followed in Sec. 3 by a rough explication of machine translation process in ATHENE. Sec. 4 discusses the experimental results. Sec. 5 presents cohcluding remarks and current plans for</Paragraph> </Section> <Section position="3" start_page="283" end_page="283" type="metho"> <SectionTitle> 2. PARSINGMODEL: HPM </SectionTitle> <Paragraph position="0"> To accelerate the clear understanding, an example of the parsed tree on HPM is illustrated in both Fig. 2 and Fig. 3.</Paragraph> <Paragraph position="1"> System R, an experimental database system, was constructed to demonstrate that I II II I\[ I I I L___J L__~L___J L___I L___J I IL_~ the usability advantages of the relational data model can be realized in a system r * 2: passive,. I ~i|. 2 In~e~Ite Parsed Tree on HPM (Part l up to &quot;PE&quot;) posslbleJ A HEURISTIC APPROACH TO MACHINE TRANSLATION 285 .'7 W? \ -F.1:passive, past ..... 7 c \[*2: passive, possible\] Fig. 3 Intermediate Parsed Tree on HPM (Part 2: from &quot;PE&quot; to Sentence) 2.1 Parsed Tree: A parsed sentence is represented in a &quot;tree&quot; or &quot;list&quot; of nodes linked by pointers. Each node corresponds to a certain &quot;constituent of sentence&quot;. &quot;Tree (~/)&quot; is for inclusive relation, and &quot;list ('~)&quot; is for modifying relation.</Paragraph> <Paragraph position="2"> 2.2 Constituent: Constituents of sentence is classified into five elements such as: Word Element, Phrasal Element, Clausal Element, Delimiting Element, and Sentence. And these elements have two values: Attribute and Syntactic Role. 2.3 Word Element (WE)~ WE is the smallest consti'tuent, and therefore is an inseparable element in HPM.</Paragraph> <Section position="1" start_page="283" end_page="283" type="sub_section"> <SectionTitle> 2.4 Phrasal Element (PE): PE is composed of one or more WE('s) which carries a </SectionTitle> <Paragraph position="0"> part of sentential meaning in the smallest possible form. PE's are mutually exclusive. Typical examples are: &quot;his very skillful technique (N)&quot;, &quot;would not have been doen (V)&quot;, and &quot;for everyday production use (PNAL)&quot;.</Paragraph> </Section> <Section position="2" start_page="283" end_page="283" type="sub_section"> <SectionTitle> 2.5 Clausal Element (CE): CE is composed of one or more PE('s) which carries a </SectionTitle> <Paragraph position="0"> part of sentential meaning in a nexus-like form. CE is nearly corresponding to a Japanese simple sentence such as:&quot;'v{wa/ga/wo/no}~,{suru/dearu} \[koto\]~' CE's allow mutual intersection. Typical examples are the underlined parts in the following: &quot;It is important for you to do so.&quot; 2.L Sentence (SE): SE is composed of one or more CE('s) and is located at the bottom of a parsed tree.</Paragraph> </Section> <Section position="3" start_page="283" end_page="283" type="sub_section"> <SectionTitle> 2.7 Dependency Pattern of Verb: Verb-dependency-type code is deten~ined by </SectionTitle> <Paragraph position="0"> simplifying Hornby's classification\[ 7\], as i</Paragraph> <Paragraph position="2"> 2.8 Sub-Attribute of Noun: Noun is classified from somewhat semantical viewpoints (cf. Table 2).</Paragraph> <Paragraph position="3"> 2.9 Syntactic Role (SR): SR is important to represent parsing results and to generate Japanese sentences. For example, the sequencWof SR such as &quot;SUBJ + GOV + OBJ&quot; will readily imply the Japanese sentence such as &quot;SUBJ + {ga/wa/ no} + OBJ + {wo/ni} + GOV&quot;. This implication may be quite natural for language-learners.</Paragraph> <Paragraph position="4"> 286 Y. NITTA et al.</Paragraph> </Section> </Section> <Section position="4" start_page="283" end_page="283" type="metho"> <SectionTitle> 3. TRANSLATION PROCESS </SectionTitle> <Paragraph position="0"> From the viewpoint of simplicity and maintenability, it might be desirable to describe all the Grammatical Data (GD) in static pattern form. But unfortunately, the pattern form description is lacking in the flexibility to change control structures. Thus we have adopted a combination of &quot;program&quot; and &quot;pattern&quot; to describe GD.</Paragraph> <Paragraph position="1"> In the followings, we will describe the translation process along with the examples of grara~atical data (GD) to be referred. The essential point of the translation process is &quot;to replace some specified node pattern sequences with others, under the appropriate control with grammatical data&quot;. This replacement process is composed of following twelve steps: (I) Text Input: To produce upper-most node sequence in the parsed tree.</Paragraph> <Paragraph position="2"> (2) Morphological Resolution: To reduce the inflected word to root form.</Paragraph> <Paragraph position="3"> (3) Lexicon Retrieval and Attribute Assignment: To assign all possible attributes to &quot;WE's&quot;. (4) Ambiguity Resolution in Attributes: To select most likely one from among many possibi)ities.</Paragraph> <Paragraph position="4"> (5) Segmentation into &quot;PE's&quot; and Attribute Assignment: To make a PE from matched WE group and give attribute(s).</Paragraph> <Paragraph position="5"> (6) Re-retrieval of Lexicon: To find again possible WE or PE, especially for * II &quot;the separated PE&quot; such as &quot;take,into consideratlon .</Paragraph> <Paragraph position="6"> (7) Syntactic Role Assignment to PE's: To determine Syntactic Role of PE's by referring a pattern GD as in Fig. 4.</Paragraph> <Paragraph position="7"> l Attr. or Synt. Role Pattern I--'; ....</Paragraph> <Paragraph position="8"> (8) Segmentation into &quot;CE's&quot; and Synt. Role Assignment: To make a CE from matched PE group and give a Synt. Role by referring patterns as in Fig. 5. &quot;(g) Determination of Modifying Relationships: To determine the appropriate ele- ment which the modifier PE should modify. A HEURISTIC APPROACH TO MACHINE TRANSLATION 287 (I0) Construction of Sentence Node (SENT): To complete the whole tree with the root node, SENT.</Paragraph> <Paragraph position="9"> (ll) Tree Transfor~tion: -To permute ~e PE's in each CE. Note that in our HPM, &quot;tree-transformation&quot; is reduced to only a simple repetition of permu~tion, which has a strong resemblance ~ language learners' translation methods (Fig. B).</Paragraph> </Section> <Section position="5" start_page="283" end_page="283" type="metho"> <SectionTitle> 4. EXPERIi4ENTAL RESULTS </SectionTitle> <Paragraph position="0"> A pro~pe machine translation system from English in~ Japanese named ATHENE, as is sketched in Fig. l, has been implemented. The lexicons contain nearly.ten thousand words, not counting idioms and other multi-word groups, which are mainly composed of educational basic words (up to senior-high-school-level in Japan) and of about a thousand computer terminologies. Our system has translated a series of t t passages extracted randomly from English readers of senior high school and c~ outer system journals.</Paragraph> <Paragraph position="1"> The results of the tests are encouraging on the whole. The system can translate fairly complicated sen~nces when equiped with the adequate gram~tical data and idiomatic phrases. Output sen~nces, even though far from eloquent style, are worth post-editing, and can be considerably improved with multiple meaning correction through interactive work. Some interesting technical findings are the following: (1) The ~llowing items are sometimes syntactically ambiguous to the system. (i) ING + N (ambiguity among ADJ + SUBJ/OBJ, GOV + OBJ, and ~e like).</Paragraph> <Paragraph position="2"> (ii) To-infinitives (ambigui~ be~een adjective and adverbial).</Paragraph> <Paragraph position="3"> (iii) Linking scope ambigui~ w.r.t. &quot;and&quot;, &quot;or&quot;, &quot;of&quot; (A and B of C for D). (iv) Embedded appositional phrases.</Paragraph> <Paragraph position="4"> (2) Very long PE's (Phrasal Elements) appear occasionally. (eg. the PE node numbered 52 in Fig. 2 and Fig. 3).</Paragraph> </Section> <Section position="6" start_page="283" end_page="283" type="metho"> <SectionTitle> 5. CONCLUDING REMARKS </SectionTitle> <Paragraph position="0"> In this paper we t~ to contend that machine translation should be studied from more heuristic side, or from actual language-learner's methodology side rather than from purely rigid liguistical analysis side. Researchers of ve~ &quot;high level&quot; linguistic analysis side, as is poin~d out by Boitet \[8\], &quot;seem too often to concentrate on in,resting high level phenomena as anaphoric re~rence, discourse 288 I Y. NITTA et al.</Paragraph> <Paragraph position="1"> structure, causality and reasoning and to forget at the same time persisting and very frequent lower-level difficulties .... &quot; This &quot;frequent lower-level difficulty&quot; is the very problem to be solved in practical machine translation, and is actually solved easily by naive foreign language learners only with the help of elementary grammatical knowledge. You had better recall that language-learners must solve the whole even though it is incomplete, on the other hand, pure linguists must solve completely even though it is very limited.</Paragraph> <Paragraph position="2"> In the light of this contention, we have devised a heuristic parsing model named HPM to accommodate the machine translation to the actual human translation methodologies, and at the same time, on HPM we have constructed a machine translation system named ATHENE. Experimental translation by ATHNE shows the following advantageous points of our heuristic approach.</Paragraph> <Paragraph position="3"> (1) Contribution to the flexibility, simplicity and maintenability in grammatical description.</Paragraph> <Paragraph position="4"> (2) Contribution to the simplicity and transparency in transforming phase and generating phase.</Paragraph> <Paragraph position="5"> One of further problems is to extend the grammatical data heuristically, so as to intensify our machine translation system from learner's level to expert's level. Though our system can translate fairly complex sentences, it still commits learner's level errors when encountering difficulties such as ambiguity of prepositional group modification or of word linking scope for conjunction. Heuristic aspects of semantics are also our current interests of research. Especially the case-grammetical idea \[9\] seems to be useful to refine our syntactic-role assignment process so as to improve the quality of generated Japanese sentences. A kind of semantic code system (or thesaurus) will also be required to be introduced in our lexicons. Space limitation of this proceeding does not allow us to describe Our linguistic model: HPM in detail. We are planning to present the more detailed version of HPM together with later improvement in some appropriate journals.</Paragraph> </Section> class="xml-element"></Paper>