File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/82/c82-1057_metho.xml
Size: 16,304 bytes
Last Modified: 2025-10-06 14:11:29
<?xml version="1.0" standalone="yes"?> <Paper uid="C82-1057"> <Title>KNOWLEDGE REPRESENTATION AND MACHINE TP~ANSLATION</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> KNOWLEDGE REPRESENTATION AND MACHINE TP~ANSLATION </SectionTitle> <Paragraph position="0"/> </Section> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> Fujitsu Limited Nakahara-ku, Kawasaki Japan </SectionTitle> <Paragraph position="0"> This paper describes a new knowledge representation called &quot;frame knowledge representation-O&quot; (FKR-O), and an experimental machine translation system named ATLAS/I which uses FKR-O.</Paragraph> <Paragraph position="1"> The purpose of FKR-O is to stored information required for machine translation processing as flexibly as possible, and to make the translation system as expandable as possible.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> I. INTRODUCTION </SectionTitle> <Paragraph position="0"> Preliminary research on machine translation (MT) started soon after computers became available.</Paragraph> <Paragraph position="1"> MT research was prevalent in the USA during the early 1960s. However, the conclusions of the ALPAC report published in 1966 opposed funding for MT research and resulted in I) the general discontinuance of MT research in the USA . The effort is more concerted in countries where MT systems are more necessary than in the USA. For example, the use of both French and English in Canada and the multilingual use of formal documents in the EEC present pressing demands for practical MT systems.</Paragraph> <Paragraph position="2"> The SYSTRAN system (produced by Latsec Incorporated) has been applied in the following areas: I) Private companies in Canada use it for translating engineering documents from English into French.</Paragraph> <Paragraph position="3"> 2) NASA used it to communicate with the crews of the Apollo and Soyuz spaceships, translating between Russian and English.</Paragraph> </Section> <Section position="4" start_page="0" end_page="351" type="metho"> <SectionTitle> 3) The EEC uses it for examining the feasibility of other MT systems 2). </SectionTitle> <Paragraph position="0"> Other systems currently being used include the METEO system which translates English weather reports into French in Canada, and the WEIDNER and LOGOS systems produced by private firms in the USA. Recently, there has been a revival of interest in MT systems in the USA, partly because of significant advances being made in artificial intelligence (AI) research.</Paragraph> <Paragraph position="1"> The future development of MT systems is ensured by the total integration of high-performance computers, new man-machine interface designs, new software methodolo- 3) gies, and progress in knowledge engineering .</Paragraph> <Paragraph position="2"> The language barrier in Japan in far greater than in the EEC or Canada, because Japanese is an isolated language. There is a large demand for document translation in Japan.</Paragraph> <Paragraph position="3"> The fifth generation system project in Japan aims to build the artificial intelligence machine that will process natural languages. This machine will perform language translation at least 90% of it automatically, such that the cost of a translation job could be reduced by 70%.</Paragraph> </Section> <Section position="5" start_page="351" end_page="351" type="metho"> <SectionTitle> 352 S. SAWAI et al. 2. PROBLEMS AND SOLUTION </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="351" end_page="351" type="sub_section"> <SectionTitle> 2.1 Methodological problems in a machine translation system </SectionTitle> <Paragraph position="0"> Basically, a machine translation system consists of three components: a dictionary (lexicon), grammar (translation rules), and the translation program (algorithm). null The major methodological problem in machine translation systems is how to separate the translation program from the grammatical rules. The advantage of this separation is that the program can be used for various languages and grammars without modification; that is, it is language-independent. However, there are practical problems in separating the grammar from the program, including difficulties in formulating complex rules for linguistic data and avoiding large storage requirements or heavy computation loads4).</Paragraph> </Section> <Section position="2" start_page="351" end_page="351" type="sub_section"> <SectionTitle> 2.2 Solution by knowledge representation methodology </SectionTitle> <Paragraph position="0"> Artificial intelligence research on natural languages and knowledge representation progressed rapidly during the 1970s. In AI, &quot;knowledge representation&quot; is a combination of data structures and interpretive procedures that leads to &quot;knowledgeable&quot; behavior. A new type of machine translation system conceived by Drs. Y. Wilks and R. Schank appeared in the early 1970s. This type of system translates input text into the knowledge representation of semantic primitives intended to be language-independentl).</Paragraph> <Paragraph position="1"> At present, the major knowledge representation techniques are predicate logic, procedural representations, semantic networks, production systems (PS), and frames. In procedural representations, knowledge is contained in procedures (programs). The basic idea of production systems is a database consisting of rules, called production rules, in the form of condition-action pairs. A frame is a predefined internal relation.</Paragraph> <Paragraph position="2"> This paper proposes an efficient knowledge representation method using frame.</Paragraph> <Paragraph position="3"> techniques to solve the above-described problems in machine translation systems.</Paragraph> <Paragraph position="4"> 3, FKR-O Figure 1 shows the framework of frame knowledge representation-O (FKR-O)5).</Paragraph> <Paragraph position="5"> In the FKR-O knowledge representation method, a production system is combined with a procedual representation and is systematized into a state transition network. Rule representation frames and control frames are provided for the efficiency of system operation.</Paragraph> </Section> <Section position="3" start_page="351" end_page="351" type="sub_section"> <SectionTitle> 3.1 Rule representation frame </SectionTitle> <Paragraph position="0"> Figure 2 shows Jackendoff's semantic representation of verbs. Because Jackendoff is a linguist, he did not propose any machine translation system, but his semantic representation provides a good frame work with a clear indication of the relationship between the actor and the action. It indicates that the verb &quot;OPEN&quot; can take two noun phrases; that the subject can be either of two noun phrases, NP (I) or NP (3); and that NP (2) is an instrument, rule representation frame used in FKR-O for the Japanese verb &quot;~&quot; (to specify).</Paragraph> <Paragraph position="1"> The frame shows: l) the verb name, which is the name of a node in the state transition network; 2) the relationship between the verb and one or more noun phrases; 3) the conditional process to be performed after the rules are applied.</Paragraph> </Section> <Section position="4" start_page="351" end_page="351" type="sub_section"> <SectionTitle> 3.2 Control frame </SectionTitle> <Paragraph position="0"> The FKR-O system has control frames which supervise the rule representation frames discussed in Sec. 3.1. Each pair of adjacent frames communicates by a control parameter as shown in Fig.3.</Paragraph> <Paragraph position="1"> The roles of the control frame are to resolve one rule frame into several subrule frames and to control the calling sequence of these frames. This is one solution which overcomes the disadvantages inherent in production system (PS) methodology; i.e. inefficency of program execution and opaqueness of the control flow. A control frame addresses the rule frame by means of the contents of the control parameter. If the next frame is not specified, control returns to the top-level control frame called the &quot;control nucleus&quot;.</Paragraph> </Section> <Section position="5" start_page="351" end_page="351" type="sub_section"> <SectionTitle> 3.3 Grammatical rules </SectionTitle> <Paragraph position="0"> ATLAS/I is an experimental machine translation system in which grammatical rules are specified in FKR-O representation. ATLAS/I corrently has several control frames and the several types of rule representation frames. These are examples: The surface case structure &quot;H4 + C7 + VO&quot; is changed to a deep case structure &quot;A' + I' + VO&quot;. DEMON is a procedure. A noun phrase &quot;C7&quot; is a combination of the concrete noun &quot;NC&quot; (~#), and postposition &quot;X7&quot; (~). If the surface case structure is &quot;H4 + C7 + VO&quot;, a verb &quot;VO&quot; (open) has an agent case &quot;A'&quot; and an instrument case &quot;I'&quot;. A variable &quot;WORD&quot; designates a slot in the stack. Figure 4 shows the structure of the grammatical rules.</Paragraph> </Section> <Section position="6" start_page="351" end_page="351" type="sub_section"> <SectionTitle> 3.4 Model of ATLAS/I </SectionTitle> <Paragraph position="0"> Figure 5 is a simplified model of ATLAS/I which includes an input tape, an output tape, a stack, a control section, a dictionary, a register, and grammatical rules (rule representation frames and control frames). When scanning an input tape, the stack is used as a table for temporary storage; at reduction, it is used as a table with attributes and equivalents.</Paragraph> <Paragraph position="1"> The dictionary is a table with words, attributes, and equivalents and is used as a table for lexical rules. The word &quot;~EB&quot;, for Notes: W ~an$ a ~rd, A means an attribute, and E ~ans an equivalent.</Paragraph> <Paragraph position="2"> Fig. 5 Model of ATLAS/I example, is stored as (p~, noun, Taro) in the dictionaries. The character strings &quot;2~EB~ ~P)~o-&quot;, for example, are stored in the input tape. The control parameter is set in the register.</Paragraph> </Section> <Section position="7" start_page="351" end_page="351" type="sub_section"> <SectionTitle> 3.5 Initial state of ATLAS/I Model </SectionTitle> <Paragraph position="0"> &quot;NOUN&quot; is set in the register as an initial value. Grammatical rules have pre-defined values. The input head points to the left most position of the input tape.</Paragraph> <Paragraph position="1"> The output tape is blank. The output head points to the leftmost position of the output tape. The initial value of the slots in the stack is (#X, C/), meaning the top of the sentence and null string (C/).</Paragraph> </Section> <Section position="8" start_page="351" end_page="351" type="sub_section"> <SectionTitle> 3.6 Flow of ATLAS/I Model 3.6.1 Phase A: sentence processing </SectionTitle> <Paragraph position="0"> In phase A, one sentence of a text is translated.</Paragraph> <Paragraph position="1"> l) State (1): scan The words in the input tape are scanned by the input head and the dictionary is accessed to determine the attributes and equivalents. When found, these attributes and equivalents are stored in the stack, and then the input head</Paragraph> </Section> </Section> <Section position="6" start_page="351" end_page="351" type="metho"> <SectionTitle> KNOWLEDGE REPRESENTATION AND MACHINE TRANSLATION 355 </SectionTitle> <Paragraph position="0"> advances one word to the right. When the input tape is scanned, the stack is used as a table by the control section.</Paragraph> <Paragraph position="1"> The control section scans the words of the input tape. If a period &quot;.&quot; is encountered, it stops scanning and stores (X#, @) in the stack.</Paragraph> <Paragraph position="2"> The rule representation frame that is pointed to by the control parameter is referenced, and the control causes a translation from state (1) to state (2). State (2): reduction and code generation The control section checks if the slots in the stack are (#X, @), (SS, a character string), and (X#, ~). If so, there is a transition from state (2) to state (4); if not, the control section checks whether the attributes in the stack match those in the input pattern of the production rule (P). If not, the control causes a transition to the state specified by the rule representation frame.</Paragraph> <Paragraph position="4"> If matched rule (P) does not exist and if the rule representation frame does not specify the new state, there is a transition to the default state specified by the top-level control frame called the &quot;control nucleus&quot;. If matched rule (P) exists, the equivalents (SE) in the stack whose attributes (SA) match the attributes of the rule (P) input pattern are used as parameters of the rule (P) action function. Figure 8 is a general diagram of the organization of the grammar. The input and output patterns are organized according to the sequence of attributes. The action function of rule (P) pops the equivalents (SE) and attributes (SA) from the stack, and pushs the attributes of the rule (P) output pattern and the'character strings created by this action function as the new slots (SA', SE'), whose number equals that of the rule (P) output pattern. The control returns ~o state (2).</Paragraph> <Paragraph position="5"> State (3): Frame transition The control checks the control frame, determines the name of the next rule representation frame, and stores this name in the register. After selection of this rule representation frame control passes to state (2).</Paragraph> <Paragraph position="6"> Remark: The control frame determines the termination of phase A. At termination, the control causes a transition to phase B.</Paragraph> <Paragraph position="7"> State (4): accept Control pops the character string of the slot (SS, a character string) from the stack and writes this string into the output tape.</Paragraph> <Paragraph position="8"> The text is translated in phase B. Control continues phase (A) until the input head arrives at the rightmost position of the input tape.</Paragraph> <Paragraph position="9"> All text translation processes end when the input head arrives at the right-most position of the input tape.</Paragraph> </Section> <Section position="7" start_page="351" end_page="351" type="metho"> <SectionTitle> 4. ATLAS/I MACHINE TRANSLATION SYSTEM </SectionTitle> <Paragraph position="0"> ATLAS/I is currently used in the limited application of translating software-related reports from Japanese into English. The number of sentence translation patterns is gradually increasing through the addition of grammatical rules and vocabulary.</Paragraph> <Section position="1" start_page="351" end_page="351" type="sub_section"> <SectionTitle> 4.1 Japanese to English machine translation </SectionTitle> <Paragraph position="0"> Machine translation involves three stages: );~X)~)o, input of the original Japanese text, trans- ,~t ....</Paragraph> <Paragraph position="1"> lation, and postediting of the translated English text (see Fig. 6).</Paragraph> <Paragraph position="2"> ~rpho) ica t Syntax %, English sentence i analysis ~kCase analysis ~. j/ i Fig. 6 Flow of ATLAS/I ATLAS/I currently integrates three processes: analysis of an original Japanese sentence, structural translation from Japanese into English, and synthesis of the English sentence. The &quot;case&quot; of nou n phrases, which is the relation of noun phrases to verbs, is checked while the syntax is analyzed. This case analysis is performed with reference to the production rules. When the matching rule is found, case and syntactic synthesis of the English are performed, and the synthesized English sentence is printed. These production rules are defined in FKR-O. The machine translation processes achieved through the use of FKR-O are delincated in Fig.6 by dotted lines.</Paragraph> </Section> </Section> class="xml-element"></Paper>