File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/85/j85-2001_metho.xml

Size: 11,757 bytes

Last Modified: 2025-10-06 14:11:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="J85-2001">
  <Title>THE JAPANESE GOVERNMENT PROJECT FOR MACHINE TRANSLATION</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
THE JAPANESE GOVERNMENT PROJECT
FOR MACHINE TRANSLATION
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
1 OUTLINE OF THE PROJECT
</SectionTitle>
    <Paragraph position="0"> The project is funded by a grant from the Agency of Science and Technology through the Special Coordination Funds for the Promotion of Science and Technology, and was started in fiscal 1982. The formal title of the project is &amp;quot;Research on Fast Information Services between Japanese and English for Scientific and Engineering Literature&amp;quot;. The purpose is to demonstrate the feasibility of machine translation of abstracts of scientific and engineering papers between the two languages, and as a result, to establish a fast information exchange system for these papers. The project term was initially scheduled as three years from the fiscal year of 1982 with a budget of about seven hundred million yen, but, due to the present financial pressures on the government, the term has been extended to four years, up to 1986.</Paragraph>
    <Paragraph position="1"> The project is conducted by the close cooperation between four organizations. At Kyoto University, we have the responsibility of developing the software system for the core part of the machine translation process (grammar writing system and execution system); grammar systems for analysis, transfer and synthesis; detailed specification of what information is written in the word dictionaries (all the parts of speech in the analysis, transfer, and generation dictionaries), and the working manuals for constructing these dictionaries. The Electrotechnical Laboratories (ETL) are responsible for the machine translation text input and output, morphological analysis and synthesis, and the construction of the verb and adjective dictionaries based on the working manuals prepared at Kyoto. The Japan Information Center for Science and Technology (JICST) is in charge of the noun dictionary and the compiling of special technical terms in scientific and technical fields. The Research Information Processing System (RIPS) under the Agency of Engineer- .</Paragraph>
    <Paragraph position="2"> # . mg Technology is responsible for completing the machine translation system, including the man-machine interfaces to the system developed at Kyoto, which allow pre- and post-editing, access to grammar rules, and dictionary maintenance.</Paragraph>
    <Paragraph position="3"> The project is not primarily concerned with the development of a final practical system; that will be developed by private industry using the results of this project.</Paragraph>
    <Paragraph position="4"> Technical know-how is already being transferred gradually to private enterprise through the participation in the project of people from industry. Software and linguistic data are also being transferred in part. Finally, complete technical transfer will be done under the proper conditions. null The Japanese source texts being used are abstracts of scientific and technical papers published in the monthly JICST journal d Current Bibliography of Science and Technology. At present, the project is only processing texts in the electronics, electrical engineering, and computer science fields. English source texts will be abstracts from INSPEC in these fields.. The sentence structures used in abstracts tend .to be complex compared to ordinary sentences, with long nominal compounds, noun-phrase conjunctions, mathematical and physical formulas, long embedded sentences, and so on. The analysis and translation of this type of sentence structure is far more difficult than ordinary sentence patterns.</Paragraph>
    <Paragraph position="5"> However, we have not included a pre-editing stage because we wanted to find the ultimate limitations on handling this type of complex sentence structure.</Paragraph>
    <Paragraph position="6"> Our system is based on the following concepts: 1. The use of all available linguistic information, both surface and syntactic. The writing of as detailed as possible syntactic rules. The development of a grammar writing system that can accept any future level of sophisticated linguistic theory.</Paragraph>
    <Paragraph position="7"> 2. The introduction of semantic information wherever necessary to enable the syntactic analysis to be as accurate as possible. The importance of semantic information not over-estimated; a well-balanced usage of both syntax and semantics. Heavily seman-Copyright1985 by the Association for Computational Linguistics. Permission to copy without fee all or part of this material is granted provided that the copies are not made for direct commercial advantage and the CL reference and this copyright notice are included on the first page. To copy otherwise, or to republish, requires a fee and/or specific permission. 0362-613X/85/02091-111503.00 Computational Linguistics, Volume I I, Numbers 2-3, April-September 1985 91 Makoto Nagao, Jun-ichi Tsujii, Jun-ichi Nakamura Japanese Government Project for MT tics-oriented analysis is very attractive and effective for sentences within narrow limits, but a system of that type cannot cope with the complicated structures found in descriptions of the wider world where semantic description becomes almost impossible.</Paragraph>
    <Paragraph position="8"> 3. There are many exceptional linguistic phenomena that are more word-specific than explainable in general linguistic theory. The system should be able to accept word-specific rules. In our system, these rules are written into the lexical entries, with the priority given to these grammar rules in the analysis, transfer, and synthesis phases. This mechanism allows the system to be upgraded step by step by the accumulation of linguistic facts and word-specific rules in the dictionary and effectively bypasses any deadlock in system improvement.</Paragraph>
    <Paragraph position="9"> 4. The system must be able to produce an output with an imperfect sentence structure and containing untranslated original words rather than fail in cases where the analysis was imperfect. From the posteditor's point of view, an imperfect output is far preferable to no output at all.</Paragraph>
    <Paragraph position="10"> Many other concepts and methods have been developed in our machine translation system, and these are explained in the sections following. This paper concentrates on the main features of the Japanese to English translation system. Details of the English to Japanese system, which is also included in our national machine translation project, is being developed, and the result will be published shortly.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 THE GRAMMAR WRITING SYSTEM, GRADE
2.1 OBJECTIVES OF THE SOFTWARE SYSTEM
</SectionTitle>
    <Paragraph position="0"> In developing a machine translation system, the grammar rules should accurately reflect the intention of the grammar writer. This is fundamental to the achievement of a good grammar system. One of the basic necessities of any machine translation system is a programming language to write the grammar composed of the language for specifying the grammar rules and the accompanying execution system.</Paragraph>
    <Paragraph position="1"> A grammar-writing language for machine translation that is powerful must fulfill the following requirements: 1. The language must allow manipulation of linguistic characteristics in both source and target languages.</Paragraph>
    <Paragraph position="2"> The linguistic structure of Japanese differs greatly from that of English. For instance, in Japanese, the restrictions on word order are not so strong, and some syntactic components can be omitted. A grammar writer must be able to reflect these sorts of characteristics. null 2. It is desirable that the grammar-writing language use the same framework for writing the grammars in the analysis, transfer, and synthesis phases. The grammar writer should not be forced to learn several different systems for the different translation stages.</Paragraph>
    <Paragraph position="3"> With these points in mind, we developed a new software system for machine translation comprising the language used to specify the grammar rules and the execution system. We call it GRADE (GRAmmar DEscriber). null</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 THE STRUCTURE OF GRADE
</SectionTitle>
      <Paragraph position="0"> The data format used to express the structure of a sentence during the analysis, transfer, and generation phases has a large influence on the design of the grammar writing language. GRADE uses an annotated tree structure to represent the sentence structure during the translation process. Grammatical rules in GRADE are described in the form of tree-to-tree transformations with each node annotated. The annotated tree in GRADE is a tree structure whose nodes are annotated by sets of property-value pairs. This tree-to-tree transformation gives a great power of expression to rewriting rules that can be used in the grammars for the analysis, transfer, and synthesis phases of the machine translation system.</Paragraph>
      <Paragraph position="1"> Annotation parts can be used to express information such as syntactic category, number, semantic markers, and other properties. They can also be used as flags to control rule application.</Paragraph>
      <Paragraph position="2"> A rewriting rule in GRADE consists of a declaration part and a main part. The declaration part has the following four components: * Directory entry part, containing the grammar writer's name, the version number of the rewriting rule, and the last revision date. This part is not used at execution time. The grammar writer can access the information using the HELP facility in GRADE.</Paragraph>
      <Paragraph position="3">  * Property definition part, where the grammar writer declares the property names and their possible values.</Paragraph>
      <Paragraph position="4"> * Variable definition part, where the grammar writer declares the names of the variables.</Paragraph>
      <Paragraph position="5"> * Matching instruction part, where the grammar writer  specifies the mode of application of the rewriting rule to an annotated tree.</Paragraph>
      <Paragraph position="6"> The main part specifies the transformation in the rewriting rule, and has the following three parts: * Matching condition part, which describes the conditions for the structure of trees and the property values of nodes.</Paragraph>
      <Paragraph position="7"> * Substructure operation part, which specifies the operations for the parts of the annotated tree that match the conditions written in the matching condition part.</Paragraph>
      <Paragraph position="8"> * Creation part, which specifies the structure and the property values of the transformed annotated trees.</Paragraph>
      <Paragraph position="9"> The matching condition part allows the grammar writer to specify not only a specific structure for an annotated tree but also structures that may repeat several times, structures that are optional, and structures where the order of the substructures is unrestricted.</Paragraph>
      <Paragraph position="10"> The substructure operation part specifies operations on the parts of the annotated tree that match in the matching condition part. It allows the grammar writer to assign a property value to a node, or to assign a variable</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML