XML Viewer - c69-3801

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/69/c69-3801_metho.xml
Size: 15,985 bytes
Last Modified: 2025-10-06 14:11:06
<?xml version="1.0" standalone="yes"?>
<Paper uid="C69-3801">
  <Title>NB*R NB*R NN DE NB NB OF NN VIZ VIZ NNDEO* R NNDEO* R VC / NN DE VC ONEOF NN</Title>
  <Section position="1" start_page="0" end_page="15" type="metho">
    <SectionTitle>
A PRAGMATIC APPROACH TO MACHINE TRANSLATION
FROM CHINESE TO ENGLISH
- Ching-Yi Dougherty -
</SectionTitle>
    <Paragraph position="0"> California 95 060 USA Chinese machine translatlon can be achieved by organizing all the necessary llng~istic data in such a way that the computer can compare and retrieve them in the most economical way. We are constantly reminded that the storage space in the co~uter is limited, and the processing time is expensive. We mmst aim at the efficiency of the system without sacrificing accuracy.</Paragraph>
    <Paragraph position="1"> Five types of linguistic data have to be stored in the computer before a translation can be rendered: (1) a Chinese to English dictionary, (2) Chinese syntactic rules, (3) syntactic converslon rules from Chinese structure to English structure, (4) English morphological rules, and (5) the text to be translated.</Paragraph>
    <Paragraph position="2"> (1) The dictionary must have the capability to distinguish automatically the different meanings which a Chinese lexeme represents. l~anmn readers can do this and 8o can the machine if enough linguistic information is stored in the computer. To a certain extent the meanings of a given lexema can be differentiated by its different syntactic functions. For example the word i means 'one' when it 18 used as a numeral. A numeral i8 defined as the class of words which is followed by a classifier, a measure, a collective and a partitive noun and other numerals. I means 'once' when it i8 used as an adverb which is always followed by a verb. Anything else which is not covered by these two rules will have to be listed in the dictionary either as an idiom or as a laxeme with its immediate constituent. Sometimes a ,,ore refined syntactic codes are needed to distinguish the different meanings of a given verb, as in the case of tzuoh ~. When it is followed by an inanimate noun, it is translated to 'make', by an abstract noun, it i8 translated to 'do',and by a human noun, it is translated to 'be'. Any other constitutes which are exceptions to the rules will have to be listed in the dictionary, such as tzuoh ran 'behave', tzuoh fann 'cook', and tzuoh show 'celebrate a birthday'.</Paragraph>
    <Paragraph position="3"> For the convenience of programming, each dictionary entry.</Paragraph>
    <Paragraph position="4"> which may be a lexeme, an idiom or a constitute, has only one syntactic code and its respective meaning. After a sentence has been anal~.eed by the automatic parser, the syntactic function of each lexeme i8 determined. Together with the method of longest match in dictionary lookup, the correct meaning of a given lexeme in a given context can be chosen by the machine.</Paragraph>
    <Paragraph position="5"> (2) The Chinese syntactic rules are formulated in such a way that they may reduce the ambiguities in parsing to a minimum. It is assumed that in a good scientific writing, tl~e sentences are not ambiguous and there is only one correct way of parsing which can carry the process to the end of the sentence. Anybody who has had any experience with the automatic parser will know this goal is  hard to achieve. Several ways have been found to reduce the m~bigulties, and probably there are others.</Paragraph>
    <Paragraph position="6"> One of the methods is to add more refinements to the syntactic codes. Semantic elements mere introduced in the foraulatlon of the syntactic codes, so that the constitutes must be meaningful as well grammatical. For example: by adding the human element to the codes, only the human nouns can be the agents of the human verbs. If an inanimate noun occurs before a human verb, it is likely to be the goal rather than the agent of the verb. For further refinement, the element of plurality is also added to the codes, so that only the plural nouns can be the agents of plural verbs. There is a class of adverbs, such as i torng *together', huh shiang *mutually' end bii tsyy 'to each other* usually pluralize any verb that follows them. The second method of reducing ambiguities is to introduce higher levels to the noun and verb phrases. There are ~ive levels of noun phrases for example. The terminal code is called level 1. When the noun is modified by an adjective, a noun or an adjective phrase, the noun phrase is called level 2. Level 1 or 2 modified by a number and a classifier is called level 3. Modified by e determiner, it is called level 4. When the latter is modified by a pronoun, a relative clause or an apposition, it is called level 5. The noun phrase of level 5 is a closed noun phrase; nothing else can be added to it.</Paragraph>
    <Paragraph position="7"> The constitute IRD (indicative expression) is formed by a closed noun phrase and a closed verb phrase. Even if the noun phrase consists of only the terminal code, this rule also applies linearly.  The third way of reducing the ambiguities is to adopt the principle of the longest match in the dictionary lookup. Just take the four characters yen yan shyue jia 'linguist' as an exsmple; all four of them can be used freely or individually. The first three characters are both nouns and verbs. AS nouns the first two mean both 'language' and 'word', and the third means 'learning'. As verbs, the first two mean 'speak' and third, 'learn'. The fourth one means '-ist', 'family' and 'home'. The compound of the first two means 'language' and that of the last two means '-ist '. The compound of the first three characters means 'linguistics'. Sixty unwanted constitutes can result from these four characters alone. With the longest match with the dictionary entry, the four characters act as a single unit, and thus sixty ambiguities can be eliminated.</Paragraph>
    <Paragraph position="8"> (3) The syntactic conversion rules from Chinese to English are formulated on the basis of comparative study of the structures of both Chinese and English sentences which represent the same meanlug. It is assumed that if the structure of the English sentence or phrase is the same as that of the Chinese sentence or phrase, no syntactical conversion is necessary. Simple lexical substitution and the application oPS the English morphological rules will render the correct translation. Diagram I shows that the structure of the I English is the same as that of the Chinese. By placing the English equivalents in the location of the grammar codes (The grammar codes are used in parsing.), the translation is already rendered in the citation forms. The last thing that needs to be done is to apply the English morphological rules to the citation forms, so that 'that' is changed to 'those' and 'linguist' to 'linguists' and 'write' '~d' to 'wrote ' The structures of many EnglPSsh phrases are not the same as those of the Chinese phrases. In cases like these, some syntactic conversions, in the forms of permutat ion, addition or deletion are needed. The fo~lowin~ examples will illustrate the logic on the basis of which the syntactic conversion rules are formulated.</Paragraph>
    <Paragraph position="9"> The verb phrase which consists of a prepositional phrase and a main verb calls for a simple permutation in the conversion. The code for such a verb phrase is VI*R (or VIHtR if the human element is added) where * indicates that a conversion is needed aud R indicates that the conversion is in the form of permutation of the two constituents.</Paragraph>
    <Paragraph position="11"> do one thing for mother The arrow (used in the sense of programming language) indicates where the conversion occurs. The two items on the left side of the arroe are replaced by the two items on the right side. The constituents of these two eonstftuteswPS11 follo~ accordingly. This conversion can be easily manipulated by the machine.</Paragraph>
    <Paragraph position="12"> Even though the Chinese pretransitlve preposition, ba__~eor jlan_._.~g, is the same as other prepositions structurally, it is deleted in the English structure. Therefore an additional conversion rule NB*L is needed. The L after the * indicates that the left constituent is to be deleted.</Paragraph>
    <Paragraph position="13"> WB NN VTH 1 V-V VTH lV-V ~ NN The conversion rule for the passive construction on the other hand requires addition. For example:</Paragraph>
    <Paragraph position="15"> All the syntactic conversion rules are either unary or binary (some longer ones are forced to be binary) in order to save table space in the computer. The table of constitutes resulted from the automatic parser consists of three columns: the array of the constitutes, the array of the left constituents and the array of the right constituents. A search through the array of the constitutes, the computer will know where and what conversion rules are, to be applied. For</Paragraph>
    <Paragraph position="17"> Another advantage of such a system is that further differentiation of meanings is made possible. It has been mentioned earlier that the meanings of a given lexeme can be differentiated by its context. When the context is as long as a clause or a phrase, it is best handled by syntactic conversion rules. For example: the character ~'~ de is translated to ''s', &amp;quot;of', 'who (which, when or where)', 'the one who (which, when or where)' and '~' under different conditions, a problem cannot be solved by dictionary lookup, but can be solved by the different constitutes it forms with other constituents.</Paragraph>
    <Paragraph position="18"> The constitute NDE, whose left constituent is a noun or verb phrase of any level or a whole clause and whose right constituent is de, can be an adjective or a noun. It is an adjective when it i8 followed by a noun phrase and it is a noun when it is preceded or followed by a verb phrase. First of all its function should be determined by the automatic parser~ the noun (NNDE) is distinguished from the adjective (NDE). The internal structures of both are the same in Chinese, but they represent different meanings which are represented by different English lexemes. Emphasis should be made here that this is not an ad hoc attempt to write a Chinese grammar on the basis of English translation, but an attempt to incorporate semantic  information into the system as demanded by the machine. Very frequently the different English translations point out the fact that a given Chinese lexeme represents different meanings in different contexts. A closer analysis of the contexts will reveal that the conditions which produce the same translation are usually consistently similar.</Paragraph>
    <Paragraph position="19"> The two sub-classes of NDE and NNDE are not sufficient to differentiate the various meanings de represents. Further refinements have to be made within each subclass of NDE or NNDE. The following pairs (NDE and NNDE) of conversion rules will illustrate this point.</Paragraph>
    <Paragraph position="20"> (a) Possession The meaning of possession is represented by de in Chinese and's in English when they are preceded by animate nouns, especially human nouns,and pronouns. Both the adjective and the nominal forms can be converted the same way.</Paragraph>
    <Paragraph position="21">  (b) Part of a whole When de is preceded by an inanimate noun or an abstract noun, it represents the meaning 'part of a whole' which is usually translated to 'of' in English. For example:  (c) A connective which represents no meaning When de is preceded by an adjective, it has no meaning. Therefore it is deleted in the conversion.</Paragraph>
    <Paragraph position="22">  phrase to a noun represents no meaning, but it has an English equivalent in the form of a relative pronoun. The case of the pronoun depends on the constituent that precedes the de.</Paragraph>
    <Paragraph position="23"> If the verb phrase consists of an intransitive verb phrase, the noun that the NDE modifies, or the NNDE implies is the subject of the verb phrase. For example:  nouns. They may be where, which or when depending on their antecedents. In order to make the computer to choose between where, who, which and when, more refinements have to be introduced to the codes of higher nodes.</Paragraph>
    <Paragraph position="24"> If the verb phrase before the de consists of a subject (or agent) and a transitive verb, the noun that the NDE modifies and the NNDE implies is the object of the verb phrase. For example:</Paragraph>
    <Paragraph position="26"> When the constitute is formed by a complete clause and de, it can only be an adjective which usually modifies the nouns such as time, place, method, instrument, manner and condition. Its conversion rule is the same as that for the intransitive verb phrase.</Paragraph>
    <Paragraph position="27"> Of the millions of de phrases, ten conversion rules are sufflcient to differentiate the many meanings that d_~e represents, and render a rough but adequate translation. In fact not many conversion rulQe are needed. In addition to those mentioned above, there are those for comparative constructions, locative phrase and some minor ones.</Paragraph>
    <Paragraph position="28"> (~) English morphological rules have been worked out by others.</Paragraph>
    <Paragraph position="29"> How they will be applied to machine translation from Chinese to English is a problem yet to be solved. Some of the solutions can be written into the dictionary, and some, incorporated into the syntactic conversion rules. A great deal of research have been done in this area; the main problem is to implement the information to the existing system.</Paragraph>
    <Paragraph position="30"> (5) The Chinese text to be translated is encoded in the Chinese telegraphic codes. The problem of allographs (including abbreviations) are taken care of by the dictionary lookup. The allographs represent the same lexeme are referred to that lexeme by the system. Punctuations prove to be confusing. The period is used to terminate a sentence, any other usage of the period, such as marking the end of a sub-title, should be eliminated. Otherwise every sentence can be nominalized by the automatic parser.</Paragraph>
    <Paragraph position="31"> With some knowledge of PL/I, I am tempted to say that machine translation from Chinese to English is not only possible, it is also easy to program. A program can be written to read the dictionary, the syntactic rules and the text, parse the sentences and store the constitutes. The program searches through the array of constitutes for the symbol of *, and then performs the syntactic conversions as indicated by the codes that follow the *. Once this is done, the r program retrieves the constituents of each constitute until the constituents are the terminal laxemesdeg Then the translation is already rendered in the citation forms.</Paragraph>
    <Paragraph position="32"> It is unfortunate that machine translation was considered  impossible a few years ago. and research efforts were curtailed.</Paragraph>
    <Paragraph position="33"> Correct machine translation of modern scientific writing is possible if enough organized linguistic data are stored in the computer.</Paragraph>
    <Paragraph position="34"> In the past twelve years the rigid demands imposed by the computer have accelerated the progress of linguistic research on many native languages. It is due to the demand of the automatic parser, the systematized syntactic rules were formulated. It is due the demand of choosing automatically the correct translation by the computer, the Study of semantics was initiated. Now the meanings of a given lexeme can be differentiated in terms of its context. With more sophisticated linguistic data of both source and target languages, more efficient prograu~ing language, and bigger and faster computers, machine translation can be a reality in the near future.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML