File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-0111_metho.xml
Size: 17,716 bytes
Last Modified: 2025-10-06 14:10:34
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-0111"> <Title>Realization of the Chinese BA-construction in an English-Chinese Machine Translation System</Title> <Section position="5" start_page="80" end_page="81" type="metho"> <SectionTitle> 4 a) (literal </SectionTitle> <Paragraph position="0"> translation: He, BA, just now, DE (STR), talks, again, speak, LE, one BIAN (CLS)) He repeated what he had said just now.</Paragraph> <Paragraph position="1"> Compare the following with aspect (literal translation: he, look, LE, this book) He has read the book.</Paragraph> <Paragraph position="2"> b) (GUO) (literal translation: he, look, GUO, this book) He read the book.</Paragraph> <Paragraph position="3"> c) He is looking at the book.</Paragraph> <Paragraph position="4"> Their point of view concerning this construction is also supported by some grammatical criteria to test the verbhood of a word. For instance, most monosyllable verbs can be duplicated as independent &quot;AA&quot; or &quot;A A&quot; structures in Chinese, for example &quot; (see, look)&quot; as &quot; &quot; or &quot; &quot; &quot; (read)&quot; as &quot; &quot; or &quot; &quot;; &quot; (eat)&quot;as &quot; &quot; or &quot; &quot;; and &quot; (go or walk)&quot; as &quot; &quot; or &quot; &quot;; but never &quot; &quot; as &quot;* &quot; or &quot;* &quot; (some transitive verbs can be used this way without objects, but the duplicated &quot; &quot; or &quot; &quot; as a verb must have its object following it, e.g. &quot; (make checks; to guard a pass, etc.)&quot; or &quot; &quot;). Furthermore the verb following the BA-construction is a transitive verb which in fact subcategorizes for (or still governs) the pre-posed logical object (the complement of the preposition BA) and the main verb is usually accompanied by other auxiliary constituents following or immediately preceding it. In other words, the verb can not stand alone after its object is moved in front of it (see in 6 a), 7 a) and 7 c)) in italics and in blue and the ungrammatical sentences 6 c) and 7 d)). Besides, Chinese is a thematic language, and the theme is often placed in front of the other constituents in the sentences accordingly. In many cases, we can see that the BA-construction does have an effect of emphasis on the semantic content that this structure carries (see the comparisons between 6 a) and 6 b), and between 7 a) and 7 b)). We take again the example (4), &quot;He repeated what he had said just now.&quot;, and show it in (6) (HU, 1991).</Paragraph> <Paragraph position="6"> structure + auxiliary constituent + V + LE) (literal translation: I BA letter carefully read LE) I have carefully read the letter.</Paragraph> <Paragraph position="7"> d) * As shown in example (6 a, c) and (7 a, c, d), if we leave out the auxiliary constituents &quot; &quot; in (6 a), and &quot; &quot;, &quot; &quot; and &quot; &quot;in (7 a, c), both sentences (6 c and 7 d) become ungrammatical. Therefore, the syntactic structure</Paragraph> <Paragraph position="9"> where the sentence can have an optional (in many cases) NP* as subject, followed by BA and its NP complement, then followed by a transitive V and another constituent X (which might precede the verb as shown in (b), and usually is an adverb or a prepositional phrase). Concerning our own view, we adopt the idea that the BA is a preposition with which the patient object is shifted to the front of the main verb and the BA structure functions as an adjunct of the verb like many other adjuncts that are often placed between the subject and the predicate verb (HU, 1991). The reason for this choice is that considering the BA-construction as a PP is easier for the syntactic analysis and formulation than taking it as a VP in a serial verb construction.</Paragraph> <Paragraph position="10"> Against this background, we will demonstrate in the following section how we formalize the BA-construction to cope with its English counterpart imperative sentences in our work and how these English sentences are finally constructed into grammatical target Chinese sentences containing the BA-structure.</Paragraph> </Section> <Section position="6" start_page="81" end_page="85" type="metho"> <SectionTitle> 3 Formalization of the BA-construction </SectionTitle> <Paragraph position="0"> The MT system we work with is oriented to the automatic translation of medical protocols selected from two sub-domains: echinococcosis (clinical practice) and molecular cloning (laboratory practice), where the predominant sentence type is the imperative sentence. Due to the fact that the BA-construction is mandatory in transferring some of the information conveyed in these SL sentences, we have formalized some English sentences into Chinese counterpart sentences containing the BA-construction. To do this, we compare carefully each of the sentence pairs in both languages from a parallel bilingual corpus which we have constructed for our research. In this way, we obtained enough evidence to support the formalization of this special Chinese construction for our MT system.</Paragraph> <Paragraph position="1"> Though the BA-construction is a very productive structure from which we can derive many varieties in Mandarin Chinese, our observation of the corpus reveals that the variations are limited but nevertheless indispensable for formulation.</Paragraph> <Paragraph position="2"> As we have mentioned in the above paragraph, we have constructed a parallel bilingual corpus for an experimental MT system for the purpose of automatic translation of medical protocols which are from two different sources: one is on echinococcosis, a kind of transmissible disease shared by humans and animals, and the other is on molecular cloning. Like many other scientific documents, the medical texts we collected show a high degree of homogeneity in respect of the text structure and lexical usage, but often we find very long and structurally complicated sentences which are difficult to analyze or to be formally represented. To narrow down the linguistic difficulty, we adopt the controlled language technique as a supporting method (CARDEY, et al. 2004), (WU, 2005). In other words, we first make the raw text materials simpler and easier for the computer to process, for example, to standardize the general structure of the text, the terminology, and to constrain the lexical usages and the sentence structures, which allows us to avoid many complex linguistic phenomena and which helps us to design practical controlled writing rules. Controlled language has been proved to be very feasible in machine translation by many systems, e.g. KANT (NYBERG & TERUKO, 1996). With the simpler and clearer input source sentences, the machine can generally produce better output target sentences. We finally work with our already wellcontrolled final texts for linguistic analysis which is based on unification-based grammar.</Paragraph> <Paragraph position="3"> According to our observation, the English sentences which have to be transferred into Chinese sentences containing the BAconstructions are of two types, of which one is obligatory and the other is optional (with the BA-construction or no). The typical feature of these kinds of sentences is that the main verb in the sentence often indicates a kind of change or movement; therefore, in both the source and target sentence the goal or location of this change or movement is represented by a prepositional phrase, for example: 8) Insert a catheter in the cyst.</Paragraph> <Paragraph position="4"> 9) Store the tube on the ice.</Paragraph> <Paragraph position="5"> The syntactic structure of this kind of sentence in the SL can be represented as:</Paragraph> <Paragraph position="7"> and we get two basic formulae by applying predicate-argument generation for example 8 and</Paragraph> <Paragraph position="9"> &quot;_&quot; refers to the position of the verb which may vary accordingly.</Paragraph> <Paragraph position="10"> From the aligned TL sentence, we can formulate the TL sentence as:</Paragraph> <Paragraph position="12"> in which the first PP is the BA-structure and the second PP corresponds to the PP in the SL.</Paragraph> <Paragraph position="13"> Therefore we get two corresponding formulae for example 8 and 9 in the TL respectively:</Paragraph> <Paragraph position="15"> In fact, for example 8 the Chinese translation can leave out the second preposition &quot; ... ( )&quot;, for the reason that it is more convenient if we lexicalize a Chinese equivalent for the English preposition &quot;in&quot; in the Chinese translation at the cost that it is a bit redundant in the TL sometimes, but completely grammatical and acceptable. Our principle here is that every word should have its status in the sentence. So whenever it is possible and, in particular acceptable in the TL, we assign a correspondence to the SL preposition (or other words like adverbs or NP as adjunct) in the TL.</Paragraph> <Paragraph position="16"> By doing so, the machine can have a better performance in most cases. It is particularly beneficial for bi-directional MT. The correspondence of a SL preposition is mostly composed of two Chinese characters in the structure of &quot;X ... Y&quot;, of which &quot;...&quot; is the position of the complement of the preposition in question. The second element &quot;Y&quot; is usually considered as a noun indicating the direction or location in Chinese. However, in our case, we consider it as a disjoint part of the first preposition &quot;X&quot;. In other words, the &quot;X...Y&quot; structure is considered as one language unit in our practice. The lexicalization of a prepositional phrase in the TL is also one of our criteria to test if a sentence has to be constructed with the BA-structure or not. Most importantly this practice can reduce the workload of writing too many grammatical rules for the system, for example when a preposition has to be translated into Chinese and when it needs not to, etc.</Paragraph> <Paragraph position="17"> Like most of the English imperative sentences, the Chinese counterpart sentences start with verbs. However, in some cases, the BA-construction is also employed. Generally speaking, many of the sentences can be used in both ways: to start with a verb or start with the BA-construction. They do not make big differences in general. However, semantically the sentences starting with a verb tend to be more narrative while the BA-construction is more firm and authoritative in expressing the ideas, for example: 10) Store the tube on the ice.</Paragraph> <Paragraph position="18"> a. (BA + N + V + PP)9 11) Aspirate the contrast medium from the cyst.</Paragraph> <Paragraph position="19"> a.</Paragraph> <Paragraph position="20"> b.</Paragraph> <Paragraph position="21"> The protocols we work with are instructions of certain step-by-step procedures of either clinical practice or laboratory practice, just like product use instructions, recipes and user's manuals. The semantic contents of these sentences should be firmly expressed as kinds of orders. Though both pairs of the Chinese sentences (10 and 11) are transferring the same idea, the BA-construction is more expressive and natural in this case (example 10 a) and 11 a).</Paragraph> <Paragraph position="22"> In our corpus, we have observed that some of the English imperative sentences can be transferred into two kinds of BA-construction, that of obligatory and that of optional.</Paragraph> <Paragraph position="23"> Obligatoriness: In our work, some sentences must be constructed into Chinese BA-structure, otherwise, the whole sentence sounds either ungrammatical (see in c below) or unnatural or especially unacceptable (see in b below). The grammaticality of the sentence can be tested by moving the translated SL PP to the front of the sentence in the TL (see in c)), for example: As is shown in (c), if the whole sentence becomes ungrammatical after moving the PP in front of the sentence, we classify the sentence as obligatory to be transferred into to a TL sentence containing the BA-structure. We then constrain the syntactic structure to the first one as the legal structure while excluding the other two, thus the formulations are:</Paragraph> <Paragraph position="25"> (unacceptable) 9 Note: the BA is underlined; the verb is in bold font; and the object (logical) is in italic.</Paragraph> <Paragraph position="26"> 10 Red: refer the translated SL PP in TL.</Paragraph> <Paragraph position="28"> Notice that though the first excluded formulation in the TL shares the same structure as that of the SL, they are unacceptable in the TL. The same situation applies to the following two examples: The final legal formulations are: Leave (_, Compl1, in_Compl2, T) (BA_Compl1, _, _Compl2_ , T) The alternatives (in a) and b)) will be excluded as long as the first one (a) is a perfectly acceptable sentence. Unlike the &quot;X&quot; in example (13 and 14), here the &quot;T&quot; refers to adjuncts which refers to TIME and which usually occupies a different position in the sentence in our case.</Paragraph> <Paragraph position="29"> Therefore our criterion to test the obligatoriness is to see what kind of grammatical performance a sentence will exhibit when it is used in the form shown in the above (b's and c's, especially in (c's)). If the sentence looks unacceptable or is in particular ungrammatical, then it must be constructed into the TL sentence containing the BA-structure. This phenomenon is in fact closely related with the semantic contents of the verb and as well as the preposition (a goal or a location) in question (we will not discuss this aspect in this paper).</Paragraph> <Paragraph position="30"> Optionality Some sentences that we have observed can be used optionally. That is to say, we can transfer the SL sentences without employing the BAconstruction, or with the BA-construction in the TL. In doing so, no significant loss of the sentence meaning will occur (except that in some cases there still exist the semantic differences where a BA-construction exhibits firmness and authority), for example:</Paragraph> <Paragraph position="32"> Here &quot;Y&quot; refers to adverbs.</Paragraph> <Paragraph position="33"> However, if the transitive verb (e.g. &quot;vortex&quot;) is used intransitively as is often the case in our corpus, the BA-construction has to be changed to the normal sentence structure (V + (X) + PP), for example: 17) Vortex gently for a few seconds.</Paragraph> <Paragraph position="34"> Formulation for this becomes:</Paragraph> <Paragraph position="36"> The reason why we allow the alternative formulations in the second case is that these sentences are actually subcategorized for by the verbs and will not be confused with other similar syntactic structures (e.g. V + NP + PP) which do not employ the BA-construction in the TL while transferring the intended information. We demonstrate this with an example: 18 a) Puncture the cyst with the needle.</Paragraph> <Paragraph position="37"> While the machine is searching the information concerning this sentence, two major supported sources of information (lexicon and grammar rules) will help it find the correct structure for transferring the sentence into the correct TL correspondence. Therefore, the machine will not mismatch the syntactic structure for this sentence by wrongly employing the BA-construction, for example the following translation will be excluded by both the information stored in the lexicon and grammar as a legal instruction: b) * This is an understandable but very unnatural sentence and can be regarded as ungrammatical in the target language. Though it possesses the same structure as that of the other BAconstruction, the problem of this ungrammaticality is caused by the semantic content conveyed by both the verb and the preposition. Usually a BA-construction expresses the resultative or directional effect of the verb. However, what the PP &quot;with the needle&quot; expresses is the manner of the verb, that is, how the action is done. Semantically, it is not within the semantic scope of the BA-construction (though we can find few contradictory examples) and thus can not be translated into to the target language by incorrectly employing the BAconstruction. null In our system prepositional phrases like, &quot;with the needle&quot; is subcategorized by the verb &quot;puncture&quot; and the syntactic rules for this verb. To demonstrate this, we simplify the lexical and syntactic information as shown in the formula below:</Paragraph> <Paragraph position="39"> The above information tells us that the verb &quot;puncture&quot; of the source language, like the other verbs mentioned in the previous paragraphs, can have two complements, of which one has a preposition as the head of the second linear complement. The correspondence in the target language for this verb is &quot; &quot; which take two complements too. One corresponds to the first complement of the SL and is placed after the verb &quot; &quot;, and the other complement corresponds to the second complement but is placed in front of the verb with a preposition as its head &quot; &quot;. The simplified syntactic structures for both sentences are: SL: V_311 (_, A, P_B) TL: V_3 (P_B, _, A)</Paragraph> </Section> class="xml-element"></Paper>