File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/n01-1019_metho.xml

Size: 18,885 bytes

Last Modified: 2025-10-06 14:07:32

<?xml version="1.0" standalone="yes"?>
<Paper uid="N01-1019">
  <Title>Information-based Machine Translation</Title>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
1. Linguistic Context for Translation
</SectionTitle>
    <Paragraph position="0"> In translating different words, phrases, and expressions, different types and amounts of information from the context need to be considered. (Only the sentential context is considered here.) So far, a systematic solution to this problem has not been found. This section illustrates the extent of this problem, and the remainder of this paper describes our approach.</Paragraph>
    <Paragraph position="1"> 1.1. Expressions with to have We examined the problem of translating the English main verb to have into Japanese. The verb to have was selected because it is quite common in colloquial English, yet forms a large variety of senses, collocations, and idioms. 615 different expressions containing the English verb to have were extracted from a 7000-sentence corpus from the &amp;quot;international travel&amp;quot; domain. Each English expression was manually translated into Japanese in the most general way possible.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
1.2. Target-language Distinctions
</SectionTitle>
    <Paragraph position="0"> The most general translation for the construction &amp;quot;X have Y&amp;quot; in this domain was found to be G01G02 G03G04G05G06 (X-ni Y-ga aru): The copy shop next door has a fax machine.</Paragraph>
    <Paragraph position="1"> G01G02G03G04G05G06G07G08G09G0AG0BG0CG0DG0EG0FG10G03G11G12G13 tonari-no kopiiya-ni fakkusu-ga arimasu.</Paragraph>
    <Paragraph position="2"> next-ATT copy shop-LOC fax-NOM exist Other translations are often necessary when the target language imposes finer semantic distinction on the state or on the action that is described. For example, if the object noun phrase refers to one or more human beings, the Japanese verb aru is changed into iru. Similarly, the word pet or a pet animal as the object noun phrase triggers the translation of to have as katteiru, a Japanese verb for keeping an animal as a pet : We have two sons.</Paragraph>
    <Paragraph position="3"> G14G15G0FG16G17G18G11G12G13 musuko-ga futari imasu.</Paragraph>
    <Paragraph position="4"> son-NOM two-CONTR exist Do you have pets? G10G02G19G1AG1BG0CG1CG1DG1EG1FG20G18G11G12G21G13 anata-wa petto-wo katte-imasu-ka you-TOP pet-ACC keep-ST-Q Other examples of finer target-language distinctions include a symptom/disease as the object of to have. While many physical symptoms and minor diagnoses (e.g. pain, cavity, fever, allergy) use the default translation (X-ga aru), a serious illness or diagnosis is translated into the copula construction. Many other to have constructions with a symptom/disease object require verbs that are specific to the object noun phrase in Japanese: I have diabetes.</Paragraph>
    <Paragraph position="5">  Some verbal adjuncts can affect the translation of the to have construction, not by altering the basic sense of 'existing', but by adding further information to specify the way in which something 'exists'. One example of such an adjunct is a prepositional phrase (PP) whose object noun phrase shares its referent with the SUBJ of have. For example, the utterance below expresses that the map is held or carried by the speaker, and the Japanese translation uses the verb motte-iru, literally meaning to be carrying/holding.</Paragraph>
    <Paragraph position="6"> I have the map with me.</Paragraph>
    <Paragraph position="8"> If the subject noun phrase is inanimate, the Japanese translation uses the verb tsuite-iru, which literally means to be attached.</Paragraph>
    <Paragraph position="9"> The main dish has a salad with it.</Paragraph>
    <Paragraph position="10"> G01G02G03G04G05G06G07G08G09G0AG0BG0CG0DG0EG0FG10G11G10G12G13G14 meindisshu-ni-wa sarada-ga tsuite-imasu.</Paragraph>
    <Paragraph position="11"> main dish-LOC-TOP salad-NOM attach-ST Similarly, a construction with an on-PP is translated into the Japanese construction notte-iru, which literally means to be written/placed on. A construction with an in-PP is translated into the Japanese construction haitte-iru, which literally means to be placed in: Does the map have subway lines on it.</Paragraph>
    <Paragraph position="12"> G37G04G38G39G09G38G3BG3CG3DG0FG04G1FG20G18G11G12G21G13 Sono chizu-ni chikatetsusen-ga notte-imasu-ka.</Paragraph>
    <Paragraph position="13"> the map-LOC subway line-NOM written-on-Q The closet has extra hangers in it.</Paragraph>
    <Paragraph position="15"> Adjunct adjectival phrases and past participles also specify the way something exists. For example, available in the have construction generally changes the translation to aite-iru, to be open or available: We have one twin room available.</Paragraph>
    <Paragraph position="17"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
1.4. Source Language Ambiguities
</SectionTitle>
      <Paragraph position="0"> In some cases, the to have construction in English carries more than one sense, and some linguistic contexts can bring out one of the senses as the preferred meaning. For example, the construction X has a Y taste is ambiguous between to be exercising Y (personal) taste and to taste X. This ambiguity is usually resolved by looking at the semantic properties of the subject noun phrase, as illustrated in the examples below: He has simple tastes.</Paragraph>
      <Paragraph position="1"> G45G0FG46G40G47G48G02G49G4AG1DG2FG20G18G4BG13 kare-ga shinpuru-na shumi-wo shiteiru he-NOM simple taste-ACC do-ST This wine has a very clean taste.</Paragraph>
      <Paragraph position="2"> G36G04G4CG3FG40G1AG01G20G4DG4EG4FG21G02G4AG0FG12G4BG13 kono wain-wa totemo sawayaka-na aji-ga suru this wine-TOP very refreshing taste-NOM do When the object refers to a specific type of information, such as number or address, the construction is inherently ambiguous between to know (the number), to be carrying (the number), and (for the number) to exist. The construction usually carries the meaning of to know, but if the construction is negated, then the sense of to be carrying becomes more preferred, since the negative construction is more specific and only negates the proposition that the object is accessible:  Another example of the ambiguities of to have concerns the two senses to have something available and to eat, when the object noun phrase refers to an edible entity. Our corpus analysis shows that some of the linguistic contexts bring out one of the two senses as clearly preferred. For example, the past tense or the perfective aspect brings out the to eat sense, whereas the present tense without any aspect markers suppresses this sense:  In some of the constructions, to have functions as a support verb. In the support verb construction the object noun phrase constitutes a part of the verbal predicate rather than an argument of the verb. If the target language does not have an equivalent support verb construction, such an expression with a support verb construction has to be translated into the corresponding single verb construction.</Paragraph>
      <Paragraph position="3"> Idiomatic expressions in the source and target languages, and their varying degrees of &amp;quot;fixedness&amp;quot;, also play a role. For example, the wordG01G07G08 (kentoo), the Japanese translation of a clue in I don't have a clue, requires the special verb G09G0A (tsuku), to constitute an idiomatic expression G07G08G04G09G0A (kentoo-ga tsuku). As another example, the English expression Have a good one does not allow a compositional translation into a Japanese construction with a main verb plus an object.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
1.6. Discussion
</SectionTitle>
      <Paragraph position="0"> From the data described above, it is clear that there are various factors that contribute to the different patterns of translation. In order to handle these different translations correctly, it is necessary to identify the linguistic features of the context that trigger different translations, and to determine how the different features and contexts interact. In the case of the English to have construction, the following surface linguistic features are identified that can be interpreted as 'triggers' for translations other than the default translation:  (request, suggestion, etc.) We found that some of the factors have stronger influence on the translation than others. For example, consider the following expression: Can I have a look at the room? G37G04G41G08G1DG63G64G2EG11G12G21G13 sono heya-wo mi-raremasu-ka.</Paragraph>
      <Paragraph position="1"> the room-ACC look-PTN-Q The source-language expression contains more than one factor that can trigger a different translation. The first factor is the construction that usually carries the pragmatic force of &amp;quot;request&amp;quot;, Can I have X?, which usually triggers the XG0BG0CG0DG0EG0FG10G11G12G13 (X-wo o-negai dekimasu-ka) construction. At the same time, the object noun phrase a look means that the verb to have is used as a support verb. For this reason, the combination of the verb have and the object noun phrase a look has to be translated into Japanese as the verbal predicate G07G06 (miru). This shows that the translation preference that is triggered by the root string of the object noun phrase is stronger and should take preference over the translation preference that is triggered by the pragmatic force.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2. Information-based MT
</SectionTitle>
    <Paragraph position="0"> We argue that the sorts of complex translation correspondences that were illustrated in the previous section are best represented as translation examples, but that the transfer procedure must use qualitative linguistic constraints in order to choose the correct examples. Given the types of linguistic features that influence translation, a highly expressive linguistic representation for both input and translation examples is required. We employ typed feature structures throughout all stages of translation.</Paragraph>
    <Paragraph position="1"> Since there are complex interactions among different contextual factors, a single quantitative matching function that calculates a distance between the input and the examples is not sufficient. Multiple steps of matching are needed, each considering a small number of linguistic dimensions, with the steps executed in the appropriate order. This is best achieved with a rule-based linguistic transfer procedure that controls the example matching procedure.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1. Transfer Component Architecture
</SectionTitle>
      <Paragraph position="0"> The transfer component for information-based MT consists of two main procedures, the linguistic transfer procedure and the example matching procedure. This is illustrated in Figure 1. The input to this component is the source-language typed feature structure; this is created by an analysis component that is not described further here. Similarly, the output of  the transfer component is a target-language typed feature structure, from which the target-language expression is generated by the generation component (also not described further).</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2. Linguistic Transfer
</SectionTitle>
      <Paragraph position="0"> The linguistic transfer procedure is implemented as a rewrite-grammar using the special-purpose Grammar Programming Language (GPL) (Duan, et al. 2000, Franz, et al. 2000a). The general role of the transfer grammar is to operate on the input feature structure in a recursive manner, and to perform source-to-target transfer by invoking the example matching procedure, and by using the translation examples to construct a target-language feature structure. The transfer grammar implements the principle of &amp;quot;large to small&amp;quot; in covering the input feature structure. When the transfer procedure invokes the example matching procedure, it implements the principle of &amp;quot;specific to general&amp;quot;. Since the linguistic features interact with each other when they are combined, and since some of the features have more influence on the translation than others, it is necessary to specify a number of separate invocations of the example matching procedure, and to pay particular attention to their order. The invocations of the example matching procedure are arranged so that each call focuses on one or two features, making sure that both the input and the example contain the same feature(s).</Paragraph>
      <Paragraph position="1"> Different invocations of the matching procedure are ordered so that the system checks the existence of the most important factors first, gradually progressing to the least important factors.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3. Example Matching
</SectionTitle>
      <Paragraph position="0"> The example matching procedure matches the input feature structure against the example feature structures, and it returns the most appropriate example. The architecture of this module is shown in Figure 2.</Paragraph>
      <Paragraph position="1"> When the transfer procedure invokes the example matching procedure, it specifies a set of linguistic constraints on which examples may be considered. This is used to narrow down the search space from all the examples to a much smaller set. The examples that satisfy these constraints are matched in detail against the input feature structure. The detailed match is a recursive process operating on the two feature structures that is based on costs for inserting, deleting, or altering features, and on certain constraints for particular features. Lexical similarity is calculated from the thesaurus on the basis of the information content of the thesaurus nodes.</Paragraph>
      <Paragraph position="2"> During example matching, the input feature structure is aligned with the example feature structure. The alignment information is used by the transfer procedure to handle differences between the input and the example. For example, if the input contains grammatical features, modifiers, adjuncts, or sub-constituents that are not in the examples, then they are transferred to the target-language representation. Similarly, if the example feature structure contains information that is not present in the input, then the transfer procedure deletes the relevant information.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3. Example Database
</SectionTitle>
    <Paragraph position="0"> The example database contains a large set of translation examples represented as pairs of typed feature structures in the source and target languages. Using a Treebanking tool, the examples are disambiguated, and indices that show corresponding constituents are added. In addition to the type and complexity of the example feature structures, there are three methods for identifying the degree of linguistic  G01 Figure 2: Architecture of the Example Matching ProcedureG01 specificity of an example: marked examples, example indices, and semantic constraints. This information is used by the transfer procedure and the matching procedure to select the best example, using the mechanism of linguistic matching constraints that was described above.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1. Marked Examples
</SectionTitle>
      <Paragraph position="0"> Some of the features that were shown in Section 2 to influence the translation have been traditionally described as &amp;quot;marked&amp;quot;. Examples include negation, interrogative, and also the presence of certain adjuncts. The transfer procedure regards these examples as more specific than unmarked examples, and (via the linguistic constraints passed to the matching procedure) only allows such examples when appropriate.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2. Example Indices
</SectionTitle>
      <Paragraph position="0"> Examples can contain two types of indices linking a source-language sub-feature-structure with a target-language sub-feature-structure. A CORRESPOND-INDEX signals that the two constituents correspond to each other, while a REPLACE-INDEX signals that two constituents correspond to each other and can be replaced by similar constituents.</Paragraph>
      <Paragraph position="1"> The absence of such indices in a major argument phrase (such as the subject or object) indicates that the example is more specific. A CORRESPOND-INDEX is more specific than a REPLACE-INDEX, since a CORRESPOND-INDEX indicates that although the head of the constituent allows modifiers, the constituent can not be substituted. For example, the object the bucket in the example for the idiom to kick the bucket does not contain any indices, since the idiom does not allow substitution or modification. On the other hand, a heart attack in to have a heart attack allows modifiers (e.g. a severe heart attack), so the example for the idiomatic translation carries a CORRESPOND-INDEX.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3. Semantic Constraints
</SectionTitle>
      <Paragraph position="0"> The example database also contains certain semantic constraints on source-language sub-feature-structures. When an input feature structure is matched with such an example, the matching procedure checks whether the input satisfies the semantic constraint. If it does, then that example is preferred over other examples, since it is more specific than other examples that do not carry a semantic constraint. On the other hand, if the input does not match the constraint, then the match is rejected.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.4. Sample Entry
</SectionTitle>
      <Paragraph position="0"> Figure 3 shows the example pair for the expressions Can I have your name? G02G03G04G05G06 G03G07G08G09G0AG0BG0CG0D (o-namae-wo o-negai dekimasu-ka). This example has a number of marked features. The mood of the sentence is yes-no question, the modal auxiliary can is present, and the subject does not contain an index. These features are used by the transfer procedure to ensure that the example is only used to translate appropriate input.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML