File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/c02-1163_metho.xml

Size: 20,757 bytes

Last Modified: 2025-10-06 14:07:49

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1163">
  <Title>Machine Translation by Interaction between Paraphraser and Transfer</Title>
  <Section position="3" start_page="0" end_page="4" type="metho">
    <SectionTitle>
2 Sandglass Translation Model
</SectionTitle>
    <Paragraph position="0"> Figure 1 shows our paradigm for a translation model. In the conventional MT model, the process load and the information used to deal with  it are maximized in the transfer module; however, we propose that they be minimized in the transfer in consideration of language portability and task portability.</Paragraph>
    <Paragraph position="1"> This translation approach is effective in MT where neither the source- nor target-language is English. Although there are a large number of bilingual corpora currently available, most of them are between English and other language.</Paragraph>
    <Paragraph position="2"> This suggests that it is not useful to apply bilingual-corpus-based approaches to situations not involving English. Moreover, conventional approaches based on hand-written rules are also unsuccessful due to lack of bilingual speakers of non-English pairs.</Paragraph>
    <Paragraph position="3"> We also assume that reduction of bilingual processing costs is crucial for multilingual MT construction. Although both interlingual MT and MT with controlled language satisfies this request, our MT paradigm has an advantage in that it does not require design of interlingua/controlled language, which can be a critical problem.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Modularity and paraphrasing
</SectionTitle>
      <Paragraph position="0"> strategy The Sandglass translation model has a source language paraphraser (hereafter the paraphraser) and a bilingual language transfer (hereafter the transfer), which have high modularity with each other in order to develop them as independently as possible. One of our aims in this model is to develop a general-purpose paraphraser that can also be used in other NLP applications.</Paragraph>
      <Paragraph position="1"> When the system has modularity, the paraphraser does not need to consider the knowledge or translation capability of the transfer. However, the paraphraser has trouble in planning a paraphrasing strategy, since the purpose of paraphrasing in this model is to help smallknowledge transfer. One may think of it as a solution to generate all possible paraphrases, transfer them into the target language, and select the best one among the successful outputs. We believe that, although this strategy works, it is not practical due to the computation cost. In many cases, there are local paraphrases possible for one input, which may result in combinatorial explosion for generating paraphrases. Moreover, this strategy leads to a more serious problem in speech translation that requires real-time computation.</Paragraph>
      <Paragraph position="2"> As an alternative, we propose the following strategy for planning paraphrases. We first put the controller between the paraphraser and the transfer. The controller communicates with both the paraphraser and the transfer and exchanges information between them on the target sentence be translated. As opposed to the one-way information path from the paraphraser to the transfer, a bi-directional information flow enables cooperation by allowing each module to provide its counterpart with information on what is possible and what is impossible.</Paragraph>
      <Paragraph position="3"> This kind of process is not necessary in the typical MT model, since each process has the responsibility to perform its mission successfully and giving up is never allowed. If one of the processes gives up its mission, the entire translation process also gives up and fails. On the contrary, our model (sometimes) allows the transfer to give up generating the target language. Although this responsibility continuously enlarges the transfer knowledge, it is one of the critical problems of the typical MT. In general, in order to avoid a fatty transfer, we propose shifting the responsibility of generating the target language from the transfer process to monolingual processes.</Paragraph>
      <Paragraph position="5"/>
    </Section>
    <Section position="2" start_page="0" end_page="4" type="sub_section">
      <SectionTitle>
2.2 Interaction between paraphrasing
</SectionTitle>
      <Paragraph position="0"> and transfer Figure 2 illustrates our translation strategy. The translator mainly consists of the paraphraser and the transfer, where a controller is located between the two modules in order to control the information flow  . This model has the following characteristics: (1) the paraphraser and the transfer are equivalent in terms of process sequences, i.e., the process flow is not an assembly line type, and (2) the knowledge for paraphrasing and that for transferring are separated so that the paraphraser and the transfer are responsible for monolingual and bilingual processing, respectively.</Paragraph>
      <Paragraph position="1"> The translation process is achieved as follows. The output of word segmentation and part-of-speech (POS) tagging is first attempted to transfer to the target language through the controller. Assume that a sequence of morphemes</Paragraph>
      <Paragraph position="3"> to transfer (process &lt;1&gt; in the figure).</Paragraph>
      <Paragraph position="4"> In this case, the transfer may obtain information on the failed input morphemes that is useful for the paraphrasing strategy, such as similar morpheme sequences that can be transferred or  For simplicity, other parts of the translator are hidden in the figure.</Paragraph>
      <Paragraph position="5"> parts of the input that are impossible to transfer. Our transfer can obtain expressions similar to the input, if any exists, when the transfer fails. In this example, the transfer found that  is similar to one in its knowledge, i.e., it understands that the input can be transferred if W  can be paraphrased into W  . Accordingly, the transfer provides this information to the paraphraser as a paraphrasing hint (process &lt;2&gt; ). Then the paraphraser attempts to use this suggestion prior to other paraphrasing trials. It  it has such knowledge, it paraphrases based on the transfer hint and returns this paraphrase to the transfer (process &lt;3&gt; ). Again, the transfer carries out a new trial and it succeeds in translation this time (process &lt;4&gt; ). Finally, the target language expression is passed to the subsequent processes (process &lt;5&gt; ).</Paragraph>
      <Paragraph position="6"> Among the possibilities other than those shown in the figure, if the transfer cannot find any similar expression, the paraphraser then attempts to rewrite the input by utilizing its own paraphrasing knowledge. Similarly, if the paraphraser cannot accept the rewriting hint that the transfer suggests, the paraphraser also thinks by itself.</Paragraph>
    </Section>
    <Section position="3" start_page="4" end_page="4" type="sub_section">
      <SectionTitle>
2.3 Paraphraser
</SectionTitle>
      <Paragraph position="0"> Currently, our paraphraser can deal with six paraphrasing types: (1) verification of the transfer's suggestion, (2) input segmentation, (3) reduction of honorific expressions (Ohtake and Yamamoto, 2001), (4) simplification of functional words (Yamamoto, 2001), (5) chunking of noun phrases, and (6) deletion of minor elements. Paraphrasing is conducted in this order. If one of the pattern conditions in this paraphrasing knowledge is matched, the paraphraser then finishes and returns its paraphrase; no other paraphrase is pursued.</Paragraph>
      <Paragraph position="1"> (1) As the first type of the paraphrasing, the paraphraser verifies the paraphrasing hint that the transfer suggests, if any. In our model, all of the suggested paraphrasing rules are formed as single-morpheme replacements, most of which are functional words. Therefore, the paraphraser has a list of these types of rephrasing rules in advance to verify the suggestion. We have built a list that contains 175 replacement patterns.</Paragraph>
      <Paragraph position="2">  (Until what time is it?) In the above two examples, a sentence-final particle and an auxiliary verb are replaced, respectively. These slight differences should be merged before bilingual processing in order to restrict unnecessary combinations in the target language.</Paragraph>
      <Paragraph position="3"> (2) If the verification fails, the paraphraser then attempts to split the input utterance according to the pre-defined segmentation rules. This is necessary because we are dealing with spoken language, which has no clear sentence boundaries. The segmentation rules, consisting of 30 rules, are defined by checking sequences of either word or POS. For example, in many cases, if there is a sentence-final particle, then the input is segmented after that word. In the following example, a segmentation border is described by the symbol &amp;quot;;&amp;quot;.</Paragraph>
      <Paragraph position="4">  (How much? That one.) It is possible to regard the above two examples as single sentences, so it is difficult in general in Japanese speech to determine whether to segment them or not. However, this is not a problem in the proposed method because our segmentation is conducted only if the transfer fails to deal with the input as a single sentence. (3) Honorific expressions are seen in Japanese speech very frequently. These expressions involve many variations for expressing one sense, so they should be unified before the transfer to avoid the great amount of increase in unnecessary bilingual knowledge that would be expected. Our paraphraser for honorifics, which was proposed by Ohtake and Yamamoto (2001), reduces such variations to as few as possible. We have 212 paraphrasing patterns for honorific expressions. null  many variations in Japanese verbal expressions, so we again need to reduce variations. Spokenstyle expressions are targets of paraphrasing here, and they are replaced by written- or normal-style expressions. The target phenomena and the effects of this paraphraser have been discussed in Yamamoto (2001). The paraphraser we use involves 302 patterns.</Paragraph>
      <Paragraph position="6"> (I think it may be a cold.) (5) Noun phrases are chunked here according to simple pattern matching by lexicon or POS: if two or more nouns are consecutive with or without a possessive-like particle &amp;quot;58,&amp;quot; we then regard them as one noun phrase. This process is necessary because we parse input utterances in neither the paraphraser nor the transfer, and the transfer only see POS sequences. We expect that this chunking would help to make our template-based poor transfer more robust against input variations. However, we place this process at a low priority in the paraphrasing order because an unconditional operation of this process is considered to be troublesome, especially in spoken language. A chunk is illustrated  (6) As the final paraphrasing measure,  the paraphraser deletes relatively unimportant parts of the input expressions, such as adverbs of manner and degree, as well as particles expressing topical markers. Changing POS sequences of the input changes the searching space in the transfer knowledge. In the following two examples, two particles and an adverb are deleted, respectively. Currently, we have 22 patterns in this type.</Paragraph>
      <Paragraph position="7">  (Perhaps it takes ten minutes.)</Paragraph>
    </Section>
    <Section position="4" start_page="4" end_page="4" type="sub_section">
      <SectionTitle>
2.4 Transfer knowledge construction
</SectionTitle>
      <Paragraph position="0"> Our transfer knowledge is constructed as follows. Because our principle requires that the bilingual processing and its efforts should be reduced as much as possible, our bilingual knowledge is primitive and easy to construct automatically. Our knowledge sources are a sentence-aligned text corpus between Japanese and Chinese, a Japanese-Chinese dictionary where one source word may correspond to many target words, and a Japanese analyzer. Note that we used neither a Chinese analyzer nor tagging in the Chinese corpus.</Paragraph>
      <Paragraph position="1"> Our transfer process is based on a wordtemplate transfer technique, and we conducted automatic word alignment for its knowledge.</Paragraph>
      <Paragraph position="2"> We first analyzed all Japanese sentences in the corpus by the free-to-use morphological analyzer JUMAN  . We then checked, by string matching, whether each source language content word has a corresponding target word. If this alignment succeeds, both source and target language words are tagged with the same ID number. When more than one translation in the dictionary can be aligned, the longest word in the target side is selected for alignment.</Paragraph>
      <Paragraph position="3"> One source language word may correspond to a target word that appears more than once.</Paragraph>
      <Paragraph position="4"> For example, a translation of the Japanese question &amp;quot;773961487037&amp;quot;is&amp;quot;32333234&amp;quot;. We can deal with this result by accepting multiple correspondences, e.g., &amp;quot;&lt;32#538&gt; 33&lt;32#538&gt; 34&amp;quot; where &lt;***&gt; is a word boundary and #538 is an (example) ID number.</Paragraph>
      <Paragraph position="5"> Hereafter, we call these sentence sets templates and the aligned parts in a sentence variables. null</Paragraph>
    </Section>
    <Section position="5" start_page="4" end_page="4" type="sub_section">
      <SectionTitle>
2.5 Transfer
</SectionTitle>
      <Paragraph position="0"> The transfer module converts the source language input into the corresponding target language expressions by using the templates. The  http://www-nagao.kuee.kyoto-u.ac.jp /nl-resource/juman-e.html process consists of two parts, i.e., template retrieval and template matching.</Paragraph>
      <Paragraph position="1"> The process first searches for templates satisfying similarity to the input expression. In order to judge similarity between the input and the templates, we only use the POS sequences of the input. If the retrieval succeeds, i.e., templates are found that have the same POS sequence, we then compare, word by word, the input and each retrieved template. If a word is a variable in the template, this comparison always succeeds. If there is no template retrieved, the transfer reports to the paraphraser (through the controller) that the retrieval process has failed. In this case, the paraphraser is required to somehow change the input sentence in terms of POS sequences.</Paragraph>
      <Paragraph position="2"> Suppose that some templates are retrieved but matching fails, implying that some lexicons are different. Although this case is a transfer failure as well, the transfer has information on which parts of the input sentence failed to transfer, and such information could be a key for paraphrasing. Therefore, information on unmatched parts is also returned to the paraphraser with the result of the transfer failure. If multiple templates are retrieved and all of them fail in matching, all of the unmatched parts are returned in parallel.</Paragraph>
      <Paragraph position="3"> If both the template retrieval and the template matching succeeds, this indicates that the transfer process has finished successfully. The input sentence is converted to the target language, and the transfer throws it to the controller for the following process. If more than one target language expression is returned due to the multiple successes in template matching, all of them are returned in parallel, and the following processes determine the best results.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="4" end_page="4" type="metho">
    <SectionTitle>
3 Preliminary Experiment
</SectionTitle>
    <Paragraph position="0"> We conducted a preliminary experiment to evaluate the translation capability under the current paraphrasing skills. Although there are many items that should be evaluated in MT, our first interest in the prototype is how much the paraphraser supports poor transfer knowledge and how small the acceptable transfer knowledge can be.</Paragraph>
    <Paragraph position="1"> The transfer knowledge contains a bilingual dictionary of approximately 51,000 source language lexical entries, as well as up to 233,000  utterances, in the domain of travel situations, and their translations. For evaluation, we use 1,000 utterances, each of which is 10 or fewer morphemes long, selected at random and unseen by the transfer.</Paragraph>
    <Paragraph position="2"> The prototype is programmed in Perl language, and the running time at the maximum transfer knowledge is 0.555 second per utterance with a Pentium III 600 MHz processor. The ratios of the fully- and partially-translated utterances to several transfer knowledge sizes are plotted in figure 3. For comparison purposes, translation performance without the paraphrasing process is also illustrated in the figure. The experiments were conducted three times under each condition.</Paragraph>
    <Paragraph position="3"> We can understand the importance of paraphrasing by observing the approximately 20%40% performance gaps between full output and no paraphrasing. The paraphraser improves performance regardless of the knowledge size.</Paragraph>
    <Paragraph position="4"> The gaps are not trivial, so the experiments confirmed that the paraphraser plays an important role in the interaction process.</Paragraph>
    <Paragraph position="5"> The figure also shows the fact that only 30% of the unseen input is translated using POSsequence-based maximum templates. Considering that all inputs are 10 morphemes or fewer, this low performance implies the necessity to acquire 70% knowledge by somehow generalizing the existing 30% knowledge. The current paraphrasing knowledge - a collection of human intuition - can cover 40% of the inputs, while it seems difficult to cover the same or higher  trials by the amount of transfer knowledge level by only automatically-acquired information from corpora.</Paragraph>
    <Paragraph position="6"> Figure 4 shows the average number of paraphrasing trials. It would be a major problem in this design if there were many interaction loops between the paraphraser and the transfer, but we found that such worries are unwarranted in the current system. However, it is necessary to be careful in this measure, since we need to add more functions to the paraphraser in order to avoid zero output.</Paragraph>
  </Section>
  <Section position="5" start_page="4" end_page="4" type="metho">
    <SectionTitle>
4 Related Works
</SectionTitle>
    <Paragraph position="0"> It is important to reduce the burden of transfer to realize multilingual MT. In this sense, MT using a controlled language, such as the KANT system (Mitamura et al., 1991), has similar principles to our approach. We believe that multilingual MT systems should not place the obligation of transferring the target language on the transfer module. Difficult or ambiguous input should be checked in document translations, while it should somehow be resolved before the transfer module in speech translation, since real-time dialog conversation is a requirement. null Although we cannot find an MT model where an interactive (that is, feedback) approach between the two sub-modules is implemented, several types of interactive models have been discussed in natural language generation systems.</Paragraph>
    <Paragraph position="1"> In the Igen system (Rubinoff, 1992), which has a similar interactive operation, the Formulator module provides feedback to the Conceptualizer module with information on how much of the content can been covered by a particular word choice. The Conceptualizer can then determine which choice satisfies its secondary goals with these annotations.</Paragraph>
    <Paragraph position="2"> Two similar works paraphrase source input for MT. One is the work of Shirai et al. (1993), where they proposed a pre-editing approach for a Japanese-English MT system ALT-J/E. The other is the work of Yoshimi and Sata (1999), where they presented an approach to rewriting English newspaper headlines for the English-Japanese MT system Power E/J. The significant difference between their approaches and ours is the model design, i.e., whether the paraphraser and the transfer are sequential or integrated. Moreover, the purposes of paraphrasing are also different: in the pre-editing system it is for expediting the transfer and in the newspaper headline system it is for reducing peculiarities in the headline; on the other hand, our paraphraser's purpose is to support poor transfer knowledge.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML