File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-0318_metho.xml
Size: 19,887 bytes
Last Modified: 2025-10-06 14:08:19
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-0318"> <Title>Input Sentence Splitting and Translating</Title> <Section position="3" start_page="3" end_page="3" type="metho"> <SectionTitle> 2 Proposed Split-and-Translate: Method- T </SectionTitle> <Paragraph position="0"> An MT system sometimes fails to translate an input, for example, due to failure in parsing a sentence or retrieving examples. Such a failure occurs particularly when an input is longer. In such a case, by splitting the input, translation may be successfully performed for each portion. Therefore, one idea is to arrange the translations of split portions in the same order as in the source sentence and to consider the arrangement as a translation of the entire input sentence. Particularly in a dialogue, sentences tend not to have complicated nested structures, and many long sentences can be split into mutually independent portions. Therefore, if splitting positions and translations of split portions are adequate, the possibility that this simple arrangement of the translations can provide an adequate translation of the complete input is relatively high.</Paragraph> <Paragraph position="1"> In the example below, a Japanese sentence (1-j) has potentially adequate splitting positions such as (1-j').</Paragraph> <Paragraph position="2"> The arrangement of the English translations of the portions (1-e) is an adequate translation.</Paragraph> <Paragraph position="3"> (1-j) sou desu ka ee kekkou desu jaa tsuin de o negai shi masu (1-j') sou desu ka |ee |kekkou desu |jaa |tsuin de o negai shi masu (1-e) i see |yes |that's fine |then |a twin please</Paragraph> <Section position="1" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 2.1 Criteria </SectionTitle> <Paragraph position="0"> When you split-and-translate a sentence, some portions can be translated while others cannot. We call the count of words in the portions that cannot be translated the fault-length. It is natural to consider (X) as a criterion to judge the goodness of split-and-translate results.</Paragraph> <Paragraph position="1"> (X) The smaller the fault-length is, the better the result is.</Paragraph> <Paragraph position="2"> Let the term partial-translation be the translation of a portion that can be translated. In a split-and-translate result, there can be some partial-translations. Partial-translation-count expresses the number of partialtranslations. (Y) is also a natural criterion to judge the goodness of a split-and-translate result.</Paragraph> <Paragraph position="3"> (Y) The smaller the partial-translation-count is, the better the result is.</Paragraph> <Paragraph position="4"> Many current MT methods produce not only target sentences but also scores. The meaning of a score, depend- null Its English translation is &quot;I see. Then, fine, we'll take a twin.&quot; in a corpus ing on the translation method, can be parsing cost, distance between sentences, word correspondence probability, or other meanings or combinations of the above. If there is a correlation between the score and the translation quality, we can make use of this score as a confidence factor of translation. We can use the confidence factor as another criterion for split-and-translate results. In order to ensure reliability for the complete result of split-and-translate procedures from confidence factors, the scores of all partial-translations are combined. We call this combined score the combined-reliability. How to combine scores depends on the mathematical characteristics of the scores. Therefore the third criterion (Z) is added.</Paragraph> <Paragraph position="5"> (Z) The higher the combined-reliability is, the better the result is.</Paragraph> <Paragraph position="6"> From the above considerations, the proposed method utilizes these criteria to judge the goodness of split-and-translate results with the priority as follows. 1. The smaller the fault-length is, the better the result is.</Paragraph> <Paragraph position="7"> 2. Unless judged with criterion-1, the smaller the partial-translation-count is, the better the result is. 3. Unless judged with criterion-1 or criterion-2, the higher the combined-reliability is, the better the result is.</Paragraph> <Paragraph position="8"> The case where translation can be performed without splitting meets these criteria. In this case, the fault-length is 0, the partial-translation-count is 1, and the combined-reliability equals the score of the complete translation that must be utilized by the MT system; therefore, this result is the best.</Paragraph> <Paragraph position="9"> Criterion-3 has a low priority. Unless an MT system has a confidence factor, only criteria-1-2 are used.</Paragraph> <Paragraph position="10"> These three criteria are based on the output of an MT system, that is, how well the MT system can translate portions. Split portions are translated, and the partial-translation results are evaluated to select the best split positions (the algorithm is discussed in section 6). As the proposed split-and-translate method is based on these criteria only, this method assumes no parsing process and depends on neither a particular language nor a particular MT method.</Paragraph> </Section> <Section position="2" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 2.2 Example </SectionTitle> <Paragraph position="0"> Below, we show an example of selecting a result from candidates based on criteria-1-2.</Paragraph> <Paragraph position="1"> (2-j) hai wakari mashi ta sore to ne choushoku na n desu kedomo dou nat teru n deshou ka (2-j') hai |wakari mashi ta |sore to ne |choushoku na n desu kedomo |dou nat teru n deshou ka For a Japanese input (2-j), there are many candidates of splitting points such as (2-j'). We consider three splittings: (2-a), (2-b) and (2-c).</Paragraph> <Paragraph position="2"> (2-a) hai wakari mashi ta |sore to ne choushoku na n desu kedomo dou nat teru n deshou ka (2-b) hai wakari mashi ta sore to ne choushoku na n desu kedomo |dou nat teru n deshou ka (2-c) hai wakari mashi ta |sore to ne choushoku na n desu kedomo |dou nat teru n deshou ka Suppose the partial translations corresponding to these candidates are as follows, where fault-lengths and partial-translation-counts are calculated.</Paragraph> <Paragraph position="3"> (2-a') hai wakari mashi ta => yes i see sore to ne cyoushoku na n desu kedomo dou nat teru</Paragraph> <Paragraph position="5"> hai wakari mashi ta => yes i see sore to ne choushoku na n desu kedomo => and i breakfast dou nat teru n deshou ka => what happened to it fault-length = 0 partial-translation-count = 3 (2-a) and (2-c) are better than (2-b) based on criterion-1, and (2-a) is better than (2-c) based on criterion-2, so the rank is (2-a), (2-c), (2-b).</Paragraph> </Section> </Section> <Section position="4" start_page="3" end_page="3" type="metho"> <SectionTitle> 3 Pre-Process-Splitting for Method-N </SectionTitle> <Paragraph position="0"> For splitting input sentences as a pre-process of MT systems, we consider a previous study of pre-processsplitting. Many pre-process-splitting methods are based on word-sequence characteristics. Among them, we use the method of Takezawa (1999), a pre-process-splitting based on the N-gram of part-of-speech subcategories.</Paragraph> <Paragraph position="1"> Its English translation is &quot;I see. And also how about breakfast?&quot; in a corpus This method is derived from that of Lavie (1996) and modified especially for Japanese.</Paragraph> <Paragraph position="2"> The function of this method is to infer where splitting positions are. Splitting positions are defined as positions at which we can put periods. For each position, to calculate the plausibility that the position is a splitting position, we consider the previous two words and the following one word, three words in total. Part-of-speech and conjugation-type are considered as word characteristics. When the plausibility is higher than a given threshold, the position is regarded as a splitting position. The threshold is manually selected to tune the performance for a training set. Equation [1] shows how to calcu- null indicates a boundary of sentences, and C(N-gram) means the appearance count of the N-gram in a training set.</Paragraph> <Paragraph position="3"> It has also been reported that, for Japanese, three heuristics for Japanese part-of-speech and conjugation-type improve the performance. The heuristics indicate that the positions before and after particular part-of-speeches with particular conjugation types must or must not be splitting positions.</Paragraph> </Section> <Section position="5" start_page="3" end_page="15" type="metho"> <SectionTitle> 4 Applying Split-and-Translate to MT </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="3" end_page="3" type="sub_section"> <SectionTitle> Systems </SectionTitle> <Paragraph position="0"> We apply the two split-and-translate methods to an MT system, D . To apply method-N to an MT system is straightforward. When applying method-T, we consider the confidence factor of the MT system for criterion-3, rather as an optional criterion.</Paragraph> </Section> <Section position="2" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 4.1 D </SectionTitle> <Paragraph position="0"> (Sumita, 2001) is an EBMT whose language resources are [i] a bilingual corpus, in which sentences are aligned beforehand; [ii] a bilingual dictionary, which is used for word alignment and generating target sentences; and [iii] thesauri of both languages, which are used for aiding word alignment and incorporating the semantic distance between words into the word sequence distance.</Paragraph> <Paragraph position="1"> D retrieves the most similar source sentence of examples from a bilingual corpus. For this purpose, DPmatching is used, which tells us the distance between word sequences, dist, while giving us the matched portions between the input and the example. dist is calculated as equation [2]. The counts of Insertion (I), Deletion (D), and substitution operations are summed. Then, this total is normalized by the sum of the lengths of the source and example sequences. Substitution is considered the semantic distance between two substituted words, or SEMDIST, which is defined using a thesaurus and ranges from 0 to 1.</Paragraph> <Paragraph position="2"> 's result is pretty good when a similar example is retrieved, but very bad otherwise. Therefore, we usually decide a threshold. If there is no example whose dist is within the given threshold, we must give up performing translation.</Paragraph> <Paragraph position="3"> In an experiment using Basic Travel Expression Corpus (BTEC, described as BE-corpus in Takezawa, 2002), D 's translation quality is very high. The experiment also shows a clear correlation between dist and the quality of translation. In other words, the accuracy decreases as the dist increases. In particular, the longer input sentences are, the more difficult for D to find examples with a small dist.</Paragraph> </Section> <Section position="3" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 4.2 Applying Method-T to D </SectionTitle> <Paragraph position="0"> As there is a correlation between dist and the translation quality, we can make use of dist as a confidence factor.</Paragraph> <Paragraph position="1"> To make the combined-reliability, each partial translation is weighted with its source word's number. That is, for each partial translation, its dist is multiplied by its source portion's length, and the resulting values are summed.</Paragraph> <Paragraph position="2"> in Japanese-to-English translation. We used a Japanese-and-English bilingual corpus, BTEC as the training set for D and the Japanese part of BTEC as that for pre-process-splitting method for method-N. BTEC is a collection of Japanese sentences and their English translations usually found in phrase-books for foreign tourists. The statistics of the corpus is shown in , the threshold for dist is 1/3.</Paragraph> <Paragraph position="3"> For the pre-process-splitting method of method-N, the combinations of the parameters were used: 1) whether the heuristics for Japanese are used or not; 2) the threshold of splitting plausibility. The best results were selected from among the combinations in subsections 5.3 and 5.5.</Paragraph> <Paragraph position="4"> The target is Japanese-to-English translation in this experiment. We extracted a test set from Bilingual Travel Conversation Corpus of Spoken Language (TC-corpus, Takezawa, 2002). All of the contents of TC-corpus are transcriptions of spoken dialogues between Japanese and English speakers through human interpreters. The test set of this experiment is 330 Japanese sentences from TC-corpus including no sentences spoken by the interpreters. The average length of the sentences in the test set is 11.4 (words). Therefore, the test sentences used in this experiment are much longer than the sentences in the training set, BTEC.</Paragraph> <Paragraph position="5"> In this experiment, each translation result is graded into one of four ranks (described below) by a bilingual human translator who is a native speaker of the target language, American English: (A) Perfect: no problem in either information or grammar; (B) Fair: easy-to-understand with some unimportant information missing or flawed grammar; (C) Acceptable: broken but understandable with effort; (D) Nonsense: important information has been translated incorrectly (Sumita, 1999).</Paragraph> <Paragraph position="6"> Adding to the four ranks, we use FAIL, or F, to indicate that there is no output sentence.</Paragraph> </Section> <Section position="4" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 5.2 Translation without Splitting </SectionTitle> <Paragraph position="0"> Translations of the test set by D without splitting were performed. The coverage of the output is lower. For 127 sentences, D cannot yield results. The average length of the 127 sentences is 15.6. Afterward, we used these 127 sentences as the test set for split-and-translate methods.</Paragraph> </Section> <Section position="5" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 5.3 Pre-Process-Splitting Quality </SectionTitle> <Paragraph position="0"> Before evaluating translation qualities of split-and-translate methods, we calculated the quality of the pre-process-splitting method of method-N on the 127 sentences. The positions where periods were manually inserted were regarded as the correct splitting positions. In the manual splitting process, they put periods at positions considered both grammatically and semantically adequate. There were 60 splitting positions, and 79 sentences, accounting for 62% of the 127 sentences, had no splitting position. Table 2 shows the numbers of sentences corresponding to those of splitting positions in a sentence.</Paragraph> <Paragraph position="1"> Table 2. Number of splitting positions in a sentence vs. total number of sentences # of split positions 0 1 2 3 # of sentences 79 37 10 1 The evaluation measure is based on how closely the result of the method corresponds to the correct solution, that is, recall and precision. We got a good result. The count of inferred positions is 65 in total, in which 55 positions are correct and 10 are incorrect, that is, recall is 91.7% and precision is 84.6%.</Paragraph> <Paragraph position="2"> We also conducted an experiment on method-T as a method for only splitting sentences, extracting partial-translation boundaries. The result was bad: The count of inferred positions is 277 in total, in which 28 positions are correct and 249 are incorrect, that is, recall is 46.7% and precision is 10.1%. Although a smaller number of splittings is preferred with method-T, when most of the translations of long portions fail, method-T results in over-splitting.</Paragraph> <Paragraph position="3"> The results show that the performance of method-N is much better than that of method-T when the target is only to split sentences.</Paragraph> </Section> <Section position="6" start_page="3" end_page="15" type="sub_section"> <SectionTitle> 5.4 Translation Quality of Method-T </SectionTitle> <Paragraph position="0"> Applying method-T to D , we performed translations of the 127 sentences by D . Table 3 shows the results, the number of each evaluation rank and the rate of the total number for each rank and better ranks than itself. As shown in the table, the rate of output is 100%, and the rate of success, which means that the rank is A, B or C, is 42.5%.</Paragraph> <Paragraph position="1"> There are correlations between quality ranks and fault-length or partial-translation-count. When the ratio of the fault-length to the entire input length is greater than 40% or the partial-translation-count is greater than 4, no result is successful.</Paragraph> <Paragraph position="2"> Furthermore, we can observe a correlation between success rate and dist-in-splitting in Figure 1. dist-in-splitting is defined by equation [4], an extension of dist, and ranges from 0 to 1. These correlations can give us a confidence factor on split-and-translate results. The condition that is good for sentence splitting quality is not good for split-and-translate quality. On the condition of the parameters that gave the recall of 91.7% and the precision of 84.6%, the rate of output was 41.7% and that of success 6.3%. According to the correct splitting solution, among the 127 sentences that D fails to translate without splitting, 79 sentences have no splitting position. Therefore, a good splitting for recall and precision has low probabilities for the rate of output and that of success. Put simply, when the threshold is smaller, although precision is worse, the rate of output and that of success are larger. However, the rates are much lower than those of method-T's results.</Paragraph> </Section> <Section position="7" start_page="15" end_page="15" type="sub_section"> <SectionTitle> 5.6 Summary of Experiments </SectionTitle> <Paragraph position="0"> Table 5 shows the summary of experiments. Though method-N is better in sentence splitting quality, method-T is better in split-and-translate quality.</Paragraph> </Section> </Section> <Section position="6" start_page="15" end_page="15" type="metho"> <SectionTitle> 6 Concluding Remarks </SectionTitle> <Paragraph position="0"> We have proposed a split-and-translate method and shown its effect through experiments. However, much more work remains to be accomplished.</Paragraph> <Paragraph position="1"> To Improve Accuracy The proposed method is based on three criteria. Although we have shown one combination of the criteria, there may be better combinations. Another possibility might be to integrate our method with another pre-process-splitting method, for example, by giving higher priorities to splitting positions as the latter method implies, which can be also used to improve the efficiency discussed below.</Paragraph> <Paragraph position="2"> For Efficiency Let N be the length of an input sentence, a naive implementation must search the solution in 2 N-1 combinations, while trying (N+1)N/2 kinds of partial translations. However, there are several ways to optimize the algorithm. For example, it can be regarded as a shortest path problem, where each portion is an arc and portions without translations have high costs. There are effective algorisms for a shortest path problem. In addition, when the quality of translation has correlations with faultlength, partial-translation-count, and dist-in-splitting, as observed in subsection 5.4, candidates can be pruned by placing constraints on these factors.</Paragraph> </Section> class="xml-element"></Paper>