File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-1070_metho.xml
Size: 23,051 bytes
Last Modified: 2025-10-06 14:14:54
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1070"> <Title>Splitting Long or Ill-formed Input for Robust Spoken-language Translation</Title> <Section position="4" start_page="421" end_page="422" type="metho"> <SectionTitle> 2 Translation strategy of TDMT </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="421" end_page="421" type="sub_section"> <SectionTitle> 2.1 Transfer knowledge </SectionTitle> <Paragraph position="0"> TDMT produces a translation result by mimicking the example judged most semantically similar to the input string, based on the idea of Example-Based MT. Since it is difficult to store enough example sentences to translate every input, TDMT performs the translation by combining the examples of the partial expressions, which are represented by transfer knowledge patterns. Transfer knowledge in TDMT is compiled from translation examples. The following EJ transfer knowledge expression indicates that the English pattern &quot;X at Y&quot; corresponds to several possible Japanese expressions:</Paragraph> <Paragraph position="2"> The first possible translation pattern is &quot;V de X&quot;, with example set ((present, conference)..).</Paragraph> <Paragraph position="3"> We will see that this pattern is likely to be selected to the extent that the input variable bindings are semantically similar to the sample bindings, where X =&quot;present&quot; and Y =&quot;conference&quot;. X' is the transfer result of X.</Paragraph> <Paragraph position="4"> The source expression of the transfer knowledge is expressed by a constituent boundary pattern, which is defined as a sequence that consists of variables and symbols representing constituent boundaries (Furuse, 1994). A variable corresponds to some linguistic constituent.</Paragraph> <Paragraph position="5"> A constituent boundary is expressed by either a functional word or a part-of-speech bigram marker. In the case that there is no functional surface word that divides the expression into two constituents, a part-of-speech bigram is employed as a boundary marker, which is expressed by hyphenating the parts-of-speech of a left-constituent's last word and that of a rightconstituent's first word.</Paragraph> <Paragraph position="6"> For instance, the expression &quot;go to Kyoto&quot; is divided into two constituents, &quot;go&quot; and &quot;Kyoto'. The preposition &quot;to&quot; can be identified as a constituent boundary. Therefore, in parsing &quot;go to Kyoto&quot;, we use the pattern &quot;X to Y&quot;.</Paragraph> <Paragraph position="7"> The expression &quot;I go&quot; can be divided into two constituents &quot;f' and &quot;go&quot;, which are a pronoun and a verb, respectively. Since there is no functional surface word between the two constituents, pronoun-verb can be inserted as a boundary marker into &quot;I go&quot;, giving &quot;I pronoun-verb go&quot;, which will now match the general transfer knowledge pattern &quot;X pronoun-verb Y'.</Paragraph> </Section> <Section position="2" start_page="421" end_page="422" type="sub_section"> <SectionTitle> 2.2 Left-to-right parsing </SectionTitle> <Paragraph position="0"> In TDMT, possible source language structures are derived by applying the constituent boundary patterns of transfer knowledge source parts to an input string in a left-to-right fashion (Furuse, 1996), based on a chart parsing method.</Paragraph> <Paragraph position="1"> An input string is parsed by combining active and passive arcs shifting the processed string left-to-right. In order to limit the combinations of patterns during pattern application, each pattern is assigned its linguistic level, and for each linguistic level, we specify the linguistic sublevels permitted to be used in the assigned variables.</Paragraph> <Paragraph position="3"> sive arc and each active arc in &quot;I go to Kyoto&quot;.</Paragraph> <Paragraph position="4"> A processed string is indicated by &quot;~&quot;. A passive arc is created from a content word shown in (a), or from a combination of patterns for which all of the variables are instantiated, like (c), (e), and (f). An active arc, which corresponds to an incomplete substructure, is created from a combination of patterns some of which have uninstantiated variables as right-hand neighbors to the processed string, like (b) and (d).</Paragraph> <Paragraph position="5"> If the processed string creates a passive arc for a substring and the passive arc satisfies the left-most part of an uninstantiated variable in the pattern of active arcs for the left-neighboring substring, the variable is instantiated with the passive arc. Suppose that the processed string is &quot;Kyoto&quot; in &quot;I go to Kyoto&quot;. The passive arc (e) is created, and it instantiates Y of the active arc (b). Thus, by combining (b) and (e), the structure of &quot;I go to Kyoto&quot; is composed like (f). If a passive arc is generated in such operation, the creation of a new arc by variable instantiation is repeated. If a new arc can no longer be created, the processed string is shifted to the right-neighboring string. If the whole input string can be covered with a passive arc, the parsing will succeed.</Paragraph> </Section> <Section position="3" start_page="422" end_page="422" type="sub_section"> <SectionTitle> 2.3 Disambiguation </SectionTitle> <Paragraph position="0"> The left-to-right parsing determines the best structure and best transferred result locally by performing structural disambiguation using semantic distance calculations, in parallel with the derivation of possible structures (Furuse, 1996). The best structure is determined when a relative passive arc is created. Only the best substructure is retained and combined with other arcs. The best structure is selected by computing the total sum of all the possible combinations of the partial semantic distance values. The structure with the smallest total distance is chosen as the best structure.</Paragraph> <Paragraph position="1"> The semantic distance is calculated according to the relationship of the positions of the words' semantic attributes in the thesaurus (Sumita, 1992).</Paragraph> </Section> </Section> <Section position="5" start_page="422" end_page="426" type="metho"> <SectionTitle> 3 Splitting strategy </SectionTitle> <Paragraph position="0"> If the parsing of long or ill-formed input is only undertaken by the application of stored patterns, it often fails and generates no results.</Paragraph> <Paragraph position="1"> Our strategy to parse such input, is to split the input into units each of which can be parsed and translated, and is explained as items (A)-(F) in this section.</Paragraph> <Section position="1" start_page="422" end_page="422" type="sub_section"> <SectionTitle> 3.1 Concatenation of neighboring substructures </SectionTitle> <Paragraph position="0"> The splitting is performed during left-to-right parsing as follows: (A) Neighboring passive arcs can create a larger passive arc by concatenating them.</Paragraph> <Paragraph position="1"> (B) A passive arc which concatenates neighboring passive arcs can be further concatenated with the right-neighboring passive arc.</Paragraph> <Paragraph position="2"> These items enable two neighboring substructures to compose a structure even if there is no stored pattern which combines them. Figure 2 shows structure composition from neighboring substructures based on these items, a, ~3, and 7 are structures of neighboring substrings. The triangles express substructures composed only from stored patterns. The boxes express sub-structures produced by concatenating neighboring substructures. ~ is composed from its neighboring substructures, i.e., a and 8. In addition, e is composed from its neighboring substruc-</Paragraph> </Section> <Section position="2" start_page="422" end_page="423" type="sub_section"> <SectionTitle> 3.2 Splitting input into well-formed </SectionTitle> <Paragraph position="0"> parts and ill-formed parts Item (C) splits input into well-formed parts and ill-formed parts, and enables parsing in such cases where the input is ill-formed or the translation rules are insufficient. The well-formed parts can be applied patterns or they can consist of one content word. The ill-formed parts, which consist of one functional word or one part-of-speech bigram marker, are split from the well-formed parts.</Paragraph> <Paragraph position="1"> (c) In addition to content words, boundary markers, namely, any functional words and inserted part-of-speech bigram markers, also create a passive arc and compose a substructure.</Paragraph> <Paragraph position="2"> (2) &quot;They also have tennis courts too plus a disco&quot; (3) &quot;Four please two children two adults&quot; Suppose that the substrings of utterance (2), &quot;they also have tennis courts too&quot; and &quot;a disco&quot;, can create a passive arc, and that the system has not yet learned a pattern to which preposition &quot;plus&quot; is relevant, such as &quot;X plus Y&quot; or &quot;plus X'.</Paragraph> <Paragraph position="3"> Also, suppose that the substrings of utterance (3), &quot;four please&quot; and &quot;two children two adults&quot;, can create a passive arc, that part-of-speech bigram marker &quot;adverb-numeral' is inserted between these substrings, and that the system does not know pattern &quot;X adverb-numeral Y&quot; to combine a sentence for X and a noun phrase for Y.</Paragraph> <Paragraph position="4"> By item (C), utterances (2) and (3) can be parsed in these situations as shown in Figure 4.</Paragraph> </Section> <Section position="3" start_page="423" end_page="423" type="sub_section"> <SectionTitle> 3.3 Structure preference </SectionTitle> <Paragraph position="0"> Although the splitting strategy improves robustness of the parsing, heavy dependence on the splitting strategy should be avoided. Since a global structure has more syntactic and semantic relations than a set of fragmental expressions, in general, the translation of a global expression tends to be better than the translation of a set of fragmental expressions. Accordingly, the splitting strategy should be used as a backup function.</Paragraph> <Paragraph position="1"> Figure 5 shows three possible structures for &quot;go to Kyoto&quot;. (a) is a structure relevant to pattern &quot;X to Y&quot; at the verb phrase level. In (b), the input string is split into two substrings, &quot;go&quot; and &quot;to Kyoto&quot;. In (c), the input string is split into three substrings, &quot;go&quot;, &quot;to&quot;, and &quot;Kyoto&quot;. The digit described at the vertex of a triangle is the sum of distance values for that strucure.</Paragraph> <Paragraph position="2"> Among these three, (a), which does not use splitting, is the best structure. Item (D) is regulated to give low priority to structures including split substructures.</Paragraph> <Paragraph position="3"> (D) When a structure is composed by splitting, a large distance value is assigned.</Paragraph> <Paragraph position="4"> In the TDMT system, the distance value in each variable varies from 0 to 1. We experimentally assigned the distance value of 5.00 to one application of splitting, and 0.00 to the structure including only one word or one part-of- null speech bigram marker. 2 Suppose that substructures in Figure 5 are assigned the following distance values. The total distance value of (a) is 0.33. The splitting is applied to (b) and (c), once and twice, respectively. Therefore, the total distance value of (b) is 0.00+0.33+5.00x 1=5.33, and that of (c) is 0.00+0.00+0.00+5.00x2=10.00. (a) is selected as the best structure because it gives the smallest total distance value.</Paragraph> </Section> <Section position="4" start_page="423" end_page="424" type="sub_section"> <SectionTitle> 3.4 Translation output </SectionTitle> <Paragraph position="0"> The results gained from a structure corresponding to a passive arc can be transferred and a partial translation result can then be generated.</Paragraph> <Paragraph position="1"> The translation result of a split structure is formed as follows: (E) The complete translation result is formed by concatenating the partial translation results of each split unit.</Paragraph> <Paragraph position="2"> A punctuation mark such as &quot;,&quot; can be inserted between partial translation results to make the complete translation result clear, although we cannot expect punctuation in an input utterance. The EJ translation result of utterance (1) is as follows: certainly sir I for how many people please h~ai , nan-nin ~desuka Strings such as functional words and part-oh speech bigram markers have no target expression, and are transferred as follows: 2These values are tentatively assigned through comparing the splitting performance for some values, and are effective only for the present TDMT system.</Paragraph> <Paragraph position="3"> pression, is transferred to a string as &quot;..&quot;, which means an incomprehensible part.</Paragraph> <Paragraph position="4"> The EJ translation results of utterances (2) and (3) are as follows. &quot;r' denotes a splitting position.</Paragraph> <Paragraph position="5"> they also have tennis courts too I plus la disco douyouni tenisu-kooto ga mata ari-masu, .., disuko four please ladverb-numeral Itwo children two adults The splitting strategy based on items (A)-(F) in Section 3 can be introduced to frameworks such as TDMT, which utilize left-to-right parsing and a score for a substructure. We discuss the effect of splitting by showing experimental results of the TDMT system's JE, E J, Japaneseto-Korean (Jg), and Korean-to-Japanese (gJ) translations. 3 The TDMT system, whose domain is travel conversations, presently can treat multi-lingual translation. The present vocabulary size is about 13,000 words in JE and JK, about 7,000 words in EJ, and about 4,000 words in KJ. The number of training sentences is about 2,900 in JE and EJ, about 1,400 in JK, and about 600 in KJ.</Paragraph> </Section> <Section position="5" start_page="424" end_page="424" type="sub_section"> <SectionTitle> 4.1 Null-output elimination </SectionTitle> <Paragraph position="0"> It is crucial for a machine translation system to output some result even though the input is ill-formed or the translation rules are insufficient.</Paragraph> <Paragraph position="1"> Items (C) and (D) in Section 3, split input into well-formed parts and ill-formed parts so that weU-formed parts can cover the input as widely as possible. Since a content word and a pattern tin the experimental results referred to later in this section, the input does not consist of strings but of correct morpheme sequences. This enables us to focus on the evaluation of our splitting method by excluding cases where the morphological analysis fails.</Paragraph> <Paragraph position="2"> can be assigned some transferred results, some translation result can be produced if the input has at least one well-formed part.</Paragraph> <Paragraph position="3"> Table 1 shows how the splitting improves the translation performance of TDMT. More than 1,000 sentences, i.e., new data for the system, were tested in each kind of translation. There was no null output, and a 100 % output rate in every translation. So, by using the splitting method, the TDMT can eliminate null output unless the morphological analysis gives no result or the input includes no content word. The splitting also improves the parsing success rate and the understandability of the output in every translation.</Paragraph> <Paragraph position="4"> The output rates of the JK and KJ translations were small without splitting because the amount of sample sentences is less than that for the JE and EJ translations. However, the splitting compensated for the shortage of sample sentences and raised the output rate to 100 %.</Paragraph> <Paragraph position="5"> Since Japanese and Korean are linguistically close, the splitting method increases the understandable results for JK and KJ translations more than for JE and EJ translations.</Paragraph> </Section> <Section position="6" start_page="424" end_page="425" type="sub_section"> <SectionTitle> 4.2 Utterance splitting into sentences </SectionTitle> <Paragraph position="0"> In order to gain a good translation result for an utterance including more than one sentence, the utterance should be split into proper sentences. The distance calculation mechanism aims to split an utterance into sentences correctly. null (4) &quot;Yes that will be fine at five o'clock we will remove the bea~' For instance, splitting is necessary to translate utterance (4), which includes more than one sentence. The candidates for (4)'s structure are shown in Figure 6. The total distance value of (a) is 0.00+1.11+5.00x1=6.11, that of (b) is 0.00+0.00+1.11+5.00x2=11.11, and that of (c) is 0.83+0.00+0.42+5.00x2=11.25. As (a) has the smallest total distance, it is chosen as the best structure, and this agrees with our intuition.</Paragraph> <Paragraph position="1"> We have checked the accuracy of utterance splitting by using 277 Japanese utterances and 368 English utterances, all of which included more than one sentence. Table 2 shows the success rates for splitting the utterances into sentences. Although TDMT can also use the pattern &quot;X boundary Y&quot; in which X and Y are at the sentence level to split the utterances, the proposed splitting method increases the success rates for splitting the utterances in both languages. null</Paragraph> </Section> <Section position="7" start_page="425" end_page="425" type="sub_section"> <SectionTitle> 4.3 Translation after speech recognition </SectionTitle> <Paragraph position="0"> Speech recognition sometimes produces inaccurate results from an actual utterance, and erroneous parts often provide ill-formed translation inputs. However, our splitting method can also produce some translation results from such mis-recognized inputs and improve the understandability of the resulting speech-translation.</Paragraph> <Paragraph position="1"> Table 3 shows an example of a JE translation of a recognition result including a substitution error. The underlined words are misrecognized parts. &quot;youi(preparation)&quot; in the utterance is replaced with &quot;yom'(postposition)&quot;.</Paragraph> <Paragraph position="2"> Table 4 shows an example of a JE translation of a recognition result including an insertion error. &quot;wo&quot; has been inserted into the utterance after speech recognition. The translation of the speech recognition result, is the same as that of the utterance except for the addition of &quot;..&quot;; &quot;..&quot; is the translation result for &quot;wo&quot;, which is a postposition mainly signing an object.</Paragraph> <Paragraph position="3"> Table 5 shows an example of the EJ translation of a recognition result including a deletion error. &quot;'s&quot; in the utterance is deleted after speech recognition. In the translation of this result, &quot;..&quot; appears instead of &quot;wa&quot;, which is a postposition signing topic. &quot;..&quot; is the translation for marker &quot;pronoun-adverb&quot;, which has been inserted between &quot;that&quot; and &quot;a//&quot;. The recognition result is split into three parts &quot;yes that&quot;, &quot;pronoun-adverb&quot;, and &quot;all correct&quot;. Although the translations in Tables 3, 4, and 5 might be slightly degraded by the splitting, the meaning of each utterance can be communicated with these translations.</Paragraph> <Paragraph position="4"> We have experimented the effect of splitting on JE speech translation using 47 erroneous recognition results of Japanese utterances. These utterances have been used as example utterances by the TDMT system. Therefore, for utterances correctly recognized, the translations of the recognition results should succeed. The erroneous recognition results were collected from an experimental base using the method of Shimizu (1996).</Paragraph> <Paragraph position="5"> Table 6 shows the numbers of sentences at each level based on the extent that the meaning of an utterance can be understood from the translation result. Without the splitting, only 19.1% of the erroneous recognition results are wholly or partially understandable. The splitting method increases this rate to 57.4%. Failures in spite of the splitting are mainly caused by the misrecognition of key parts such as predicates. null</Paragraph> </Section> <Section position="8" start_page="425" end_page="426" type="sub_section"> <SectionTitle> 4.4 Translation time </SectionTitle> <Paragraph position="0"> Since our splitting method is performed under left-to-right parsing, translation efficiency is not</Paragraph> <Paragraph position="2"> recognition result Yes that all correct Hai sore .. mattaku tadashii desu.</Paragraph> <Paragraph position="3"> a serious problem. We have compared EJ translation times in the TDMT system for two cases.</Paragraph> <Paragraph position="4"> One was without the splitting method, and the other was with it. Table 7 shows the translation time of English sentences with an average input length of 7.1 words, and English utterances consisting of more than one sentence with an average input length of 11.4 words. The translation times of the TDMT system written in LISP, were measured using a Sparcl0 workstation. null The time difference between the two situations is small. This shows that the translation efficiency of TDMT is maintained even if the splitting method is introduced to TDMT.</Paragraph> </Section> </Section> <Section position="6" start_page="426" end_page="426" type="metho"> <SectionTitle> 5 Concluding remarks </SectionTitle> <Paragraph position="0"> We have proposed an input-splitting method for translating spoken-language which includes many long or ill-formed expressions. Experimental results have shown that the proposed method improves TDMT's performance without degrading the translation efficiency. The proposed method is applicable to not only TDMT but also other frameworks that utilize left-to-right parsing and a score for a substructure. One important future research goal is the achievement of a simultaneous interpretation mechanism for application to a practical spoken-language translation system.</Paragraph> <Paragraph position="1"> The left-to-right mechanism should be maintained for that purpose. Our splitting method meets this requirement, and can be applied to multi-lingual translation because of its universal framework.</Paragraph> </Section> class="xml-element"></Paper>