File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/j04-2004_metho.xml
Size: 34,971 bytes
Last Modified: 2025-10-06 14:08:45
<?xml version="1.0" standalone="yes"?> <Paper uid="J04-2004"> <Title>c(c) 2004 Association for Computational Linguistics Machine Translation with Inferred Stochastic Finite-State Transducers</Title> <Section position="3" start_page="206" end_page="207" type="metho"> <SectionTitle> 2. Finite-State Transducers </SectionTitle> <Paragraph position="0"> A finite-state transducer, T , is a tuple <S, [?], Q, q , F,d> , in which S is a finite set of source symbols, [?] is a finite set of target symbols (S [?] [?]=[?]), Q is a finite set of states, q</Paragraph> <Paragraph position="2"> in T associated with the pair (s, t).</Paragraph> <Paragraph position="3"> A rational translation is the set of all translation pairs of some finite-state transducer T . This definition of a finite-state transducer is similar to the definition of a regular or finite-state grammar G. The main difference is that in a finite-state grammar, the set of target symbols [?] does not exist, and the transitions are defined on Q x S x Q.A translation form is the transducer counterpart of a derivation in a finite-state grammar, and the concept of rational translation is reminiscent of the concept of (regular) language, defined as the set of strings associated with the derivations in the grammar G.</Paragraph> <Paragraph position="4"> Rational translations exhibit many properties similar to those shown for regular languages (Berstel 1979). One of these properties can be stated as follows (Berstel 1979): , we denote the set of finite-length strings on [?] and S, respectively.</Paragraph> <Paragraph position="5"> 2 To simplify the notation, we will remove the superscript ph from the components of a translation form if no confusion is induced.</Paragraph> <Paragraph position="6"> Casacuberta and Vidal Translation with Finite-State Transducers As will be discussed later, this theorem directly suggests the transducer inference methods proposed in this article.</Paragraph> </Section> <Section position="4" start_page="207" end_page="207" type="metho"> <SectionTitle> 3. Statistical Translation Using Finite-State Transducers </SectionTitle> <Paragraph position="0"> In the statistical translation framework, the translation of a given source string s in</Paragraph> <Paragraph position="2"> , Q,S, and [?] are as in the definition of a finite-state transducer and p and f are two functions p : Q xSx[?] star</Paragraph> <Paragraph position="4"> (characteristic finite-state transducer). The set of transitions of T is the set of tuples (q, s, t, q</Paragraph> <Paragraph position="6"> with probabilities greater than zero, and the set of final states is the set of states with nonzero final-state probabilities. The probability of a translation pair (s, t) [?] S star</Paragraph> <Paragraph position="8"> is the sum of the probabilities of all the translation forms of (s, t) in T :</Paragraph> <Paragraph position="10"> where the probability of a translation form ph (as defined in equation (1)) is</Paragraph> <Paragraph position="12"> that is, the product of the probabilities of all the transitions involved in ph.</Paragraph> <Paragraph position="13"> We are interested only in transducers without useless states, that is, those in which for every state in T , there is a path leading to a final state. If we further assume that</Paragraph> <Paragraph position="15"> star which will be called the stochastic translation defined by T</Paragraph> <Paragraph position="17"> Finally, the translation of a source string s [?] S star by a stochastic finite-state trans-</Paragraph> <Paragraph position="19"/> </Section> <Section position="5" start_page="207" end_page="216" type="metho"> <SectionTitle> 3 For the sake of simplicity, we will denote Pr(X = x) as Pr(x) and Pr(X = x |Y = y) as Pr(x |y). </SectionTitle> <Paragraph position="0"> 4 This concept is similar to the stochastic regular language for a stochastic regular grammar. In that case, the probability distribution is defined on the set of finite-length strings rather than on the set of pairs of strings.</Paragraph> <Paragraph position="1"> Computational Linguistics Volume 30, Number 2 A stochastic finite-state transducer has stochastic source and target regular languages</Paragraph> <Paragraph position="3"> In practice, these source or target regular languages are obtained, by dropping the target or the source symbols, respectively, from each transition of the finite-state transducer. null The following theorem naturally extends Theorem 1 to the stochastic framework (Casacuberta, Vidal, and Pic'o 2004):</Paragraph> <Paragraph position="5"/> <Section position="1" start_page="208" end_page="210" type="sub_section"> <SectionTitle> 3.1 Search with Stochastic Finite-State Transducers </SectionTitle> <Paragraph position="0"> The search for an optimal</Paragraph> <Paragraph position="2"> problem (Casacuberta and de la Higuera 2000). In practice, an approximate solution can be obtained (Casacuberta 2000) on the basis of the following approximation to the probability of a translation pair (Viterbi score of a translation):</Paragraph> <Paragraph position="4"> This computation can be carried out efficiently (Casacuberta 1996) by solving the following recurrence by means of dynamic programming:</Paragraph> <Paragraph position="6"> Finally, the approximate translation ~ t is obtained as the concatenation of the target strings associated with the translation form</Paragraph> <Paragraph position="8"> corresponding to the optimal sequence of states involved in the solution to Equation (8); that is,</Paragraph> <Paragraph position="10"/> <Paragraph position="12"> of the pair una camera doppia/a double room is (1.0 * 0.3 * 1.0)+(1.0 * 0.3 * 1.0)=0.6. This is greater than the probability</Paragraph> <Paragraph position="14"> of the pair una camera doppia/a room with two beds,1.0 * 0.4 * 1.0 = 0.4. However, the Viterbi score V</Paragraph> <Paragraph position="16"> for the first pair is 1.0 * 0.3 * 1.0 = 0.3, which is lower than the Viterbi score V</Paragraph> <Paragraph position="18"> the second pair, 1.0 * 0.4 * 1.0 = 0.4. Therefore this second pair will be the approximate result given by equation (7).</Paragraph> <Paragraph position="19"> The computational cost of the iterative version of this algorithm is O(|s|* |Q|* B), where B is the (average) branching factor of the finite-state transducer.</Paragraph> <Paragraph position="20"> Figure 1 shows a simple example in which Viterbi score maximization (7) leads to a suboptimal result.</Paragraph> <Paragraph position="21"> 4. A Method for Inferring Finite-State Transducers Theorems 1 and 2 establish that any (stochastic) rational translation T can be obtained as a homomorphic image of certain (stochastic) regular language L over an adequate alphabet G. The proofs of these theorems are constructive (Berstel 1979; Casacuberta, Vidal, and Pic 'o 2004) and are based on building a (stochastic) finite-state transducer T for T by applying certain morphisms h S and h [?] to the symbols of G that are associated with the rules of a (stochastic) regular grammar that generates L. This suggests the following general technique for learning a stochastic finite-state transducer, given a finite sample A of string pairs (s, t) [?] S This technique, which is very similar to that proposed in Garc'ia, Vidal, and Casacuberta (1987) for the inference of regular grammars, is illustrated in Figure 2. The first transformation is modeled by the labeling function L : S (z)).</Paragraph> <Paragraph position="22"> Without loss of generality, we assume that the method used in the second step of the proposed method consists of the inference of n-grams (Ney, Martin, and Wessel 1997) with final states, which are particular cases of stochastic regular grammars. This simple method automatically derives, from the strings in S, both the structure of G (i.e., the rules--states and transitions) and the associated probabilities. Since L is typically the inverse of L, the morphisms h S and h [?] needed in the third step of the proposed approach are determined by the definition of L.Soakey Computational Linguistics Volume 30, Number 2 Figure 2 Basic scheme for the inference of finite-state transducers. A is a finite sample of training pairs. S is the finite sample of strings obtained from A using L. G is a grammar inferred from S such that S is a subset of the language, L(G), generated by the grammar G. T is a finite-state transducer whose translation (T(T )) includes the training sample A. point in this approach is its first step, that is, how to conveniently transform a parallel corpus into a string corpus. In general, there are many possible transformations, but if the source-target correspondences are complex, the design of an adequate transformation can become difficult. As a general rule, the labeling process must capture these source-target word correspondences and must allow for a simple implementation of the inverse labeling needed in the third step.</Paragraph> <Paragraph position="23"> A very preliminary, nonstochastic version of this finite-state transducer inference technique was presented in Vidal, Garc'ia, and Segarra (1989) An important drawback of that early proposal was that the methods proposed for building the G star sentences from the training pairs did not adequately cope with the dependencies between the words of the source sentences and the words of the corresponding target sentences. In the following section we show how this drawback can be overcome using statistical alignments (Brown et al. 1993).</Paragraph> <Paragraph position="24"> The resulting methodology is called grammatical inference and alignments for transducer inference (GIATI).</Paragraph> <Paragraph position="25"> A related approach was proposed in Bangalore and Ricardi (2000b). In that case, the extended symbols were also built according to previously computed alignments, but the order of target words was not preserved. As a consequence, that approach requires a postprocess to try to restore the target words to a proper order.</Paragraph> </Section> <Section position="2" start_page="210" end_page="211" type="sub_section"> <SectionTitle> 4.1 Statistical Alignments </SectionTitle> <Paragraph position="0"> The statistical translation models introduced by Brown et al. (1993) are based on the concept of alignment between source and target words (statistical alignment models). Formally, an alignment of a translation pair (s, t) [?] S star x [?] star is a function a : {1,...,|t|} - {0,...,|s|}. The particular case a(j)=0 means that the position j in t is not aligned with any position in s. All the possible alignments between t and s are denoted by A(s, t), and the probability of translating a given s into t by an alignment a is Pr(t, a |s).</Paragraph> <Paragraph position="1"> Thus, an optimal alignment between s and t can be computed as</Paragraph> <Paragraph position="3"> Casacuberta and Vidal Translation with Finite-State Transducers Different approaches for estimating Pr(t, a |s) were proposed in Brown et al. (1993). These approaches are known as models 1 through 5. Adequate software packages are publicly available for training these statistical models and for obtaining good alignments between pairs of sentences (Al-Onaizan et al. 1999; Och and Ney 2000). An example of Spanish-English sentence alignment is given below: Example 1 ?Cu'anto cuesta una habitaci'on individual por semana ? how (2) much (2) does (3) a (4) single (6) room (5) cost (3) per (7) week (8) ? (9) Each number within parentheses in the example represents the position in the source sentence that is aligned with the (position of the) preceding target word. A graphical representation of this alignment is shown in Figure 3.</Paragraph> </Section> <Section position="3" start_page="211" end_page="213" type="sub_section"> <SectionTitle> 4.2 First Step of the GIATI Methodology: Transformation of Training Pairs into Strings </SectionTitle> <Paragraph position="0"> The first step of the proposed method consists in a labeling process (L) that builds a string of certain extended symbols from each training string pair and its corresponding statistical alignment. The main idea is to assign each word from t to the corresponding word from s given by the alignment a. But sometimes this assignment produces a violation of the sequential order of the words in t. To illustrate the GIATI methodology we will use example 2: Figure 3 Graphical representation of the alignment between a source (Spanish) sentence (?Cu'anto cuesta una habitaci 'on individual por semana ?) and a target (English) sentence (How much does a single room cost per week ?). Note the correspondence between the Spanish cuesta and the English does and cost. Note also that the model does not allow for alignments between sets of two or more source words and one target word.</Paragraph> <Paragraph position="1"> In the first pair of this example, the English word double could be assigned to the third Italian word (doppia) and the English word room to the second Italian word (camera). This would imply a &quot;reordering&quot; of the words double and room, which is not appropriate in our finite-state framework.</Paragraph> <Paragraph position="2"> Given s, t, and a (source and target strings and associated alignment, respectively), the proposed transformation z = L</Paragraph> <Paragraph position="4"> Each word from t is joined with the corresponding word from s given by the alignment a if the target word order is not violated. Otherwise, the target word is joined with the first source word that does not violate the target word order.</Paragraph> <Paragraph position="5"> The application of L to example 2 generates the following strings of extended symbols: (una , a) (camera , l) (doppia , double room) (una , a) (camera , room) (la , the) (camera , l) (singola , single room) (la , the) (camera , room) As a more complicated example, the application of this transformation to example 1 generates the following string: (? , l) (Cu'anto , how much) (cuesta , does) (una , a) (habitaci'on , l) (individual , single room cost) (por , per) (semana , week) (? , ?) In this case the unaligned token ? has an associated empty target string, and the target word cost, which is aligned with the source word cuesta, is associated with the nearby source word individual. This avoids a &quot;reordering&quot; of the target string and entails an (apparently) lower degree of nonmonotonicity. This is achieved, however, at the expense of letting the method generalize from word associations which can be considered improper from a linguistic point of view (e.g., (cuesta, does), (individual, single Casacuberta and Vidal Translation with Finite-State Transducers room cost)). While this would certainly be problematic for general language translation, it proves not to be so harmful when the sentences to be translated come from limited-domain languages.</Paragraph> <Paragraph position="6"> Obviously, other transformations are possible. For example, after the application of the above procedure, successive isolated source words (without any target word) can be joined to the first extended word which has target word(s) assigned. Let z = L</Paragraph> <Paragraph position="8"> be a transformed string obtained from the above procedure and let</Paragraph> <Paragraph position="10"> to example 2 leads to (una , a) (camera doppia , double room) (una , a) (camera , room) (la , the) (camera singola , single room) (la , the) (camera , room) Although many other sophisticated transformations can be defined following the above ideas, only the simple L will be used in the experiments reported in this article.</Paragraph> </Section> <Section position="4" start_page="213" end_page="214" type="sub_section"> <SectionTitle> 4.3 Second Step of the GIATI Methodology: Inferring a Stochastic Regular Grammar </SectionTitle> <Paragraph position="0"> from a Set of Strings Many grammatical inference techniques are available to implement the second step of the proposed procedure. In this work, (smoothed) n-grams are used. These models have proven quite successful in many areas such as language modeling (Clarkson and Rosenfeld 1997; Ney, Martin, and Wessel 1997).</Paragraph> <Paragraph position="1"> Figures 4 and 5 show the (nonsmoothed) bigram models inferred from the sample , respectively, in example 2. Note that the generalization achieved by the first model is greater than that of the second. The probabilities of the n-grams are computed from the corresponding counts in the training set of extended strings. The probability of an extended word z</Paragraph> <Paragraph position="3"> given the sequence of extended words z</Paragraph> <Paragraph position="5"> in example 2.</Paragraph> <Paragraph position="6"> in example 2. is estimated as</Paragraph> <Paragraph position="8"> where c(*) is the number of times that an event occurs in the training set. To deal with unseen n-grams, the back-off smoothing technique from the CMU Statistical Language Modeling (SLM) Toolkit (Rosenfeld 1995) has been used.</Paragraph> <Paragraph position="9"> The (smoothed) n-gram model obtained from the set of extended symbols is represented as a stochastic finite-state automaton (Llorens, Vilar, and Casacuberta 2002). The states of the automaton are the observed (n [?] 1)-grams. For the n-gram (z</Paragraph> <Paragraph position="11"> ). The back-off smoothing method supplied by the SLM Toolkit is represented by the states corresponding to k-grams (k < n) and by special transitions between k-gram states and (k [?] 1)-gram states (Llorens, Vilar, and Casacuberta 2002). The final-state probability is computed as the probability of a transition with an end-of-sentence mark.</Paragraph> </Section> <Section position="5" start_page="214" end_page="216" type="sub_section"> <SectionTitle> 4.4 Third Step of the GIATI Methodology: Transforming a Stochastic Regular Gram- </SectionTitle> <Paragraph position="0"> mar into a Stochastic Finite-State Transducer In order to obtain a finite-state transducer from a grammar of L This construction is illustrated in Figures 6 and 7 for the bigrams of Figures 4 and 5, respectively. Note that in the second case, this construction entails the trivial addition of a few states which did not exist in the corresponding bigram. As previously discussed, the first transformation (L ) definitely leads to a greater translation generalization than the second (L ) (Casacuberta, Vidal, and Pic'o 2004). The probabilities associated with A finite-state transducer built from the n-gram of Figure 5. the transitions and the final states of the finite-state transducer are the same as those of the original stochastic regular grammar.</Paragraph> <Paragraph position="1"> Since we are using n-grams in the second step, a transition (q, a, b The transitions associated with back-off are labeled with a special source symbol (not in the source vocabulary) and with an empty target string. The number of states is the overall number of k-grams (k < n) that appear in the training set of extended strings plus one (the unigram state). The number of transitions is the overall number of k-grams (k [?] n) plus the number of states (back-off transitions). The actual number of these k-grams depends on the degree of nonmonotonicity of the original bilingual training corpus. If the corpus if completely monotone, this number would be approximately the same as the number of k-grams in the source or target parts of the training corpus. If the corpus in not monotone, the vocabulary of expanded strings becomes large, and the number of k-grams can be much larger than the number of training source or target k-grams. As a consequence, an interesting property of this type of transformations is that the source and target languages embedded in the final finite-state transducer are more constrained than the corresponding n-gram models obtained from either the source or the target strings, respectively, of the same training pairs (Casacuberta, Vidal, and Pic'o 2004).</Paragraph> <Paragraph position="2"> While n-grams are deterministic (hence nonambiguous) models, the finite-state transducers obtained after the third-step inverse transformations (h ) are often nondeterministic and generally ambiguous; that is, there are source strings which can be parsed through more than one path. This is in fact a fundamental property, directly coming from expression (5) of Theorem 2, on which the whole GIATI approach is essentially based. As a consequence, all the search issues discussed in Section 3.1 do apply to GIATI-learned transducers.</Paragraph> <Paragraph position="3"> Computational Linguistics Volume 30, Number 2</Paragraph> </Section> </Section> <Section position="6" start_page="216" end_page="221" type="metho"> <SectionTitle> 5. Experimental Results </SectionTitle> <Paragraph position="0"> Different translation tasks of different levels of difficulty were selected to assess the capabilities of the proposed inference method in the framework of the EuTrans project (ITI et al. 2000): two Spanish-English tasks (EuTrans-0 and EuTrans-I), an Italian-English task (EuTrans-II) and a Spanish-German task (EuTrans-Ia). The EuTrans-0 task, with a large semi-automatically generated training corpus, was used for studying the convergence of transducer learning algorithms for increasingly large training sets (Amengual et al. 2000). In this article it is used to get an estimation of performance limits of the GIATI technique by assuming an unbounded amount of training data. The EuTrans-I task was similar to EuTrans-0 but with a more realistically sized training corpus. This corpus was defined as a first benchmark in the EuTrans project, and therefore results with other techniques are available. The EuTrans-II task, with a quite small and highly spontaneous natural training set, was a second benchmark of the project. Finally, EuTrans-Ia was similar to EuTrans-I, but with a higher degree of nonmonotonicity between corresponding words in input/output sentence pairs.</Paragraph> <Paragraph position="1"> Tables 1, 4, and 7 show some important features of these corpora. As can be seen in these tables, the training sets of EuTrans-0, EuTrans-I and EuTrans-Ia contain non-negligible amounts of repeated sentence pairs. Most of these repetitions correspond to simple and/or usual sentences such as good morning, thank you, and do you have a single room for tonight. The repetition rate is quite significant for EuTrans-0, but it was explicitly reduced in the more realistic benchmark tasks EuTrans-I and EuTrans-Ia.</Paragraph> <Paragraph position="2"> It is worth noting, however, that no repetitions appear in any of the test sets of these tasks. While repetitions can be helpful for probability estimation, they are completely useless for inducing the transducer structure. Moreover, since no repetitions appear in the test sets, the estimated probabilities will not be as useful as they could be if test data repetitions exhibited the same patterns as those in the corresponding training materials.</Paragraph> <Paragraph position="3"> In all the experiments reported in this article, the approximate optimal translations (equation (7)) of the source test strings were computed and the word error rate (WER), the sentence error rate (SER), and the bilingual evaluation understudy (BLEU) metric for the translations were used as assessment criteria. The WER is the minimum number of substitution, insertion, and deletion operations needed to convert the word string hypothesized by the translation system into a given single reference word string (ITI et al. 2000). The SER is the result of a direct comparison between the hypothesized and reference word strings as a whole. The BLEU metric is based on the n-grams of the hypothesized translation that occur in the reference translations (Papineni et al 2001).</Paragraph> <Paragraph position="4"> The BLEU metric ranges from 0.0 (worst score) to 1.0 (best score).</Paragraph> <Section position="1" start_page="216" end_page="218" type="sub_section"> <SectionTitle> 5.1 The Spanish-English Translation Tasks </SectionTitle> <Paragraph position="0"> A Spanish-English corpus was semi-automatically generated in the first phase of the EuTrans project (Vidal 1997). The domain of the corpus involved typical human-to-human communication situations at a reception desk of a hotel.</Paragraph> <Paragraph position="1"> A summary of this corpus (EuTrans-0) is given in Table 1 (Amengual et al 2000; Casacuberta et al. 2001). From this (large) corpus, a small subset of ten thousand training sentence pairs (EuTrans-I) was randomly selected in order to approach more realistic training conditions (see also Table 1). From these data, completely disjoint training and test sets were defined. It was guaranteed, however, that all the words in the source test sentences were contained in both training sets (closed vocabulary).</Paragraph> <Paragraph position="2"> Results for the EuTrans-0 and EuTrans-I corpora are presented in Tables 2 and 3, respectively. The best results obtained using the proposed technique were 3.1% WER The Spanish-English corpus. There was no overlap between training and test sentences, and the test set did not contain out-of-vocabulary words with respect to any of the training sets.</Paragraph> <Paragraph position="3"> Computational Linguistics Volume 30, Number 2 for EuTrans-0 and 6.6% WER for EuTrans-I. These results were achieved using the statistical alignments provided by model 5 (Brown et al. 1993; Och and Ney 2000) and smoothed 11-grams and 6-grams, respectively.</Paragraph> <Paragraph position="4"> These results were obtained using the first type of transformation described in )produced slightly worse results. However, L is interesting because many of the extended symbols obtained in the experiments involve very good relations between some source word groups and target word groups which could be useful by themselves. Consequently, more research work has to be done with this second type of transformation. The results on the (benchmark) EuTrans-I corpus can be compared with those obtained using other approaches. GIATI outperforms other finite-state techniques in similar experimental conditions (with a best result of 8.3% WER, using another transducer inference technique called OMEGA [ITI et al. 2000]). On the other hand, the best result achieved by the statistical templates technique (Och and Ney 2000) was 4.4% WER (ITI et al. 2000). However, this result cannot be exactly compared with that achieved by GIATI, because the statistical templates approach used an explicit (automatic) categorization of the source and the target words, while only the raw word forms were used in GIATI. Although GIATI is compatible with different forms of word categorization, the required finite-state expansion is not straightforward, and some work is still needed in order to actually allow this technique to be taken advantage of.</Paragraph> </Section> <Section position="2" start_page="218" end_page="220" type="sub_section"> <SectionTitle> 5.2 The Italian-English Task </SectionTitle> <Paragraph position="0"> The Italian-English translation task of the EuTrans project (ITI et al. 2000) consisted of spoken person-to-person telephone communications in the framework of a hotel reception desk. A text corpus was collected with the transcriptions of dialogues of this type, along with the corresponding (human-produced) translations. A summary of the corpus used in the experiments (EuTrans-II) is given in Table 4. There was a small overlap of seven pairs between the training set and the test set, but in this case, the vocabulary was not closed (there were 107 words in the test set that did not exist in the training-set vocabulary). The processing of words out of the vocabulary was very simple in this experiment: If the word started with a capital letter, the translation was the source word; otherwise it was the empty string.</Paragraph> <Paragraph position="1"> The same translation procedure and evaluation criteria used for EuTrans-0 and EuTrans-I were used for EuTrans-II. The results are reported in Table 5.</Paragraph> <Paragraph position="2"> Table 4 The EuTrans-II corpus. There was a small overlap of seven pairs between the training and test sets, but 107 source words in the test set were not in the (training-set-derived) vocabulary.</Paragraph> <Paragraph position="3"> Casacuberta and Vidal Translation with Finite-State Transducers Table 5 Results with the standard EuTrans-II corpus. The underlying regular models were smoothed n-grams (Rosenfeld 1995) for different values of n.</Paragraph> <Paragraph position="4"> Results with the standard EuTrans-II corpus. The underlying regular models were smoothed n-grams (Rosenfeld 1994) for different values of n. The training set was (automatically) segmented using a priori knowledge. The statistical alignments were constrained to be within each parallel segment.</Paragraph> <Paragraph position="5"> This corpus contained many long sentences, most of which were composed of rather short segments connected by punctuation marks. Typically, these segments can be monotonically aligned with corresponding target segments using a simple dynamic programming procedure (prior segmentation) (ITI et al. 2000). We explored computing the statistical alignments within each pair of segments rather than in the entire sentences. Since the segments were shorter than the whole sentences, the alignment probability distributions were better estimated. In the training phase, extended symbols were built from these alignments, and the strings of extended symbols corresponding to the segments of the same original string pair were concatenated. Test sentences were directly used, without any kind of segmentation.</Paragraph> <Paragraph position="6"> The translation results using prior segmentation are reported in Table 6. These results were clearly better than those of the corresponding experiments with nonsegmented training data.</Paragraph> <Paragraph position="7"> The accuracy of GIATI in the EuTrans-II experiments was significantly worse than that achieved in EuTrans-I, and best performance is obtained with a lower-order n-gram. One obvious reason for this behavior is that this corpus is far more spontaneous than the first one, and consequently, it has a much higher degree of variability. Moreover, the training data set is about three times smaller than the corresponding data of EuTrans-I, while the vocabularies are three to four times larger. The best result achieved with the proposed technique on EuTrans-II was 24.9% WER, using prior segmentation of the training pairs and a smoothed bigram model. This result was comparable to the best among all those reported in RWTH Aachen and ITI (1999). The previously mentioned statistical templates technique achieved 25.1% WER in this case. In this application, in which categories are not as important as in EuTrans-I, statistical templates and GIATI achieved similar results.</Paragraph> </Section> <Section position="3" start_page="220" end_page="221" type="sub_section"> <SectionTitle> 5.3 The Spanish-German Task </SectionTitle> <Paragraph position="0"> The Spanish-German translation task is similar to EuTrans-I, but here the target language is German instead of English. It should be noted that Spanish syntax is significantly more different from that of German than it is from that of English, and therefore, the corresponding corpus exhibited a higher degree of nonmonotonicity. The features of this corpus (EuTrans-Ia) are summarized in Table 7. There was no overlap between training and test sets, and the vocabulary was closed.</Paragraph> <Paragraph position="1"> The translation results are reported in Table 8. As expected from the higher degree of nonmonotonicity of the present task, these results were somewhat worse than those achieved with EuTrans-I. This is consistent with the larger number of states and transitions of the EuTrans-Ia models: The higher degree of word reordering of these models is achieved at the expense of a larger number of extended words.</Paragraph> <Paragraph position="2"> The way GIATI transducers cope with these monotonicity differences can be more explicitly illustrated by estimating how many target words are produced after some delay with respect to the source. While directly determining (or even properly defining) the actual production delay for each individual (test) word is not trivial, an approximation can be indirectly derived from the number of target words that are preceded by sequences of l symbols (from target-empty transitions) in the parsing of a source test text with a given transducer. This has been done for the EuTrans-I and EuTrans-Ia test sets with GIATI transducers learned with n = 6. On the average, the EuTrans-I transducer needed to introduce delays ranging from one to five positions for approximately 15% of the English target words produced, while the transducer for EuTrans-Ia had to introduce similar delays for about 20% of the German target words produced.</Paragraph> <Paragraph position="3"> Casacuberta and Vidal Translation with Finite-State Transducers</Paragraph> </Section> <Section position="4" start_page="221" end_page="221" type="sub_section"> <SectionTitle> 5.4 Error Analysis </SectionTitle> <Paragraph position="0"> The errors reported in the previous sections can be attributed to four main factors: 1. Correct translations which differ from the given (single) reference 2. Wrong alignments of training pairs 3. Insufficient or improper generalization of n-gram-based GIATI learning 4. Wrong approximate Viterbi score-based search results An informal inspection of the target sentences produced by GIATI in all the experiments reveals that the first three factors are responsible for the vast majority of errors. Table 9 shows typical examples for the results of the EuTrans-I experiment with 6-gram-based GIATI transducers.</Paragraph> <Paragraph position="1"> The first three examples correspond to correct translations which have been wrongly counted as errors (factor 1). Examples 4 and 5 are probably due to alignment problems (factor 2). In fact, more than half of the errors reported in the EuTrans-I experiments are due to misuse or misplacement of the English word please. Examples 6-8 can also be considered minor errors, probably resulting from factors 2 and 3. Examples 9 and 10 are clear undergeneralization errors (factor 3). These errors could have been easily overcome through an adequate use of bilingual lexical categorization. Examples 11 and 12, finally, are more complex errors that can be attributed to (a combination of) factors 2, 3, and 4.</Paragraph> </Section> </Section> class="xml-element"></Paper>