File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/w99-0602_metho.xml
Size: 33,520 bytes
Last Modified: 2025-10-06 14:15:30
<?xml version="1.0" standalone="yes"?> <Paper uid="W99-0602"> <Title>Text-Translation Alignment: Three Languages Are Better Than Two *</Title> <Section position="2" start_page="0" end_page="2" type="metho"> <SectionTitle> 1 Trilingual Alignments </SectionTitle> <Paragraph position="0"> There are various ways in which the concept of alignment can be formalized. Here, we choose to view alignments as mathematical relations between linguistic entities: Given two texts, A and B, seen as sets of linguistic units: A = {al,a2,...,am} and B = {bl, b2, ...,bn}, we define a binary alignment XAB as a relation on A tj B: XAB={(al,bl),(a2,b2),(a2,b3),...} The interpretation of XAB is: (a, b) belongs to XAB if and only if some translation equivalence exists between a and b, total or partial.</Paragraph> <Paragraph position="1"> This definition of alignment, inspired from Kay and RSscheisen (1993), can be naturally extended to accommodate any number of versions of a text. In general, we will say that, given N versions of a text A1,...,AN, a N-lingual alignment XA~ is a relation on UiN=iAi.</Paragraph> <Paragraph position="2"> Clearly, a N-lingual alignment can be obtained by combining pairwise bilingual alignments. For example, with three texts A, B and C, and three alignments XAB, XBC and XCA, one can easily obtain the trilingual alignment XABC as XAB U XBC U XCA. In fact, in all that follows, we indifferently refer to trilingual alignments as unique relations or as triples of bilingual alignments. Conversely, any smaller-degree alignment can be extracted as a subset of a N-lingual alignment, by projecting the relation onto a given &quot;plane&quot;.</Paragraph> <Paragraph position="3"> Another thing that becomes apparent as soon as more than two languages are involved is that text-translation alignments appear to be equivalence relations, which means that they generally display the properties of reflexivity, symmetry and transitivity: * reflexivity: Any word or sequence of words aligns with itself- which is natural, insofar as we extend the notion of &quot;translation&quot;, so as to include the translation from one language to itself...</Paragraph> <Paragraph position="4"> * symmetry: if a in language A is aligned with b in language B, then we expect b to align with a. In other words, alignment is not &quot;directional&quot;.</Paragraph> <Paragraph position="5"> * transitivity: if a aligns with b, and if b itself aligns with c, then a aligns with c.</Paragraph> <Paragraph position="6"> Although there are limits to the applicability of these mathematical properties to real-life translations, the case of transitivity is particularly interesting, as we will see later on.</Paragraph> <Paragraph position="7"> Translation equivalences can be viewed at different levels of resolution, from the level of documents to those of structural divisions (chapters, sections, etc.), paragraphs, sentences, words, morphemes and eventually, characters. In general, it seems quite clear that the smaller the units, the more interesting an alignment is likely to be (although we can question the interest of a character-level alignment). However, in the experiments described here, we focus on alignment at the level of sentences, this for a number of reasons: First, sentence alignments have so far proven their usefulness in a number of applications, e.g. bilingual lexicography (Langlois, 1996; Klavans and Tzoukermann, 1995; Dagan and Church, 1994), automatic translation verification (Macklovitch, 1995; Macklovitch, 1996) and the automatic acquisition of knowledge about translation (Brown et al., 1993). Also, the sentence alignment problem has been widely studied, and we could even say that at this point in time, a certain consensus exists regarding how the problem should be approached.</Paragraph> <Paragraph position="8"> On the other hand, not only is the computation of finer-resolution alignments, such as phrase- or word-level alignments, a much more complex operation, it also raises a number of difficult problems related to evaluation (Melamed, 1998), which we wanted to avoid, at least at this point. Finally, we believe that the concepts, methods and results discussed here can be applied just as well to alignments at other levels of resolution.</Paragraph> </Section> <Section position="3" start_page="2" end_page="4" type="metho"> <SectionTitle> 2 A General Method for Aligning </SectionTitle> <Paragraph position="0"> Multiple Versions of a Text Existing alignment Mgorithms that rely on the optimality principle and dynamic programming to find the best possible sentence alignment, such as those of Gale and Church (1991), Brown et al. (1991), Simard et al. (1992), Chen (1993), Langlais and E1-B~ze (1997), Simard and Plamondon (1998), etc. can be naturally extended to deal with three texts instead of two, or more generally to deal with N texts. While the resolution of the bilingual problem is analogous to finding an optimal path in a rectangular matrix, aligning N texts is analogous to the same problem, this time in a N-dimensional matrix.</Paragraph> <Paragraph position="1"> Normally, these methods produce alignments in the form of parallel segmentations of the texts into equal numbers of segments. These segmentations are such that 1) segmentation points coincide with sentence boundaries and 2) the k th segment of one text and the k th segment of all others are mutual translations. We refer to such alignments as non-crossing alignments (see Figure 1 for an example).</Paragraph> <Paragraph position="2"> La vraie question posde par cette controverse est la suivante: P qu'est ce que la pensde? P Elle mystifie l'humanitd (seule, apparemment, k pouvoir penser) depuis des milldnalres. P Des ordinateurs qui ne pensent pas ont cependant rdorientd la question et ~limin~ diverses rdponses. P un programme manipule seulement des symboles, mais le cerveau leur donne un sens. P Behind this debate lies the question, What does it 'mean to think? P The issue has intrigued people (the only entities known to think) for millennia. P Computers that so far do not think have given the question a new slant and struck clown many candidate answers. P A definitive one remains to be found. P Is the brain's Mind a Computer Program?? P NO.</Paragraph> <Paragraph position="3"> A program merely manipulates symbols, whereas a brain attaches meaning to them. P by John It. Searle P This definition covers a subset of alignments as defined in Section 1. It is therefore always possible to represent a non-crossing alignment as an equivalence relation. However, the converse is not true: in particular, and as their name suggests, one cannot explicitly represent inversions with such alignments, i.e. situations where the order of sentences is not preserved across translation. In spite of this limitation, these alignments cover the vast majority of situations encountered in real-life texts, at least at the level of sentences (Gale and Church, 1991). There is a problem with extending this class of alignment algorithms to deal with the general N-dimensional case, however: the computational complexity of the algorithm increases multiplicatively with each new language. For instance, the space and time complexity of the trilingual version of the Gale and Church (1991) program would be O(N3). The use of such an algorithm quickly becomes prohibitive (for example: 1,000,000 computation steps for texts of 100 sentences each). Of course, in the case of bilingual alignment, it is common practice to restrict the search, for instance to a narrow corridor along the main &quot;diagonal&quot; (see Simard and Plamondon (1998), for example). But even with such heuristics, it is quite clear that in general, the search-space will grow multiplicatively with each new language.</Paragraph> <Paragraph position="4"> Nevertheless, the idea of aligning multiple versions of a text simultaneously is intuitively appealing: while the alignment operation will no doubt be more complex than with two languages, every new version brings some additional information, which we should be able to make good use of (see Figure 2). Therefore, we will want to find a way to overcome the complexity issues.</Paragraph> <Paragraph position="5"> We know that the multilingual alignment problem is related to a number of other sequence</Paragraph> <Paragraph position="7"> two - In the face of an uncertainty regarding correspondences between b2, Cl and c2: the absence of evidence for (a2,cl) or (al, b2) correspondences suggests rejecting (b2, cl), while a similar reasoning supports (b2, c~).</Paragraph> <Paragraph position="8"> comparison problems, with applications in various domains. In particular, molecular biologists are concerned with relating sequences of nucleotides (in DNA or RNA molecules) and of amino acids (in proteins) (Sternberg, 1996).</Paragraph> <Paragraph position="9"> The methods used to attack these problems are very similar to those used in translation alignment, and rely largely on dynamic programming. In practice, researchers in molecular biology have observed that, insofar as the input sequences axe not excessively dissimilar, the greater the number of sequences, the better the alignments obtained. Therefore, numerous strategies have been proposed to alleviate the complexity issues related to multiple sequence comparison (Chan et al., 1992). One common heuristic approach is to reduce the search-space, either in width (i.e. by concentrating the search around the &quot;diagonal&quot;), or in depth (i.e. by first segmenting the input sequences at judicious points, and then aligning the subsequences). Of course, these strategies are also widely used in text-translation alignment.</Paragraph> <Paragraph position="10"> However, the most widespread approach is to construct multiple alignments by iteratively combining smaller-degree alignments. While these methods are not generally optimal, they still produce good results in most situations.</Paragraph> <Paragraph position="11"> More importantly, for a given number of sequences, they usually work in quadratic time and space. The general idea is to first compare sequences two-by-two, so as to measure their pairwise similarity; based on the result of this operation, an order of alignment is determined -- typically, the most similar pairs will be aligned first; the final multiple alignment is produced by gradually combining alignments (see, for example, Barton and Sternberg (1987)).</Paragraph> <Paragraph position="12"> This approach can be directly adapted to the trilingual text alignment problem. The idea is simple: given three versions of a text A, B and C, in three different languages, we first determine which of the three pairs AB, BC or AC is the most &quot;similar&quot;. Let us suppose that this is the AB pair. We then align this pair, using whatever bilingual alignment program we have at hand, producing XAB; we then align text C with this alignment, thus producing XABC.</Paragraph> <Paragraph position="13"> To implement this idea, we need to answer two questions: First, how to measure the similarity between different versions of a text? And second, what does it mean to align a text with an alignment? There are certainly numerous possible answers to the first question. But actually, statistical alignment methods such as those derived from Gale and Church (1991) provide us with a simple solution: to find the best alignment, these methods explore different alignment hypotheses, and select the one with the highest probability with regard to a certain statistical model of translation. Therefore, at the end of the operation, a statistical alignment program has at its disposal an overall score for the best alignment, in the form of a global probability.</Paragraph> <Paragraph position="14"> In practice, we observe that this score is a good indicator of the similarity between two texts.</Paragraph> <Paragraph position="15"> For instance, Gale and Church used this score to identify dubious regions in their alignments 1. 1Also recall that the dynamic programming approach Therefore, to determine the most similar pair of texts, we propose to compute the bilingual alignments XAB, XBC and XAC, and to compare the final alignment scores. Of course, for this exercise to be meaningful, we must make sure that the scores associated with the bilingual alignments are indeed comparable. In general, if the same alignment method is used with comparable translation models for all pairs of languages, this should not be a problem.</Paragraph> <Paragraph position="16"> Once the most similar pair of versions has been identified, say A and B, and we have computed a bilingual alignment for that pair, we are ready to tackle the problem of aligning the remaining text C with the XAB alignment. In practice, this will amount to aligning the elements of C (in our case, sentences) with individual &quot;couples&quot; of the XAB relation: whenever we align some sentence c E C with a sentence a E A, then this implies that c must also be aligned with all other sentences to which a is related within the transitive closure of XAB. In other words, this alignment method is &quot;inherently transitive&quot;.</Paragraph> <Paragraph position="17"> In practice, the alignment of XAB and C is dealt with just like a bilingual alignment: the XAB alignment is viewed as a sequence of items, and dynamic programming is used to find the best alignment with the sentences of C. The only real difference lies in how individual &quot;triples&quot; are scored. Here again, we turn to molecular biology, where experience seems to show that the &quot;joint similarity&quot; of multiple items can be measured as the linear combination of all pairwise comparisons:</Paragraph> <Paragraph position="19"> This sort of combination supposes that all binary scoring functions s(ai, aj) are comparable (Carillo and Lipman, 1988). Once again, this will not be a problem if we plan to use analogous translation models for all language pairs.</Paragraph> <Paragraph position="20"> To sum up, given three versions of a text A, B and C, we propose the following trilingual alignment method: 1. Compute initial bilingual alignments XAB, to text alignment actually derives from a classic algorithm to measure the &quot;edit distance&quot; between two strings (Wagner and Fischer, 1974) 5 XBC and XAC; 2. Using the final alignment score, identify the most similar pair (say, AB); 3. Align the remaining text (C) with the ini null tial alignment of the retained pair (XAB); the result is a trilingual alignment XABC; The computational complexity of this method is essentially the same as that of the underlying bilingual alignment method, both in terms of time and space. In practice, aligning three texts this way takes about the same amount of memory as aligning one pair, and about four times as much computation time.</Paragraph> </Section> <Section position="4" start_page="4" end_page="4" type="metho"> <SectionTitle> 3 Evaluation </SectionTitle> <Paragraph position="0"> We have implemented a trilingual sentence-alignment program called trial, based on the approach presented in Section 2 and on a bilingual sentence-alignment program called sfial, which implements a modified version of the method of Simard et al. (1992). In sfial, we essentially combine into a statistical framework two criteria: the length-similarity criterion proposed by Gale and Church (1991) and a &quot;graphemic resemblance&quot; criterion based on the existence of cognate words between languages. This method was chosen because it is simple, it requires a minimum of language-specific knowledge, and because it is representative of the kind of approaches that are typically used for this task, at least for aligning between closely-related languages such as German, English, French, Spanish, etc. Furthermore, in a recent sentence-alignment &quot;competition&quot; held within the ARCADE project (Langlais et al., 1998), the three top-ranking systems relied at least partially on cognates, and two of them were derived directly from the Simard et al. (1992) method.</Paragraph> <Paragraph position="1"> To test the performance of the trial program, we needed a performance metric and a test corpus. Following the work of the ARCADE project, we decided to measure performance in terms of alignment recall, precision and F-measure, computed on the basis of sentence lengths (measured in terms of characters).</Paragraph> <Paragraph position="2"> In our experience, this set of metrics is the most generally useful.</Paragraph> <Paragraph position="3"> Our test corpus was The Gospel According to John, in English (New International alignments produced by sfial and trial, on The Gospel According to John, French, English and Spanish versions.</Paragraph> <Paragraph position="4"> and Spanish (Reina Valera version). All versions were obtained via the Bible Gateway (http://uvw.gospelcom.net). For the needs of the evaluation, we manually segmented all three versions of the text into sentences, and then produced reference sentence alignments, using the Manual system (Simard, 1998c). This corpus and its preparation axe described in more details in Simard (1998a).</Paragraph> <Paragraph position="5"> The test-corpus was submitted to both sfial and trial; the results of this experiment are reproduced in Table 1. The Spanish-French pair was identified by trial as being the most similar (not surprisingly, English-Spanish was the most dissimilar). Since the alignment of the most similar pair is used as the basis of the trilingual alignment, the results obtained by sfial and trial for this pair are identical. On the other hand, for the two other pairs, the trial method seems to improve the quality of the alignments, but the gains are minimal.</Paragraph> <Paragraph position="6"> A close examination of the results quickly reveals what is going on here: As mentioned earlier, our trilingual alignment method is &quot;inherently transitive&quot;; in fact, it naturally produces alignments which are transitively closed. In doing so, it sometimes run into some natural limitations of the applicability of transitivity to real-life translations. Take the following example: suppose that the word weak in an English text is rendered in French as sans force (&quot;without strength&quot;) and in Spanish as sin fortaleza. A transitively closed trilingual alignment will contain the correct correspondences (sans, sin) and (force, fortaleza), but also the correspon- null volving more than a single pair of sentences in the alignment produced by trial.</Paragraph> <Paragraph position="7"> dences (sans, fortaleza) and (force, sin), which are superfluous. Such contractions and expansions happen all the time in real-life translations, not only at the level of words, but at the level of sentences as well. As a result, transitively closed alignments of three texts or more will usually display a lower precision than bilingual alignments.</Paragraph> <Paragraph position="8"> To compensate for this &quot;transitivity noise&quot;, we decided to apply a final post-processing step: for each pair of languages, whenever the trilingual alignment produced by trial connects two pairs of sentences or more, we evaluate the impact of re-segmenting the corresponding region of text (in other words, we perform a local bilingual alignment). Typically, this operation can be carried out in near-linear time and space.</Paragraph> <Paragraph position="9"> Table 2 shows the impact of this procedure on the trial alignments of The Gospel According to * John (the initial bilingual alignment is not submitted to re-alignment, and so the results for the French-Spanish pair is not reproduced here).</Paragraph> <Paragraph position="10"> What we observe is a significant improvement of precision, and a slight decrease in recall. Compared to the sfial bilingual alignment, the over-all improvement (F-measure) is approximately 1%: all figures being in the 0.95 area, this corresponds to a 20% reduction in the total error. Therefore, it would indeed seem that our final post-processing is effective, and that in the end, &quot;three languages are better than two&quot;.</Paragraph> </Section> <Section position="5" start_page="4" end_page="7" type="metho"> <SectionTitle> 4 Optimizations </SectionTitle> <Paragraph position="0"> In addition to all the usual optimizations to bilingual alignment methods, various things can be done to reduce computation times in the trilingual alignment method of Section 2: for instance, individual bilingual scores from step 1 can be recorded in memory, to be later re-used in step 3. Also, if multiple processors are available, the three initial alignments of step 1 can be done in parallel. By combining these optimizations, it is possible to align three texts in less than twice the time necessary to align a single pair.</Paragraph> <Paragraph position="1"> Another possible optimization is to initially segment the three texts in parallel, so as to perform step 3 on smaller pieces. Of course, this idea is not new, but what makes it particularily appealing for trilingual alignment, in addition to the usual reduction in the needed time and space, is the potential for further improvements in the quality of the resulting alignments: In the method outlined above, we have chosen to base the re-alignment on the initial alignment that connected the two most similar versions of the text. In reality, nothing proves that this similarity is &quot;evenly distributed&quot; on the totality of the texts. In fact, if we segmented the input texts at arbitrary points, we might very well discover that the most similar pair of languages is not always the same. If this is the case, then we could improve our results by doing the re-alignment in small chunks, each time basing the re-alignment on the pair of languages that locally displays the best similarity.</Paragraph> <Paragraph position="2"> On the other hand, this approach also carries its own risks. Indeed, by pre-segmenting the three texts in parallel, we will be fixing points in the alignment a priori, namely those points at the boundaries between segments. This is why it is crucially important to select segmentation points judiciously: we will want these to lie in areas where all three initial alignments agree and each display a high level of confidence.</Paragraph> <Paragraph position="3"> In practice, such &quot;points of agreement&quot; between the initial bilingual alignments can be found by computing their transitive closure, i.e.</Paragraph> <Paragraph position="4"> by adding to the union of the three alignments all couples whose existence is predicted by transitivity (a simple procedure for this can be found in Hopcroft and Ullman (1979)). From such transitively closed trilingual alignments emerge &quot;islands of correspondence&quot;, i.e. groups of sentences that are all related to one another. In between these islands lie natural segmentation points, that can be viewed as points of agreement between the three initial alignments.</Paragraph> <Paragraph position="5"> We also found that to obtain the best possible segmentation of the texts, it was necessary to select among such points of agreement only those lying between pairs of islands of correspondence for which we have a high degree of confidence. To measure this &quot;confidence&quot;, we currently use two criteria: first, the number of sentences of each language in the surrounding islands; and second, the alignment program's own scoring function. The first criterium is based on the simple observation that most alignment errors happen when the translation diverges from the usual pattern of &quot;one sentence translates to one sentence&quot; (Simard, 1998b); so we only con- null sider points of agreement lying between &quot;l-to1-to-l&quot; islands. The second criterium is based on the observation by Gale and Church (1991) that good alignments usually coincide with high scoring regions of text.</Paragraph> <Paragraph position="6"> To sum up, our optimized trilingual alignment method follows these lines: Given three versions of a text A, B and C, 1. Compute initial bilingual alignments XAB, XBC and XAC; 2. Segment the texts: (a) Identify points of agreement between XAS, XBC and XAC, by computing the transitive closure X~B C of XAB U XBC U XAC; (b) Among these points, select those points that lie between pairs of 1-1-1 triples within which individual bilingual alignment scores do not exceed some threshold T; (c) Segment A, B and C at these points, thus producing sub-segments A1...An, B1...Bn and C1...C n.</Paragraph> <Paragraph position="7"> 3. Jointly align each triple of segments (Ai, Bi, Ci) as with the trial method (Section 2), and obtain the final trilingual alignment as the union of all partial alignments XAB C = UXAiBiCi ; This optimization was implemented into the trial program, thus producing a program we call trial++. The Gospel According to John, in French, English and Spanish was then submitted to this new program. To a certain degree, the results of this experiment were a disappointment, since they turned out to be virtually identical to those obtained with the trial program. step trial trial++ vidual computation steps of trial and trial++ (French, Spanish and English versions of The Gospel According to John).</Paragraph> <Paragraph position="8"> A closer examination reveals what is going on: in 201 out of the 279 segments produced by the pre-segmentation procedure, trial++ chose to base the re-alignment on the same alignment as trial, i.e. the Spanish-French. No differences could therefore be expected between the two programs on these segments. Of the 78 remaining segments, 60 contained exactly one sentence per language, so not much improvement could be expected for those either. In the end, only 18 segments remained where trial and trial+ + had the potential to produce different alignments; but even here, both programs produced birtually identical results.</Paragraph> <Paragraph position="9"> The main difference between trial and trial++ was execution times. Table 3 shows the times required for each of the computation steps of the two programs. What we observe is that pre-segmentation reduces execution times significantly, without hampering the quality of the alignments. We can therefore consider that this is a useful step, especially if we are dealing with long texts 2.</Paragraph> </Section> <Section position="6" start_page="7" end_page="9" type="metho"> <SectionTitle> 5 A Disturbing Experiment... </SectionTitle> <Paragraph position="0"> We mentioned earlier that to compute trilingual alignments, directly extending dynamic programming bilingual alignment methods was not a realistic approach from the point of view of computational complexity. However, it seems that the optimization described in Section 4 for pre-segmenting the three texts into small segments before performing the trilingual alignment could actually help resolve the problem: if we manage to segment the input texts into small enough chunks, then a cubic-order algo~Also worth noting here is that pre-segmentation is currently carried out by a Perl script. With a proper C implementation, execution times for this step would probably become negligible.</Paragraph> <Paragraph position="1"> rithm may not be so problematic after all.</Paragraph> <Paragraph position="2"> To test this conjecture, we implemented the following method into a program called trial-(the origin of the name will become clear in a moment ): Given three versions of a text A, B and C, 1. Compute initial bilingual alignments XAB, XBe and XAC; 2. Pre-segment the texts, as in Section 4; 3. Align each triple of sub-segments (Ai, Bi, Ci), using dynamic programming to find the optimal alignment XAiBiCl in the Ai x Bi x Ci space; 4. Obtain the final trilingual alignment as the union of all partial alignments XAB c ----</Paragraph> <Paragraph position="4"> Once again, and as in the trial and trial++ methods, we finish up with a final bilingual alignment pass to compensate for &quot;transitivity noise&quot;.</Paragraph> <Paragraph position="5"> This new program was succesful in aligning The Gospel According to John , using amounts of time and memory comparable to the trial++ program. However, the resulting alignments were quite different, as can be seen in the performance figures in Table 4.</Paragraph> <Paragraph position="6"> The performance of this new program on The Gospel According to John turns out to be not only poorer than that of the trial program, but also poorer than that of the sfial program! What is going on, here? After all, we would expect trial-- to be The optimal method, of which all others are heuristic approximations. Although we have no definitive answer to this question, we see two different lines of explanation. null The first and most obvious possibility is that our initial assumption, namely that three languages are better than two, is false. In other words: aligning three texts is at least as hard as aligning two, and possibly harder. Of course, this would also imply that the results obtained with the trial and trial++ methods were mere accidents. Although this explanation for the failure of the trial-- approach clearly contradicts our initial intuitions, we cannot entirely reject it. We do not, however, pursue this line any further. (Besides, it doesn't go any further!) The second line of explanation leads us straight to the scoring function of our alignment methods. As in the trial and trial++ methods, the scoring function used in trial-- is the sum of all pairwise (bilingual) alignment scores. It could simply be that this way of measuring &quot;trilingual fit&quot; is inadequate.</Paragraph> <Paragraph position="7"> However, we believe that our problems run deeper. To begin with, it could be argued that what a trilingual alignment program really needs is a true &quot;trilingual translation model&quot;; it is not at all clear that three bilingual translation models are an adequate substitute for this. Even if it were, we know that there are numerous problems with the &quot;length&quot; and &quot;cognate&quot; stochastic models on which the s fial scoring function is based. For instance, we know that while these models usually describe the phenomena observed in translations adequately, they are not necessarily as good when it comes to things that are not translations.</Paragraph> <Paragraph position="8"> While these weaknesses do not appear to cause too many problems when computing bilingual alignments, it would not be surprising that a third language is all it takes for incoherences to creep out and performance to degrade. If this is indeed the case, then this is one more argument in favor of the trial approach: by treating multilingual alignments the same way as bilingual alignments, this approach may let us get away with poor translation models, at least until we come up with something better! Conclusion and Future Work We have showed how an existing bilingual text alignment method could be adapted to align three versions of a text simultaneously. The computational complexity of the resulting trilingual alignment method is the same as that of the underlying bilingual method, and various optimizations are possible. In experiments on English, French and Spanish versions of The Gospel According to John, this approach produced sentence alignments significantly better than those obtained using a bilingual alignment program.</Paragraph> <Paragraph position="9"> All tests reported here were conducted on a single, relatively small corpus of text. The contradictory results reported in Section 5 highlight the need for more work with regard to evaluation. Such an evaluation exercise would normally imply putting together a much larger and more varied test corpus, segmented and aligned by hand, a process which is known to be costly.</Paragraph> <Paragraph position="10"> However, since the goal is to measure improvements relative to existing bilingual alignment methods, an interesting alternative would be to perform a relative evaluation instead: the programs could then be tested on a much larger test-corpus, and the performance of each system would be measured on only those portions of text where the alignments differ. The details of such an evaluation need to be worked out.</Paragraph> <Paragraph position="11"> Also, it remains to be seen how the trial approach would adapt to the general multi-lingual case (three languages or more) on the one hand, and to the more challenging problem of finer-grained alignments on the other hand. Here again, we will likely encounter numerous complications, most notably regarding questions of evaluation. Ongoing work in the word-alignment track of the ARCADE project is likely to bring interesting results regarding this question (Langlals et al., 1998).</Paragraph> <Paragraph position="12"> Finally, and probably more importantly, working with more than two languages has highlighted weaknesses in the modeling of translation that underlies our alignment methods. Much work remains to be done in this direction.</Paragraph> </Section> class="xml-element"></Paper>