File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-1408_intro.xml
Size: 22,747 bytes
Last Modified: 2025-10-06 14:02:38
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1408"> <Title>Bilingual concordancers and translation memories: A comparative evaluation</Title> <Section position="2" start_page="0" end_page="5" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Recent years have witnessed a number of significant changes in the translation market.</Paragraph> <Paragraph position="1"> Largely as a result of globalization, there has been a considerable increase in the volume of text to be translated. New types of text, such as Web pages, have also appeared and require translation.</Paragraph> <Paragraph position="2"> The increased demand for translation has been accompanied by another trend: deadlines for translation jobs have grown shorter. This is in part because companies want to get their products onto the shelves in all corners of the world as quickly as possible. In addition, electronic documents such as Web pages may have content that needs to be updated frequently. Companies want to be sure that their sites reflect the latest information, so translators are under pressure to work very quickly to ensure that the up-to-date information is reflected in all language versions of the site.</Paragraph> <Paragraph position="3"> Furthermore, it has been observed that in today's market, there is currently a shortage of human translators (e.g. Sprung 2000:ix; Shadbolt 2002:3031; Allen 2003:300).</Paragraph> <Paragraph position="4"> The increase in volume coupled with shorter turnaround times has resulted in an immense pressure on existing translators to work more quickly, while still maintaining high quality in their work. However, these two demands of high quality and fast turnaround are likely to be at odds with one another. Therefore, one way that some translators are trying to balance the need for high quality with the need for increased productivity is by turning to electronic resources and tools.</Paragraph> <Paragraph position="5"> One type of language resource that has become popular is the bilingual parallel corpus, which is essentially a collection of texts in one language (e.g. English) alongside their translations into another language (e.g. French). The two sets of texts must be aligned, which means that links are made between corresponding sections (e.g.</Paragraph> <Paragraph position="6"> sentences, paragraphs) in the two languages.</Paragraph> <Paragraph position="7"> Bilingual parallel corpora can contain a wealth of useful information for translators, but in order to be able to exploit these resources, some type of tool is needed. There are two main types of tool that can be used to search for and retrieve information from a bilingual parallel corpus : a bilingual concordancer (BC) and a translation memory (TM). While these two types of tool have some common goals and features, they also have a number of differences.</Paragraph> <Paragraph position="8"> As we will see in the upcoming sections, BCs can be considered to be &quot;old technology&quot; and they are not well known in the translation industry outside of academic circles. In contrast, TMs have garnered a significant amount of attention in the translation industry of late; they are very much in vogue and are considered to be leading-edge technology. Nevertheless, a number of translators have expressed frustration and disappointment when trying to apply TMs in certain contexts. It is possible that some of the frustration experienced by translators using TMs in certain situations could be alleviated by using BCs instead. The aim of this paper is to conduct a comparative analysis of the two types of technology in an effort to determine the strengths and weaknesses of each in order to determine those situations where translators would be best served by using a TM and those where they may be better off using a BC.</Paragraph> <Paragraph position="9"> Following the introduction, the paper will be divided into four main parts. Part 2 provides some Note that while the same corpus data can be used with both types of tool, it is usually necessary to pre-process the corpus in a different way in order to render it readable by different tools.</Paragraph> <Paragraph position="10"> background information, including a general description of how the two types of tool work, with reference to two specific tools - ParaConc and Trados - that are representative of the categories of BC and TM respectively. Part 3 contains a brief assessment of the place occupied by these tools within the translation industry today. Part 4 contains a more detailed comparative analysis of the features and associated advantages and disadvantages of each type of tool. Finally, Part 5 concludes with some general recommendations about which translation situations warrant the use of each type of tool.</Paragraph> <Paragraph position="11"> 2 General introduction to BCs and TMs The general aim of both a BC and a TM is to allow a translator to consult, and if appropriate to &quot;reuse&quot;, relevant sections of previously translated texts. In the following sections, BCs and TMs will be described with reference to ParaConc and Trados, which are representative examples of these respective categories of tool.</Paragraph> <Section position="1" start_page="1" end_page="5" type="sub_section"> <SectionTitle> 2.1 ParaConc: an example of a BC </SectionTitle> <Paragraph position="0"> BCs, such as ParaConc, are fairly straightforward tools: they allow translators to search through bilingual parallel corpora to find information that might help them to complete a new translation. For example, if a translator encounters a word or expression that he does not know how to translate, he can look in the bilingual parallel corpus to see if this expression has been used before, and if so, how it was dealt with.</Paragraph> <Paragraph position="1"> To use ParaConc, the source and target texts must first be aligned, which means that corresponding text segments are linked together .</Paragraph> <Paragraph position="2"> A semi-automatic alignment utility is included in the program to prepare texts that are not already pre-aligned. The initial part of the alignment process is carried out in three stages: first the texts are aligned based on headings, if any are present in the texts, then alignment is carried out at the paragraph level, and finally at the sentence level. The software uses the formatting information in files to carry out alignment of headings and In fact, ParaConc could more properly be termed a multilingual concordancer, since it is possible to consult texts in up to four languages at once. However, in the context of this paper, we will refer to it as a BC and discuss its use for comparing texts in two languages. A detailed description of alignment techniques is beyond the scope of this paper; however, alignment is a non-trivial matter. Problems can arise, for example, if a single source text sentence has been translated by multiple target language sentences, or vice versa, or if information has been omitted from or added to the target text (e.g. to handle cultural references). paragraphs. Alignment at the sentence level is achieved by applying the Gale-Church algorithm (Gale and Church 1993). To make adjustments to the alignment, the user can examine the aligned segments and either merge or split particular segments, as necessary. One important thing to note is that the aligned units remain situated within the larger surrounding text.</Paragraph> <Paragraph position="3"> Once the texts are aligned, the translator can consult the corpus. By choosing the basic search command, the translator can retrieve all examples of a word or phrase (or part of a word) from the corpus. As shown in Figure 1, the search term &quot;head&quot; has been entered and all instances of &quot;head&quot; from the English corpus are displayed in the upper pane (here in a KWIC format). The corresponding text segments from the French corpus are shown in the lower pane.</Paragraph> <Paragraph position="4"> The concordance lines can be sorted in various ways (e.g., primarily 1 st left and secondarily 1 st right) in order to group similar phrases together and therefore make it easier for a translator to spot linguistic patterns. Clicking on a concordance line in the upper pane will highlight that line and also the corresponding text segment in the lower pane. Double-clicking on a line will bring up a window containing the segment within a larger context. Suggested translations for the English &quot;head&quot; can be highlighted by positioning the cursor in the lower French results pane and clicking on the right mouse button. A possible translation of &quot;head&quot; such as &quot;tete&quot; can be entered. The program then simply highlights all instances of &quot;tete&quot; in the French results window, which can then be displayed (and sorted).</Paragraph> <Paragraph position="5"> It is also possible to use a utility that presents a list of &quot;hot&quot; words in the French results pane, including possible translations. Some or all the words listed can be selected and they will then be highlighted in the results.</Paragraph> <Paragraph position="6"> Finally, more complex search commands can also be used if desired. Some of the possible advanced search options are: Text search, Regular expression search, Tag (part-of-speech) search, Batch search, and various heading-sensitive and context-sensitive searches. Of particular interest to translators is a Parallel search, which allows the user to enter both an English and a French search word and to retrieve only those occurrences that match both (e.g. only instances where &quot;head&quot; is translated by &quot;tete&quot; and not by &quot;chef&quot;). There are a number of potential limitations that are often associated with BCs: 1) the limited degree of automation; 2) the nature of the search item; and 3) the nature of the matching process. With regard to degree of automation, when using a BC, it is up to the translator to decide what word or expression to look up, and he then has to manually type this into the search engine.</Paragraph> <Paragraph position="7"> In terms of the nature of the search item, BCs are generally designed to search only for words or very short phrases. It is true that, in principle, a BC could be used to search for an entire sentence or paragraph; however, the fact that the search pattern must be manually entered tends to discourage this type of use because it would be extremely time-consuming and error prone (e.g. typos).</Paragraph> <Paragraph position="8"> Finally, BCs are sometimes criticized because of the nature of the matching process that they use. By default, these tools basically search through the corpus for occurrences that match the entered search pattern precisely. For example, if the translator enters the search pattern &quot;flatbed colour scanner&quot; into the concordancer, it will retrieve only those occurrences that match that pattern exactly. It will not retrieve an example that contains differences in punctuation, spelling or morphology (e.g. &quot;flat-bed color scanners&quot;). However, as noted in section 2, some BCs, such as ParaConc, have added more advanced search features to improve the flexibility of searching.</Paragraph> <Paragraph position="9"> 2.2 Trados: an example of a TM Like a BC, a TM is a tool designed to help translators identify and retrieve information from a bilingual parallel corpus. However, one of the motivating factors in developing TMs was to overcome some of the seeming limitations of BCs as described in section 2.1.1. Consequently, TMs are more automated, can search for longer segments, and employ fuzzy matching techniques.</Paragraph> <Paragraph position="10"> The data contained in a conventional TM, such as Trados , are organized in a very precise way, which differs somewhat from the way in which data are stored for use with a BC. Trados divides each text into small units known as segments, which usually correspond to sentences or sentence-like units (e.g., titles, headings, list items, table cells). The source text segments are linked to their corresponding target text segments and the resulting aligned pair of segments is known as a translation unit (TU). Each TU is extracted from the larger text and stored individually in a database. It is this database of TUs, not the original complete text, that is later searched for matches. When a TM, such as Trados, is first acquired, its database is empty. It is up to the translator to stock the database. This can be done interactively by having the translator add each newly translated segment to the database as he works his way through the text, or it can be done by taking previously translated texts and aligning them using the accompanying automatic alignment program. It is important to note, however, that in order to ensure that the automatic alignment has been done correctly, manual verification may be required.</Paragraph> <Paragraph position="11"> When a translator receives a new text to translate he begins by opening this new text in the Trados environment. Trados proceeds to divide this new text into segments. Once this has been accomplished, the tool starts at the beginning of the new source text and automatically compares each segment to the contents of the TM database.</Paragraph> <Paragraph position="12"> If it finds a segment that it &quot;remembers&quot; (i.e., a segment that matches one that has been previously translated and stored in the TM database), it retrieves the corresponding TU from the database and shows it to the translator, who can refer to this previous translation and adopt or modify it for use in the new translation.</Paragraph> <Paragraph position="13"> Of course, language is flexible, which means that the same idea can be expressed in a number of different ways (e.g., 'The filename is invalid' / 'This file does not have a valid name').</Paragraph> <Paragraph position="14"> Consequently, a translator cannot reasonably expect to find many exact matches for complete segments in the TM. However, it is highly likely that there will be segments in a new source text that are similar to, but not exactly the same as, segments that are stored in the TM. For this reason, Trados also employs a feature known as fuzzy matching. As shown in Figure 2, a fuzzy match is able to locate segments in the TM that are an Note that Trados is actually a suite of tools that includes, among other things, an automatic aligner, a terminology manager and a TM.</Paragraph> <Paragraph position="15"> approximate or partial match for the segment in the new source text.</Paragraph> <Paragraph position="16"> interrupted by the application.</Paragraph> <Paragraph position="17"> FR: L'operation a ete interrompue par l'application.</Paragraph> <Paragraph position="18"> Figure 2. Fuzzy match retrieved from the TM.</Paragraph> <Paragraph position="19"> If more than one potential match is found for any given segement, these are ranked by the system according to the degree of similarity between the new segment to be translated and the previously translated segment found in the database. Note that the similarity in question is a superficial similarity (e.g., the number/length of character strings that the two segments have in common) and not a semantic similarity (thus &quot;gone&quot; and &quot;went&quot; will not count as similar despite the similarity in meaning of the two words). The match that the system perceives as being most similar to the new source segment is automatically pasted into the new target text. The translator can accept this proposal as is, edit it as necessary, or reject it and ask to see other candidates (if any were found).</Paragraph> <Paragraph position="20"> Trados also works in conjunction with termbases; however, it is important to note that these need to be manually pre-stocked by translators with specialized terms and their equivalents. By searching in the termbase - if one exists - Trados can locate matches at the term level and present them to the translator.</Paragraph> <Paragraph position="21"> Nevertheless, there is still a level of linguistic repetition that falls between full sentences and specialized terms - repetition at the level of expression or phrase. This is in fact the level where linguistic repetition will occur most often.</Paragraph> <Paragraph position="22"> Until recently, Trados permitted phrase or expression searching only though a feature that resembled a BC. In other words, a translator could manually select an expression, and Trados would search through the database of TUs to find examples. In the most recent version of Trados (v6.5), however, an auto-concordance function has been added, which, when activated will automatically go on to search for text fragments when no segment-level match is found.</Paragraph> <Paragraph position="23"> Once the translator is satisfied with the translation for a given segment - which can be taken directly from Trados, adapted from a Trados match, or created by the translator from scratch the newly created TU can be added to the TM database and the translator can move on to the next segment. In this way, the database grows as the translator works. Trados can also be networked so that multiple translators can search and contribute to the same TM.</Paragraph> <Paragraph position="24"> 3 BCs and TMs in the translation industry A literature survey indicates that BCs and TMs are both widely used in academic settings for translator training. A long list of researchers (e.g. Bernardini 2002; Hansen and Teich 2002; Palumbo 2002; Pearson 2000; Tagnin 2002; Zanettin 1998) have shown that using BCs in conjunction with parallel bilingual corpora can help students with a range of translation-related tasks, such as identifying more appropriate target language equivalents and collocations; coming to grips with difficult grammatical points (e.g. prepositions, verb tenses, negative prefixes); identifying the norms, stylistic preferences and discourse structures associated with different text types; and uncovering important conceptual information.</Paragraph> <Paragraph position="25"> With regard to TMs, meanwhile, many translator trainers (e.g. Austermuhl 2001; Bowker 2002; DeCesaris 1996; Kenny 1999; L'Homme 1999) are now using TMs for tasks such as getting students to analyze and evaluate different translation solutions; helping students to learn more about inter- and intra-textual features by examining source texts and evaluating their characteristics in an effort to determine whether or not they can be usefully translated with the help of a TM; and conducting longitudinal studies of students' progress over the course of their training program. In contrast to the academic setting, where both BCs and TMs are well known and widely used, the situation in the professional setting is somewhat different: TMs are very popular, but the existence of BCs does not seem to be widely known.</Paragraph> <Paragraph position="26"> For example, TMs are discussed frequently in the professional association literature. According to newsletters/programmes circulated to members, translators' associations such as the American Translator's Association or the Association of Translators and Interpreters of Ontario have provided their members with opportunities (e.g.</Paragraph> <Paragraph position="27"> demonstrations, workshops, professional development seminars) to learn about TMs.</Paragraph> <Paragraph position="28"> In addition, some professional translators' associations, such as the Ordre des traducteurs, terminologues et interpretes agrees du Quebec, also publish magazines aimed at language professionals, and in recent years, these have included a number of discussions on TMs (e.g.</Paragraph> <Paragraph position="29"> Bedard 1995, 1998; Arrouart and Bedard 2001; Lanctot 2001).</Paragraph> <Paragraph position="30"> In those same publications, however, considerably less attention has been paid to BCs: only one event focusing on these tools was reported (Evans 2002).</Paragraph> <Paragraph position="31"> This raises the question as to why BCs appear to have received a less enthusiastic welcome in the professional world than have TMs. One factor that may have led to a difference in uptake of these two tools is the ease of access to such tools.</Paragraph> <Paragraph position="32"> Firstly, it should be noted that BCs have long been known in fields such as language teaching or second-language learning (e.g. Johns 1986, Mindt 1986, Barlow 2000), but it is only more recently that their potential as translation aids has been recognized. Academics working in the field of translation are often involved in, or have colleagues who are involved in, language teaching, and as such they may have gained exposure to BCs in this way. Many of the existing BCs were initially developed by academics who work in language training often as a means of helping their own students. This means that while such tools are generally very reasonably priced and may be easily accessible within the academic community, they are sometimes not widely advertised or distributed to the professional translation community because the people who have created these tools have full-time teaching jobs. In contrast, tools such as TMs, which have typically been developed in the private sector by companies that have professional full-time programmers, technical support staff and generous advertising budgets, are more actively marketed to working translation professionals. The fact that BCs do not seem to be well advertised in the professional setting may explain, in part, why translators and translators' associations seem to be more aware of the existence of TMs than they are of BCs. This situation may change in the future, however. As noted above, the use of BCs in translator training institutes has become firmly established since the late 1990s. This means that, at present, most of the translators in the workforce will have received their education during a time when BCs were not part of the translator training curriculum. However, over the coming years, the number of BC-saavy graduates will increase and they will bring to the workforce their knowledge of BCs. They will be able to share their experience with their colleagues and employers and gradually, more and more companies will have translators on staff who have an understanding of such tools.</Paragraph> </Section> </Section> class="xml-element"></Paper>