Subsentential Translation Memory for  
Computer Assisted Writing and Translation 
Jian-Cheng Wu 
Department of Computer Science 
National Tsing Hua University 
101, Kuangfu Road, Hsinchu, 300, 
Taiwan, ROC 
D928322@oz.nthu.edu.tw 
Thomas C. Chuang 
Department of Computer Science 
Van Nung Institute of Technology
No. 1 Van-Nung Road 
Chung-Li Tao-Yuan, Taiwan, ROC
tomchuang@cc.vit.edu.tw 
Wen-Chi Shei , Jason S. Chang 
Department of Computer Science 
National Tsing Hua University 
101, Kuangfu Road, Hsinchu, 300, 
Taiwan, ROC 
jschang@cs.nthu.edu.tw 
 
Abstract 
This paper describes a database of translation 
memory, TotalRecall, developed to encourage 
authentic and idiomatic use in second 
language writing. TotalRecall is a bilingual 
concordancer that support search query in 
English or Chinese for relevant sentences and 
translations. Although initially intended for 
learners of English as Foreign Language (EFL) 
in Taiwan, it is a gold mine of texts in English 
or Mandarin Chinese. TotalRecall is 
particularly useful for those who write in or 
translate into a foreign language. We exploited 
and structured existing high-quality 
translations from bilingual corpora from a 
Taiwan-based Sinorama Magazine and 
Official Records of Hong Kong Legislative 
Council to build a bilingual concordance. 
Novel approaches were taken to provide high-
precision bilingual alignment on the 
subsentential and lexical levels. A browser-
based user interface was developed for ease of 
access over the Internet. Users can search for 
word, phrase or expression in English or 
Mandarin. The Web-based user interface 
facilitates the recording of the user actions to 
provide data for further research. 
1 Introduction 
Translation memory has been found to be more 
effective alternative to machine translation for 
translators, especially when working with batches 
of similar texts. That is particularly true with so-
called delta translation of the next versions for 
publications that need continuous revision such as 
an encyclopaedia or user’s manual. On another 
area of language study, researchers on English 
Language Teaching (ELT) have increasingly 
looked to concordancer of very large corpora as a 
new re-source for translation and language learning. 
Concordancers have been indispensable for 
lexicographers. But now language teachers and 
students also embrace the concordancer to foster 
data-driven, student-centered learning.  
A bilingual concordance, in a way, meets the 
needs of both communities, the computer assisted 
translation (CAT) and computer assisted language 
learning (CALL). A bilingual concordancer is like 
a monolingual concordance, except that each 
sentence is followed by its translation counterpart 
in a second language. “Existing translations 
contain more solutions to more translation 
problems than any other existing resource.” 
(Isabelle 1993). The same can be argued for 
language learning; existing texts offer more 
answers for the learner than any teacher or 
reference work do.   
However, it is important to provide easy access 
for translators and learning writers alike to find the 
relevant and informative citations quickly. For in-
stance, the English-French concordance system, 
TransSearch provides a familiar interface for the 
users (Macklovitch et al. 2000). The user type in 
the expression in question, a list of citations will 
come up and it is easy to scroll down until one 
finds translation that is useful much like using a 
search engine. TransSearch exploits sentence 
alignment techniques (Brown et al 1990; Gale and 
Church 1990) to facilitate bilingual search at the 
granularity level of sentences.     
In this paper, we describe a bilingual 
concordancer which facilitate search and 
visualization with fine granularity. TotalRecall 
exploits subsentential and word alignment to 
provide a new kind of bilingual concordancer. 
Through the interactive interface and clustering of 
short subsentential bi-lingual citations, it helps 
translators and non-native speakers find ways to 
translate or express them-selves in a foreign 
language. 
2 Aligning the corpus 
Central to TotalRecall is a bilingual corpus and a 
set of programs that provide the bilingual analyses 
to yield a translation memory database out of the 
bilingual corpus. Currently, we are working with 
