File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/98/w98-1005_abstr.xml
Size: 4,382 bytes
Last Modified: 2025-10-06 13:49:32
<?xml version="1.0" standalone="yes"?> <Paper uid="W98-1005"> <Title>Translating Names and Technical Terms in Arabic Text</Title> <Section position="2" start_page="1" end_page="34" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> It is challenging to translate names and technical terms from English into Arabic. Translation is usually done phonetically: different alphabets and sound inventories force various compromises.</Paragraph> <Paragraph position="1"> For example, Peter Streams may come out as hr..~ ~ bytr szrymz. This process is called transliteration. We address here the reverse problem: given a foreign name or loanword in Arabic text, we want to recover the original in Roman script. For example, an input like .~..A~ bytr strymz should yield an output like Peter Streams. Arabic presents special challenges due to unwritten vowels and phonetic-context effects. We present results and examples of use in an Arabic-to-English machine translator.</Paragraph> <Paragraph position="2"> Introduction It is not trivial to write an algorithm for turning Translators must deal with many problems, and one of the most frequent is translating proper names and technical terms. For language pairs like Spanish/English, this presents no great challenge: a phrase like Antonio Gil usually gets translated as Antonio Gil. However, the situation is more complicated for language pairs that employ very different alphabets and sound systems, such as Japanese/English and Arabic/English. Phonetic translation across these pairs is called transliteration. null (Knight and Graehl, 1997) present a computational treatment of Japanese/English transliteration, which we adapt here to the case in Arabic.</Paragraph> <Paragraph position="3"> Arabic text, like Japanese, frequently contains foreign names and technical terms that are translated phonetically. Here are some examples from newspaper text: a IThe romanization of Arabic orthography used here consists of the following consonants: ! (alif), b, t, th, j, H, x, d, dh, r, z, s, sh, S, D, T, Z, G (@ayn), G (Gayn), f, q, k, 1, m, n, =h, w, y, ' (hamza). !, w, and y also indicate long vowels. !' and !+ indicate harnza over ali/and harnza under ali/, respectively.</Paragraph> <Paragraph position="4"> English letter sequences into Arabic letter sequences, and indeed, two human translators will often produce different Arabic versions of the same English phrase. There are many complexity-inducing factors. Some English vowels are dropped in Arabic writing (but not all). Arabic and English vowel inventories are also quite different--Arabic has three vowel qualities (a, i, u) each of which has short and long variants, plus two diphthongs (ay, aw), whereas English has a much larger inventory of as many as fifteen vowels and no length contrast. Consonants like English D are sometimes dropped. An English S sound frequently turns into an Arabic s, but sometimes into z. English P and B collapse into Arabic b; F and V also collapse to f. Several English consonants have more than one possible Arabic rendering--K may be Arabic k or q, t may be Arabic t or T (T is pharyngealized t, a separate letter in Arabic). Human translators accomplish this task with relative ease, however, and spelling variations are for the most part acceptable.</Paragraph> <Paragraph position="5"> In this paper, we will be concerned with a more difficult problem--given an Arabic name or term that has been translated from a foreign language, what is the transliteration source? This task challenges even good human translators:</Paragraph> <Paragraph position="7"> (Answers appear later in this paper).</Paragraph> <Paragraph position="8"> Among other things, a human or machine translator must imagine sequences of dropped English vowels and must keep an open mind about Arabic letters like b and f. We call this task back-transliteration. Automating it has great practical importance in Arabic-to-English machine translation, as borrowed terms are the largest source of text phrases that do not appear in bilingual dictionaries. Even if an English term is listed, all of its possible Arabic variants typically are not. Automation is also important for machine-assisted translation, in which the computer may suggest several translations that a human translator has not imagined.</Paragraph> </Section> class="xml-element"></Paper>