PI~OSI'ECTS FOR COMI'UTER- 
ASSISTED DIALECT A1)APTION 
David J. Weber 
UCI,A and Summer Institute of Linguistics 
William C. Mann 
USC Information Sciences Institute 
When a document is to appear ill several dialects or 
closely related langu-Res, there are many practical reasons 
for adapting It from one to another rather than preparing 
separate translations. IIowever, manual adaptlon can be 
tedious, errorful, and requires a bidialectal adaptor (often 
unavailable) and/or qualified linguist (if available, very 
expensive). Computer-aided adaption might be an 
alternative, but Is it feasible to write a computer program 
which contributes enough to be worth the bother and cost? 
For the source dialect ($D) is provided a root dictionary 
and a suffix dictionary; each entry contains the string of 
characters by which that morpheme is recognized, 
morphological category, morphophonemic properties .... and 
for roots, the "0 for,n. For each target dialect (TD) is 
provided a suffix dictionary, a list of the regular sound 
changes (RSC's) which when applied to a "O root will yield 
the correct TI} reflex, and a list of pairs of roots not cognate 
in the SD and TD, 
This paper describes an experiment in computattonally 
assisting tile adapt;on of text from one dialect of central 
Peruvian Quechua (a polysynthetic, al:glutinative. American 
Indian language) to ~everal others. The overall results are 
extremely encouraging: computer-aided dialect adaptton is 
feasible and has important advantages over entirely manual 
methods. 
Bolow we describe the dialect situation, the data and 
processes of the experimental program, and a field test of 
text produced by the program. 
Six dialects differing phonologically, lexically, and 
l;ran|n|atlcally were involved. The rich dlvcrsity of 
differences are dominated by a few kinds of systematic 
difference. The program treats these classes of difference 
separately rather than by a single method (such as string 
substitution): this requires a detailed analysis of the source 
dialect text. 
Examples of the kinds of d l fferences involved: 
Phonolog.icah the reflexes of four proto-Ouechua ('O) 
phonemes ('/~/'l~/'fMf/fl/) --in the government-mandated 
orthography-- are: Panao: tr ch II Ji. Huallaga: ch ch \]I if. 
Dos-de-Mayo: ch t~ I n. l,lata: ch s I n. Yanahuanca: t, xr t ss I n. 
and Junln: tr ch I n. 
I,-xlcah 'to ~et well/recover from an illnes~' i~ 
ex-pressed with the root allchaka:- in Iiuallaga, aliya:- in 
l,lata, and kachaka:- In Junin. 
Gran|matlcah 1 ) Morphologicah a suffix may be present 
In one dialect and absent in another; the forms or properties 
of corresponding suffixes may difier across dialects; there 
are different syMems of lndtcatlnf', plurality within the 
verb., in some dialects there are 3-5 distinct plurallzers 
occurring in different "slots" and conditioned by what other 
suffixes occur in the word; In others there is only one 
pluralizor which has a fixed posit/on; ~-) Syntactic: the 
complements of phasal verbs ('begin', 'finish'...) are 
subordlnatod as adverbial clauses in some dialects but as 
infinitival (object)clauses in others. 
A TD root dictionary Is computatlonally derived by I) 
applying the RSC's to the "Q form of each $D root and ~-) 
substituting the TD root for non-cognate root pairs. 
Tc, xt is adapted word by word, first analyzing SD words 
and then synthesizing TI) words. (An early penctl-and-lmper 
,,xperi;nent sul%~,ested that for Ouechua, word-by-word 
methods could effect approximately 95% of the required 
ehange,~.) After orthographic adjustments, a simple. 
recursivc, exhaustive search attempts to decompose each 
word Into a root and zero or more suffixes by matching the 
word's characters to the strings of characters of dictionary 
entries subject to constraints of a built-in morphology. Tests 
are applied 1) during the search to teat the suitability of a 
matching suffix as the immediate succe~or to what 
precedes, and 2) after all the word is matched to morphemes 
to test the. overall suitability of that sequence of morphemes. 
Th-sc tests constrain possible decompositions to within 
managable ltnlits. Word decomposltioq is complicated by 
various morphophonemic processes. 
Pluralization differences are accommodated by 1) 
tagging as plural each decomposition which contains a plural 
morpheme, 2) deleting that plural morpheme, and 3) 
inserting the appropriate pluralize! for the TD word. 
,Synthesis of a TD word proceeds by 1) substituting the 
SD root by the corresponding TD root from the 
(:oiuputationaily derived TI} root dictim|ary, Z) selecting for 
each morpheme the correct allomorph, 3) concatenation of 
the allomorphs, and 4) orthographic adjustment. Examples 
are shown in Figure 1. 
Many words have multiple decompositions but this is 
tolerable because synthesis of alternative decompositions of 
one 5D word usually yield identical TD words. Nonidentical 
alternatives for one SD word are left to the choice of the 
human editor/checker. 
About 40 pages of text were adapted into each of 5 
target dialects for use in the field test. Sampling indicates 
that the computer correctly changed about 760 morphemes 
per 1000 words of text; in the worst case native speakers 
109 
SO orlhogriphi¢ form: (l) lywarkaykan 
The word analyzer produces, in succession: 
length converted: (2) aywarkaykan 
$egmentatlon: (3) aywa-rka-yka-n 
morphophonemic form: (4) saywa-RKA-YKA:-3 
DlurJdify handled: (5) (taywa-YKA:-3).t4=L 
The word synthesizer produces, in succession: 
r e-plu,alizalion: (6) IIywi-YKA:-YA:-3 
allomorph selection: (7) aywa-yka:-ye-n 
TD or'thoQf'aphic form: (8) aywaykasyan 
'lhey are joins" 
Figure I 
suggested about 190 additional changes per |000 words. The 
computer converted text which otherwise would have been 
at best only marginally lntellit~lble to a speaker of another 
dialect into --with a few exceptions-- fully comprehensible 
text. Thus the program brings a text being adapted close 
enough to the TD that it can be edited/corrected by a native 
speaker of the TD without much coaching or reference to the 
3D text. 
Since Inevitably there is a non-trivial residue of 
changes Infeasible for the computer, its output requires 
subsequent manual correcting/editing. Therefore, rather 
than strive to make the program do everything imaginable, it 
is wise to do the overwhelming number of systemic, "low 
level" changing and not unduly complicate the prol~ram to 
accommodate too much. The test identified many relatively 
infrequent changes not handled by the present program. For 
most of them, computational adaptlon is not feasible. These 
are discus.sod in a version of this paper which has been 
published by Note~ ~. Linguistics. $IL. It is available from 
The International Linguistics Center, 7500 Camp Wisdom 
Road, Dallas TX 75236 for $.75 , as Special Publication 1, 
Prospects for Compttter-Assistod Dialect Adeption. 
Conclusion: A computer can contribute significantly to 
adaption between dialects or closely related languages. 
(9) lywirk|arinanp|q 
(10) lywirks:rin|npsa 
(l i) lyWl-rka-:ri-na-n-p|Cl 
(l 2) saywi-RKU-:RI-NA-3P-PAQ 
(13) (eaywa-RKU-NA-3P-PAO).(.PL 
(| 4) eaywa-RKU-YA:-NA-3P-PAO 
(15) lyws-rku-ya:-nl-n-paq 
(i6) sywsrkuyaananpaQ 
'in order that they 8o' 
110 
