Morphological Interfaces to Dictionaries Michael Maxwell Linguistic Data Consortium 3600 Market St, Suite 810Philadelphia, PA 19104 Maxwell@ldc.upenn.eduWilliam Poser Linguistics, University of Pennsylvania 3600 Market St, Suite 501 Philadelphia, PA 19104billposer@alum.mit.edu Abstract Langues with complex orphlgies prent difculties for dictonaries urs. One solutin tohis problmiusearlica rfo lkp omphlgly cmplxwrd,icdgfulyifctd r, itut  senditxlitlyknw te rhlg. W s the srt morphologies which cause the greatest need for such an interface.1 Introduction When itcomes tdictonaries, not alnguaes ar cadqual.Qe ptfmhe fct htmofrt hs buti lxicrpyfor se lngetn for hrs,angus vainhow they lend themselves to word look up.Gratios fEglis-peki stdents he ben tld, whenty wr uncrtahow plawor,k iupithdiy. H issups tol aor endenotalready know how to spell it has been the source of much distress for those same students. For English, te chif obstacletodictonary lkup ortgrpy.1 Bu fr helanguages, the structure of the language itself is the roblem, and i articular the langus phgy.Uleshe dioryser ixlictl fir witt mpholy, detrnith ctaion fr fagiven wrd canbe quite difficult.O soluti, tlest fr elctroi ictories, is tcrea nrfachatus amrphlgcalpa tfidth ot rfte fu wrd rovidby eusr,ndte oicalyoku th frm(r itcita fr),thrb shifting ento xplly kowt mplg(nd t ciefitnfr) frte usr toeoputr. Sch eracsedscribdiBreidt and Flwg197, Prózéky an Ks 20, Streiter, Knapp, Voltmer et al. 2004, etc.But bildn amorphlgicl prse ia no-trivalsk,  serolutwhere                                                       1 Orthograpy resnt difculties for dictonary lokup isdifculti lagues wha witeabetl, dfor it lxilrs thrf anotbe lphbtzd.We  do ntdsthisu itisaer, spcialzdfrtes ictonre vn ufor kup i uchlanguages; see e.g. Bilac, Baldwin and Tanaka 2002. possibleould be prfeabl. Inthis paer, w discu the srtfmohgytamksrinterface especially desirable.  2 Morphology and Citation Forms We nd toclarify her ta we ar conerd withifutdiconrylkupis foavrg user,  s,f us h mytb overtlyfamilwithterpolgf heln. Lngs and (sa) lanug teachrsre oftfmilrnuhwit tmoplyha ey c opte  cio fr f nrbitrinfletd frm,btanyother users will not be able to do so.Certai kids of mrpholgy can ke itdifculforveraguse tfindteitofrmo n ict w. Ual is lctinal mrphlgy,siplybcseformraed bydeivato morhg r tn giv sprt listings. However, some languages have productive n rgulr deivatnl morphly, sotha forms related by derivational affixation may not in fact be listed. Frthor, ise langues (uc asAhabskn laguebow) thdrivtonln iflcti mrphlgy itrcso a mke fdg of secal ifult.Fialy,the bouary betwnifltiond ervton mrplis ays lrwht linguist or to the end user.We now turn to the specifics of how morphology can iped ictonary lokup. For langues ith ygrf mphgy,efrm ofepri give art fschisaly chosen as the citation form. Problems may arise for words hic lak th cosen fr (e.g pluralia tantumords,ucsscissors).Ianycs,uers must generally be told what form to look for. Of curse, fr langues with olasml amontfpodctive morplgy,de not keh sisttc u itci frfr an ifld fr.ForEglsh, t
citaon frm of an iflectd verb isgenraly fudbystipgo ahnfulof fixs( es udigtr spirul). Ireglar vrrent cmlict, bt heirfquncymaks thukey andisfo lokp, xeptbylag lrs.At y rat,ireglarwords cn e pacd iioreis sty lhtizdfromtheaj nti, ndcross-referencing the latter. In practie, usr ayntv ed tokw howtmovthfixes, icwhnsarchingfor walks r walking thl fi walk, d generally make the connection.If alnguae isxclusively sufixng, ot evn alremotfntonaatdsin th wy lokp. Ifh sr ct fiur t ciatnfr fawrd,ensimplyokpte frs e ltest fin t tr. Ths, evn lnguikTurihorQuchaofelitproblm for lp. (Nevtels, r s usr, it aytbevisa itnfrmthfund crsnd tohinflcd wo, se .g Corris, Manning, Poetsch et al. 2004: 47.)More prblematic fr lokup isrfixaton.2 Sincditny wodsaealylphbeizd frm th gif th rd te nd (lftigiost ritgsyms),inortusr would avep refix befr ig lkp.Anbis wrk-aoundwouldtaphbetiz rs (xclsivly) rfing lus fromightoleft. Atertiel, theictonryculd prvd anid alphbizdfrm ght eft, wetusrcou fn t itafr,n lok p ht fmien pt of thdictnary.Tr kwldg, thisoluias be mloed,althou timay be d tepaucity of exclusively prefixing languages.Th rvs lpbetizaon slution wul not wokflangus whc epy bhprefixgand suix, cTlt (Ma). Bt v hert tio isnot bdif tnumbrofpfisimal,ifact sTzelta: the con prefixs re h-/k, -/aw, ad s-/y,ndstrigthoblydoes ntprntuc ofa pbl st urs ftVocblri tzelade Bachajon (Slocum and Gerdel 1965). The ral problem for langueshaving both prfixsndsufixsieswh t lueasa lg r rf, rt l roductivelyploys cpoundig orincrpo-tin, wh canhve th sameft f ar lku srctiv prixa.                                                     2 Ifthe citaon frm isprefixd, this may lso cause problmsfr lphbetzaon, scenwrdmyfa int  sci fthalpbt. This pbl is well-known, but is not the focus of our discussion. German is notrius example ofthe difculties ocedbycpndig,nNat anexample of a language having incorporation.In Nahutl, inefit irect objects n ofte bicorpedo thvb:chi:lkwaachilsms ftr stm kwa t t, prd by thincrpaednuchi:lchil.Tenïveusera su ifig te norat ou in pitedtory, bayb lstdecihr  rs ftheword, sickwa in anoun suffix in Nahuatl.3 A greatr difculty fr te avrge dictory usi noaenivmophly,suh asinfx,prtil rdplctin,  tmpltic morhlgy. ITgo, fr exale,re san affix -ummarkin atus, whic infixd flwin od-il cnot(Satr aOtaes1972). Furthemr, e Tglo imprfctive aspct isdiatebyprti re-dulion.Thwor bumibili isa fmf the rb t bili t buy,hr tredulictonis bi-,adeifx-um istckino h e fthis reduplicated syllable. In som cas, the usr a (r suld!) b expctedtundrandthisndeawithconverting bumibili, sy, oe proit con form. An ifactitrwits fsitbyprovidgprly nflecd fm:inthe a fTal, fo ist,itao frsgnrlincuethcu afixs. Bust coplxity ofth mrplgynre,relyingthe usrgst ito fm fo a ifldfm becoles apti.Atsa tie,xplity incudig ule nflced fornh dictionary becomes cumbersome, even impossible.In the folwin sbtis, we dtail difculiscasedy the parcularmorph-logies of Semitic and Athabaskan languages. 2.1 Semitic Languages Arabic, like most ther Smitc languages, employstpaticrpolgy, nwhi fixes ced fvwls an bitrditedbtnnsats therot.Afxa ca lso ify th ro cnas, fquetlygmint. Forexmple,typiclrabic ro ktb  peain infltd frs divskatab, kattab, ktatab, ktanbab,ankutib (Spenr 19). So fthis orphlgy iseriatol,adsmeinlcional,but it all poses a problem for users.  Mrevr, Abic srinrilywrit ithut many of thowel.Whle ts aes e                                                     3 Inpractie, this problem inNahutl isameliorated by the f ncatso gyprucv.Trotmst  cef intin should arguably be listed in the dictionary.
problem caused by the intrdigtaed vowels, itansthticoayus myh mrdifly isngsg rotcnstsf affixal consonants, since the vowels are not present in the written form to help parse the word.  Traditonal Arbic ditonaries havebnrot bse; his,theaw flxicl tyisthrt,w l rivtnd fal moplgyrmovd.Lsed riveorms peras ubentis uner agint (ainflctd fr hctblst ar grly us viat fors wihesubt fogive derm).Becaus oft difclndAbic derivtnl dinltna mrphloy posfth rgser, -lelabeticditnarisae bcomicsigy ulr. Ina lbtc ditnary, drv forsvshewors,h lpetizanide r the entire set of lexemes, whether root or stem. Root-based ictonaris d lphabtic iton-aries ch vsregthnweknse.Arbditony s  iformti lated forms  plac,rattha sctring itthugthe ditony si e foaalpbeic itr. Oteotrd,rt-sd tonay quis amuch r explic unraigfArabcrplgytanmyse pse. Eve so, find te io fr ofairulrplurlr irularvebiaalphabetic dictionary can be a daunting task. In smary, Arabic morphlgy forcsthe dictor witeo hse btwenat-based fra dn lptifra; t pras hvetir savags. Silroblms obtinfor lgue withemptic hlges. Ftunaty, thsprobls anevrc yierpsimgiclrs btwnt user and the electronic dictionary.2.2 Athabaskan Languages The difculties tha Atabskn lagues poe fortonary okupve deti inPoser 2002; wgiv notli ofh prblm fr one such language, Carrier. Like othr Athabsk langues, Carie isprdmnalyprefixng,with vrbcyngnus rfis. Ech verbc ts oreven hundreds of thousands of forms. But the sheer ber ofverb forms inot alth difernt frm othaglutinalaguesucasFishoTurkis. Th r prb it Cri prfxe ae mixtureofinlctonaldevatinlmorpes, wit he drivtl afis of appearing outside of inflectional affixes.  Furthor, its not ifrequntly hecs tha te a pefxhicblgaorimbinwi rotincrtai mea. Ift, Atask langues have discontius verb stem.4 For istc,tCr verb todconistfh ro k'un withtaln prfx l- imediatlyprecdng  rt theidseveral positions to the left, giving forms like: dilk'un you (sg.) ar dduzk'unIamredhudulk'un th  hudutilk'unywilb redNote hat some subject arksfolw the d-whilrprcd i.Al nticaamrpy stisolpset afxs into single segment (s+l  z in duzk'un). For dictionaries, the implication is that there is in genral o cntiguos rinvarit portin ofthe vbtha servathe cfm.Tmorplgyipimily prfixl, but he xistc f extnsv t rton adsofaona hatesis g cita r, dtording frfr iht-tolefwiltkep lt msclosetr. Wors, he hngicalteria th cnibuttbaicmai ofth wdsn,igeral, cnigus. This means that any citation form will not be easily extrced by an usophistcad usr. Morevr, no simple sorting will keep related forms together.Wors, m verb ts re higly abstrc, sotha frcanolybvnEnglihrnslti o th si ft rot ethr wt oe rrepfixe.Eamplesafudieclasifcaty vrbs For xl, t vrb ot mng ogitksditnceiatnlfixesdpen n he typ of jc eig hal: bal-shedobjcs,ams,hus,o-countjct, g ri jt, cnte fpn iersliqus,fluy sufadthesclassifiers may not be adjacent to the root. In light ofte difclt ofdictonrylokup inAtabsknlagus,neaprh as betolis, fr ec vrb, aigl fm, itmajr dctoyfNjo(YudWl Mgn1987). However, this requires the user to be able to anlze avrb frm and covert ito he citao form.This ao-trilasknfrluentti pker; its fut rimpsib fr languln.Inde,he rolodictoary s for Navjo is actat he Diné (Nvj)Cleghstud nstrurse avj Grmr Aplid Lingitcs, wicis lagelyvote teachng                                                     4 Thes are somehat nlogus toEnglish verb-particlobint sucs bringamterup, inw t vrl iflti(dfte dic ojct)intersetwhe vrb o nhprtl.Bu h itnig iflctionalmrplgy iAabsknis vastly more complex than that of English.
college-levl native spakers ofNavjo hw touse the dictionary of their own language.Th otr mjor rch tdictnry making in Aabsknlaguesio ls idvulmrpe. Ird t uator, the us stle tonlyzth wr in andafix.Bu thr ma vemyshps. For eple, igto aroud ibttaks forms cas, ,, ,an . Alhugther i ptern thescges,its cpx ifntrgul.Trulti difculyfor dictionary lookup should be obvious. rot-based lxicon has ben pblishe for Navj(Yung,MrgdMigt 192).Iths te irt f bei mrhsiv,and f oidngplicato.Frexaple, t etilmeaning of a verb root can be explained only once, i the try fr th rt, rthr tan iach ofmany entries for forms derived from that root.The problem with this approach is that it requires evn more gamtical knowledg onthe part ofthusthntrdAthabskictionaries, gtr wi  uersi f a lbrte procesfoalyzing fom,loingup himnts, d cstrcting the oft fr friopes.Aarslt, wilealytictnaris r ufl o linguibut most peol, icludgbothngueres andnative speakers, find them very difficult. In sum, the mrplical strctur of Athabknlagus foes df hiesnthe dictionary writer, and results in a steep learning curve for the sr. Again, this t sort f language structure where a morphological interface can make a crucial difference. 3 Conclusion We hav outlined ways inhic te structre oflanguscmkorplgalpasfront end for dictionary lookup attractive. Ther a ore us tsuch tenolythn just dictonylkp.Ifemorpg eiiarasur, itcan bsd f ratis wel sf pg.Suh irctinl ncbsedto ert epag ofystm. Whil thi flitinrst otivespaker,iayeof great assistance to language learners.  Anoter aplict wuld b torvid wht amust virainterlia texirph glosefo y x crni form. To besr,thi txuldotb disbguated,unl aknwldabl ser pfrthe fr s utomticimigua (a)wsprovide.Nverhls, ntrlrtx,vn ia ambgs f, udbe sefl or lgustand perhaps language learners.In sum, aorphlgical trnsducer onectd oaelctrni dictnrypovialublaifor both native speakers and language learners. 4 Acknowledgements Our thanks toTim Buckwalter nd Mohamed Mamoi fhe Lngist DCsrtiu nJonathAitfr heromt rli versions of this paper.

References  

Bilac, S., T. Baldwin, et al. (2002). Bringing the Dictionary to the User: the FOKS system. COLING-2002. 

Breidt, E. and H. Feldweg (1997). "Accessing Foreign Languages with COMPASS." Machine Translation Journal, special issue on New Tools for Human Translators12: 153-174. 

Corris, M., C. Manning, et al. (2004). "How Useful and Usable are Dictionaries for Speakers of Australian Indigenous Languages?" International Journal of Lexicography17: 33-68. 

Poser, W. J. (2002). Making Athabaskan Dictionaries Usable. Proceedings of the Athabskn Lagues Conferc. G. Holton. Fairbanks, Alaska Native Language center, University of Alaska: 136-147. 

Prószéky, G. and B. Kis (2002). Context-Sensitive Electronic Dictionaries. COLING2002. 

Schachter, P. and F. T. Otanes (1972). Tagalog Reference Grammar. Berkeley, University of California Press. 

Slocum, M. and F. Gerdel (1965). Vocabulario tzeltal de Bachajon. Mexico, Summer Institute of Linguistics. 

Spencer, A. (1991). Morphological theory : an introduction to word structure in generative grammar. Oxford, UK ; Cambridge, Mass., Basil Blackwell. 

Streiter, O., J. Knapp, et al. (2004). Bridging the Gap between Intentional and Incidental Vocabulary Acquisition. ALLC/ ACH 2004, Göteborg University, Sweden. 

Young, R. W., W. Morgan, et al. (1992). Analytical Lexicon of Navajo. Albuquerque, University of New Mexico Press. 

Young, R. W. and S. William Morgan (1987). The Navajo Language: a Grammar and Colloquial Dictionary. Albuquerque, University of New Mexico Press. 
