GLOSSER-RuG: in Support of Reading 
John Nerbonne and Petra Smit 
Vakgroel) Alfa-in forma.l;ica 
l:{ijksuniversiteit Oroning('.u 
ncrbonnc((Y!let.rug.nl, smit(~-~h;t.rug.nl 
Abstract 
q'his paper reports ou ongoing work on 
a CAI,I, system to facilitate foreign lain 
guage learning: GI,()SSEI{-I{uG. The 
system is partieulm'ly dependent on ad- 
vanc.ed morphological analysis, t!'ollow- 
ing a brief introduction to the project, 
the paper describes the architecture of 
GLOSSI';I{-RuG. Then wc describe iu 
detail the main compolmnts/modnles 
that are part of the implemented pro- 
totype. Finally, iml)lement,ation issues 
and details involving the user interfaces 
of the tool are discussed. We oul, line the 
design of an integrated system t,o SUl> 
port the reading of French text by \])ul, ctl 
speakers. 
1 Introduction 
This paper reports on our ongoing research 
t,ow~rds a computer-assisted language learning 
(CALl,) tool, GLOSSI'2R-lhlG. After only several 
months, a first prototype was operational. This 
demonstrates that useful language-learning and 
language-assistance syste.ms are presently within 
reach given the availability of key components 
such ms morphological analysis software and on- 
line dict,ionaries. In the case of GLOSSF, tlA{uG, 
this was morphological analysis software made 
available by l{a, nk Xerox, Grenoble (Chanod and 
TN)anainen 1995; l)aniel Bauer and Zaenen 1995) 
and an online French-Dutch dictionary provid- 
ed by Van Dale I,exicographie (Vanl)ale 1993). 
The system integrates previously existing soI'tware 
modules, and suptjies the minimal additional ones 
together with interfaces in order to support the 
reading of French text by l)uteh speM~ers. 
Following a brief introduction to and motiva- 
tion for the project, the paper describes the archi- 
tecture of GLOSSI';R-I{uG. We describe the main 
component,s/modules (,hat arc. part of this pro 
retype, including implementation and the user- 
interface. 
1.1. Motivation 
(Zaenen and Nunberg i995) notes that, (~ven as 
fully autontatic machine translation has receded 
a.s a reasonable mid-t,erm goal for natm'al language 
processing, several goals have emerged which are 
less ambitious, but LtSe\['ul all(t att,ainable. These 
focus less on eliminating language loa.rriers and 
more on assisting peoI)h; in learning and under- 
standing the wide. range of languages in current 
use. It, is still the. case t,hnt, language differences 
form a substantial barrier to the free \[low of ideas 
and technologies: ideas are effec(,ively only a(-ces- 
sibh; only to l;hose in command of the la.nguagc 
they are (;xl)ressed in. But since an ever increasing 
number o\[" people encounter t,ext,s electronically, 
automated methods of language processing may 
be brought to bear on this problem. (ILOSSER- 
RuG is designed t,o hel t) peol)le who know a. bit of 
l!'rench but cannot read it; quickly or reliably. It 
allows a native Dutch pe.rson to learn more ahout, 
French morl)hology, it removes the tedious task 
of thumbing through the dictionary and it gives 
examples from corpora. 
(?,LOSSH{-IhK~ may also be contrasted with 
more t,radil,ional compute.r-assisted language 
learning (CALl,) sol'twarc (l,ast 1992) which 
has lbcused primary on providing exercises, an- 
swer keys, and links to grammar explanations. 
GI,OSSI~R-I{uG on the other hand, focuses ou 
l)roviding assistance to novice readers whether 
these are activeley involved in educational pro- 
grams or not, and the locus is clearly on the level 
of word, including the grammatical information 
associated with intlectional endings. We therefore 
regard traditional CALl, software as complemen- 
tary in purl)ose. 
830 
2 l)esign 
We envision a user <>F internmdiai,e level in Ig:ench 
(sc, hool level, not universil.y level). While l.he lts- 
er reads a text,, s/he <ran sclecl, wil, h a, mouse a u 
unknown or unfamiliar word. 'l'he Itrogra.m tna.kes 
,~vMlablc: 
. the internal s(;)'ucl, ure of the word, in(:l, l;h<~ 
gramnmt;i<'M infornm.l.ion enco<le<\[ in nior 
l>hology 
,, thc diel, ionary en(,ry of t, ll<+ word in a bilingual 
Ig'e)l<:h-I)u(,ch <lietioua, ry; a.u<l 
, o(;\]mr eXaml>les <)f' (,he wor<l I'lrcml corpora, 
A user-interfac<~ allows l;he range of inf'ornm.l, ion t;o 
bc t,a, ilorcd 1;o in<livi<hm, l prcfereu<:<'. The usefid. 
hess of (;he first, l, wo sorts oF informal, ion is evident. 
Wc chose to include the I,hird sorC as well because 
<:orpora seemed likely (;o bc vahmbh; in providing 
exauq)les rLto):<~ concrel,ely an<l certa, inly m<)r(' ex-- 
i;ensively (,ha, ll (>(,her sollr<:('s. They may l>r<)vid< ' a 
Sotlse ()f COl~OC~t,l, iolt OF eVell lltlil,11c(!s Of ltH~&llillg. 
The )'ealizal;ion of l, hesc <lesig)l goals i'e<llJircd 
exl, ensive l(nowh'<lgc I>ascs abou(; t"rench ntorl>hol: 
ogy and lexicon. 
o Most c)'ucially, the m<)rphological knowl<~dgc 
l>asc l)rovides tim link I>el:we<m tim inlh't:ted 
fornm found in (,exl,s and the "<'il;ai,ion forms" 
fouml in <_licl, iona, ri<~s (Sl)roa(. 1992). IA,:MMA- 
TIZATION recovers cil.a(.io, foruts \['r<>m iufl('<tt- 
ed forms a,n<l is a primary (,ask oF morl>ho- 
logical a, nalysis. A sul>sl, antial morl>hologiea.l 
knowh'.dge I>ase is likewise m~(essary if o)m is 
1,o l>rovi(h~ infornm, Lion aloou(; tim gra.n~ma.(,i 
cal sig;n ifi<;a, nc<': of morphological inf<~rn~a.Lion. 
The only <~fl'<~<'(:iv<~ mear~s of I>rovi(ling such 
a I,:nowh~clg<'~ base is I;hrough morpt~.ol(;gica, l 
analysis soft,ware. Even if one couhl imagine 
sl;oring all (,he inll0(;l,ed fornLs of a, hmguag;e 
such its French, the inl'orma, tion associated 
with l, hose forms is awdlMfl<: l;oda,y only from 
analysis sofl;ware. Th<: softwar(', is need<:d to 
cr<:ate tim store <>F informal, ion. 
Even al>ar(; \['ron) this: l>eople occasio,a, lly 
creat;e new words. Analysis l)rogranls ca.u 
l>rovided in\['oL'lna, t, ion M)ou(, (,h<~s<', since most 
are fornm<l accordi))g (;o very gcmcral ~t,|l(I IN!g: 
ula, r illorl>hologi<'al I>r<)<'css(~s. 
, Obviously, (;It(' <lua, lil, y <>fl, he onli.e (li(:l, io~mry 
is absolut<:ly essential. Tim only fea,sil)le Ol) 
(;ion is 1;o use a,n <'xisl, ing dicl,iouary. Our in: 
vesl, igadve us(;r stu<lics indica, l;e tha(, (Jm d ic 
t iotm, ry is 1;he 111.:)81; in~porl,a, nl, fa, cl;or in user 
sadsfa<-tion. 
The essential design questions vis-h-vis (,11(' 
<:oft>us were (i) ltow large nm.s(, (;he eorl)us b(, 
iu oMer to gua, rant('e a, high <',xltecl;a,t, ion l;ha, l, 
l, he most \[requen(, words would I>c fc)und; and 
(ii) what sort of a<'cc~ss l,('chniqucs are, needed 
on a <:orl)uS of l, hc rcquisil;e size giv<'n (,hal, 
a,<:c(~ss m~ls(, suc.c<~<~<l within a(; most; a. very Ihw 
seCOtl (18, 
Wc Wel'(? I'll Iq;h e i" cotlCOl'lled (,0 tlSC (,()X (,,S 
from a, varie(,y of gc,,nres, aud we al; 
l,(:lll\])l,(~(I (wi(,\[i v(!ry limil,e(t suc(:ess) t:(> 
find bilingua.I li'ren('11-l)u(,(:}) t,exts. 'l'o 
da.(.c wc ha.v<: only the bible and the 
(;reat, y of Maasl, richl; ill bilingual \['orlll. 
2.1. Morph()l()gical Analysis 
As we. have seen, )norl>hological amllysis is .,~<:cs 
sary if om~ wish(> t;o a.c(-css a.n onlim~ dict.ion~try. 
Since large cow,rage ana.lysis I>aekages 17cl)r<~seIH, 
very major <lcvelolmmllt, effort;s, (II,()SSI';I{.-I{.u(~ 
wits fort,md.e in ha.ving a<:eess /.o Locolc;~:, "a st.a.(.(~ 
of=i;he,.-ar(; sys~enl provided by I{,ank Xerox. 
Some cxa.n@es of its mmlyses: 
• vont as aller+IndP+PL+P3+Fing; 
• bien as bien+Masc+SG+Noun, ~md bien+Adv; 
~t, ll(\[ 
• chats a.s chat+Masc+PL+Roun. 
The inf'orH1a.l, ion \['roll1 (,he nlorl>hologi('al parse 
enal>l<:s a dic, ticmary Iool,:ul> and the gra.tll,la(.i 
eM inforlnal, ion is direcl.ly usel'ul l,o readers. Ih\](: 
(,here are also example, s of woMs which could ha v<~ 
dilTeren(, granmm.l,ical nmanillgs. 
2.2 Dictionary 
( I I,OSS I i', I{ R,u( ', was likewisc fort, na, l,~' in ol>l:ain- 
ing 1,he use o\[' au onlim~ version of tim Vanl)a, lc 
dicl.ionary llcdcu<la<~qs I,'r'a,..~. Van I)a.h~ is 1;he I)re- 
nlier I>ul>lisher of I)ui,ch diei;iouarics. 
In Ih'dcndaa.q.s l/va,s, for exa, ml>h" , (,he word 
I)aisc'.r <:ouhl t)<: a. no, n a,s well ~¢s ~ v <:'. ~' t > ~ ~ ~ ~ < I 
c(m(;ains 1,her<'for<~ th<: folk)wing infc>rlJlal,ion (the 
ae(;uat <la(,a, sl;ruclau'es a, re differenl,, and c(mfi<hm= 
cnl;ry \[ 
< I, I'\]M M A > I>aiser 
<(aI{AM> nla.seuline vloun 
<TI{A NS> kus \[a ldss\] 
en/.ry 2 
< I,t'\] M M A > Imiser 
<(i I~,A M > 1 ransil;iv(', verl> 
<'I'I{,A NS> kuss<m \[1,o kiss\] 
831 
Appelfie Ilumani generi~ unitat ( L'Unit.~ du genre humain), cette 
encyclique d6nonce diverses formes de n ation alisme et la mont~e en 
lalissance d'Etals fauteurs de d~sordre, tout octopus h des 
prdparatifs de gaerre. La personne l~umaine, voutue par Diet, et plac6e 
au cmttre du dispositif de la soci6t~, est bafc~e Iortque le r~.gne 
de rargent ~e conjugue avec l'agrc~ivit~ d'un r~gime ot'lla 
pr6dminence de la race ou de la classe remplace le souei d'une 
politique au ~crvice de tous. Certes, le eonlntultiSme r~t tc~ljours 
dsigt~d comme l'adversaire principal, le dangt.n" ~apr~me. Mats le~ 
diverges figurcs d'un nafionalisme e~x acer b6 ~ont, elics aussi, 
(l~tlOIIC~'~ COIllll|e nlelllionger e~ ~'~ eontrairt~ all plan de l)ieu. On sent 
cependant entre les lights une moindre ~'wSrit~ ~ lear endroit qu'~ 
r,~gard du eOlllllluni~lne. 
De pesants filences 
CE qtte Fencyclique dit mr le raeisme fimpire de ee que John I a 
Farge a d~Sj~ 6crit pour ~tigmatiser le r acinne anti~Noir qui ~6vit 
aux Etals Uni~. Ellereprendenoutrel'essentieldescondaml~ations 
tr.N ferrets que Pie XI viem de domwr dam son encyelique ~ur le 
nazim~e (Mit Bre~me~der Serge, mart 1937). 
!lider. mt effet, f6tait r~joui trop vile d'entendre le pape parler 
d'un communimte e intrim~quemott perve¢~ (7) ~. I1 ne soulNannait 
pas que, qudque~ jc~lrl plus lard, une condamnation en r~..le du 
laazi~me i~ou~ forme d'encyc|ique t~ait introduite clandcstinement en 
Allemagne el, h la barbe de~ autoritt~s, terait he mlennellement en 
chaire dens toutcs les ~glises le dimalache de la f~te de~ Rameaux de 
2937. Sent mix en accuxation :. la pr~tendue conception des ancient 
Get'mains., h base de panth~im~e, d'idoutifieation entre Dieu et le 
(( de~tin impersonnel., outre Di~a et la race, le Feu ple, l'Et at, le~ 
hommcs au pouvoir _ bref ridolfitrie d'an Dicu et d'une religion 
l~rement nalionau× (8). 
C"e~t mr la question des juifs pers&:ut~s que le projet d'encyclique 
de 1938 ~e r~vt~le le plus faible. 11 ¢st largemettt tributaire de ce 
que le P~re Gundlaeh a 6crit dans un article intituld 
., Amis~mitisme. el. pare en 1930 dens une encyclopddie thdologique. 
!(au!eur dtablit eq.effet (1~ distitwtions entre,plusiears ~or.\[~ . 
ill ~4oordenboek I III Herr. Anal. en Dt.samb. Corpora 
M0rfol~sct~e aria y~ ~i( tel '~'ooM '&tit ....... : 
iai0ffol~(~: ~it~ly@ .... van h~{ w0ord : 
Le LuSh; Pqbma ~roi'-corafque ~ Botlaatt 
http;llwww,mttmp~ttC~#¢tt~llt~tlcy/ltOlLI3AU,l,g_L~lN 
tithe ; 221 
";.. Le SOrt, dit le prglat, vaus servtra de lot Que l'on tire au 
Le ColOn el ¢2habeo. ~ tL d~ 11alza¢, 1832 
Figure 1: USER INTER.FACI~" GLOSSH{-RuG. On the left; is at text, on the right, from the top are 
windows for morphological ~nalysis, dictionary, and further examples. 
Cases like these suggest a po(;elltially crippling 
problem for the GLOSSI~R-I{uG concept: if words 
are in general ambignous, then providing morpho- 
logical analyses for them may be too tiresome to 
be of genuine use to language learners. A long list 
of potential analyses is potentially of very little 
use. Since indeed most words are multiply am- 
biguons, a problem looms. 
2.3 Disambigual;ion 
The solution to this problem is disambiguation: 
to find the right entry in the dictionary, a part- 
of-speech (POSt disambiguator is applied be\[brc 
morphological analysis in order to obtain the eon- 
textually most plausible morphological analysis. 
For example in the sentence IJon, donnc-moi un 
baiscr 'Good, give me a kiss', the disambiguator 
should return a tag for the word baiser indicat- 
ing \[masculine\] uoun and in the sentence IIne pent 
pas baiser 'tie can'(; kiss' the word baiser should 
be assigned with a tag indicating verb \[infinitive\]. 
The combination of POS disambiguator and mor- 
phological analysis suffice to provide the contex- 
tually most likely analysis nearly all the time. 
Stochastic POS disaml)iguation is implemented in 
the Rank Xerox Loeolex package. 
2.4 Corpus 
The results of disambiguation and morpltologi- 
eel analysis serve not only as input to dictionary 
lookup hut also to corpus search. The curren- 
t implementation of this sear(-h uses only string 
matching to find farther tokens. Our design calls 
for L1,;XEMb;-Dased search however, and a prelimi- 
nary version of this has also been implemented. 
In order to determine I;he size of corpus need- 
ed, wc experimented with a frequency list of 
the 10,000 most frequent words. A corpus of 2 
MI3 contained 85% of these, and a corpus of 6 
MII 100%. Our goal is 100% coverage of the words 
found in Hedcndaagsc \[,'tans, aud 100% coverage 
of the most fi'equent 20,000 words, and we arc 
close to it. The current corpus size is 8 MB. 
As the corpus grows, the time for incrcinental 
search likwise grows linearly. When the average 
search time grew to severed seconds (on a 70 MIltS 
UNIX server), it became apparent that some sort 
of indexing wa.s needed. This was implemented 
and is described in (win Slooten 1995). The in 
dexed lookup is most satisfactory- not only has 
the absolnte time dropped ~m order of magnitude, 
but the time appears to be constant when eorpns 
size is varied between 1 and 10 MB. 
l~exeme-.based search looks not only lbr further 
occurrences of the same string, but Mso for thrice- 
832 
tional varbml;s of the word. If the selected word is 
liw'(,.+Hasc+SG+~oun, (;he sem'(-h shol)ld find other 
tokens of this and Mso tokens of the l)lural l'oriH 
llvres. This is made possible I)y lemmatizing the 
entire corpus in ~t t>ret)rocessing st,el/, a.n(l ret;a.in-- 
i.g the results in an index of temma.(;a, 
2.5 User Int(,.rfa('e 
'l'h(' text th(' user is reading is (lisl)htycd in the 
maia window. Ea.eh of the three sorts of infor 
mati(m is disl)la.yed in Scl)a.ra.Le windows: MOI/,- 
I'IlOI,OGY, Life lYeStlll)s of morphological ;ma.lysis; 
I)I(YI'IONAI.IX~ l;he l"rench-l)utch (lictJonm'y entry; 
a,\]ld EXAMPLI,iS, the examples of the word f(mn(l 
in (:orpor~ scm:ch. See Figure \] for an e×ample. 
In case tit(: disamt>igmm)r / )r~orl)hological- 
a.imlyser cmmot decide which mw.lysis is more like- 
ly, the user is allowed to select which hc is inl;er- 
esl;ed in (this fea.tm:e toggh's for ,~sers who l)r('fer 
fewer choices). 
With podagogica.\] sol)we.re l.here is a da.nger 
of assutning too much exl>er(;ise on the l)a.rt o\[' 
users. In GI,OSSHURu(~ this da.nger cot)l(I (,eke 
the form of dist)lnying Iiu'ther unknown wor(ts in 
eilJ~er the dictionary or the examl)h:s wit,lows. 
To obvi~(,e this a.t least l)nr(.ia\]ly, hoth of these 
windows ha,ve 1)ee)) mad(' s('nsitive (;o Cll,()SS1,;Ib- 
l~,u(.~'s search. Thus, if, e.g., corpus search turns 
up examples with fill%her llllklloWll wortls, these 
may l)e suhmitted 1,o (IL()SSI,'JI.-IbKI Ibr analysis, 
look-up tm(l exa.mples. ~ 
2.6 Summary of Design 
The prot;ol;yl)e was designed t;o consist of the I'of 
lowing modules: a a dismubigu;d, or, u~Orl)holog- 
i(:al mmlyser, a dictionary lookup a.nd {~ covpor~ 
s('~rch as shown on the next page. (3orpus temm> 
tizntion and in(lexation based on \](:ll,13\];t ~-I.re (Iolle 
offline. In (;he next secl,ion we will illus(,ra.te these 
modules in more. detail. 
3 A session with GLOSSER-RuG 
The present section ~(,('l)S (.\]t)'ough (Jm va.)'ious 
modules in order to illusi~rat(: l,he system more 
con(-rel;ely aml it, orde)' to mol;ivnte son,e f'u):l, her 
design decisions. 
3.1 An Examph; 
Wheu the user selects ~ word in a. text for example 
6(:rit in (,he sentence: 
...Ira. col6re d(;~dt 5('rlt sin: son vis~gc... 
tThis is a. point at whi(:h input from traditional 
l~tngmLg(', p(.da.gogy could bc v(:t'y u~cful (~spe(:iMly 
rein:ling )mLtcri~d the.(, has I)ccn s('vccned a ml edited to 
be a.ccessiblc to a. l)~Lrti(:ul~Lr lew>l. 
SEN'I ENCE WI'IH SliI.EC'Ilil) WORI} L 
~ (RI\[\[III{ 
,..._ I .......... ....... 
IIISAMIIIGUA IOR 
__ 
%,,,,,, ,,, 
I: ....................... .F 
I,'igm'e 2: AltcIHq'v,(:'\['u)u,', (~1/)SS1:,1¢.- l~,))(l. 
3.2 Pr(:i)rocessing 
The l)rogra.m must; \[irst (,xtrn(:t \['rO\])l (,h(~ texl. th(, 
senten(:(: in which the word occurs. It; does this on 
(,he. basis of l)un(:tu;~tion, l)a.ying sl>ecial ;tlJ;ention 
t,o the (>ccu)'v('uc(' of ahl)r(,viations (c. g., . (., F. ,l.) 
aad (,itles (e.g., dr., m'm. etc..). 
3.3 The morl)hoh)gical mmlyser 
After (,his so-('alh'd prel)roc.essing , (;he morl)holog 
ica.\] a.na.lyse.r is <'nlled to gel; the n,orl)hok)gi('~d in 
forlna(,ion of l;he selec.(,e(1 word, i.e. t,he l('.xe~Jm 
and possible l,~:~gs a.c(-ording to result of (,he mo)- 
phologieal ann\]ysis. 
Morfologische analyse van hot woord "6cril" 
~.crR+M asc+SG+Adj 
6crit+Masc÷SG+Noun 
dcrire+lndP~ SO ~P3 ~FinV 
&rire+M asc+SG+l)aPrt 
Geselecteerde morfologi~he analyse van het woord: 
+PaPrt ~> dcrire+Ma~c+SG~PaPrt 
Figure 3: '\['m,~ MORI'I1OI,O(HCAI, ANAI,YSIS 
\[{.ANK XER.OX Locolex. 
As (,he pxa.ml>le shows (,he mOrl)hologi<'.al a))a.l 
yser giw:s four possible \[gra.m)na(,ic.al\] )'eadings of 
the seh'.cl,('.d wo):d and l,wo l)ase forms \[h,x(~))ws\]. 
It should he noted theft, the pr('processing l)hase 
isn't ne('(:ssary for the nlort)hologieal a na, lysev. 
3.4 Disamblgnator 
As mentioned in the l)revhms section l he. morpho 
logical mmlysis hlforma.tion miglfl; not he enough 
to get the right entry in (,he dictionary. In t,his ex- 
a.mple th(:rc are ma.ny l)ossil)h ' base forms of t;he 
seh'c(~ed word, namely: 
833 
entry 1 
<I,EMMA> dcrit 
<GRAM> masculine noun 
<TRANS> geschrift 
entry 2 
<I, EMMA> dcrire 
<GRAM> verb 
<TRANS> schrijven 
\[abbreviated\] 
In order to get the right entry, in this case ell- 
try 1, one has to consider te whole sentence. 
Research on POS-tagging has proved it to be a 
good method to disambiguate a sentence. 'Fhe 
disambiguator assigns every word of the sentence 
a tag. In this exalnple the disambiguator chooses 
the 6crire+Masc+SG+PaPrt reading ms the tnost 
likely one, as shown in Figure 3. 
3.5 Dictionary Lookup 
At'tel disambiguation the lexeme with the most 
likely tag is used to get the right entry of the se- 
lected word in the dictionary. 
3.6 Dealing with Inaccuracies 
Although the disambiguator is very accurate, it 
doesn't always assigns the right tag to a word. 
C'onsider \[7or example the sentence 
Je pense que tu as l'as de pique \[1 think 
you've got the ace of spades\] 
According to the morphological analyser tile se- 
lected word as has two base forms namely avoir, 
indicating a verb \[avoir+ IgDP+SG+P2+Avoir\]- 
and as, indicating a noun \[as+Masc+INVPL+NOUN\]. 
To choose the right base form, one consults the 
disambiguator, but it selects the 'verb' tag instead 
of the wanted 'noun' tag. /n this case the dic- 
tionary lookup module will fetch the wrong en- 
try, namely of avoir. In order to get the right 
entry, namely as, it is possible for the alert us- 
er of GI,OSSEIblhK~ to override the decision of 
the. disambiguator. The user can select the other 
('wanted') tag, push the search button, and ac- 
cordingly get the right dictionary entry and COl-- 
pore examples on the sceen. 
Figure 4: YI'HE I)ICTIONAll,Y LOOKUP 
VAN I)ALF, Hcdendaags Frans. 
The dictionary lookup process is straightfo> 
ward. '\['he exact structure of the dictionary source 
files is confidential, but it is well-structured, and 
allows uncomplicated t~ccess. The right file is 
opened and searched until a match with the icy 
eine occurs. If this is the case the information of 
this \]exeme is printed in pretty form on the screen. 
In the case the user reads a French word in the 
dictionary output and wants to get the dictionary 
entry of this particular word, s/he can select this 
word in the dictionary output and after a push 
on the search button the selected word is mor- 
phologicM analysed and, if possible, disambiguat- 
ed and with the lexeme another dictionary lookup 
will taken place and the information found will 
be placed in another I)ICTIONAII.Y window on the 
screen. 
3.7 Corpora Search 
'Phc selected word and its lexeme form also the 
input for the Corpora Search module. This con> 
portent usc's indexed files (win Slooten 1995). The 
index is set up in two parts. The first part is an 
index to generate a key for every word. This in- 
dex is used for all files in the corpus ~. This key is 
then used in the second part where for every file 
in the corpus two extra index files are generated. 
These files c(mtain information about the position 
of words by their key in the corpus file up to a 
certain maximum (e.g. 50) of occurrences. As the 
index consists of two parts, so does the lookup. 
The first part is to gel; all the keys of words 
starting with a particular string from the first in 
dex. 'Fhen these keys can be used to search in 
the second index, one index file tbr each corpus, 
for occurrences of the word denoted by these keys. 
If the Corpora Search Module has as input 6crit 
\[the selected word\] and d('rire \[the base form\] the 
following examples (a.o.) will be found: 
2'Phe corpora text ~tre collected from different sides 
on the WWW. 
834 
Le Lutrln; Pe~me t~dtffi- comique - Boiloau 
Mtp:/lwww.ensmpJrl~$chererlllteracy/BOlLEAU.Lg LUTRlN 
llgn~ : 221 
"... Le sort, dit le pr~lat, vous servira de lot. Que l'on tire au 
billet ceux que l'on doit ~tire. il dit, on ob~it, on se presse 
d'dcrim. Aussi~t ttente horns, .sur le papier trac~.s, Sont au 
fond d'un bonnet par billets entass~s .... " 
L e Colonel ChabeT~ ~ H. de Balmc, 1832 
http://wab.ct~am,frlABU/abu_xerver.html?pab/ABU/anteABUlchabert.t 
I:'igure 5: SOME (~OIU'ORA I'~XAMI'I,t,;S. 
As in the I)ICTIONARY window it is also \[)ossi- 
ble to select another \]!'rench word in the Corl)ora 
OUtl)tit a.iid push the Search I)uttolr. The morpho 
logical analysis and disamlfiguation of this select- 
ed word and tire dictionary entry will m'cordingly 
be displayed in the relevant windows. 
4 Final Remarks 
The intergration of existing morphologica.I pro- 
eessing tools has led to a powerful CAI,I, tool. 
The tool provides a dictionary lookup, it, gives 
examples fl'om corpora and displayes morl~holog- 
ical information, all on-line. Other languages 
could be easily imlflemented in the overall skeleton 
of (; I,()SS I';IUI{uC. Although development of the 
l)rototype ({ I,()SSI~;l{.-\[{,u(l is still ongoing, these 
first results look very promising. The prototype 
- ( ' w~ts sul~ticiently advanced in li'ehruary ror (, n m- 
gen communications stndellts to conduct an in 
vestigatiw~ user study. Although we.'ll report on 
this seperately, it indicated user interest.In the u- 
ear fllture we're planning to index the corpora on 
basis of lexemes, l,al, er we wish to extend tim' soft- 
ware with for example a teaching ;rod diagnosting 
module so that the tool matures to real (:ALl, 
software. 
5 Acknowledgements 
The work is supported by grant mtm bet 343 to the 
University of Groningen from the F,U Copernicus 
program. The (,'opernicus pm'tners consulted on 
a common design, in particular l,auri Karttunen 
(Rank Xerox, Grenoble, France), Elena Paskale- 
va (l,inguistic Modelling Laboratory Bulgarian A- 
cademy of Sciences, Bulgaria), Gabor Proszeky 
(MorphoLogic, Ihmgary), 'l'iit Roosmaa ('l'artu 
University, Estonia), Maria Stambolieva (histi- 
lute of 13ulgarian.l,anguage, Bulgarian Academy 
of Sciences), and UIIe Viks (Institute of the l';sto- 
nian l,anguage, li~stonia). Auke vtt.n Slootcn de- 
signed a,ml progra.mmed the corpus indexing ;rod 
sere;oh routines. I,auri Karrttullell ((h:enoble) ml- 
vized on the use of morphology and Gertjan wm 
N oord (( I roningen) on '\['C I,/T K. 
Referellces 
Jca, n-Picxrc (Jha.nod a.nd Pa,si T~q);m;fintm. 1995. (h'c- 
ating a. ta.gsc't, lexicon a.nd guesser for st french t~fi-. 
ger. In IJroccedinf/s of the A C'L SIGI)A'I' 'worksluJp 
on "l'}'om 7~a:ts To Tags: Issues In Multilingual 
l)anguagc Analysis", I)~tges 158 64, University (3ol- 
lcge Dublin, h'chmd. 
l:'rederique Segond I)~tnic.1 IbLuer arr(l Annie Znenen. 
19!)5. I,ocolex: 'l'ra.nsl~Ltion rolls off your l\[,OlrgllC. \[11 
l~rocccdings of the conference, of the d CII-A L L C'95, 
Sa.trt.:c \[~Lrba.ra., lISA. 
l{. I,a.st. 1992. (',ompttters ~Lnd la.nguage, learning: 
Past, present - and future'. ? Ill C. Butler, editor, 
(:ornp~ders and Written 7'ex/s, pages 227 24:5, Ox 
ford: I~btckwell. 
\]{icha.rd SproaJ,. \]992. Morphology and (.,'ompulalion. 
M \['\[' Press. 
Auke wtn Slooten. 1995. Sc;trching and quoting ex~tm- 
pies of word-usa.ge in french l,%nguage corpus. Tech- 
nical report, I{,ijksuniv(~rsiteit Ch:oningen. 
V~tnl)alc. 199,'1. llaudwoordcnbock l"rarts-Ncdcrland.~ 
÷ I'risma, 2c druk. Va.n Dale \],exicogra.\[ie b.v. 
Annie ~3~(,.rr(!\[l ~tlrd (hx)lf Nunberg. 15)95. Commr,- 
nica.tion technology, linguistic I, echntology enid the 
multilinguaJ individuM. In '\[bin(! An(lerna:ch, Mark 
Moll, a.nd Anl.on Nijholt, editors, CLIN V: t'apcr- 
s fl'om tire \]"ifth (;1,IN Meeting, p~tges l 12, F;n- 
s<:hcdc. 'l'~ta.I u it.gever it. 
835 
