Helyette: Inflectional Thesaurus for Agglutinative Languages 
I MORPHOLOGE 
1=6 u. 56-58. I/3 
H- 1011 Budapest 
Hungary 
G~ibor Pr6$zP..ky 1,2 & I~szI6 Tihalnyi \],3 
OPKM COMP. CENTRE 
Honv(~l u. 19. 
H- 1055 Budapest 
Hungary 
e-mall:h6109pro@ella.hu 
1. Introduction 
In the environment of word-processors thesauri serve the 
user's convenience in choosing the best suitable syno- 
nym of a word. Words in text of agglutinative languages 
occur almost always as inflected forms, thus finding 
them directly in a stem vocabulary is impossible. H01y0ltu, 
the inflectional thesaurus coping with this problem is 
introduced in the paper. 
2. Synonym dictionary with 
morphological knowledge 
The inflectional thesaurus is a tool which (1) first per- 
forms the morphological segmentation of the input word- 
form, then (2) finds its stem's lexical base(s), (3) stores 
the suffix sequence situated on the right of the actual 
stem-allomorph, (4) offers the synonyms for the lexical 
base(s), and (5) generates the new word-form consisting 
of the adequate allomorph of the chosen stem and the 
adequate allomorph of the above suffix-sequence. 
Both the morphological analysis and synthesis steps 
are done by the Humor ~igh-speed unification morphol- 
ogy) method described by Pr6sz~ky and Tihanyi (1992, 
1993). The possible roots and the suffixes following 
them are temporarily stored, and H01y0ft0 performs the 
morphological synthesis on the basis of the new 
(synonym) root and the internal code of the stored suffix 
sequence. For more details, see Example 1. 
3. Implementation details 
The morphological framework behind Holyotto relies on 
unification morphology. Both the thesaurus and the mor- 
phologicaVgenerator (as a stand-alone tool) are fully im- 
plemented for Hungarian. The synonym system consists 
of 40.000 headwords, the stem dictionary of the mor- 
phological analyzer/generator contains 80.000 stems, 
suffix dictionaries contain all the inflectional suffixes and 
the productive derivational morphemes of present-day 
Hungarian. With the help of these dictionaries more than 
1.000.000.000 well-formed Hungarian word-forms can 
be analyzed or generated, and approximately 
500.000.000 synonyms are handled. The whole soft- 
ware package is written in C programming language. The 
morphological analyzer based on Humor needs 800 
3 INSTITUTE FOR LINGUISTICS OF H.A.S 
Szfnl~z u. 5-9. 
H- 1014 Budapest 
Hungary 
e-mall:h 1243tih@ella.hu 
KBytes disk space and less than 90 KBytes of core 
memory. The first version of the inflectional thesaurus 
Helvitto needs 1.6 MBytes disk space and runs under 
MS-Windows. 
References 
\[Pr6sz~ky and Tihanyi, 1992\] G&bor Pr6sz~ky and L~sd6 
Tihanyi. A Fast Morphological Analyzer for Lemmatiz- 
ing Corpora of Agglutinative Languages. In: Ferenc 
Kiefer, G(tbor Kiss and J~lia Pajzs (eds.) Papers in 
Computational Lexicography h COMPLEX-92, 
pages 265-278, Linguistics Institute, Budapest, 1992. 
\[Prhsz~ky and Tihanyi, 1993\] G~or Pr6sz~ky and L~szl6 
Tihanyi. Humor: High-speed Unification Morphology 
and Its Applications for Agglutinative Languages. La 
tribune des industries de la langue, No.10., pages 
28-29, ORL, Paris, 1993. 
WORD-FORM TO BE REPLACED: 
kup~irnra \[onto my drinking cups l \] 
MORPHOLOGICAL ANALYSB: 
kup~ +irn+ra 
SLE'FIX SEQUENCE TO BE STORED: 
+ PERS- 1SG-PL + SUB 
BASE-FORM OF rrs STEM: 
kupa \[drinking cuPl \] 
THE SYNONYM CHOSEN: 
kehely \[drinking cup2 \] 
TO BE SYI~S~ZED: 
kehely +PERS-ISG-PL+SUB 
ALLOMOP.PrlS OF ~ NEW STEM: 
{kehely, kelyh} 
ALLOMORPHS OF ~ ~IX ARRAY: 
{+ffn+ra, +irn+re, +aim+ra, 
+elm+re, + jairn + ra, + jeim + re} 
MORPt-~LOGICAL SYHTI-ESIS: 
kelyh +eim+re 
REPLACIV, G WORD-FORM: 
kelyheimre \[onto my drinking cups2\] 
Example 1. 
473 
