RULE-BASED INFLEXIONAL ANALYSIS 
Zbigniew Jurkiewlcz 
University of Warsaw, Institute of Informatics 
Palac kultury i naukl, 00-901 Warszawa, P.O.Box 1210, Poland 
This paper presents a system for representation and use 
of inflexlonal knowledge for Polish language. By inflexlonal 
knowledge we mean information about rules of inflection/de- 
flection for regul~ words together with a llst of exceptions, 
Such knowledge can be successfully manipulated by a rule-based 
system. The research is a part of big undertakin~, aimed at 
construction of a system able to converse in Polish with 
casual user. 
The problem we are concerned with may be stated as 
follows. For each word in input sentence the system should 
find its basic form and dictionary i~ ormatlon connected with 
It. 
The simplest approach to this problem is to store all 
forms of words in the forms dictionary, which associates them 
with their bsslc forms. This method is acceptable for small 
sets of words. It places too big strain on system resources 
for bigger dictionaries. T~e way to minimize resource usage 
is to exploit regularities in the inflection. 
Each isn6uage possesses some regnllarltes in its IDflex- 
ion. The extent of these regularities is different in differ- 
ent languages. Also the number of different inflectional forms 
may be different, e.g. an average polish verb can have about 
100 forms. This forced us to think seriously about using re- 
gularltles even in lexlcal components for small subsets of 
~' - 146 - 
of lan~ageo We view the inflectionsl analysis system as 
composed out of %~o parts~ 
- an exception dictionary wi~h all forms taken as it- 
- a mechanism e~9~oiti ~ ~e@~lax~ties for getting neoess- 
ax-y efficiency in ses~ch and ~ving resources. 
We based our mechanism on the analysis of endings. The 
ending is defined as a ps~t of word which is changed while 
reducing the word to its b~sic (d~Ictionax~) form. Polish 
language is characterized by many rules of deflection, which 
may be applicable to a given ending° A single word may be 
interpreted in as many w~s as many endings we can disting- 
uish in it, multiplied by a number of applicable rules for 
each ending. Therefore such candidate ending must be confirm- 
ed by checking result in the diction~ of basic forms after 
applying proposed deflection rule. 
The described knowledge was written down in rule-based 
system "FORS". "PORS" is rather classical forwax~-driven rule 
system with some degree of extensibility. I% is written in 
programming language LISP and is composed out of three parts: 
- facts, represented as list structures stored in a 
fully indexed data base8 
- rules of the form 
condition =~ aotlon action ... 
- control mechanism for choosing and applying rules. 
Eaoh condition is a sequence of patterns of faots, 
which must be asserted in a database for rule to be applicab- 
le. 
Patterns may contain typed variables. The type Of a 
variable is identified by one-letter long prefix. Prefix must 
be a non~alphanumex~Ical character. Variable type may be defin- 
ed by providing matching functions for this type. 
147 - 
Inflexional knowledge is represented in "YORS" as follows. 
Each dictionary entry is represented as fact of the following 
forms 
(ENTRY (BASIC-PORU) (CATEGORY) (OTHER PARAMETERS)) 
The word currently processed is saved asz 
(~C~WD(W0~)) 
The exceptions are represented as rules 'of the fore 
(~ED~D (WO~-FO~)) (E~TR¥ (BASlC-FO~)...) 
C~SWEa...) 
The" rules for deflection by endings replacement are stored as 
(~C~IVED *VAa-(~DINGI)) (E~TR~ ~Wa-(E~DI~) ...) 
(~S~R...) 
The prefix ~ is used for variables typed "suffixed"• All 
variables in "FORS" get valuss by matohlng to fact elements• 
For suffixed variable without value, the value is assigned 
after cutting a given ending from item element (if possible, 
otherwise the matching fails)• While matoh£ng suffixed variab- 
le which already7 has some valuer Final value is obtained by 
concatenating ~iven gufix to its 
There may exist msn~ oompeti~E rules for recognized 
ending. Also, for a given word a couple of allowed endings 
may be lndentified (e•g• one letter long, two letters long 
etc.). The control component in "PORe-. allows to specify the 
sequencing between such completing rules. In a current version, 
the set of rules for regular endings is divided into ~oups 
according to the ending in 
(RECEIVED, • • ) 
pattern. We amocla~e a node with each such group. The nodes 
form a directed graph, called control graph. We associate a 
node with exception rules group too. One node is selected as 
a staz~__ing node. The ares in this ~aph specify (partial) 
- 148 - 
order between nodeeo ~hus defining eoquen@L~ between groups 
of rules. AI£ nodes must be aocese4blo from startinK node (4. 
other terms, @cat.el graph must be a dLreoted aoyol£o oonno@t- 
The .system works in cycles. At each cycle It roads the 
nex~ word from input sentence and tr~e8 to find a rule appli~- 
able to thee word. Rules 8~re tr~ed a@coz~LnK to the order 
defined by a control graph, 8tartinK from the etart~ node. 
For each node, the rules ansoolated with it are @hsoked, 
until one is found with satisfied conditions This rule is then 
run and the next cycle begins. If no rule wan app\]~Loable, 
system goes to one of successor nodes, guided by analysed word 
~he advantages of representinK inflectione£ knowledge in 
such a form are mango The system is modular, because each rule 
is independent from all others. Therefore rules may be added 
o~ deleted at will, allowing additional sources of knowledge 
to be tried. 
The beha~our of the 8yatem Is easily observable by non- 
proKraumer (in linguistic terms such as rules, end£nKs eto.)o 
The set of rules may be adjusted to a given application, 
espeoielly for small systems with specialised diotionariese 
The independent control component allows to exper2ment 
with different rule groupings in the search of minimization 
of resource usage. The grouping ac@ord~ to the oonoludad 
'syntactic category may ellow to exploit syntactic expectations, 
prowlded from parser. As for .now, we succeeded in incorporat- 
in K only most popular deflection rules (about 600 of them)e We 
are goi~ to inoorporat~e some additionel phonetic rules to 
take care ~f alterations. This could hopefully dimin:l.sh the 
number of deflection r~les. 
- 149 - 
