File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/82/c82-2034_abstr.xml

Size: 6,401 bytes

Last Modified: 2025-10-06 13:46:02

<?xml version="1.0" standalone="yes"?>
<Paper uid="C82-2034">
  <Title>RULE-BASED INFLEXIONAL ANALYSIS</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
RULE-BASED INFLEXIONAL ANALYSIS
</SectionTitle>
    <Paragraph position="0"> Palac kultury i naukl, 00-901 Warszawa, P.O.Box 1210, Poland This paper presents a system for representation and use of inflexlonal knowledge for Polish language. By inflexlonal knowledge we mean information about rules of inflection/deflection for regul~ words together with a llst of exceptions, Such knowledge can be successfully manipulated by a rule-based system. The research is a part of big undertakin~, aimed at construction of a system able to converse in Polish with casual user.</Paragraph>
    <Paragraph position="1"> The problem we are concerned with may be stated as follows. For each word in input sentence the system should find its basic form and dictionary i~ ormatlon connected with It.</Paragraph>
    <Paragraph position="2"> The simplest approach to this problem is to store all forms of words in the forms dictionary, which associates them with their bsslc forms. This method is acceptable for small sets of words. It places too big strain on system resources for bigger dictionaries. T~e way to minimize resource usage is to exploit regularities in the inflection.</Paragraph>
    <Paragraph position="3"> Each isn6uage possesses some regnllarltes in its IDflexion. The extent of these regularities is different in different languages. Also the number of different inflectional forms may be different, e.g. an average polish verb can have about 100 forms. This forced us to think seriously about using regularltles even in lexlcal components for small subsets of ~' - 146 of lan~ageo We view the inflectionsl analysis system as composed out of %~o parts~ - an exception dictionary wi~h all forms taken as it- a mechanism e~9~oiti ~ ~e@~lax~ties for getting neoessax-y efficiency in ses~ch and ~ving resources.</Paragraph>
    <Paragraph position="4"> We based our mechanism on the analysis of endings. The ending is defined as a ps~t of word which is changed while reducing the word to its b~sic (d~Ictionax~) form. Polish language is characterized by many rules of deflection, which may be applicable to a given endingdeg A single word may be interpreted in as many w~s as many endings we can distinguish in it, multiplied by a number of applicable rules for each ending. Therefore such candidate ending must be confirmed by checking result in the diction~ of basic forms after applying proposed deflection rule.</Paragraph>
    <Paragraph position="5"> The described knowledge was written down in rule-based system &amp;quot;FORS&amp;quot;. &amp;quot;PORS&amp;quot; is rather classical forwax~-driven rule system with some degree of extensibility. I% is written in programming language LISP and is composed out of three parts: - facts, represented as list structures stored in a fully indexed data base8 - rules of the form condition =~ aotlon action ...</Paragraph>
    <Paragraph position="6"> - control mechanism for choosing and applying rules.</Paragraph>
    <Paragraph position="7"> Eaoh condition is a sequence of patterns of faots, which must be asserted in a database for rule to be applicable. null Patterns may contain typed variables. The type Of a variable is identified by one-letter long prefix. Prefix must be a non~alphanumex~Ical character. Variable type may be defined by providing matching functions for this type.</Paragraph>
    <Paragraph position="8">  The prefix ~ is used for variables typed &amp;quot;suffixed&amp;quot;* All variables in &amp;quot;FORS&amp;quot; get valuss by matohlng to fact elements* For suffixed variable without value, the value is assigned after cutting a given ending from item element (if possible, otherwise the matching fails)* While matohPSng suffixed variable which already7 has some valuer Final value is obtained by concatenating ~iven gufix to its There may exist msn~ oompeti~E rules for recognized ending. Also, for a given word a couple of allowed endings may be lndentified (e*g* one letter long, two letters long etc.). The control component in &amp;quot;PORe-. allows to specify the sequencing between such completing rules. In a current version, the set of rules for regular endings is divided into ~oups according to the ending in (RECEIVED, * * ) pattern. We amocla~e a node with each such group. The nodes form a directed graph, called control graph. We associate a node with exception rules group too. One node is selected as a staz~__ing node. The ares in this ~aph specify (partial) - 148 order between nodeeo ~hus defining eoquen@L~ between groups of rules. AIPS nodes must be aocese4blo from startinK node (4. other terms, @cat.el graph must be a dLreoted aoyolPSo oonno@t-The .system works in cycles. At each cycle It roads the nex~ word from input sentence and tr~e8 to find a rule appli~able to thee word. Rules 8~re tr~ed a@coz~LnK to the order defined by a control graph, 8tartinK from the etart~ node. For each node, the rules ansoolated with it are @hsoked, until one is found with satisfied conditions This rule is then run and the next cycle begins. If no rule wan app\]~Loable, system goes to one of successor nodes, guided by analysed word ~he advantages of representinK inflectionePS knowledge in such a form are mango The system is modular, because each rule is independent from all others. Therefore rules may be added o~ deleted at will, allowing additional sources of knowledge to be tried.</Paragraph>
    <Paragraph position="9"> The beha~our of the 8yatem Is easily observable by nonproKraumer (in linguistic terms such as rules, endPSnKs eto.)o The set of rules may be adjusted to a given application, espeoielly for small systems with specialised diotionariese The independent control component allows to exper2ment with different rule groupings in the search of minimization of resource usage. The grouping ac@ord~ to the oonoludad 'syntactic category may ellow to exploit syntactic expectations, prowlded from parser. As for .now, we succeeded in incorporatin K only most popular deflection rules (about 600 of them)e We are goi~ to inoorporat~e some additionel phonetic rules to take care ~f alterations. This could hopefully dimin:l.sh the number of deflection r~les.</Paragraph>
    <Paragraph position="10"> - 149 -</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML