File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/92/c92-1014_evalu.xml
Size: 5,544 bytes
Last Modified: 2025-10-06 14:00:08
<?xml version="1.0" standalone="yes"?> <Paper uid="C92-1014"> <Title>A High-level Morphological Description Language Exploiting Inflectional Paradigms</Title> <Section position="6" start_page="0" end_page="0" type="evalu"> <SectionTitle> 5 Discussion </SectionTitle> <Paragraph position="0"> A number of researchers have proposed tile use of tither\[lance in representing aspects of natural language (e.g., \[IIUDSON841, \[EVANS891 IDAELEMANS9I)I, \[PUSTEJOVSKY91\]). The wnrk described here is most similar in spirit to the wurk of \[CALDER89} and \[R/JS-SELL91\], who also al)ply principles of del;casible inherilance to 111e domain of conllltltational morphology, Caldot's word Rmnation rules make use of string equations, an elegant and powerful tlechtrative device which, while more expressive than our (deliberately) conslrainetl wm'd lbnnatioa and orthographic rules, may bc less amenable to efficient compilation anti appears geared towards an thmemory lexicon. By di~llowing recursion in our form rules, limiting each form rule to at most one affixatkm operation, and encoding directionality within our nrthtlgraphic patterns, wc are able to cOral)lie rules into transition networks in a swaightforward manner, reducing the need for extensive run-time unification. In oar experience to date, these language limitations have m)t interfered wilt} the concise capture of morphological behavior. Indeed, our separation of orthographic rules and fonn talcs allows us to capture orthographic gmtcralizatimts that Calder (1989) canm)t. Furthermore, whereas Calder's system &quot;disallows the possibility of inheritance of partial derived string forms,&quot; we have found that the thheritanee of intermediate stmns contributes considerably to the descriptive power of our h}rmalism.</Paragraph> <Paragraph position="1"> Russell ct al (\[RUSSELL911) have tlevelnpctl language extensions to the PATR II style aniiicatiou grammar Ibrrealism which allow lot multiple defanlt inheritance in the description of lexical entries. Multil)le inheritance is a useful tool fur partitioning syntactic, semantic, and morphological classes el behavior. However, while we have encountered occasional cases iu which a word appears to derive variants Item multil)le paradigms, we have so f,'~r npted to preserve the simplicity ol a single itthcritance hierarchy in PDL, utilizing extra lexical stems to accomodate such variants when they arise.</Paragraph> <Paragraph position="2"> Byrd and Tzoukermann (\[BYRD881) nolo that Iheir French word grammar contains 165 verb stem rules and another 110 affix rules; and they question the rehltive value nf storing rules versus inflected Iorms. This is a concern of ours as well, as we wish to minimize the number of run-time &quot;\[~dse alan'as&quot;, lXltential stems generated during morpl}ological analysis which do not actually exist in the lexicon. Our mlxtel of the French verb inflections uses 81 form rules and 17 orthogml}hic rules. We have tried to tlesign our paradigms to minimize the numtxzr of inflected stems that must be stored m the lexicon, while at the same time avoiding roles that woukl conlribnte to a prolit)ration (ff false alarms during analysis. We 1)clieve that the use of lexically overridable intermediate ff)rms is a key to strikiug this balance.</Paragraph> <Paragraph position="3"> For the purtx}se n\[ acquiring moqthnlogical information about unknown words m a coqms, however, it is useful tn have a single canonical furm (citation lorm) t~)r each paradignl, from which all inflected fornls in the paradignt can be derived. Thus we have opted to extend our language with the notkm el &quot;acquisition-only&quot; paradigms. These paradigu/s are essentially tile saute as those used for recognition; however, they include extra form rules (typically siren change rules) to reduce all lexical steins wilhth a AclIis m! COLIN(L92, NAr'rrus, 23 28 hOLq&quot; 1992 7 2 PROC. el: COLING 92, NANTES, AUG. 23-28, 1992 paradigm to a single citation stem. The intleritance provisions of PDL make it very easy to add sucb paradigms.</Paragraph> <Paragraph position="4"> I lowever, any lemum created dnring Ihe acqnisition procedure nsing an acquisition-only paradigm must be nlappe{ |to iks eqnivalent lelnma based ou Ihe corresponding recognition-thne paradigm. This iuvolves generating tile extra lexical stems required by Ihe rec{}gnition-lime paradigm, so that these stems, in addition to tile citation stem, call be stored directly ill the lexicon.</Paragraph> <Paragraph position="5"> Several traditionally problematic aspects of German mor pholtlgy have proved problematic for our fllrnlalism as well aod we lulve adoptexl extensions to tile language to acconmdate thenl. Modeling tile stem changes revolving German &quot;l.lmlantmtg&quot; (FI'ROST90\]) has required tbc a{ldition of a variable mappiug function to tile spccificatinn of orthographic rales. German separablc prefixes are handled via tile use of an affix variable, which retains tile value of the separable prefix for later unificalion with tile separable-pretix fcature of potential lexical stems. Gerinatl conlpounding renlains impossible to capture witllin our current I{)rlrl rules, as they are, constrained to a single <stenr> component. While we expect t{} store nlost COlOponnds directly in Ihc lexicon, we arc looking rote henristics Ibr analyzing componnds that minimize the number of probes needed into our secondary slorage lexicon.</Paragraph> </Section> class="xml-element"></Paper>