File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-2182_metho.xml
Size: 7,281 bytes
Last Modified: 2025-10-06 14:14:22
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-2182"> <Title>Formal Description of Multi-Word Lexemes with the Finite-State Formalism IDAREX</Title> <Section position="3" start_page="0" end_page="1036" type="metho"> <SectionTitle> 2 Restricted Variability of MWLs </SectionTitle> <Paragraph position="0"> Basically, we recognize four types of variability (see also Fleischer 1982, Brundage et al. 1992, Engelke 1994) that a description of MWLs, both for NLP and for human use, should cover. Though part of the variability of MWLs may follow from their semantic properties as argued in recent work (e.g. Nunberg et al. 1994), it is difficult to establish such a relationship on a large scale, and a lot of remaining idiosyncratic characteristics of individual MWLs need to be represented.</Paragraph> <Section position="1" start_page="0" end_page="1036" type="sub_section"> <SectionTitle> Morphological Variation: Particular words </SectionTitle> <Paragraph position="0"> in the MWL may undergo certain inflections.</Paragraph> <Paragraph position="1"> For instance, in G:durchschlagender Erfolg ('sweeping success', lit. rubbing--offsuccess), noun and adjective can vary in case and in numbcr, and comparative and superlative form are possible for the adjective, whereas G:griine Welle ('phased traffic lights', lit. green wave) may only vary in case, but not in number or adjective comparison without loosing its idiomatic meaning.</Paragraph> <Paragraph position="2"> Lexical Variation: One or more words can be substituted by other terms without changing the overall meaning of the MWL.</Paragraph> <Paragraph position="3"> For instance, in F:perdre la tdte ('to lose one's mind', lit. to lose the head), the noun can be substituted by la boule (lit. ball, coll. head) or les pddales (lit. pedals) without loosing its idiomatic meaning, but not by la tronche (lit. slice, coll. head).</Paragraph> <Paragraph position="4"> Modification: One of the MWL's constituents can be modified, preserving the idiomatic meaning. null For instance, in G:den (schgnen) Schein wahren ('to keep up appearances', lit. the (nice) pretence preserve) the presence or absence of the adjective does not change the meaning at all, whereas in G:das Handtueh werfen ('to throw in the towel', lit. the towel throw) any modification would evoke the literal meaning.</Paragraph> <Paragraph position="5"> Structural Flexibility: This includes phenomena like passivization, topicalization, scrambling, raising constructions etc.</Paragraph> <Paragraph position="6"> For instance, whereas in German standard word order variation applies to all verbal MWLs, topicalisation of lexically fixed components is only rarely po~ible, as in G:den Vogel dabei hat dana Jan abgeschossen ('finally, Jan surpassed everyone', lit. the bird with it has then Jan shot).</Paragraph> </Section> </Section> <Section position="4" start_page="1036" end_page="1037" type="metho"> <SectionTitle> 3 IDAREX: Encoding Idioms As </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="1036" end_page="1036" type="sub_section"> <SectionTitle> Regular EXpressions </SectionTitle> <Paragraph position="0"> The IDAREX formalism and the corresponding FSC Finite State Compiler have been developed at Rank Xerox Research Centre by L. Karttunen, P. Tapanainen and G. Valetto 2.</Paragraph> </Section> <Section position="2" start_page="1036" end_page="1036" type="sub_section"> <SectionTitle> 3.1 Morphological Variation </SectionTitle> <Paragraph position="0"> Because IDAItEX use8 a two-level morphology, words can be presented either in their base form at the lexical level or in an inflected form at the surface level. The surface form is preceded by a colon and restricts occurrences of the word to exactly this form, e.g.</Paragraph> <Paragraph position="1"> 2 For a more detailed description of the formalism see Kaxttunen and Yampol (1993), Tapanainen (1994), Uaxttunen (1995), and Segond and Tapanainen (1995).</Paragraph> <Paragraph position="2"> : Welle The lezical form is followed by an IDAREX morphological variable specifying morphological features of the word, and a colon, e.g.</Paragraph> <Paragraph position="3"> durchschlagend A : This represents any occurrence of the word with the specified morphological properties. The morphological variable can be very general, such a.u 'A' for ally adjectival use, or more specific, such as Abs0 for adjectives that may not be used in comparative form and isg to restrict nouns to the singular, as in grim Abse: Welle Nsg: This way, the restricted morph(~syntactic flexibility of MWLs can bc expressed very elegantly.</Paragraph> </Section> <Section position="3" start_page="1036" end_page="1036" type="sub_section"> <SectionTitle> 3.2 Modification </SectionTitle> <Paragraph position="0"> MWL modifications with particular words are represented as optional expressions with parentheses, as ill :den (:sch6nen) :Schein The definition of word-class variables allows to express lexically unrestricted modifications of an MWL such as insertion of any adverb(s) (the Kleene star operator indicates that the item may occur any number of times): perdre V: ADV* :la :t@te On the basis of simpler word-class variables more complex ones may be defined for complex syntactic categories suclt as NP, ADVP or PP.</Paragraph> </Section> <Section position="4" start_page="1036" end_page="1037" type="sub_section"> <SectionTitle> 3.3 Lexieal and Structural Variation </SectionTitle> <Paragraph position="0"> The formalism provides a set of RE operators to combine the descriptions of single words. Square brackets '\[ \]' and the bar '--' are used to describe lexical variants and alternations of more complex sequences such as word order variation in German.</Paragraph> <Paragraph position="1"> For instance, for the French example above we In addition, IDAREX allows the definition of macros to capture generalisations on the syntactic level. Any position in the macro that we want to instantiate differently for each use is indicated by a parameter $4. Instantiations of parameters can be single words in lexical or surface form, variables, operators or other macros.</Paragraph> <Paragraph position="2"> For example, instead of explicitly writing the complicated RE above, we define a word order macro WOVltrg that may be used for all German verbal MWLs having no additional idiommxternal arguments: WOV1Arg:</Paragraph> <Paragraph position="4"> In addition, we define auxiliary macros fix(i) because we want to instantiate the parameter $1, which stands for the lexically fixed components of the MWL, with expressions of variable length: fix5:$1 $2 $3 $4 $S fix2:$1 $2 etc.</Paragraph> <Paragraph position="5"> Using this word order macro, the MWLs den (schSnen) Schein wahren and die Ohren spitzen ('to prick up one's ears', lit. the ears sharpen) can now both be expressed very simply according to the same schema as</Paragraph> <Paragraph position="7"> Further macros are defined for German for MWLs with a reflexive or particle verb, to express scrambling of an idiom-external PP complement or topicalisation. In French, macros describe for example the verb complex for MWLs involving a reflexive verb.</Paragraph> </Section> </Section> class="xml-element"></Paper>