File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/89/e89-1018_metho.xml
Size: 13,350 bytes
Last Modified: 2025-10-06 14:12:20
<?xml version="1.0" standalone="yes"?> <Paper uid="E89-1018"> <Title>cation. A progress report.&quot; in: Ross Steele / Terry Threadgold (Eds.): Language</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 1 Lexical knowledge for </SectionTitle> <Paragraph position="0"> multilingual generation Within a multilingual generation system, it seems necessary to keep the dictionary as modular as possible, separating information that pertains to different levels of linguistic description 3. We assume that the system's lexical knowledge is stored in the following types of &quot;specialized dictionaries&quot;: * semantic: inventory of possible lexicalizations of a concept in a given language; syntactic: one inventory of realization classes per language, providing information about number, type and realization of the arguments of a given lexeme; * morphological: one inventory of inflectional classes per language.</Paragraph> <Paragraph position="1"> Since none of these levels of decsription is completely independent, the dictionaries should be linked to each other by means of cross-references and reference to class membership. Templates and mechanisms allowing for explicit inheritance of shared properties, e.g. redundancy rules, will be used within aFor more details on the dictionary structure see \[HEID/MOMMA 1989\].</Paragraph> <Paragraph position="2"> - 130 each of the layers. These mechanisms give access to the knowledge about the linguistic &quot;behaviour&quot; of lexemes needed in the process of lexicalization 4.</Paragraph> <Paragraph position="3"> 2 Approaches to the description of collocations</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Contributions from lexicogra- </SectionTitle> <Paragraph position="0"> phy The tradition of British Contextualism 5 defines collocations on the basis of statistical assumptions about the probability of the cooccurence of two lexemes. Particularly frequent combinations of lexical units are regarded as collocations.</Paragraph> <Paragraph position="1"> A more detailed definition can be found in the work of Franz Josef Hausmann (1985:119): &quot;One partner determines, another is determined. In other words: collocations have a basis and a cooccurring collocate. &quot;6 This determination manifests itself in so far as a given basis does not allow all of the collocates that would be possible according to general semantic coocurrence conditions, but only a certain subset: so in French, retenir son admiration, retenir sa haine, sa joie are possible, but *retenir son dgsespoir is not. The choice of collocates depends strongly on the lexeme that has been chosen as the basis; knowledge about possible collocations can be only partly derived from knowledge about general semantic properties of lexemes. Therefore general cooccurrence rules or selectional 4Possibly including classifications according to semantically motivated lexeme classes and a modelling of paradigmatic relations between lexemes, such as hyponymy or synonymy.</Paragraph> <Paragraph position="2"> 5The term &quot;collocation&quot; was introduced into linguistic discussion by John R. Firth (1951:94). eTranslation by the authors. We use the terms basis and collocate in the sense of \[ttAUSMANN 1985\]; HAUSMANN'S original terms are Basis and Kollokator. restrictions (e.g. using semantic markers) are not adequate for the choice of collocates in the process of lexicalization.</Paragraph> <Paragraph position="3"> These considerations lead to two proposals for the structuring of the lexical knowledge used in a generator: * Heuristic for the lexicalization process: &quot;First the basis is lexicalized, then the collocate, depending on which lexeme has been chosen as the basis.&quot; Knowledge about the possibility of combining lexemes in collocations should be stored in the lexicalization dictionary (where lexicalization candidates for concepts are provided), and specifically in the entries for the bases.</Paragraph> <Paragraph position="4"> The following table shows in terms of categories 7 what can be a possible collocate for a particular basisS: basis possible collocates noun noun, Verb , adjective verb adverb adjective adverb 7Unlike British Contextualism (cf. the recent \[SINCLAIR 1987\]) we assume that bases and collocates are of one of the following categories: noun, verb, adjective or adverb.</Paragraph> <Paragraph position="5"> s For substantive-verb-coliocations, the classification as basis and collocate is opposed to the usual syntactic description according to head and modifier; this has consequences for the lexicalization process: while it is usually possible to frst lexicalize the heads of phrases, then the modifiers (e.g. substantiveh~d,bo~s < adjective,~od~1~e~,coUo~ot~, the choice of verbs depends on their nominal complements (which are modifiers, but which have to be considered as bases of collocations). This means that nouns have to be lexicalized before verbs, e.g. Pi~'ne schmieden, but not *gute Vors~'tze schmieden).</Paragraph> <Paragraph position="6"> - 131 -</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Lexical functions of the </SectionTitle> <Paragraph position="0"> Meaning-Text-Theory as a tool for the description of collocations null In MTT, developed by Mel'~uk and coworkers, there exist about 60 &quot;lexical functions&quot; which describe regular dependencies between lexical units of a language. In MTT, lexical functions are understood as cross-linguistically constant operators (f), whose application to a lexeme (&quot;keyword&quot;, L) yields other lexemes (v). Mel'~uk (1984:6), (1988:31f) uses the following notation: f(L) = v The result of the application of a lexical function to a given lexeme can be another &quot;one-word&quot; lexeme, or a collocation, an idiom or even an interjection.</Paragraph> <Paragraph position="1"> The parallelism between the collocation definition used in this paper and the notion of lexical function is that both start from the principle that collocates depend upon the respective bases (in MTT, v is a function of L). Therefore lexical functions seem to be a useful device for the description of collocations in a generation lexicon.</Paragraph> <Paragraph position="2"> In the following, we only consider lexica/ functions which, when applied to a lexeme word, yield collocationsS; Table 1 gives some examples of such lexical functions, together with a definitional gloss, taken from \[STEELE/MEYER 198811deg: sit should be investigated to what extent the category of v is predictable for every f, according to the category of L. For instance, J~s of group 1 and 2 specified in the table below, applied to nouns, yield substantive+verb-collocations, those of groups 3 and 4 yield substantive+adjective-collocations, and those of groups 5 and 6 return substantive+substantivecollocations. null ldegLexical functions of group 2, normally occur together with those from 1; ABLB only occurs in combination with other lexical functions.</Paragraph> </Section> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Generating Collocations </SectionTitle> <Paragraph position="0"> We propose that every lexeme entry in the lexicalization dictionary contains slots for lexical functions, whose fillers are possible collocates; within a slot/filler-notation as the one used in Polygloss, a (partial) lexical entry, e.g. for problem, could be represented in the following It might be possible to predict the types of lexical functions applicable to a given lexeme from its membership in a semantic class. Syntactic properties of bases and collocates are accessible through reference to the realization lexicon.</Paragraph> <Paragraph position="1"> \[MEL'CUK/POLGUERE 1987\]:271f themselves stress the advantage of describing collocations with lexical functions within language generation and machine translation: they give the example of OPER (*QUESTION*), realized as</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Lexicon structure and possible generalizations </SectionTitle> <Paragraph position="0"> On the basis of the analysis of some entries in \[MEL'CUK et al. 1984\] and of material we 11Here *QUI~STION* refers to a concept that stands for the language-specific items.</Paragraph> <Paragraph position="1"> have analysed within Polygloss x2, it seems possible to generalize over some regularities in collocation formation for members of semantically homogenous lexeme classes.</Paragraph> <Paragraph position="2"> An example: the following default assumptions can be made for nouns expressing information handled by a computer (we assume semantic classes *I-NoUNSG* and *I-NoUNSF* for Some exceptions, however, have to be stated explicitly, as illustrated by the example of French nouns expressing personal attitudes, treated in \[MEL'CUK et al. 1984\]: PA* -&quot; { admiration, coldre, dgsespoir, enthousiasme, enyie, gtonnement, haine, joie, mgpris, respect } 12Manuals for PC-Networks that have been provided in machine-readable form in German and French by IBM; cf. \[RAAB 1988\].</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 The generation of paraphrases </SectionTitle> <Paragraph position="0"> One of the aims in the development of the &quot;how-to-say&quot;-component of a generation system is to ensure that variants (i.e. true paraphrases) can be generated for one and the same semantic structure.</Paragraph> <Paragraph position="1"> This involves two types of knowledge: more 'static' knowledge about interchangeability of realization variants (synonymous items, information about paraphrase relations between certain constructions or between collocations) and more 'procedural' knowledge about heuristics guiding the choice between candidates. The 'static' knowledge should be represented declaratively. It can be divided into information about syntactic variants (e.g.</Paragraph> <Paragraph position="2"> participle form vs. relative clause) and information about lexicalization variants. In</Paragraph> <Paragraph position="4"> express paraphrase relations between certain types of collocations. Ideally these rules can be set up for pairs of lexical functions, without consideration of concrete lexemes. Examples John was enthused by this discovery.</Paragraph> <Paragraph position="5"> Within a generation system, such descriptions can be used to state paraphrase relations between collocational lexicalization candidates. The choice between candidates depends on parameters, amongst which the following ones seem to be essential: * syntactic &quot;behaviour&quot; of the lexemes building up a collocation 13 - in relation to roles in the frame structure to be realized; - in relation to the thematic structure of the intended utterance; 18We plan to investigate to what extent it is possible to describe the syntactic form of certain collocations with general rules. This is possible e.g. for OVER, FUNC, LABOR, i.e. for lexical functions yielding collocations of the type of &quot;Funktionsverbgeffige&quot;: &quot;avoid repetition&quot;, &quot;avoid deep embedding&quot; etc. ) In the following, we give an example for the lexicalization possibilities that can be described with the proposed device: given the following (rudimentary) semantic representation 14: mental process : *BE- HAPPY*</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> :BEARER *PIERRE* :CAUSE *NEWS*, </SectionTitle> <Paragraph position="0"> there should be available the following information about collocations with joie as a mettre qn en joie remplir qn de joie = la joie s'empare de qn la joie saisit qn, la joie nab dans le coeur de qn = qn se met enjoie The choice between INCEP and CAUSE depends on whether (and how) the causality is to be expressed. The choice between INCEP OPER and INCEP FUNC depends on whether the relaization of *PIERRE* or Of*NEWS* should become the subject.</Paragraph> <Paragraph position="1"> 14 menta/ process is meant to be a concept type; :BBARBR and :OAUSB are semantic relations; *BB-HAPPY*~ *PIBRRB* and *NBWS* are concepts. ZSIn simplified notation. The first two examples are roughly equivalent to English make someone happy, fill someone with joy, the latter ones to to please someone. - 134 Here constraints caused by the syntax of the utterance to be generated play an important role: in a relative clause e.g. the antecedent has already been introduced. This fact limits the choice: -- se mit en joie (= CAUS FUNC) This example shows that the heuristic &quot;lexicalize bases first, then collocates&quot; interacts with constraints stemming e.g. from syntax; these constraints can also be produced by a text structuring component (decisions about topic, thematic order etc.). The modular design of the lexicon supports generation of variants by giving access to all information needed at the appropriate choicepoints.</Paragraph> </Section> class="xml-element"></Paper>