File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/w94-0311_metho.xml
Size: 32,521 bytes
Last Modified: 2025-10-06 14:13:53
<?xml version="1.0" standalone="yes"?> <Paper uid="W94-0311"> <Title>Semantic Lexicons: the Cornerstone for Lexical Choice in Natural Language Generation</Title> <Section position="3" start_page="0" end_page="92" type="metho"> <SectionTitle> 2 The Issue of Lexical Choice </SectionTitle> <Paragraph position="0"> There is a debate in NLG concerning the place of lexical choice in the generation process. Should lexical choice take place at the level of the &quot;planning component&quot; or the &quot;realization component&quot;? Even for generators which do not have a &quot;traditional&quot; two-component architecture, actions are still sequential and lexical choice takes place after some &quot;planning&quot;.</Paragraph> <Paragraph position="1"> Lexical choice relates to lexicalization in the sense of not only needing to pick up the right words or expressions but also of needing to &quot;realize&quot; them or lexicalize them. We would argue on one hand that lexicalization does not constitute an autonomous module within the process of generation, and on the other hand that lexical choice is not the sole prerogative of either the &quot;planning&quot; or the &quot;realization&quot; component. The reason is that a concept cannot be seen in isolation (the choice of a particular concept will trigger some other related concepts) and when lexicalized, the syntactico-semantics of the lexical item will impose some constraints on the further possible choice of concepts to be lexicalized (thus constraining the set of concepts triggered by the previous one). In other words in the process of production a lexical choice can influence a conceptual choice and vice versa.</Paragraph> <Paragraph position="2"> Thus in terms of NLG this means that lexical choice has some influence at the level of &quot;planning&quot; and &quot;re-</Paragraph> <Section position="1" start_page="91" end_page="91" type="sub_section"> <SectionTitle> 7th International Generation Workshop * Kennebunkport, Maine * June 21-24, 1994 </SectionTitle> <Paragraph position="0"> alization&quot;. Moreover, if we want to generate in an incremental way, it follows that a strict distinction between these two components can no longer hold, and that we must attempt either to bridge gaps between them (Meteer 1992) or to generate in a partly parallel fashion.</Paragraph> <Paragraph position="1"> In this paper, we take the view of integrating lexical semantics in the design of the lexicon to be used in an NLG system, in order to perform the right lexicalizations. We define lexicalization as a complex dynamic process, by which we find the appropriate lexicalized items for utterances, in order to fulfill communicative goals. In fact, we think that we use a backward and forward process between concepts and lexical items and we believe that it is through incremental (re)lexiealizations(re)conceptuallzations that we perform well-formed linguistic realizations (Viegas, 1993).</Paragraph> <Paragraph position="2"> In the following, after a brief overview of the issue of lexical choice, we focus on the treatment of collocations, which poses the problem of complex lexicalizations, and motivates the need of taking into account, in the process of lexicalizing, both several concepts and several lexical items.</Paragraph> </Section> <Section position="2" start_page="91" end_page="91" type="sub_section"> <SectionTitle> 2.1 Different Approaches </SectionTitle> <Paragraph position="0"> Roughly speaking, the issue of lexical choice has been investigated mainly along two different lines: a conceptual-based approach (mainly in the AI tradition) and a linguistic-based approach. 1 Despite these efforts, lexical choice remains a burning issue. We agree with McKeown and Swartout (1988) when they say that: &quot;... a truly satisfactory theoretical approach for lexical choice has yet to be developed.&quot; However, like some leading researchers in generation, we argue that it is of paramount importance to first know the kind of information that should be coded in the lexicon, which means to pay more attention to &quot;the nature of words&quot; (McDonald, 1988) and to have a &quot;real knowledge of \[the\] lexical semantics&quot;, as was pointed out by Marcus (1987): &quot;In some important sense, \[the\] systems have no real knowledge of lexical semantics ....</Paragraph> <Paragraph position="1"> They use fragments of linguistic structure which eventually have words as their frontiers, but have little or no explicit knowledge of what these words mean.&quot; In this article, we will not give a review of the issue of the lexical choice; it is enough to say that the lexical semantic component for lexical representation is still 1Robin's report (1990) presents a good survey on &quot;Lexical Choice in NLG&quot;. See also (Reiter, 1991) and (Nogier and Zock, 1992) for a comprehensive study of the evolution made in the field.</Paragraph> <Paragraph position="2"> basically unused and that there is a need to tackle that issue if we want to give some new and promising impetus to the study on lexical choice.</Paragraph> </Section> <Section position="3" start_page="91" end_page="92" type="sub_section"> <SectionTitle> 2.2 The Treatment of Collocations </SectionTitle> <Paragraph position="0"> There is much divergence of opinion on just what the defining criteria for collocations are. One can minimally define a collocation as the distribution of an object or element in relation to other objects or elements, as dictionaries do; needless to say, apart from remaining vague, at best this does not provide any clue for finding them operationally.</Paragraph> <Paragraph position="1"> There are three main approaches to the study of collocations, namely, lexicographic, statistical and linguistic: in each of these, the term collocation is used differently.</Paragraph> <Paragraph position="2"> The traditional approach to collocations has been lexlcographic. Here dictionaries provide information about what is unpredictable or idiosyncratic. Benson (1989) synthesizes Hausmann's studies on collocations (Hausmann, 1979), calling expressions such as commit murder, compile a dictionary, inflict a wound, etc. &quot;fixed combinations, recurrent combinations&quot; or &quot;collocations&quot;. In Hausmann's terms a collocation is composed of two elements, a base (&quot;Basis&quot;) and a collocate (&quot;Kollokator&quot;); the base is semantically autonomous whereas the collocate cannot be semantically interpreted in isolation. In other words the set of lexical collocates which can combine with a given base is not predictable and collocations must therefore be listed in dictionaries.</Paragraph> <Paragraph position="3"> In recent years, there has been a resurgence of statistical approaches applied to the study of natural languages. Sinclair (1991) states that &quot;a word which occurs in close proximity to a word under investigation is called a collocate of it ... Collocation is the occurrence of two or more words within a short space of each other in a text&quot;. The problem is that with such a definition of collocations, even when improved, one identifies not only collocations but freecombining pairs frequently appearing together such as lawyer-client; doctor-hospital, as pointed out by Smadja (1993).</Paragraph> <Paragraph position="4"> There has been no real focus on collocations from a linguistic perspective. The lexicon has been broadly sacrificed by both English-speaking schools and continental European schools. The scientific agenda of the former has been largely dominated by syntactic issues until recently whereas the latter was more concerned with pragmatic aspects of natural languages.</Paragraph> <Paragraph position="5"> The focus has been largely on grammatical collocations such as adapt to, aim at, look for. Lakoff (1970) distinguishes a class of expressions which cannot undergo certain operations, such as nominalization, causativization: the problem is hard; *the hard-</Paragraph> </Section> <Section position="4" start_page="92" end_page="92" type="sub_section"> <SectionTitle> 7th International Generation'Workshop * Kennebunkport, Maine * June 21-24, 1994 </SectionTitle> <Paragraph position="0"> ness of the problem; *the problem hardened. Restrictions on the application of certain syntactic operations can help define collocations such as hard problem, for example. One specific proposal for how to treat collocations in a linguistic model is developed in Mel'~uk's work on lexical functions (Mel'~uk, 1988).</Paragraph> <Paragraph position="1"> In this theory, lexicM knowledge is encoded in an entry of the Explanatory Combinatorial Dictlonary, each entry being divided into three zones: the semantic zone (a semantic network representing the meaning of the entry in terms of more primitive words), the syntactic zone (the grammatical properties of the entry) and the lexical combinatorics zone (containing the values of the Lexical Functions (LFs)) ~. LFs are central to the study of collocations and can be defined as the following : a lexicalfunction F is a correspondence which associates a lexical item L, called the key word of F, with a set of lexical items F(L) - the value of F (Mel'~uk, 1988).</Paragraph> <Paragraph position="2"> The LF Magn, for example, applies to different categories to deliver collocational values, expressing an intensity:</Paragraph> <Paragraph position="4"> The Mel'~ukian approach is very interesting as it provides a model of production well suited for generation with its different strata and also a lot of lexical-semantic information. It suffers nevertheless from three main problems (Heylen et al., 1993). First, all the collocational information must be listed in a static way, because the theory does not provide any predictable calculus of the possible expressions which can collocate with each other semantically. Second, it is sometimes difficult to assign the right lexical functions for newly analyzed lexical items; if we take the example of assigning an LF to an Adj-Noun structure, it involves knowing something about the semantic relation which exists between adjective and noun. (Bloksma et al., 1993) state that &quot;It is precisely this information which in many cases proves extremely difficult to establish, simply because it is just not entirely clear what semantic processes are involved in the union of adjective and noun&quot;.</Paragraph> <Paragraph position="5"> Finally, sometimes LFs are too general to be useful, as shown in the following examples:</Paragraph> <Paragraph position="7"> In these cases, superscripts and subscripts are needed to restrict the scope of the LF: they enhance the precision of the LFs, making them sensitive to 2See (Iordanskaja, et al., 1991) and (Ramos et al., 1994), concerning the use of MTT and LFs in NLG respectively.</Paragraph> <Paragraph position="8"> meaning aspects of the lexical items on which they operate, thus constraining overgeneration of multiple values; yet this also shows that the set of LFs described is not sufficient.</Paragraph> <Paragraph position="9"> By contrast, our general thesis is that there is no single definition for what a collocation is, but rather, collocational behavior emerges from a theory of what the range of connections and relations between lexical items can be. We claim that much of the allegedly idiosyncratic and language-specific collocation in language is in fact predictable from a sufficiently rich theory of lexical organization. This is not to say that there is no need for specific lexical encoding of some idioms and phrases, but that there is seldom any attempt made to bridge the gap between conventional semantic selection and the peripheral phenomena of collocations and fixed expressions. null We will make the distinction between the following kinds of combinations: Free-Combining Words: ( a big stick; a wonderful man; there is an old man at the door) Semantic Collocations: ( a fast car; a long book; to start a car) Idiosyncratic Lexical Co-occurrences: (a heavy smoker vs. un grand fumeur (French); un grand/gros mangeur (French) vs. un gran/~gordo comelon (Spanish)) Idioms: (to kick the bucket; take advantage o\]). Formally, this takes us from purely compositional constructions of &quot;free-combining words&quot; to the non-compositional structures in idioms. The vast space between these two extremes can still be explained in terms of compositional principles with mechanisms from GLT such as type coercion and subselection (Pustejovsky, 1991, 1993), as we shall see below. Idiosyncrasies, of course, should be listed in the lexicon, yet we believe that we can reduce the set of what are conventionally considered idiosyncrasies by differentiating &quot;true&quot; idiosyncrasies (which cannot be derived or generated) from expressions which, since they are compositional in nature, behave predictably, and which we call semantic collocations.</Paragraph> </Section> </Section> <Section position="4" start_page="92" end_page="93" type="metho"> <SectionTitle> 3 Generative Lexicon Theory </SectionTitle> <Paragraph position="0"> The Generative Lexicon Theory (GLT) (Pustejovsky, 1991, 1994c) can be said to take advantage of both linguistic and conceptual approaches, providing a framework which arose from the integration of linguistic studies and of techniques found in AI. GLT can be briefly characterized as a system which involves four</Paragraph> <Section position="1" start_page="93" end_page="93" type="sub_section"> <SectionTitle> 7th International Generation Workshop * Kennebunkport, Maine * June 21-24, 1994 </SectionTitle> <Paragraph position="0"> levels of representation which are connected by a set of generative devices accounting for a compositional interpretation of words in context, namely: the argument structure which specifies the predicate argument structure for a word and the conditions under which the variables map to syntactic expressions; the event structure giving the particular event types such as S (state), P (process) or T (transition); the qualia structure distributed among four roles FORM (formal), CONST (constitutive), TELIC and AGENT (Agentive); and the inheritance structure which involves two different kinds of mechanisms: * the fixed inheritance mechanism, which is basically a fixed network of the traditional isa relationship found in AI, enriched with the different roles of the qualia structure; * the projective inheritance mechanism, which can be intuitively characterized as a way of triggering semantically related concepts which define for each role the projective conclusion space (PCS).</Paragraph> <Paragraph position="1"> For instance in the PCS of the telic and agentive roles of book we will find at least the following predicates: read, reissue, annotate, ... and write, print, bind, ... (respectively) 3.</Paragraph> <Paragraph position="2"> The most important of the generative devices connecting these four levels is a semantic operation called type coercion which &quot;captures the semantic relatedness between syntactically distinct expressions&quot; (Pustejovsky, 1994a). Another notion introduced is that of lexical conceptual paradigms (LCPs), as formalized in (Pustejovsky, 1994b). We will say that the aim of an LCP is to capture the conceptual regularities across languages in terms of cognitive invariants, like &quot;physical-object&quot;, &quot;aperture&quot;, &quot;natural kind&quot; and alternations such as &quot;container/containee&quot;, etc. Moreover, the possible syntactic projections are associated with LCPs. For instance, one can say &quot;I left a leaflet in/inside the book at the page I want you to read' as book is an information-phys_obj-container whereas for instance one cannot say &quot;I put the book in the top of the table&quot; as &quot;the top of the table&quot; is a surface and not a container 4.</Paragraph> <Paragraph position="3"> In the following, we will focus on two basic mechanisms of GLT, which allow us to bridge the word usage gap, that is, on a scale of lexical specificity, from freecombining words to idioms. These are: (1) Reference to the qualia structure: By giving ev null categorization in relation with cognition.</Paragraph> <Paragraph position="4"> cific semantic functions, we are encoding the &quot;semantic basis&quot; of word usage information with a lexical item. This gives rise to semantic collocations. null (2) Cospecification: This is the basic means of encod null ing specific usage information in the form of either coherent argument subtypes, or already lexicalized phrases, giving rise to idiosyncrasies and idioms, respectively.</Paragraph> </Section> </Section> <Section position="5" start_page="93" end_page="96" type="metho"> <SectionTitle> 4 Adjectival Semantic Collocations within GLT </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="93" end_page="94" type="sub_section"> <SectionTitle> 4.1 The Semantics of Nominals </SectionTitle> <Paragraph position="0"> We illustrate these theoretical notions with some examples for nominals 5, paying particular attention to &quot;covert-relational nominals ''6, that is, those exhibiting a logical polysemy. We only present partial entries, which however exhibit semantic information distributed among the qualia, thus allowing the prediction of semantic collocations as will be shown in 4.2.</Paragraph> <Paragraph position="1"> We give some realizations for beer and writer and discuss their representationsZ: South African Breweries Ltd., or SAB, the country's largest producer of beer, was hit by a strike at seven of its 11 breweries around the country.</Paragraph> <Paragraph position="2"> &quot;I am a beer-drinker with a running problem,&quot; one hash lapel button re~tds.</Paragraph> <Paragraph position="4"> Ms. Rifkind is a writer and editor living in New York.</Paragraph> <Paragraph position="5"> Mr. Ferguson is an editorial writer for Scripps Howard</Paragraph> <Paragraph position="7"> nominals, including nominalizations, see Pustejovsky and Anick (1988).</Paragraph> <Paragraph position="8"> sWe use &quot;covert&quot; to differentiate traditional relational nominals (such as/fiend, father, cousin), from the class of nouns . which exhibit a polysemous behaviour (such as book, door, record}.</Paragraph> <Paragraph position="9"> 7We mainly use the approach to typed feature structures as described in Carpenter (1992). We cannot develop here the way the information is inherited in the partial lexical entries presented.</Paragraph> </Section> <Section position="2" start_page="94" end_page="94" type="sub_section"> <SectionTitle> 7th International Generation Workshop * Kennebunkport, Maine * June 21-24, 1994 </SectionTitle> <Paragraph position="0"> The argument structure of nouns encodes arguments which are to be taken as logical parameters providing type information for lexical items as discussed in (Pustejovsky, 1994a). The predicates &quot;drink&quot;, &quot;produce&quot;, and &quot;write&quot;, are the defaults we find in the qualia of beer and writer respectively. It is still possible to create the semantic space which these predicates belong to, through the projective inheritance mechanism.</Paragraph> <Paragraph position="1"> In the cases of covert-relational nominals, exhibiting semantic polysemy, we argue that they have actu- null ally well-defined calculi. If we look at examples (1): (1) a. This book is heavy to carry around.</Paragraph> <Paragraph position="2"> (physical object) b. I read an angry book. (text) c. This book is great! (text and/or physical object) (la) and (lb) illustrate the polysemy between the physical object and the notion of text, whereas (lc) can either refer to one or both aspects within the same sentence.</Paragraph> <Paragraph position="3"> Traditional approaches, from transformational grammars to classical Montague grammars, account for this lexical ambiguity by postulating different entries per lexical item. These fail to capture the core semantics of the lexical items, leaving the complemenlary s senses unrelated. Following Pustejovsky (1994b) we suggest that covert relational nominals should have a relational structure, thus capturing polysemy within the lexical structure.</Paragraph> <Paragraph position="4"> For the purpose of clarity We only give a partial representation of book below:</Paragraph> <Paragraph position="6"> Briefly, this states that book inherits from the relational information-physical_object-container-Lcp, although imposing additional constraints of its own, represented here as the arguments, namely ARG 1 and AR62. Moreover, we have specified two defaults for the telic and agentive roles, each refering to one aspect of book, either text or physical_object. The 8Weinreieh (1964) makes the distinction between contrastlvc and complementary ambiguity. A noun such as record exhibits the former type between the readings written statement o\]\]acts and gramophone record or disc, and the latter between the complementary interpretation of physical object and musical content.</Paragraph> <Paragraph position="7"> sorts publisher, writer, printer are organized hierarchically with individual as a common super-type.</Paragraph> <Paragraph position="8"> This nominal representation enables us to capture all the complementary nominal &quot;polysemous&quot; senses as expressed in the sentences: The writer began his third book (writing), my sister began &quot;The Bostonians&quot; (reading); the binder finished the books for CUP (binding), etc. The values of these qualia are typed and are accessible to semantic operations in composition with other phrases. One aspect of nominal representation to be captured with this formalism is the paradigmatic behavior of a lexical item in the syntax, and help understanding the processes involved in lexical selection tasks. In the next section, we address the issue of selection within the NP, and show the utility of having qualia structure associated with nouns and adjectives for compositional purposes, focusing on semantic collocations.</Paragraph> </Section> <Section position="3" start_page="94" end_page="95" type="sub_section"> <SectionTitle> 4.2 Adj-Noun Interpretation </SectionTitle> <Paragraph position="0"> Within the approach taken here, adjectives, depending on their types, will be able to modify not only the arguments of the argument structure of the nouns (ARGSTR), but also the arguments inside the agentive and the telic roles. As the information in the qualia is specific to the noun and as the same adjective can modify different roles, it is possible to deal with the polysemous behavior of adjectives and to provide a generative explanation of semantic collocations.</Paragraph> <Paragraph position="1"> Very briefly, an adjective selects for a particular type, an event or an object. When it modifies an object, it selects for a particular semantic type (person, vehicle, information, etc.). When it takes an event, it can be restricted to a special type (process, event, transition) or role (agentive or telic). If the noun does not have in its argument structure the type required by the adjective, generative mechanisms can exploit the richness of typing of the qualia and generate the required type (Pustejovsky, 1994a), if it is available in the qualia and if common sense knowledge is respected. In this case, the adjective will only modify one part of the qualia (i.e. of the denotation) of the noun.</Paragraph> <Paragraph position="2"> Consider, for example, the French adjectives intelligent (clever) and lriste (sad) in (2). We give, for each example, the English literal translations (lit. tr.): (2) a. un homme intelligent/triste; (lit. tr. a clever/sad man) b. des yeuz intelligents/tristes; (lit. tr. clever/sad eyes) ~ which show the cleverness/sadness of the person in question c. un livre intelligent/triste; (lit. tr. clever/sad book) ~ book which shows the cleverness/sadness of the person who writes the book</Paragraph> </Section> <Section position="4" start_page="95" end_page="96" type="sub_section"> <SectionTitle> 7th International Generation Workshop * Kennebunkport, Maine * June 21-24, 1994 </SectionTitle> <Paragraph position="0"> d. un livre intelligent/triste ~ book which causes the *cleverness/sadness of the person who reads it e. un sapin triste; (lit. tr. a sad fir-tree) ---~ *fir-tree that causes the sadness of the person who . . .</Paragraph> <Paragraph position="1"> f. nne voiture triste; (lit. tr. a sad car) ~ *car that causes the sadness of the person who constructs it g. une robe triste; (lit. tr. a sad dress) ---~ *that causes the sadness of the person who wears it These adjectives select for an object of type person (as shown in (2a)): triste ARCl - \[\] person \[ change.state-LeP 1 QUALIA \[ FORM = ,,i.,~C~ s. \[\] ) J In (2bcd), despite the apparent violation of types, the modification is possible, because the qualia of the noun makes explicit different relations between the type person selected by the adjective (Type-Adj) and the noun (N), as:</Paragraph> <Paragraph position="3"> It must be clear that this kind of modification is only possible if the relations are defined in the qualia.</Paragraph> <Paragraph position="4"> The sentence (2e), for example, is semantically difficult, as the word sapin, as a natural kind, has no telic or agentive roles (independently of particular contexts). The modication must also respect very general common sense knowledge: in (2e) and (2g), the readings * a book that causes the cleverness of the person who reads it (2e) and *a dress that causes the sadness of the person who wears it (2g) is blocked by common sense principles, like: * cleverness cannot be communicated, unlike sadness null * there must be a direct causal link between the event expressed in the telic/agentive role and the sadness of the individual. This link does not relate in our societies sadness and wearing a particular dress or building a car.</Paragraph> <Paragraph position="5"> Take now the case of long. This adjective, in one of its senses, modifies an event transition, whose it indicates the temporal duration, as shown in the examples (3): (3) a. le long voyage (the long trip) b. nn long livre (a long book) ---~ whose reading/writing is long It will therefore receive the following entry:</Paragraph> <Paragraph position="7"> (3b) is therefore possible because events are defined in the qualia of the noun livre. Again, un long sapin has no event reading, because there is no event available in the qualia of the noun sapin.</Paragraph> <Paragraph position="8"> The adjectives ancient and former are also event submodifiers, distinguished by the role they modify.</Paragraph> <Paragraph position="9"> Ancient is a relative adjective that submodifies the agentive role of the modified noun: ancient oRM= ,, ......</Paragraph> <Paragraph position="10"> In this view, ancient stories (in example (3)) are stories which were narrated in the past, so: distant_past (e r) A narrate(e T , z, stories) By contrast, the English adjective former is a prop-erty modifier and can only modify the telic role of the noun: former QUALIA \[change..stat e-LOP \[FORM = p..,(~) A former architect is a person who performed his job in the past 9, so:.</Paragraph> <Paragraph position="11"> past(e P) A perform_the_job_of_architect (e P, z) In French, two adjectives with the same meaning past can modify these two roles: ancien and vieuz, which will receive the following feature structure (which does not deal with the absolute sense):</Paragraph> </Section> <Section position="5" start_page="96" end_page="96" type="sub_section"> <SectionTitle> 7th International Generation Workshop * Kennebunkport, Maine * June 21-24, 1994 </SectionTitle> <Paragraph position="0"> That is not to say that these two adjectives will be ambiguous in context. We show elsewhere (Bouillon and Viegas, 1994) that the interpretation of the adjective can be influenced by the context or morphological and syntactical constraints as the place of the French adjective, the type of the determiner or the typography (hyphen or quotes).</Paragraph> <Paragraph position="1"> Within this approach, semantic collocations can be therefore computed in the same way as other Adj-Noun constructions and do not need to be listed in the dictionary.</Paragraph> </Section> </Section> <Section position="6" start_page="96" end_page="96" type="metho"> <SectionTitle> 5 Perspectives for NLG </SectionTitle> <Paragraph position="0"> With GLT, we can generate dynamically the set of possible semantic collocations. This can be done incrementally, as we make available the set of possible choices at run-time, a set which will be constrained by the situational and/or contextual environment.</Paragraph> <Paragraph position="1"> Suppose that we are generating Adj-Noun constructions from logical forms. From a structure like the following: By, z, e T livre(y) A lire(e T, x, y) A long(e T) we can generate two sentences: the non-collocational one un livre long h life (lit. tr. a book long to read) and the collocational one un long livre (a long book), because the entries of the noun and the adjective in GL specify that this combination is possible.</Paragraph> <Paragraph position="2"> In contrast, we will not be able to generate from the logical form below une robe triste (a Sad dress) with the meaning of a dress which makes me sad because this NP is blocked by the common sense principles evocated in the previous section.</Paragraph> <Paragraph position="3"> 3y, x, e T, e s robe(y) A porter(e T, x, y) A causer(e T , e S) A triste(e s, x) That is not to say that we can predict generatively all collocations. Take the examples of Adj-Noun collocations involving grand and gros with nouns denoting activities: (4) a. un grand/gros mangeur (a big eater) b. un grand/gros fraudeur (a big smuggler) c. un *grand/gros client (a big client) d. un grand/*gros fumeur (a heavy smoker) e. un grand/*gros professeur (a great professor) Here, grand and gros are intensifiers of the predicate in the telic. Un grandfumeur, for example, will receive the following interpretation : Az\[f umeur( x ) . . . \[Telic( z ) = AvAeP\[furner(e P, x, v: tabac) A grand(e P) \]\]\] We can predict that gros is intensifier of the quantitative aspect of the predicate while grand will modify both qualitative (4e) and quantitative aspects (4abcd), depending on the salience of these aspects in the predicate (we can assume that a professor is generally judged by the quality of his courses, while a smoker by the quantity of the smoking). What we cannot do is to predict which adjective will be used with preference for the quantitative aspects.</Paragraph> <Paragraph position="4"> To deal with this set of idiosyncratic lexical co-occurrences and idioms, we must take the concept of collocational information a step further, with a theory of cospecification. This takes advantage of linguistic, statistical and lexicographic approaches (see 2.2), but also adds the dimension of semantic typing, focusing on collocations as they relate to sortal selection.</Paragraph> <Paragraph position="5"> For instance, the cospecifications associated with the predicates we find in the telic of book, namely read, has encoded sortal pairs, providing the privileged environment (or associations) for that word:</Paragraph> <Paragraph position="7"> In the cases of grand fumeur versus gros mangeur, we know that the telic offumeur and mangeur (ruiner and manger) are predicates, denoting activities of type process, on which we can apply a scale (tr~s pen .... beaucoup ... dnormdment ...). The adjective which will express a point on the scale with a specific noun will be specified in the cospecifications (as below). In fact, both grand and gros can generally be understood, with sometimes a clear preference for one of these, depending of the term being modified. This preference is modelled as a partial ordering ( ) over a type hierarchy < Cospec, E> , encoded in the cospecifications. null</Paragraph> <Paragraph position="9"/> </Section> class="xml-element"></Paper>