File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/93/e93-1055_abstr.xml
Size: 3,205 bytes
Last Modified: 2025-10-06 13:47:46
<?xml version="1.0" standalone="yes"?> <Paper uid="E93-1055"> <Title>Lexical Choice Criteria in Language Generation</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> In natural language generation (NLG), a semantic representation of some kind- possibly enriched with pragmatic attributes -- is successively transformed into one or more linguistic utterances. No matter what particular architecture is chosen to organize this process, one of the crucial decisions to be made is lexicalization: selecting words that adequately express the content that is to be communicated and, if represented, the intentions and attitudes of the speaker. Nirenburg and Nirenburg \[1988\] give this example to illustrate the lexical choice problem: If we want to express the meaning &quot;a person whose sex is male and whose age is between 13 and 15 years&quot;, then candidate realizations include: boy, kid, teenager, youth, child, young man, schoolboy, adolescent, man. The criteria influencing such choices remain largely in the dark, however.</Paragraph> <Paragraph position="1"> As it happens, the problem of lexical choice has not been a particularly popular one in NLG. For instance, Marcus \[1987\] complained that most generators don't really choose words at all; McDonald \[1991\], amongst others, lamented that lexical choice has attracted only very little attention in the research community. Implemented generators tend to provide a one-to-one mapping from semantic units to lexical items, and their producers occasionally acknowledge this as a shortcoming (e.g., \[Novak, 1991, p. 666\]); thereby the task of lexical choice becomes a nonissue. For many applications, this is indeed a feasible scheme, because the sub-language under consideration can be sufficiently restricted such that a direct mapping from content to words does not present a drawback -- the generator is implicitly tailored towards the type of situation (or register) in which it operates. But in general, with an eye on more expressive and versatile generators, this state of affairs calls for improvement.</Paragraph> <Paragraph position="2"> Why is lexical choice difficult? Unlike many other decisions in generation (e.g., whether to express an attribute of an object as a relative clause or an adjective) the choice of a word very often carries implicatures that can change the overall message significantly -- if in some sentence the word boy is replaced with one of the alternatives above, the meaning shifts considerably. Also, often there are quite a few similar lexical options available to a speaker, whereas the number of possible syntactic sentence constructions is more limited. To solve the choice problem, first of all the differences between similar words have to be represented in the lexicon, and the criteria for choosing among them have to be established. In the following, I give a tentative list of choice criteria, classify them into constraints and preferences, and outline a (partly implemented) model of lexicalization that can be incorporated into language generators. null</Paragraph> </Section> class="xml-element"></Paper>