File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/02/j02-4007_abstr.xml
Size: 3,213 bytes
Last Modified: 2025-10-06 13:42:23
<?xml version="1.0" standalone="yes"?> <Paper uid="J02-4007"> <Title>c(c) 2002 Association for Computational Linguistics Squibs and Discussions Human Variation and Lexical Choice</Title> <Section position="2" start_page="0" end_page="546" type="abstr"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> A major task in natural language generation (NLG) is lexical choice, that is, choosing lexemes (words) to communicate to the reader the information selected by the system's content determination module. From a semantic perspective lexical choice algorithms are based on models of word meanings, which state when a word can and cannot be used; of course, lexical choice algorithms may also consider syntactic constraints and pragmatic features when choosing words.</Paragraph> <Paragraph position="1"> Such models assume that it is possible to specify what a particular word means to a particular user. However, both the cognitive science literature and recent experiments carried out in the SUMTIME project at the University of Aberdeen, of which the current authors are a part, suggest that this may be difficult to do because of variations among people, that is, because the same word may mean different things to different people.</Paragraph> <Paragraph position="2"> More precisely, although people may agree at a rough level about what a word means, they may disagree about its precise definition, and in particular, to what objects or events a word can be applied. This means that it may be impossible even in principle to specify precise word meanings for texts with multiple readers, and indeed for texts with a single reader, unless the system has access to an extremely detailed user model.</Paragraph> <Paragraph position="3"> A corpus study in our project also showed that there were differences in which words individuals used (in the sense that some words were used only by a subset of the authors) and also in how words were orthographically realized (spelled).</Paragraph> <Paragraph position="4"> This suggests that it may be risky for NLG systems (and indeed human authors) to depend for communicative success on the human reader's interpreting words exactly as the system intends. This in turn suggests that perhaps NLG systems should be cautious in using very detailed lexical models and also that it may be useful to add some redundancy to texts in case the reader does not interpret a word as expected.</Paragraph> <Paragraph position="5"> This is especially true in applications in which each user reads only one generated text; if users read many generated texts, then perhaps over time they will learn about and adapt to the NLG system's lexical usage. Human variability also needs to be taken into account by natural language processing (NLP) researchers performing corpora analyses; such analyses should not assume that everyone uses identical rules when making linguistic decisions.</Paragraph> <Paragraph position="6"> [?] Department of Computing Science, University of Aberdeen, Aberdeen AB24 3UE, UK. E-mail: ereiter@csd.abdn.ac.uk + Department of Computing Science, University of Aberdeen, Aberdeen AB24 3UE, UK. E-mail:</Paragraph> </Section> class="xml-element"></Paper>