File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/02/j02-2001_abstr.xml
Size: 6,121 bytes
Last Modified: 2025-10-06 13:42:23
<?xml version="1.0" standalone="yes"?> <Paper uid="J02-2001"> <Title>c(c) 2002 Association for Computational Linguistics Near-Synonymy and Lexical Choice</Title> <Section position="2" start_page="0" end_page="107" type="abstr"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> A word can express a myriad of implications, connotations, and attitudes in addition to its basic &quot;dictionary&quot; meaning. And a word often has near-synonyms that differ from it solely in these nuances of meaning. So, in order to find the right word to use in any particular situation--the one that precisely conveys the desired meaning and yet avoids unwanted implications--one must carefully consider the differences between all of the options. Choosing the right word can be difficult for people, let alone present-day computer systems.</Paragraph> <Paragraph position="1"> For example, how can a machine translation (MT) system determine the best English word for the French b'evue when there are so many possible similar but slightly [?] Sharp Laboratories of Europe Limited, Oxford Science Park, Edmund Halley Road, Oxford OX4 4GB, England. E-mail: phil@sharp.co.uk.</Paragraph> <Paragraph position="2"> + Department of Computer Science, University of Toronto, Ontario, Canada M5S 3G4. E-mail: Computational Linguistics Volume 28, Number 2 different translations? The system could choose error, mistake, blunder, slip, lapse, boner, faux pas, boo-boo, and so on, but the most appropriate choice is a function of how b'evue is used (in context) and of the difference in meaning between b'evue and each of the English possibilities. Not only must the system determine the nuances that b'evue conveys in the particular context in which it has been used, but it must also find the English word (or words) that most closely convey the same nuances in the context of the other words that it is choosing concurrently. An exact translation is probably impossible, for b'evue is in all likelihood as different from each of its possible translations as they are from each other. That is, in general, every translation possibility will omit some nuance or express some other possibly unwanted nuance. Thus, faithful translation requires a sophisticated lexical-choice process that can determine which of the near-synonyms provided by one language for a word in another language is the closest or most appropriate in any particular situation. More generally, a truly articulate natural language generation (NLG) system also requires a sophisticated lexical-choice process. The system must to be able to reason about the potential effects of every available option.</Paragraph> <Paragraph position="3"> Consider, too, the possibility of a new type of thesaurus for a word processor that, instead of merely presenting the writer with a list of similar words, actually assists the writer by ranking the options according to their appropriateness in context and in meeting general preferences set by the writer. Such an intelligent thesaurus would greatly benefit many writers and would be a definite improvement over the simplistic thesauri in current word processors.</Paragraph> <Paragraph position="4"> What is needed is a comprehensive computational model of fine-grained lexical knowledge. Yet although synonymy is one of the fundamental linguistic phenomena that influence the structure of the lexicon, it has been given far less attention in linguistics, psychology, lexicography, semantics, and computational linguistics than the equally fundamental and much-studied polysemy. Whatever the reasons--philosophy, practicality, or expedience--synonymy has often been thought of as a &quot;non-problem&quot;: either there are synonyms, but they are completely identical in meaning and hence easy to deal with, or there are no synonyms, in which case each word can be handled like any other. But our investigation of near-synonymy shows that it is just as complex a phenomenon as polysemy and that it inherently affects the structure of lexical knowledge.</Paragraph> <Paragraph position="5"> The goal of our research has been to develop a computational model of lexical knowledge that can adequately account for near-synonymy and to deploy such a model in a computational process that could &quot;choose the right word&quot; in any situation of language production. Upon surveying current machine translation and natural language generation systems, we found none that performed this kind of genuine lexical choice. Although major advances have been made in knowledge-based models of the lexicon, present systems are concerned more with structural paraphrasing and a level of semantics allied to syntactic structure. None captures the fine-grained meanings of, and differences between, near-synonyms, nor the myriad of criteria involved in lexical choice. Indeed, the theories of lexical semantics upon which present-day systems are based don't even account for indirect, fuzzy, or context-dependent meanings, let alone near-synonymy. And frustratingly, no one yet knows how to implement the theories that do more accurately predict the nature of word meaning (for instance, those in cognitive linguistics) in a computational system (see Hirst [1995]).</Paragraph> <Paragraph position="6"> In this article, we present a new model of lexical knowledge that explicitly accounts for near-synonymy in a computationally implementable manner. The clustered model of lexical knowledge clusters each set of near-synonyms under a common, coarse- null Edmonds and Hirst Near-Synonymy and Lexical Choice grained meaning and provides a mechanism for representing finer-grained aspects of denotation, attitude, style, and usage that differentiate the near-synonyms in a cluster. We also present a robust, efficient, and flexible lexical-choice algorithm based on the approximate matching of lexical representations to input representations. The model and algorithm are implemented in a sentence-planning system called I-Saurus, and we give some examples of its operation.</Paragraph> </Section> class="xml-element"></Paper>