XML Viewer - j02-2001

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/j02-2001_metho.xml
Size: 81,392 bytes
Last Modified: 2025-10-06 14:07:54
<?xml version="1.0" standalone="yes"?>
<Paper uid="J02-2001">
  <Title>c(c) 2002 Association for Computational Linguistics Near-Synonymy and Lexical Choice</Title>
  <Section position="5" start_page="111" end_page="115" type="metho">
    <SectionTitle>
8 A rigorous justification of this point would run to many pages, especially for near-synonyms. For
</SectionTitle>
    <Paragraph position="0"> example, it would have to be argued that the verb sleep and the adjective asleep are not merely near-synonyms that just happen to differ in their syntactic categories, even though the sentences Emily sleeps and Emily is asleep are synonymous or nearly so.</Paragraph>
    <Paragraph position="1">  Computational Linguistics Volume 28, Number 2 MISTAKE, ERROR. Fehler is a definite imperfection in a thing which ought not to be there. In this sense, it translates both mistake and error. Irrtum corresponds to mistake only in the sense of 'misunderstanding', 'misconception', 'mistaken judgment', i.e. which is confined to the mind, not embodied in something done or made. [footnote:] Versehen is a petty mistake, an oversight, a slip due to inadvertence. Missgriff and Fehlgriff are mistakes in doing a thing as the result of an error in judgment.</Paragraph>
    <Paragraph position="2"> Figure 2 An entry (abridged) from Dictionary of German Synonyms (Farrell 1977).</Paragraph>
    <Paragraph position="3"> impair (3) blunder, error b'evue (3-2) blunder (due to carelessness or ignorance) faux pas (3-2) mistake, error (which affects a person adversely socially or in his/her career, etc) bavure (2) unfortunate error (often committed by the police) b^etise (2) stupid error, stupid words gaffe (2-1) boob, clanger Figure 3 An entry (abridged) from Using French Synonyms (Batchelor and Offord 1993). The parenthesized numbers represent formality level from 3 (most formal)to1(least formal). of lexical knowledge is not very well understood as yet. Lexicographers, for instance, whose job it is to categorize different uses of a word depending on context, resort to using mere &amp;quot;frequency&amp;quot; terms such as sometimes and usually (as in Figure 1). Thus, we cannot yet make any claims about the influence of context on nearsynonymy. null In summary, to account for near-synonymy, a model of lexical knowledge will have to incorporate solutions to the following problems: * The four main types of variation are qualitatively different, so each must be separately modeled.</Paragraph>
    <Paragraph position="4"> * Near-synonyms differ in the manner in which they convey concepts, either with emphasis or indirectness (e.g., through mere suggestion rather than denotation).</Paragraph>
    <Paragraph position="5"> * Meanings, and hence differences among them, can be fuzzy.</Paragraph>
    <Paragraph position="6"> * Differences can be multidimensional. Only for clarity in our above explication of the dimensions of variation did we try to select examples that highlighted a single dimension. However, as Figure 1 shows, blunder and mistake, for example, actually differ on several denotational dimensions as well as on stylistic and attitudinal dimensions.</Paragraph>
    <Paragraph position="7">  * Differences are not just between simple features but involve concepts that relate roles and aspects of the situation.</Paragraph>
    <Paragraph position="8"> * Differences often depend on the context.</Paragraph>
    <Paragraph position="9"> 3. Near-Synonymy in Computational Models of the Lexicon  Clearly, near-synonymy raises questions about fine-grained lexical knowledge representation. But is near-synonymy a phenomenon in its own right warranting its own  A simplistic hierarchy of conceptual schemata with connections to their lexical entries for English and German.</Paragraph>
    <Paragraph position="10"> special account, or does it suffice to treat near-synonyms the same as widely differing words? We will argue now that near-synonymy is indeed a separately characterizable phenomenon of word meaning.</Paragraph>
    <Paragraph position="11"> Current models of lexical knowledge used in computational systems, which are based on decompositional and relational theories of word meaning (Katz and Fodor 1963; Jackendoff 1990; Lyons 1977; Nirenburg and Defrise 1992; Lehrer and Kittay 1992; Evens 1988; Cruse 1986), cannot account for the properties of near-synonyms. In these models, the typical view of the relationship between words and concepts is that each element of the lexicon is represented as a conceptual schema or a structure of such schemata. Each word sense is linked to the schema or the conceptual structure that it lexicalizes. If two or more words denote the same schema or structure, all of them are connected to it; if a word is ambiguous, subentries for its different senses are connected to their respective schemata. In this view, then, to understand a word in a sentence is to find the schema or schemata to which it is attached, disambiguate if necessary, and add the result to the output structure that is being built to represent the sentence. Conversely, to choose a word when producing an utterance from a conceptual structure is to find a suitable set of words that &amp;quot;cover&amp;quot; the structure and assemble them into a sentence in accordance with the syntactic and pragmatic rules of the language (Nogier and Zock 1992; Stede 1999).</Paragraph>
    <Paragraph position="12"> A conceptual schema in models of this type is generally assumed to contain a set of attributes or attribute-value pairs that represent the content of the concept and differentiate it from other concepts. An attribute is itself a concept, as is its value. The conceptual schemata are themselves organized into an inheritance hierarchy, taxonomy, or ontology; often, the ontology is language-independent, or at least languageneutral, so that it can be used in multilingual applications. Thus, the model might look  One possible hierarchy for the various English and French words for untrue assertions. Adapted from Hirst (1995).</Paragraph>
    <Paragraph position="13"> like the simplified fragment shown in Figure 4. In the figure, the rectangles represent concept schemata with attributes; the arrows between them represent inheritance. The ovals represent lexical entries in English and German; the dotted lines represent their connection to the concept schemata.</Paragraph>
    <Paragraph position="14">  Following Frege's (1892) or Tarski's (1944) truth-conditional semantics, the concept that a lexical item denotes in such models can be thought of as a set of features that are individually necessary and collectively sufficient to define the concept. Such a view greatly simplifies the word-concept link. In a text generation system, for instance, the features amount to the necessary applicability conditions of a word; that is, they have to be present in the input in order for the word to be chosen. Although such models have been successful in computational systems, they are rarely pushed to represent near-synonyms. (The work of Barnett, Mani, and Rich [1994] is a notable exception; they define a relation of semantic closeness for comparing the denotations of words and expressions; see Section 9.) They do not lend themselves well to the kind of fine-grained and often fuzzy differentiation that we showed earlier to be found in nearsynonymy, because, in these models, except as required by homonymy and absolute synonymy, there is no actual distinction between a word and a concept: each member of a group of near-synonyms must be represented as a separate concept schema (or group of schemata) with distinct attributes or attribute values. For example, Figure 5 shows one particular classification of the fib group of near-synonyms in English and French.</Paragraph>
    <Paragraph position="15">  A similar proliferation of concepts would be required for various error clusters (as shown earlier in Figures 1, 2, and 3).</Paragraph>
    <Paragraph position="16"> 9 This outline is intended as a syncretism of many models found in the interdisciplinary literature and is not necessarily faithful to any particular one. For examples, see the papers in Evens (1988) (especially Sowa [1988]) and in Pustejovsky and Bergler (1992) (especially Nirenburg and Levin [1992], Sowa [1992], and Burkert and Forster [1992]); for a theory of lexico-semantic taxonomies, see Kay (1971). For a detailed construction of the fundamental ideas, see Barsalou (1992); although we use the term schema instead of frame, despite Barsalou's advice to the contrary, we tacitly accept most elements of his model. For bilingual aspects, see Kroll and de Groot (1997).</Paragraph>
    <Paragraph position="17"> 10 We do not claim that a bilingual speaker necessarily stores words and meanings from different languages together. In this model, if the concepts are taken to be language independent, then it does not matter if one overarching hierarchy or many distinct hierarchies are used. It is clear, however, that cross-linguistic near-synonyms do not have exactly the same meanings and so require distinct concepts in this model.</Paragraph>
    <Paragraph position="18">  Edmonds and Hirst Near-Synonymy and Lexical Choice Although some systems have indeed taken this approach (Emele et al. 1992), this kind of fragmentation is neither easy nor natural nor parsimonious. Hirst (1995) shows that even simple cases lead to a multiplicity of nearly identical concepts, thereby defeating the purpose of a language-independent ontology. Such a taxonomy cannot efficiently represent the multidimensional nature of near-synonymic variation, nor can it account for fuzzy differences between near-synonyms. And since the model defines words in terms of only necessary and sufficient truth-conditions, it cannot account for indirect expressions of meaning and for context-dependent meanings, which are clearly not necessary features of a word's meaning.</Paragraph>
    <Paragraph position="19"> Moreover, a taxonomic hierarchy emphasizes hyponymy, backgrounding all other relations, which appear to be more important in representing the multidimensional nature of fine-grained word meaning. It is not even clear that a group of synonyms can be structured by hyponymy, except trivially (and ineffectively) as hyponyms all of the same concept.</Paragraph>
    <Paragraph position="20"> The model also cannot easily or tractably account for fuzzy differences or the full-fledged concepts required for representing denotational variation. First-order logic, rather than the description logic generally used in ontological models, would at least be required to represent such concepts, but reasoning about the concepts in lexical choice and other tasks would then become intractable as the model was scaled up to represent all near-synonyms.</Paragraph>
    <Paragraph position="21"> In summary, present-day models of the lexicon have three kinds of problems with respect to near-synonymy and fine-grained lexical knowledge: the adequacy of coverage of phenomena related to near-synonymy; engineering, both in the design of an efficient and robust lexical choice process and in the design of lexical entries for near-synonyms; and the well-known issues of tractability of reasoning about concepts during natural language understanding and generation.</Paragraph>
    <Paragraph position="22"> Nevertheless, at a coarse grain, the ontological model does have practical and theoretical advantages in efficient paraphrasing, lexical choice, and mechanisms for inference and reasoning. Hence, to build a new model of lexical knowledge that takes into account the fine-grainedness of near-synonymy, a logical way forward is to start with the computationally proven ontological model and to modify or extend it to account for near-synonymy. The new model that we will present below will rely on a much more coarsely grained ontology. Rather than proliferating conceptual schemata to account for differences between near-synonyms, we will propose that near-synonyms are connected to a single concept, despite their differences in meaning, and are differentiated at a subconceptual level. In other words, the connection of two or more words to the same schema will not imply synonymy but only near-synonymy. Differentiation between the near-synonyms--the fine tuning--will be done in the lexical entries themselves.</Paragraph>
  </Section>
  <Section position="6" start_page="115" end_page="117" type="metho">
    <SectionTitle>
4. Near-Synonymy and Granularity of Representation
</SectionTitle>
    <Paragraph position="0"> To introduce the notion of granularity to our discussion, we first return to the problem of defining near-synonymy.</Paragraph>
    <Paragraph position="1"> Semanticists such as Ullmann (1962), Cruse (1986), and Lyons (1995) have attempted to define near-synonymy by focusing on &amp;quot;propositional&amp;quot; meaning. Cruse, for example, contrasts cognitive synonyms and plesionyms; the former are words that, when intersubstituted in a sentence, preserve its truth conditions but may change the expressive meaning, style, or register of the sentence or may involve different idiosyn- null whereas intersubstituting the latter changes the truth conditions but still yields semantically similar sentences (e.g., misty : foggy). Although these definitions are important for truth-conditional semantics, they are not very helpful for us, because plesionymy is left to handle all of the most interesting phenomena discussed in Section 2. Moreover, a rigorous definition of cognitive synonymy is difficult to come up with, because it relies on the notion of granularity, which we will discuss below.</Paragraph>
    <Paragraph position="2"> Lexicographers, on the other hand, have always treated synonymy as nearsynonymy. They define synonymy in terms of likeness of meaning, disagreeing only in how broad the definition ought to be. For instance, Roget followed the vague principle of &amp;quot;the grouping of words according to ideas&amp;quot; (Chapman 1992, page xiv). And in the hierarchical structure of Roget's Thesaurus, word senses are ultimately grouped according to proximity of meaning: &amp;quot;the sequence of terms within a paragraph, far from being random, is determined by close, semantic relationships&amp;quot; (page xiii). The lexicographers of Webster's New Dictionary of Synonyms define a synonym as &amp;quot;one of two or more words ... which have the same or very nearly the same essential meaning....Synonyms can be defined in the same terms up to a certain point&amp;quot; (Egan 1942, pages 24a-25a). Webster's Collegiate Thesaurus uses a similar definition that involves the sharing of elementary meanings, which are &amp;quot;discrete objective denotations uncolored by ... peripheral aspects such as connotations, implications, or quirks of idiomatic usage&amp;quot; (Kay 1988, page 9a). Clearly, the main point of these definitions is that near-synonyms must have the same essential meaning but may differ in peripheral or subordinate ideas. Cruse (1986, page 267) actually refines this idea and suggests that synonyms (of all types) are words that are identical in &amp;quot;central semantic traits&amp;quot; and differ, if at all, only in &amp;quot;peripheral traits.&amp;quot; But how can we specify formally just how much similarity of central traits and dissimilarity of peripheral traits is allowed? That is, just what counts as a central trait and what as a peripheral trait in defining a word? To answer this question, we introduce the idea of granularity of representation of word meaning. By granularity we mean the level of detail used to describe or represent the meanings of a word. A fine-grained representation can encode subtle distinctions, whereas a coarse-grained representation is crude and glosses over variation. Granularity is distinct from specificity, which is a property of concepts rather than representations of concepts. For example, a rather general (unspecific) concept, say Human, could have, in a particular system, a very fine-grained representation, involving, say, a detailed description of the appearance of a human, references to related concepts such as Eat and Procreate, and information to distinguish the concept from other similar concepts such as Animal. Conversely, a very specific concept could have a very coarse-grained representation, using only very general concepts; we could represent a Lexicographer at such a coarse level of detail as to say no more than that it is a physical object.</Paragraph>
    <Paragraph position="3"> Near-synonyms can occur at any level of specificity, but crucially it is the fine granularity of the representations of their meanings that enables one to distinguish one near-synonym from another. Thus, any definition of near-synonymy that does not take granularity into account is insufficient. For example, consider Cruse's cognitive synonymy, discussed above. On the one hand, at an absurdly coarse grain of representation, any two words are cognitive synonyms (because every word denotes a &amp;quot;thing&amp;quot;). But on the other hand, no two words could ever be known to be cognitive synonyms, because, even at a fine grain, apparent cognitive synonyms might be fur11 What's the difference between a violin and a fiddle? No one minds if you spill beer on a fiddle.  Edmonds and Hirst Near-Synonymy and Lexical Choice ther distinguishable by a still more fine-grained representation. Thus, granularity is essential to the concept of cognitive synonymy, as which pairs of words are cognitive synonyms depends on the granularity with which we represent their propositional meanings. The same is true of Cruse's plesionyms. So in the end, it should not be necessary to make a formal distinction between cognitive synonyms and plesionyms.</Paragraph>
    <Paragraph position="4"> Both kinds of near-synonyms should be representable in the same formalism.</Paragraph>
    <Paragraph position="5"> By taking granularity into account, we can create a much more useful definition of near-synonymy, because we can now characterize the difference between essential and peripheral aspects of meaning. If we can set an appropriate level of granularity, the essential meaning of a word is the portion of its meaning that is representable only above that level of granularity, and peripheral meanings are those portions representable only below that level.</Paragraph>
    <Paragraph position="6"> But what is the appropriate level of granularity, the dividing line between coarse-grained and fine-grained representations? We could simply use our intuition--or rather, the intuitions of lexicographers, which are filtered by some amount of objectivity and experience. Alternatively, from a concern for the representation of lexical knowledge in a multilingual application, we can view words as (language-specific) specializations of language-independent concepts. Given a hierarchical organization of coarse-grained language-independent concepts, a set of near-synonyms is simply a set of words that all link to the same language-independent concept (DiMarco, Hirst, and Stede 1993; Hirst 1995). So in this view, near-synonyms share the same propositional meaning just up to the point in granularity defined by language dependence. Thus we have an operational definition of near-synonymy: If the same concept has several reasonable lexicalizations in different languages, then it is a good candidate for being considered a language-independent concept, its various lexicalizations forming sets of near-synonyms in each language.</Paragraph>
    <Paragraph position="7">  Granularity also explains why it is more difficult to represent near-synonyms in a lexicon. Near-synonyms are so close in meaning, sharing all essential coarse-grained aspects, that they differ, by definition, in only aspects representable at a fine grain. And these fine-grained representations of differences tend to involve very specific concepts, typically requiring complex structures of more general concepts that are difficult to represent and to reason with. The matter is only made more complicated by there often being several interrelated near-synonyms with interrelated differences.</Paragraph>
    <Paragraph position="8"> On the other hand, words that are not near-synonyms--those that are merely similar in meaning (dog : cat) or not similar at all (dog : hat)--could presumably be differentiated by concepts at a coarse-grained, and less complex, level of representation.</Paragraph>
  </Section>
  <Section position="7" start_page="117" end_page="124" type="metho">
    <SectionTitle>
5. A Model of Fine-Grained Lexical Knowledge
</SectionTitle>
    <Paragraph position="0"> Our discussion of granularity leads us to a new model of lexical knowledge in which near-synonymy is handled on a separate level of representation from coarse-grained concepts.</Paragraph>
    <Section position="1" start_page="117" end_page="122" type="sub_section">
      <SectionTitle>
5.1 Outline of the Model
</SectionTitle>
      <Paragraph position="0"> Our model is based on the contention that the meaning of an open-class content word, however it manifests itself in text or speech, arises out of a context-dependent combination of a basic inherent context-independent denotation and a set of explicit differences 12 EuroWordNet's Inter-Lingual-Index (Vossen 1998) links the synsets of different languages in such a manner, and Resnik and Yarowsky (1999) describe a related notion for defining word senses cross-lingually.</Paragraph>
      <Paragraph position="1">  Computational Linguistics Volume 28, Number 2 to its near-synonyms. (We don't rule out other elements in the combination, but these are the main two.) Thus, word meaning is not explicitly represented in the lexicon but is created (or generated, as in a generative model of the lexicon [Pustejovsky 1995]) when a word is used. This theory preserves some aspects of the classical theories--the basic denotation can be modeled by an ontology--but the rest of a word's meaning relies on other nearby words and the context of use (cf. Saussure). In particular, each word and its near synonyms form a cluster.</Paragraph>
      <Paragraph position="2">  The theory is built on the following three ideas, which follow from our observations about near-synonymy. First, the meaning of any word, at some level of granularity, must indeed have some inherent context-independent denotational aspect to it--otherwise, it would not be possible to define or &amp;quot;understand&amp;quot; a word in isolation of context, as one in fact can (as in dictionaries). Second, nuances of meaning, although difficult or impossible to represent in positive, absolute, and context-independent terms, can be represented as differences, in Saussure's sense, between near-synonyms. That is, every nuance of meaning that a word might have can be thought of as a relation between the word and one or more of its near-synonyms. And third, differences must be represented not by simple features or truth conditions, but by structures that encode relations to the context, fuzziness, and degrees of necessity.</Paragraph>
      <Paragraph position="3"> For example, the word forest denotes a geographical tract of trees at a coarse grain, but it is only in relation to woods, copse, and other near-synonyms that one can fully understand the significance of forest (i.e., that it is larger, wilder, etc.). The word mistake denotes any sort of action that deviates from what is correct and also involves some notion of criticism, but it is only in relation to error and blunder that one sees that the word can be used to criticize less severely than these alternatives allow. None of these differences could be represented in absolute terms, because that would require defining some absolute notion of size, wildness, or severity, which seems implausible. So, at a fine grain, and only at a fine grain, we make explicit use of Saussure's notion of contrast in demarcating the meanings of near-synonyms.</Paragraph>
      <Paragraph position="4"> Hence, the theory holds that near-synonyms are explicitly related to each other not at a conceptual level but at a subconceptual level--outside of the (coarser-grained) ontology. In this way, a cluster of near-synonyms is not a mere list of synonyms; it has an internal structure that encodes fine-grained meaning as differences between lexical entries, and it is situated between a conceptual model (i.e., the ontology) and a linguistic model.</Paragraph>
      <Paragraph position="5"> Thus the model has three levels of representation. Current computational theories suggest that at least two levels of representation, a conceptual-semantic level and a syntactic-semantic level, are necessary to account for various lexico-semantic phenomena in computational systems, including compositional phenomena such as paraphrasing (see, for instance, Stede's [1999] model). To account for fine-grained meanings and near-synonymy, we postulate a third, intermediate level (or a splitting of the conceptual-semantic level). Thus the three levels are the following: A conceptual-semantic level.</Paragraph>
      <Paragraph position="6"> | A subconceptual/stylistic-semantic level.</Paragraph>
      <Paragraph position="7"> | A syntactic-semantic level.</Paragraph>
      <Paragraph position="8"> 13 It is very probable that many near-synonym clusters of a language could be discovered automatically by applying statistical techniques, such as cluster analysis, on large text corpora. For instance, Church et al. (1994) give some results in this area.</Paragraph>
      <Paragraph position="9">  A clustered model of lexical knowledge So, taking the conventional ontological model as a starting point, we cut off the ontology at a coarse grain and cluster near-synonyms under their shared concepts rather than linking each word to a separate concept. The resulting model is a clustered model of lexical knowledge. On the conceptual-semantic level, a cluster has a core denotation that represents the essential shared denotational meaning of its near-synonyms. On the subconceptual/stylistic-semantic level, we represent the fine-grained differences between the near-synonyms of a cluster in denotation, style, and expression. At the syntactic-semantic level, syntactic frames and collocational relations represent how words can be combined with others to form sentences.</Paragraph>
      <Paragraph position="10"> Figure 6 depicts a fragment of the clustered model. It shows how the clusters of the near-synonyms of error, order, person, and object in several languages could be represented in this model. In the figure, each set of near-synonyms forms a cluster linked to a coarse-grained concept defined in the ontology: Generic-Error, Generic-Order, Person, and Object, respectively. Thus, the core denotation of each cluster is the concept to which it points. Within each cluster, the near-synonyms are differentiated at the subconceptual/stylistic level of semantics, as indicated by dashed lines between the words in the cluster. (The actual differences are not shown in the figure.) The dashed lines between the clusters for each language indicate similar cross-linguistic differenti- null The core denotation and some of the peripheral concepts of the cluster of error nouns. The two large regions, bounded by the solid line and the dashed line, show the concepts (and attitudes and styles) that can be conveyed by the words error and blunder in relation to each other. ation between some or all of the words of each cluster. Not all words in a cluster need be differentiated, and each cluster in each language could have its own &amp;quot;vocabulary&amp;quot; for differentiating its near-synonyms, though in practice one would expect an overlap in vocabulary. The figure does not show the representation at the syntactic-semantic level. We can now describe the internal structure of a cluster in more detail, starting with two examples.</Paragraph>
      <Paragraph position="11">  mistake, blunder, ...); it is explicitly based on the entry from Webster's New Dictionary of Synonyms shown in Figure 1. The core denotation, the shaded region, represents an activity by a person (the actor) that is a deviation from a proper course.</Paragraph>
      <Paragraph position="12">  In the model, peripheral concepts are used to represent the denotational distinctions of nearsynonyms. The figure shows three peripheral concepts linked to the core concept: Stupidity, Blameworthiness, and Misconception. The peripheral concepts represent that a word in the cluster can potentially express, in relation to its near-synonyms, the stupidity of the actor of the error, the blameworthiness of the actor (of different degrees: low, medium,orhigh), and misconception as cause of the error. The representation also contains an expressed attitude, Pejorative, and the stylistic dimension of Concreteness. (Concepts are depicted as regular rectangles, whereas stylistic dimensions and attitudes are depicted as rounded rectangles.) The core denotation and peripheral concepts together form a directed graph of concepts linked by relations; 14 Specifiying the details of an actual cluster should be left to trained knowledge representation experts, who have a job not unlike a lexicographer's. Our model is intended to encode such knowledge once it is elucidated.</Paragraph>
      <Paragraph position="13">  The core denotation and peripheral concepts of the cluster of order verbs. The two large regions, bounded by the solid line and the dashed line, show the concepts that can be conveyed by the words order and enjoin in relation to each other.</Paragraph>
      <Paragraph position="14"> the individual concepts and relations are defined in the ontology. But although all of the near-synonyms in the cluster will convey the concepts in the core denotation, the peripheral concepts that will be conveyed depend on each near-synonym. This is depicted by the two large regions in the figure (bounded by the solid line and the dashed line), which each contain the concepts, styles, and attitudes conveyed by their associated near-synonyms, blunder and error, respectively. Thus, error conveys a degree of Blameworthiness compared to the higher degree that blunder conveys; error does not convey Stupidity whereas blunder does; blunder can also express a Pejorative attitude toward the actor, but error does not express any attitude; and error and blunder differ stylistically in their degree of Concreteness. Notice that the attitude connects to the concept Person, because all attitudes must be directed toward some entity in the situation. Stylistic dimensions such as Concreteness, on the other hand, are completely separate from the graph of concepts. Also, the indirectness of expression of each of the peripheral concepts by each of the near-synonyms is not shown in this diagram (but see below). The Appendix gives the complete representation of this cluster in the formalism of our model.</Paragraph>
      <Paragraph position="15"> Similarly, Figure 8 depicts the cluster of order verbs (order, enjoin, command, ...), including three of its peripheral concepts and one stylistic dimension. In this cluster, the core represents a communication by a person (the sayer) to another person (the sayee) of an activity that the sayee must perform. The core includes several concepts that are not actually lexicalized by any of the words in the cluster (e.g., the sayer of the  Computational Linguistics Volume 28, Number 2 order) but that nevertheless have to be represented because the peripheral concepts refer to them. (Such concepts are indicated by dashed rectangles.) The peripheral concepts represent the idea that a near-synonym can express the authority of the sayer (with possible values of Official or Peremptory), a warning to the sayee, and the imperativeness of the activity (with possible values of low, medium,orhigh). The figure shows the difference between order (the region bounded by the solid line) and enjoin (the region bounded by the dashed line).</Paragraph>
    </Section>
    <Section position="2" start_page="122" end_page="122" type="sub_section">
      <SectionTitle>
5.2 Core Denotation
</SectionTitle>
      <Paragraph position="0"> The core denotation of a cluster is the inherent context-independent (and in this formulation of the model, language-neutral) denotation shared by all of its near-synonyms.</Paragraph>
      <Paragraph position="1"> The core denotation must be specified at a level of granularity sufficient to form a useful cluster of near-synonyms (i.e., at the right level of granularity so that, for instance, human and person fall into the same cluster, but dwarf and giant do not; see Section 4).</Paragraph>
      <Paragraph position="2"> A core denotation is represented as a directed graph of concepts linked by relations. The graph can be of arbitrary size, from a single concept (such as Generic-Error) up to any number of interrelated concepts (as shown in Figures 7 and 8). It must be specified in enough detail, however, for the peripheral concepts to also be specified. For instance, in the error cluster, it was not possible to use the simple concept Generic-Error, because the peripheral concepts of the cluster refer to finer-grained aspects of the concept (the actor and the deviation); hence we used a finer-grained representation of the concept.</Paragraph>
    </Section>
    <Section position="3" start_page="122" end_page="122" type="sub_section">
      <SectionTitle>
5.3 Peripheral Concepts
</SectionTitle>
      <Paragraph position="0"> Peripheral concepts form the basic vocabulary of fine-grained denotational distinctions. They are used to represent non-necessary and indirect aspects of word meaning.</Paragraph>
      <Paragraph position="1"> That is, they are concepts that might be implied, suggested, emphasized, or otherwise when a word is used, but not always. For instance, in differentiating the error words, a lexicographer would first decide that the basic peripheral concepts required might be 'stupidity', 'blameworthiness', 'criticism', 'misconception', 'accidentalness', and 'inattention'. Then the lexicographer would proceed to distinguish the near-synonyms in terms of these concepts, for instance, by specifying that blunder involves a higher degree of blameworthiness than error.</Paragraph>
      <Paragraph position="2"> More formally, peripheral concepts are structures of concepts defined in the same ontology that core denotations are defined in. In fact, every peripheral concept in a cluster must &amp;quot;extend&amp;quot; the core denotation in some way, because, after all, peripheral concepts represent ideas related to the core meaning of a cluster of near-synonyms.</Paragraph>
      <Paragraph position="3"> But peripheral concepts are represented separately from the core denotation.</Paragraph>
      <Paragraph position="4"> Moreover, since peripheral concepts are defined in the ontology, they can be reasoned about, which, in principle, makes the formalism robust to variation in representation. That is, if a lexicographer used, say, 'responsibility' to define mistake and 'blameworthiness' to define blunder, the words could still be compared, because inference would find a connection between 'responsibility' and 'blameworthiness'. See Section 6.1 below for more discussion on this point.</Paragraph>
    </Section>
    <Section position="4" start_page="122" end_page="124" type="sub_section">
      <SectionTitle>
5.4 Distinctions between Near-Synonyms
</SectionTitle>
      <Paragraph position="0"> Following Hirst (1995), we would like to represent differences explicitly as first-class objects (so that we can reason about them during processing). While we don't adopt an explicit formalism, for reasons of practicality of representation, our implicit formalism provides a method for computing explicit differences as needed (as we'll see in Section 6.1). Thus we associate with each near-synonym in a cluster a set of  is a variable that refers to the actor of the error as specified in the core denotation, and in the denotational distinction high is a fuzzy set of values in the range [0, 1].</Paragraph>
      <Paragraph position="1"> distinctions that are taken to be relative within the cluster; the cluster establishes the local frame of reference for comparing them. So a word's set of distinctions implicitly differentiates the word from its near-synonyms. In other words, if one considers the peripheral concepts, attitudes, styles, and so on, to be dimensions, then the set of distinctions situates a word in a multidimensional space relative to its nearsynonyms. We define three types of distinction below: denotational, expressive, and stylistic.</Paragraph>
      <Paragraph position="2">  to a particular peripheral concept and specifies a value on that dimension, which can be binary (i.e., is or isn't expressed), continuous (i.e., takes a possibly fuzzy value in the range [0, 1]), or discrete (i.e., takes a conceptual structure as a value). Now, in Section 2.3.1 we observed that indirectness forms a continuum (suggestion, implication, denotation), and, following the method used by lexicographers in near-synonym guides, points on the continuum are modulated up or down by a strength, which can take the values weak, medium, or strong. To also account for context dependence at least as well as lexicographers do, we include a measure of the frequency with which the peripheral concept is conveyed by the word. This can take one of five values (never, seldom, sometimes, often, always). When the problem of context dependence is better understood, this part of the formalism will need to be changed.</Paragraph>
      <Paragraph position="3"> Thus, a denotational distinction of a word w is a quadruple of components as follows: w: (frequency strength indirectness concept) The first part of Table 2 gives some examples for the distinctions of Figures 7 and 8.  potentially any entity in a situation, an expressive distinction must include a reference to the entity. As for the attitude itself, we take a conservative approach, for now, and  Computational Linguistics Volume 28, Number 2 define only three possible attitudes: favorable, neutral, and pejorative. Thus, an expressive distinction has the following form: w: (frequency strength attitude entity) Frequency and strength have the same role as above. The entity is actually a reference (i.e., a variable) to one of the concepts specified in the core denotation of peripheral concepts. The second part of Table 2 gives an example.</Paragraph>
      <Paragraph position="4">  stylistic distinctions, that does not imply that style is easy to capture. Style is one of the most difficult of lexical phenomena to account for, since it affects the text at a pragmatic level and is highly influenced by context. Since there is as yet no comprehensive theory of style, our approach is similar to past approaches, such as those of DiMarco and Hirst (1993), Stede (1993), and Hovy (1988).</Paragraph>
      <Paragraph position="5"> Unlike the denotational distinctions discussed above, stylistic features have a global or absolute quality to them. We can compare all words, whether or not they are near-synonyms, on various stylistic dimensions, such as formality and concreteness. Because style is a global aspect of text, a certain style can be (and should be) achieved by more than just lexical choice; structural choices are just as important (DiMarco and Hirst 1993). Hence, in defining a set of stylistic dimensions, we must look for global stylistic features that can be carried not only by words but also by syntactic and larger text structures. Our stylistic dimensions include, but are not limited to, formality, force, concreteness, floridity, and familiarity.</Paragraph>
      <Paragraph position="6"> Stylistic variation also differs from the other types of variation in being related solely to the lexeme itself and not to its denotation or conceptual meaning (though in a deeper sense style is certainly related to meaning). So in representing stylistic distinctions we don't have to make any reference to entities or other aspects of the core denotation or peripheral concepts in a cluster. Thus, we represent a stylistic distinction as follows: w: (degree dimension) where degree can take a value of low, medium,orhigh (though more values could easily be added to increase the precision). The third part of Table 2 gives two examples.</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="124" end_page="138" type="metho">
    <SectionTitle>
6. Lexical Similarity
</SectionTitle>
    <Paragraph position="0"> It is not sufficient merely to represent differences between near-synonyms; we must also be able to use these representations effectively. For lexical choice, among other tasks, we need to be able to compare the similarities of pairs of near-synonyms. For example, in a transfer-based MT system, in order to translate the French word bavure into English, we need to compare the similarities of at least the three pairs bavure : error, bavure : mistake, and bavure : blunder and choose the English word whose meaning is closest to bavure, subject to any constraints arising from the context. And in text generation or interlingual MT, we need to be able to compare the similarities of each of several near-synonyms to a particular semantic representation or conceptual structure in order to choose the one that is closest to it in meaning.</Paragraph>
    <Paragraph position="1"> Now, the general problem of measuring the semantic distance between words or concepts has received much attention. This century, Wittgenstein (1953) formulated the notion of family resemblance--that several things can be related because  Edmonds and Hirst Near-Synonymy and Lexical Choice they overlap with respect to a set of properties, no property being common to all of the words--which Rosch (1978) then used as the basis for the prototype theory of meaning. Recent research in computational linguistics has focused more on developing methods to compute the degree of semantic similarity between any two words, or, more precisely, between the simple or primitive concepts  denoted by any two words.</Paragraph>
    <Paragraph position="2"> There are many different similarity measures, which variously use taxonomic lexical hierarchies or lexical-semantic networks, large text corpora, word definitions in machine-readable dictionaries or other semantic formalisms, or a combination of these (Dagan, Marcus, and Markovitch 1993; Kozima and Furugori 1993; Pereira, Tishby, and Lee 1993; Church et al. 1994; Grefenstette 1994; Resnik 1995; McMahon and Smith 1996; Jiang and Conrath 1997; Sch &amp;quot;utze 1998; Lin 1998; Resnik and Diab 2000; Budanitsky 1999; Budanitsky and Hirst 2001, 2002). Unfortunately, these methods are generally unhelpful in computing the similarity of near-synonyms because the measures lack the required precision. First, taxonomic hierarchies and semantic networks inherently treat near-synonyms as absolute synonyms in grouping near-synonyms into single nodes (e.g., in WordNet). In any case, as we argued in Section 3, taxonomies are inappropriate for modeling near-synonyms. Second, as we noted in Section 2.2, standard dictionary definitions are not usually fine-grained enough (they define the core meaning but not all the nuances of a word) and can even be circular, defining each of several near-synonyms in terms of the other near-synonyms. And third, although corpus-based methods (e.g., Lin's [1998]) do compute different similarity values for different pairs of near-synonyms of the same cluster, Church et al. (1994) and Edmonds (1997) show that such methods are not yet capable of uncovering the more subtle differences in the use of near-synonyms for lexical choice.</Paragraph>
    <Paragraph position="3"> But one benefit of the clustered model of lexical knowledge is that it naturally lends itself to the computation of explicit differences or degrees of similarity between near-synonyms. Although a fully effective similarity measure for near-synonyms still eludes us, in this section we will characterize the problem and give a solution to one part of it: computing the similarity of individual lexical distinctions.</Paragraph>
    <Section position="1" start_page="125" end_page="127" type="sub_section">
      <SectionTitle>
6.1 Computing the Similarity of Near-Synonyms
</SectionTitle>
      <Paragraph position="0"> In the clustered model of lexical knowledge, a difference between two near-synonyms is encoded implicitly in two sets of relative distinctions. From two such sets of distinctions, one can compute, or build, an explicit representation of the difference between two near-synonyms. Thus, the difference between, or similarity of, two near-synonyms depends on the semantic content of their representations on the subconceptual/stylistic level (cf. Resnik and Diab [2000], in which similarity is computed according to the structure, rather than content, of lexical conceptual structure representations of verbs; see Jackendoff [1983] and Dorr [1993]).</Paragraph>
      <Paragraph position="1"> Comparing two sets of distinctions is not straightforward, however, because, near-synonyms often differ on seemingly incommensurate dimensions. That is, the distinctions of one near-synonym will often not match those of another near-synonym, leaving no basis for comparison. For instance, in Figure 9, bavure and mistake align on only two of five denotational dimensions (Blameworthiness and Criticism), and this assumes that each of the near-synonyms was represented using the exact same pe15 By primitive concepts, we mean named concepts, or concepts that can be lexicalized by a single word, even though they may be defined in terms of other concepts in an ontology.</Paragraph>
      <Paragraph position="2">  A structure that explicitly represents the difference between bavure and mistake. The separate structures were merged, and where they differed, the two values are shown within square brackets separated by a /.</Paragraph>
      <Paragraph position="3"> ripheral concepts to begin with (i.e., both with Blameworthiness rather than, say, one with Blameworthiness and the other with a closely related concept such as Responsibility). Can one even compare an error that is caused by a misconception to an error that is stupid? (See Figure 3 for bavure.) When several dimensions are commensurate, how should one compute similarity? Consider the near-synonyms of forest: Is it possible to decide whether a &amp;quot;large and wild&amp;quot; tract of trees is closer to a &amp;quot;small wild&amp;quot; one or to a &amp;quot;medium-sized non-wild&amp;quot; one? In other words, how much of a change in the size of a forest will compensate for an opposite change in its wildness? Part of the solution lies in the fact that the dimensions of any cluster are never actually completely incommensurate; usually they will have interrelationships that can be both modeled in the ontology and exploited when comparing the representations of words. For instance, in the cluster of near-synonyms of forest, the wildness of a tract of trees is related to its size and distance from civilization (which one can infer from one's knowledge about forests and wildlife; e.g., most wildlife tries to avoid people); so there may be a way of comparing a &amp;quot;wild&amp;quot; tract of trees to a &amp;quot;large&amp;quot; tract of trees. And in the error cluster, the dimensions are related in similar ways because of their semantics and pragmatics (e.g., responsibility leads to blameworthiness, which often leads to criticism, and stupidity often leads to a pejorative attitude). Certainly these interrelationships influence both what can be coherently represented in a cluster and how similar near-synonyms are. And such relationships can be represented in the knowledge base, and hence reasoned about; a complete model, however, is out of the scope of this article.</Paragraph>
      <Paragraph position="4"> The interaction of the dimensions within a cluster is not yet very well studied, so for a partial solution, we make the simplifying assumptions that the dimensions of a cluster are independent and that each can be reduced to a true numeric dimension.</Paragraph>
      <Paragraph position="5">  16 Certainly, numeric values are necessary at some level of representation. As we've seen, nuances of meaning and style are not always clear-cut but can be vague, fuzzy, and continuously variable. Using a numerical method would seem to be the most intuitive way of computing similarity, which we have to do to compare and choose appropriate lexical items.</Paragraph>
      <Paragraph position="6">  are stylistic, then they involve the same stylistic dimension; if they are expressive, then they refer to the same entity; and if they are denotational, then they involve the same peripheral concept.</Paragraph>
    </Section>
    <Section position="2" start_page="127" end_page="128" type="sub_section">
      <SectionTitle>
6.2 Computing the Similarity of Distinctions
</SectionTitle>
      <Paragraph position="0"> Given our simplifications from above, a word's set of distinctions situates it in a numeric multidimensional space. Consider a function Sim: D x D - [0, 1], for computing the similarity of two commensurate lexical distinctions taken from the set D of all possible distinctions that can be represented in a particular cluster. A value of 0 means that the distinctions are completely different (or can't even be compared), and a value of 1 means that they are equivalent (though not necessarily identical, as two equivalent distinctions might be structurally different).</Paragraph>
      <Paragraph position="1"> Hence, each type of distinction requires its own similarity function:  Each of the similarity functions must compare the values that the pair of distinctions has on each of their components (see Section 5.4). To arrive at a final numerical value, we must reduce each component to a real-valued dimension and assign each symbolic value for that component to a numeric position on the line. Edmonds (1999) gives complete details of the formulas we developed.</Paragraph>
      <Paragraph position="2"> There is, however, a remaining interesting problem: How does one compute the degree of similarity of two conceptual structures? Denotational distinctions sometimes involve complex structures of concepts, and these structures must be somehow compared to determine their numeric degree of similarity. For instance, we might need to decide how similar a high degree of blameworthiness is to a moderate degree of blameworthiness, or to blameworthiness. Or, we might need to decide how similar official authority is to peremptory authority, or how similar arbitrary power is to peremptory authority (where arbitrariness is a kind of peremptoriness and authority is a kind of power). Computing this type of similarity is clearly different from, but related to, the problem of computing the similarity of primitive concepts (or words). We have to consider not only the content but also the structure of the representations. We are not aware of any research on the general problem of computing the similarity of arbitrary conceptual structures, though some related work has been done in the area of description logics. Cohen, Borgida, and Hirsh (1992), for example, formalize a &amp;quot;least common subsumer&amp;quot; operation that returns the largest set of commonalities between two descriptions. And Resnik and Diab (2000) use a technique, attributed to Lin, of decomposing a structure into feature sets. Edmonds (1999) describes a technique for simultaneously traversing a pair of conceptual structures under the assumption that the structures will be &amp;quot;similar&amp;quot; because they are commensurate. Still, a good solution to this problem remains an open issue.</Paragraph>
    </Section>
    <Section position="3" start_page="128" end_page="129" type="sub_section">
      <SectionTitle>
7.1 Architectures for Lexical Choice
</SectionTitle>
      <Paragraph position="0"> The clustered model of lexical knowledge is applicable to both the lexical-analysis and lexical-choice phases of a machine translation system. Figure 10 shows that during analysis, fine-grained lexical knowledge of the source language is accessed, in conjunction with the context, to determine possibilities of what is expressed in the source language text. Then, depending on the type of MT system (i.e., transfer or interlingual), the appropriate target language words can be chosen: The possibilities become preferences for choice. Recovering nuances of expression from source text is currently an open problem, which we do not explore further here (but see Edmonds [1998] for some preliminary work). In this section we concentrate on the second phase of MT and show that robust, efficient, flexible, and accurate fine-grained lexical choice is a natural consequence of a clustered model.</Paragraph>
      <Paragraph position="1"> Lexical choice, as we see it, is more than a problem of mapping from concepts to words, as the previous section might have implied; it is a problem of selecting words so as to meet or satisfy a large set of possibly conflicting preferences to express certain nuances in certain ways, to establish the desired style, and to respect collocational and syntactic constraints. So lexical choice--genuine lexical choice--is making choices between options rather than merely finding the words for concepts, as was the case in many early text generation systems (for instance, BABEL [Goldman 1975], MUMBLE [McDonald 1983], and TEXT [McKeown 1985]). This kind of lexical choice is now thought to be the central task in text generation (or, at least, sentence generation), because it interacts with almost every other task involved. Indeed, many recent text generation systems, including MOOSE (Stede 1999), ADVISOR II (Elhadad, McKeown, and Robin 1997), and Hunter-Gatherer (Beale et al. 1998), among others (see Reiter and Dale's [1997] survey), adopt this view, yet their lexical-choice components do not account for near-synonymy. Without loss of generality, we will look at fine-grained lexical choice in the context of one of these systems: Stede's MOOSE (1999).</Paragraph>
      <Paragraph position="2"> The input to MOOSE is a &amp;quot;SitSpec,&amp;quot; that is, a specification of a situation represented on the conceptual-semantic level as a graph of instances of concepts linked  Edmonds and Hirst Near-Synonymy and Lexical Choice by relations. MOOSE outputs a complete well-formed &amp;quot;SemSpec,&amp;quot; or semantic specification on the syntactic-semantic level, from which the Penman sentence realization system can generate language.</Paragraph>
      <Paragraph position="3">  MOOSE processes the input in two stages. It first gathers all of the lexical items (as options for choice) whose conceptual-semantic representation covers any part of the SitSpec. Then it chooses a set of lexical items that satisfy Stede's three criteria for sentence planning: the input SitSpec is completely covered (and so is completely lexicalized without redundancy); a well-formed SemSpec can be built out of the partial SemSpecs associated with each of the chosen lexical items; and as many of the preferences are satisfied as possible. MOOSE supports preferences, but only those that require structural decisions, such as choosing a causative over inchoative verb alternation. The main goal of Stede's work was to account for structural paraphrase in sentence generation, not near-synonymy.</Paragraph>
      <Paragraph position="4"> In the general case of sentence planning, given a set of input constraints and preferences, a sentence planner will make a large number of decisions of different types--lexical, syntactic, and structural--each of which has the potential to satisfy any, some, or all of the input preferences (while trying to satisfy the constraints, of course). It is unlikely that any particular set of preferences can all be satisfied simultaneously, so some kind of conflict resolution strategy is required in order to manage the decision-making task. It is not within the scope of this paper to develop solutions to this general problem (but see Nirenburg, Lesser, and Nyberg [1989], Wanner and Hovy [1996], Elhadad, McKeown, and Robin [1997], and Stede [1999] for a variety of solutions).</Paragraph>
      <Paragraph position="5"> Instead, we will discuss the following two new issues that arise in managing the interactions between lexical choices that a clustered model brings out: * We will argue for a unified model for representing any type of preference for lexical choice.</Paragraph>
      <Paragraph position="6"> * We describe a two-tiered model of lexical choice that is the consequence of a clustered model of lexical knowledge.</Paragraph>
      <Paragraph position="7"> Then, we will end the section with a brief description of our software implementation of the model, called I-Saurus.</Paragraph>
    </Section>
    <Section position="4" start_page="129" end_page="131" type="sub_section">
      <SectionTitle>
7.2 Constraints and Preferences
</SectionTitle>
      <Paragraph position="0"> Simple systems for lexical choice need only to make sure that the denotations of the words chosen in response to a particular input exactly match the input. But when we use fine-grained aspects of meaning, the lexical-choice process, and so, in turn, its input, will ultimately be more complex. But with so many possibilities and options, choosing from among them will necessarily involve not only degrees of satisfying various criteria, but also trade-offs among different criteria. Some of the criteria will be hard constraints (i.e., a SitSpec), ensuring that the basic desired meaning is accurately conveyed, and others will be preferences.</Paragraph>
      <Paragraph position="1"> The main difference between a constraint and a preference is that a preference is allowed to be satisfied to different degrees, or even not at all, depending on the decisions that are made during sentence planning. A preference can be satisfied by 17 A SemSpec is a fully lexicalized sentence plan in Penman's Sentence Plan Language (SPL). SPL is defined in terms of the Penman Upper Model, a model of meaning at the syntactic-semantic level, which ensures that the SemSpec is well-formed linguistically. Penman can thus turn any SemSpec into a well-formed sentence without having to make any open-class lexical decisions (Penman Natural  And because conflicts and trade-offs might arise in the satisfaction of several preferences at once, each preference must have an externally assigned importance factor.</Paragraph>
      <Paragraph position="2"> Many types of preference pertain to lexical choice, including emphasizing an aspect of an entity in a situation, using normal words or a certain dialect, using words with a particular phonology (e.g., words that rhyme), using different near-synonyms for variety or the same word as before for consistency, and so on. All should be formalizable in a unified model of preference, but we have started with three types corresponding to the subconceptual level of the clustered model: denotational (or semantic), expressive, and stylistic preferences.</Paragraph>
      <Paragraph position="3"> Denotational preferences are distinct from denotational constraints, but there is no theoretical difference in the nature of a &amp;quot;preferred&amp;quot; meaning to a &amp;quot;constrained&amp;quot; meaning. Hence, we can represent both in the same SitSpec formalism. Thus, a denotational preference is a tuple consisting of a partial SitSpec and a preferred method of expression, which takes a value on the continuum of indirectness (see Section 5.4). An expressive preference requests the expression of a certain attitude toward a certain entity that is part of the situation. Thus, an expressive preference is a tuple consisting of a reference to the entity and the stance that the system should take: favor, remain neutral, or disfavor. A stylistic preference, for now, is simply a value (of low, medium, or high) on one of the stylistic dimensions. We will see some examples in Section 8.</Paragraph>
      <Paragraph position="4">  to simultaneously satisfy all of the input preferences by choosing appropriate near-synonyms from appropriate clusters. But if none of the available options will satisfy, to any degree, a particular preference, then that preference is trivially impossible to satisfy (by lexical choice). But even when there are options available that satisfy a particular preference, various types of conflicts can arise in trying to satisfy several preferences at once, making it impossible to use any of those options. At the level of clusters, for instance, in choosing a particular cluster in order to satisfy one preference, we might be therefore unable to satisfy another preference that can be satisfied only by a different, competing cluster: We might choose the cluster of the err verbs (to err, to blunder) because of the simplicity or directness of its syntax: John erred; but we would not be able simultaneously to satisfy a preference for implying a misconception by choosing, say, mistake from the cluster of error nouns: John made a mistake.</Paragraph>
      <Paragraph position="5"> Similar trade-offs occur when choosing among the near-synonyms of the same cluster. Such lexical gaps, where no single word can satisfy all of the input preferences that a cluster can potentially satisfy, are common. For instance, in English, it's hard to talk about a mistake without at least some overtones of criticism; in Japanese one can: with ayamari instead of machigai (Fujiwara, Isogai, and Muroyama 1985). There is also no near-synonym of error in English that satisfies preferences to imply both stupidity and misconception; blunder satisfies the former but not the latter, and mistake vice versa. Similarly, there is no formal word for an untrue statement (i.e., a lie) that also expresses that the lie is insignificant; fib is an option, but it is not a formal word. And there is no word for a tract of trees that is both large and not wild; forest has the former property, woods the latter.</Paragraph>
      <Paragraph position="6"> 18 A preference is like a floating constraint (Elhadad, McKeown, and Robin 1997) in that it can be satisfied by different types of decision in sentence planning but differs in that it may be satisfied to different degrees.</Paragraph>
      <Paragraph position="7">  Edmonds and Hirst Near-Synonymy and Lexical Choice Two separate simultaneous choices might also conflict in their satisfaction of a single preference. That is, the preference might be satisfied by one choice and negatively satisfied by another choice. For instance, it might happen that one word is chosen in order to express a favorable attitude toward a participant in a particular situation and another word is chosen that inadvertently expresses a pejorative attitude toward the same person if that second word is chosen in order to satisfy some other preference.</Paragraph>
      <Paragraph position="8"> And of course, stylistic decisions can often conflict (e.g., if one has to choose both formal and informal words).</Paragraph>
      <Paragraph position="9"> Our solution to resolving such lexical gaps and conflicting preferences is to use an approximate matching algorithm that attempts to satisfy collectively as many of the preferences as possible (each to the highest degree possible) by choosing, on two tiers, the right words from the right clusters.</Paragraph>
      <Paragraph position="10">  We will describe this model in Section 7.3.</Paragraph>
      <Paragraph position="11">  taneously satisfy two preferences under any circumstances? We have assumed up to now that the set of preferences in the input is consistent or well-formed. This is often a reasonable assumption. In the context of MT, for instance, we can assume that a &amp;quot;good&amp;quot; analysis stage would output only well-formed expressions free of incompatibilities. But two preferences may be incompatible, and we would like our system to be able to detect such situations. For instance, preferences for both low and high severity are incompatible; not only is it impossible for a word to simultaneously express both ideas, but if the system were to attempt to satisfy both, it might output a dissonant expression  such as &amp;quot;I (gently) chided Bill for his (careless) blunder&amp;quot; (the preference to harshly criticize Bill is satisfied by blunder, and the preference to gently criticize Bill is satisfied by chide). (Of course, a dissonant expression is not always undesirable; it might be used for special effect.) This kind of incompatibility is easy to detect in our formalism, because peripheral concepts are explicitly modeled as dimensions. There are, of course, other types of incompatibility, such as denotational and contextual incompatibilities, but we won't discuss them further here (see Edmonds [1999]).</Paragraph>
    </Section>
    <Section position="5" start_page="131" end_page="134" type="sub_section">
      <SectionTitle>
7.3 Two-Tiered Lexical Choice
</SectionTitle>
      <Paragraph position="0"> We believe that it is important to separate the processes for making these two types of decision--even though they must interact--because of their different choice criteria and effects. The former type involves choosing between options of differing coarse-grained semantic content and resulting syntactic structure (i.e., paraphrases): clusters 19 A complementary approach is to paraphrase the input and hence explicitly express a preferred implication or mitigate against an unwanted implication (for instance, by generating insignificant lie when fib is too informal). A sentence planner, like MOOSE, is designed to generate such structural paraphrases, so we have concentrated on the lexical issues here.</Paragraph>
      <Paragraph position="1">  Computational Linguistics Volume 28, Number 2 have different core denotations, after all. Here, issues of syntactic and semantic style are involved, as one can choose how the semantic content is to be incorporated. On the other hand, the latter type of decision involves options that might have subtle semantic and stylistic differences but result in the same syntactic structure (though collocational and subcategorization structure can vary).</Paragraph>
      <Paragraph position="2"> In other words, lexical choice is a two-tiered process that must find both the appropriate set of cluster options and the appropriate set of lexical items (one from each chosen cluster option) whose contributing SemSpec fragments can be unified into a complete well-formed SemSpec. Of course, many possible SemSpecs can usually be generated, but the real problem is to find the combination of cluster options and lexical items that globally satisfy as many of the input preferences as possible.</Paragraph>
      <Paragraph position="3"> For instance, Figure 11 depicts the state of processing the SitSpec for the utterance by John of an untrue statement just before lexical choice occurs. There are four cluster options (denoted by the suffix C): say C and tell-a-lie C match subgraphs of the SitSpec rooted at say1, untruth C matches the graph rooted at lie1, and John C matches john1. Now, the system could choose the tell-a-lie C cluster and the John C cluster, which fully cover the SitSpec, and then choose the words John and lie to come up with John lies, or the system could choose John and prevaricate for John prevaricates. The system could also choose the say C, untruth C and John C clusters, and then the words tell, fib, and John, to end up with John tells a fib. These alternatives--there are many others--are different in structure, meaning, and style. Which best satisfies the input preferences, whatever they may be? We can formally define fine-grained lexical choice (within sentence planning) as follows. Given an input SitSpec S and a set of compatible preferences P, the goal is to find a set C of i cluster options and a word w</Paragraph>
      <Paragraph position="5"> can be combined into a</Paragraph>
      <Paragraph position="7"> The first criterion ensures complete coverage without redundancy of the input SitSpec, so the desired meaning, at a coarse grain, is expressed; the second ensures that a SemSpec can be constructed that will lead to a grammatical sentence; and the third ensures that the preferences are collectively satisfied as much as is possible by any sentence plan. The third criterion concerns us here; the first two are dealt with in MOOSE.</Paragraph>
      <Paragraph position="8"> As we said earlier, a thorough understanding of the interacting decisions in lexical choice is still an open problem, because it is context dependent. Our present solution is simply to assume that no complex interaction between decisions takes place. So, assuming that each option has an associated numeric score (the degree to which it satisfies all of the preferences), we can simply choose the set of options that maximizes the sum of the scores, subject to the other constraints of building a proper sentence plan. Thus, we do not provide a solution to how the context affects the combination of the scores. So, given a sentence plan SP and a set of preferences P, we have  The state of processing just before lexical choice on the input for John tells a lie. Four clusters have become options; each is shown with its core denotation and near-synonyms. Solid arrows in the SitSpec indicate relations between instances of concepts. Solid arrows in the cluster options relate concepts in the core denotations. Dashed arrows link SitSpec nodes to the cluster options that cover subgraphs rooted at the nodes.</Paragraph>
      <Paragraph position="9"> where WSat is the degree to which w satisfies the preferences P (see Equation (3)).</Paragraph>
      <Paragraph position="10"> The most preferred sentence plan SP prime is thus the one whose set of word choices maximizes Satisfaction(P, SP prime ). This function accounts for trade-offs in the satisfaction of preferences, because it finds the set of words that collectively satisfy as many of the preferences as possible, each to the highest degree possible.</Paragraph>
      <Paragraph position="11"> Each word in SP has to be chosen from a distinct cluster. Thus, given * a particular cluster c in the set of all cluster options * a list W of candidate near-synonyms of c, ordered according to a prespecified criterion (some candidates of the cluster might have already been ruled out because of collocational constraints)  [?] P of compatible preferences, each of which can potentially be satisfied by a word in c, with associated importances: Imp: P - [0, 1] find the first candidate w prime [?] W such that WSat(P, w prime ) is maximized.</Paragraph>
      <Paragraph position="12">  We use an approximate-matching algorithm to compute WSat(P, w). Under the simplification that its value depends on the degree to which w individually satisfies each of the preferences in P c , the algorithm computes WSat(P, w) by combining the set of scores Sat(p, w) for all p [?] P c . Various combination functions are plausible, including simple functions, such as a weighted average or a distance metric, and more complex functions that could, for instance, take into account dependencies between preferences.</Paragraph>
      <Paragraph position="13">  Deciding on this function is a subject for future research that will empirically evaluate the efficacy of various possibilities. For now, we define WSat as a weighted average of the individual scores, taking into account the importance factors: WSat(P, w)=WSat(P</Paragraph>
      <Paragraph position="15"> For a given preference p [?] P c , the degree to which p is satisfied by w, Sat(p, w), is reducible to the problem of computing similarity between lexical distinctions, for which we already have a solution (see Equation (1)). Thus, Sat(p, w)=Sim(d(p), d(w)) (4) where d(p) is a kind of pseudo-distinction generated from p to have the same form as a lexical distinction, putting it on equal footing to d(w), and d(w) is the distinction of w that is commensurate with d(p), if one exists.</Paragraph>
    </Section>
    <Section position="6" start_page="134" end_page="135" type="sub_section">
      <SectionTitle>
7.4 Implementation: I-Saurus
</SectionTitle>
      <Paragraph position="0"> I-Saurus, a sentence planner that splits hairs, extends Stede's MOOSE (1999) with the modifications formalized above for fine-grained lexical choice. It takes a SitSpec and a set of preferences as input, and outputs a sentence plan in Penman's SPL, which Penman generates as a sentence in English. (Section 8 provides an example.) Now, finding the best set of options could involve a lengthy search process. An exhaustive search through all possible sentence plans to find the one that maximizes Satisfaction(P, SP) can be very time-inefficient: In the relatively small example given in Section 8, there are 960 different sentence plans to go through. To avoid an exhaustive search, we use the following heuristic, adopted from Stede (1999): In order to find the globally preferred sentence plan, make the most preferred local choices. That is, whenever a (local) decision is made between several options, choose the option with the highest score. Thus, we postulate that the most preferred sentence plan will be one of the first few sentence plans generated, though we offer no proof beyond our intuition that complex global effects are relatively rare, which is also a justification for the simplifications we made above.</Paragraph>
      <Paragraph position="1"> Figure 12 gives an algorithm for two-tiered lexical choice embedded in MOOSE's sentence planner. The main additions are the procedures Next-Best-Cluster-Option and 21 For instance, we might want to consider a particular preference only after some other preference has been satisfied (or not) or only to resolve conflicts when several words satisfy another preference to the same degree.</Paragraph>
      <Paragraph position="2">  Edmonds and Hirst Near-Synonymy and Lexical Choice</Paragraph>
      <Paragraph position="4"> if we've tried all the options then return &amp;quot;fail&amp;quot; (2) w -Next-Best-Near-Synonym(c, P) if we've tried all the near-synonyms in c then backtrack to (1) p - partial SemSpec of w if p has external variables then for each external variable v in p s - Build-Sentence-Plan(node bound to v, P) if s = &amp;quot;fail&amp;quot; then backtrack to (2) else attach s to p at v return p Figure 12 The sentence-planning algorithm. This algorithm outputs the most preferred complete well-formed SemSpec for a subgraph rooted at given node in the SitSpec. Next-Best-Near-Synonym. (Note, however, that this version of the algorithm does not show how complete coverage or well-formedness is ensured.) Next-Best-Cluster-Option moves through the cluster options that cover part of the SitSpec rooted at node in order of preference. As we said above, structural decisions on this tier of lexical choice are outside the scope of this article, but we can assume that an algorithm will in due course be devised for ranking the cluster options according to criteria supplied in the input. (In fact, MOOSE can rank options according to preferences to foreground or background participants, in order to make them more or less salient, but this is only a start.) Next-Best-Near-Synonym steps through the near-synonyms for each cluster in order of preference as computed by WSat(P, w).</Paragraph>
    </Section>
    <Section position="7" start_page="135" end_page="138" type="sub_section">
      <SectionTitle>
7.5 Summary
</SectionTitle>
      <Paragraph position="0"> The two-tiered lexical-choice algorithm (and sentence-planning algorithm) developed in this section is as efficient as any algorithm developed to date for a conventional model of lexical knowledge (without near-synonyms), because it can find the appropriate cluster or clusters just as easily as the latter can find a word; A cluster in our model corresponds to an individual word in the conventional model. And choosing a near-synonym from a cluster is efficient because there are normally only a few of them per cluster. The system does not have to search the entire lexicon. The full complexity of representing and using fine-grained lexical differences is partitioned into small clusters. The process is also robust, ensuring that the right meaning (at a coarse grain) is lexicalized even if a &amp;quot;poor&amp;quot; near-synonym is chosen in the end. And when the right preferences are specified in the input, the algorithm is accurate in its choice, attempting to meet as many preferences as possible while also satisfying the constraints.</Paragraph>
      <Paragraph position="1"> 8. Example A formal evaluation of I-Saurus would require both a substantial lexicon of clusters and a large test suite of input data correlated with the desired output sentences. Building such a suite would be a substantial undertaking in itself. Barring this, we could  Note: For each candidate, we show the satisfaction scores (Sat) for each individual preference and the total satisfaction scores (WSat): fib scores highest.</Paragraph>
      <Paragraph position="2"> evaluate I-Saurus as an MT system, in terms of coverage and of quality (intelligibility, fidelity, and fluency). Unfortunately, I-Saurus is but a prototype with a small experimental lexicon, so we can only show by a few examples that it chooses the most appropriate words given a variety of input preferences.</Paragraph>
      <Paragraph position="3"> Returning again the situation of John and his lie (Figure 11), consider the set of four simultaneous preferences shown in the top part of Table 3. The bottom part shows the scores for each candidate in the untruth C cluster. If this cluster option were chosen, then I-Saurus would choose the noun fib, because it simultaneously and maximally satisfies all of the preferences as shown by the score of WSat({1, 2, 3, 4}, fib)=3.00. But note that if fib were not available, then the second-place lie would be chosen, leaving unsatisfied the preference to express insignificance.</Paragraph>
      <Paragraph position="4"> Now, for a whole sentence, consider the SitSpec shown in Table 4. For this, I-Saurus can generate 960 different sentence plans, including plans that realize the sentences John commands an alcoholic to lie and John orders a drunkard to tell a fib.I-Saurus can be so prolific because of the many possible combinations of the near-synonyms of the six clusters involved: John C (one near-synonym), alcoholic C (ten near-synonyms), order C (six near-synonyms), say C (two near-synonyms), untruth C (six near-synonyms), and tell-a-lie C (four near-synonyms).</Paragraph>
      <Paragraph position="5"> The bottom part of Table 4 shows the variety of output that is possible when each individual preference and various combinations of simultaneous preferences (cases i-x) are input to the system. (The numbered preferences are listed at the top of the table.) So for example, if we input preference 3 (high formality), the system outputs John enjoins an inebriate to prevaricate. The output appears stilted in some cases because no other parameters, such as desired verb tense, were given to Penman and because the system has no knowledge of collocational constraints. Of course, we could have defined many other preferences (and combinations of preferences), but we chose these particular ones in order to show some of the interesting interactions that occur among the cluster options during processing; they are not meant to be representative of what a user would normally ask of the system.</Paragraph>
      <Paragraph position="6"> Consider, for instance, case iv. Here, one cluster can satisfy preference 6 (pejorative attitude), and another cluster can satisfy preference 10 (misconception), but neither  ii 1, 9 John commands a drunk to fib.</Paragraph>
      <Paragraph position="7"> iii 3, 6 John enjoins a drunkard to prevaricate.</Paragraph>
      <Paragraph position="8"> iv 6, 10 John commands a drunkard to tell an untruth.</Paragraph>
      <Paragraph position="9"> v 3, 9 John enjoins an inebriate to fib.</Paragraph>
      <Paragraph position="10"> vi 3, 7, 9 John commands an inebriate to fib.</Paragraph>
      <Paragraph position="11"> vii 3, 8, 9 John orders an inebriate to fib.</Paragraph>
      <Paragraph position="12"> viii 3, 6, 8, 9 John orders a drunkard to fib.</Paragraph>
      <Paragraph position="13"> ix 3, 5 John enjoins a tippler to tell a prevarication.</Paragraph>
      <Paragraph position="14"> x 3, 5, 11 John enjoins a tippler to tell a prevarication.</Paragraph>
      <Paragraph position="15"> cluster can satisfy both preferences on its own. So the system chooses drunkard, because it is pejorative, and untruth, because it implies a misconception. No other combination of choices from the two clusters could have simultaneously satisfied both preferences. And finally, consider case v, which illustrates a clash in the satisfaction of one of the preferences. Fib is chosen despite the fact that it is informal, because it is the only word that implies an insignificant lie. But the system compensates by choosing two  Computational Linguistics Volume 28, Number 2 other formal words: enjoin and inebriate. If we add a preference to this case to imply that John has official authority (case vi), then I-Saurus system chooses command instead of enjoin, further sacrificing high formality.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML