File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/w00-1413_metho.xml

Size: 16,545 bytes

Last Modified: 2025-10-06 14:07:27

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-1413">
  <Title>The hyperonym problem revisited: Concep, tual and::lexical.:hierarchies.in:,Janguage,,generation::</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Representations used in language processing owe much to the tradition of 'semantic networks', which nowadays have been successfully formalized and organized especially around one particular kind of link between nodes: the ISAlink, which connects entities to subordinate entities. This link is, by definition, the root of the so-called 'hyperonym 1 problem': When a speaker utters a word, she presumably needs to retrieve a lemma from her mental lexicon, and the 'applicability conditions&amp;quot; of the lemma automatically render the lemma's hyperonyms also applicable, thus raising the question how the choice among a set of more or less specific words is made.</Paragraph>
    <Paragraph position="1"> In this paper, I briefly review approaches to the hyperonym problem in psycholinguistics, natural language generation, and lexical semantics. In doing that, I will refer to different branches of NLG according to their roots I Alternatively called 'hypernym' in many publications: 'hyperonym&amp;quot; seems preferable, as the Greek root is 'hyper&amp;quot; (super) + 'onoma' (name).</Paragraph>
    <Paragraph position="2"> .... ~ ......................... * ....... : ........ ~ .: : ...... and main motivations. Generally acknowledged are the two poles of 'cognition-inspired' and 'engineering-inspired' language production: Cognition-inspired work (CI-NLG, for short) seeks to build models that replicate performance data and explain phenomena of human language production with the help of psychological experiments; engineering-inspired work (EI-NLG) seeks to build programs that provide linguistic output to some particular computer application. These goals are extremely different, and it seems that the gap between the respective methodologies will persist for quite some time. In between the two, however, I would situate a third category, which may be called 'linguistics-inspired'. For this branch, here abbreviated as LI-NLG, the primary motivation is neither in modelling human performance nor in efficiently performing a technical application; rather, LI-NLG seeks production models that replicate 'competence data', i.e. that account for observed linguistic regularities, without con&gt; miting to statements about the human production p~vcess.</Paragraph>
    <Paragraph position="3"> Arguing that progress hinges on a better understanding of the structure of the mental vocabulary, which includes a clear picture of the nature of the ISA-link, I will sketch a framework of distinct (but related) conceptual and lexical hierarchies, which offers possibilities to account for at least some of the phenomena to be discussed. null</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="93" type="metho">
    <SectionTitle>
2 The hyperonym problem
</SectionTitle>
    <Paragraph position="0"> Following tile psycholinguistics literature, the hyperonym problem is regarded as all aspect of lemrna retrieval. Roelofs \[1996, p. 308\] describes a 'lemma' as a representation of the meaning and the syntactic properties of a word, and the task of lemma retrieval as a crucial step in the  process of grammatical encoding, where build- situations of utterance. More concrete, given ing of a phrasal, clausal, or sentential structure a conceptual specification (in a wide sense, in-requires the syntacti~information :thattemmas. : :ctuding,:eontextual. parameters=andcommun:iCa- contain. null Thus abstracting from .the other steps of language production (formulation, articulation) as well as from possible influences of context, the task is confined to retrieve a lemma that corresponds to the Conceptual specification that is represented in some adequate way. For the psycholinguist, the~geneya~!_.prgb!em is that of convergence from an under-specified conceptual representation to one word that the speaker utters. Levelt \[1989, p. 20I\] characterizes the hyperonym problem: &amp;quot;There is one particularly nasty convergence problem that has not been solved by any theory of lexical access.</Paragraph>
    <Paragraph position="1"> l will call it the hyperonym problem \[...\]: When lemma A's meaning entails lemma B's meaning, B is a hyperonym of A. If A's conceptual conditions are met, then B's are necessarily also satisfied. Hence, if A is the correct lemma, B will (also) be retrieved.&amp;quot; The relation of hyperonymy is generally regarded as transitive: If A is a hyperonym of B, and B is a hyperonym of C, then A is a hyperonym of C. Following common practice, we call A a direct hyperonym of B, while it is only an indirect hyperonym of C. The same holds for the inverse relation, hyponymy.</Paragraph>
    <Paragraph position="2"> For CI-NLG, which is concerned with finding models that resolve the convergence problem with the impressive speed displayed by human speakers, the hyperonym problem is important because it. serves to put implemented models of spreading activation to the test. For EI-NLG. on the other hand, it can usually be ignored, as most of today's practical applications either do not require the production of a more general word (i.e.. there is a one-to-one mapping from concept to word) or can rely on fairly simple mechanisms that.,avoid ,lexical repetitions bv choosing from a fixed, pre-defined set of near-synonyms. For LI-NLG, the challenge of the hyperonynl problem is to explain how a sentence can be paraphrased by others that replace a word by a hyperonym, and why speakers select from candidate hyperonyms in different rive goals), the task is to find the best candidate from a set of valid paraphrases, here especially on the grounds of replacing content words with hyperonyms.</Paragraph>
  </Section>
  <Section position="6" start_page="93" end_page="94" type="metho">
    <SectionTitle>
3 Psycholinguistic production
</SectionTitle>
    <Paragraph position="0"> models .... Lan gu age' prod n'ction ~m o dels~deve\[oped in- psy:--, cholinguistics are nowadays couched in neural network theory. Under debate are the computational properties of the networks, i.e., the modes of activation spreading, tile existence of feedback, of inhibitory links, etc. The main methodological concern is to construct the models in such a way that they account for data gathered in human speech production experiments, often involving production errors, which can shed light on the underlying mechanisms.</Paragraph>
    <Paragraph position="1"> A central point of content is the question whether the meaning of concepts and/or words is represented in a decomposed fashion or not. Here, the hyperonym problem is sometimes used as evidence by proponents of non-decompositional models. Roelofs \[1996\], for instance, argues that if a number of nodes representing semantic features are the basis for lexical access, in lemma retrieval it becomes extremely difficult to control the activation spread in such a way that only the most specific lexical unit that combines these features gets selected.</Paragraph>
    <Paragraph position="2"> Roelofs concludes that a non-decompositional model is to be favoured: When lemma retrieval starts with activation of the 'lexical concept' FATHER, rather than with tile features MALE and PARENT, the output word will be father, without the danger of being outranked by a higher activation of parent (or person, or entity.</Paragraph>
    <Paragraph position="3"> presumably).</Paragraph>
    <Paragraph position="4"> This line is continued in a recent comprehensive theory of speech production by Levelt. Roelofs, and Meyer \[1999\]. The focus of .this. theory_is more _on. the side. of.articulation, but their approach to (non-) decompos'itionan/:t hyperonyms follows the basic assumption just sketched. The model consists of three layers of nodes: A layer of concept nodes with labelled concept links, a layer of lemma nodes, and a layer of word form nodes that include morpho- null logical information. When a lexical concept is activated, the mechanism of activation spread:ing ensures that ~the::~directly:..ecm:nected::lemma.... receives tile highest activation, and not a lemma associated with a hyperonym of the lexical concept (which is connected by an ISA-link).</Paragraph>
    <Paragraph position="5"> Working out the mechanics to ensure this behaviour is important for the implementation, but from the particular viewpoint of word choice, approaches of this kind are not very explanatory. Levelt. et.al. :\[1999, ~..~,4\]i istate that &amp;quot;there is not the slightest evidence that speakers tend to produce hyperonyms of intended target words.&amp;quot; But when lexical access starts with an appropriately activated lexical concept, the problem is effectively moved away, into the realm of conceptualization. The authors acknowledge the need for a component that establishes a 'perspective' by selecting a specific set of words, but have not incorporated such a component into their model. Thus, why and how the lexical concept receives its activation, and where the intention of using a word arises from, is not covered by the theory. For these questions, we have to turn to work in natural language generation.</Paragraph>
  </Section>
  <Section position="7" start_page="94" end_page="95" type="metho">
    <SectionTitle>
4 Hyperonyms in NLG systems
</SectionTitle>
    <Paragraph position="0"> In contrast to psycholinguistics-inspired work, the vast majority of natural language generation systems uses computations based on symbol manipulation, often connected with symbolic knowledge representation and reasoning techniques. In these systems, the hyperonym problem as one aspect of the general task of lexical choice arises only in systems that employ a sufficiently rich model of the lexicon and tile concept-lexicon link. involving some sort of hierarchy information. As pointed out above, from an application-oriented perspective (i.e..</Paragraph>
    <Paragraph position="1"> in EI-NLG) it is often sufficient to work with rather limited mechanisms that largely eschew the lexical choice task.</Paragraph>
    <Paragraph position="2"> The earliest and very influential device for performing lexical choice, Goldman's-\[.1.-975\] discrimination net hard-wires the sequence of choice points leading to a specific lexical item, which is in fact the general strategy taken in the majority of NLG systems: if you have a choice.</Paragraph>
    <Paragraph position="3"> then prefer the most specific term.</Paragraph>
    <Paragraph position="4"> The most substantial criticism on the preferthe-specific heuristic has been voiced in the work of Reiter \[1991\]. One of his examples :is. ~a.. system., ~as~zerhlg~:the-N.uestio n .*Is; .Ter~y:a woman? Even if the system has the specific knowledge that Terry is a bachelor, the response No, Terry is a bachelor would not be appropri-..</Paragraph>
    <Paragraph position="5"> ate here; the less specific No, Terry is a man is better since it does not prompt the hearer to draw ally conclusions as to tile particular relevance of Terry's marital status for the present Lc0:n~ersa, tion, Reiter?s-. main -pointis:to distinguish the knowledge a generation system has at its disposal from the communicative goals followed in producing an utterance. The latter are explicitly represented in his system as a. list of attributes 'to communicate about an entity', which is a subset of the overall knowledge the system has of that entity. In the Terry-example, the goal is to inform the hearer that Terry has the attributes {Human, Age-status:adult, Sex:Male}.</Paragraph>
    <Paragraph position="6"> In the KL-ONE \[Brachman, Schmolze 1985\]) style knowledge representation used by Reiter, concepts can be marked as 'basic-level' in the sense of \[Rosch 1978\]. Thus, on the taxonomic path Tweety (instance-of) Robin - Bird - Vertebrate - Animal - Object, the concept Bird is a basic-level one, which leads to a preference for using the corresponding lexical item when referring to some kind of bird (i.e., some concept or instance subsumed by it). Simultaneous to Rosch's work, Cruse \[1977\] (who in turn was building on earlier research by Roger Brown in tile 1960s) had pointed out that tile failure to use items of &amp;quot;inherently neutral specificity&amp;quot; (a notion that closely corresponds to the basic level) results in unwanted conversational implica.tures I tile hearer will surmise the existence of some reason why the neutra.1 term could not be used in the specific situation of utterance.</Paragraph>
    <Paragraph position="7"> But using the basic level is not mandatory.</Paragraph>
    <Paragraph position="8"> of course. Given a suitable context where attention is directed to particular attributes of entitities, a speaker moves to a more specific or sometimes to a more ~ general :level. ~:Reiter's mechanism of to-communicate attributes tries to capture this: Covering these attributes with a suitable term can override the preference for the basic level. Other kinds of preferences are also accounted for, such as favouring shorter rather than longer words, which typically (but  not always) co-incides with the basic-level preference. Reiter notes that humans also employ ...... - some preferences.t:hat can~otbe explained ~wi,th the parameters investigated so far. He gives the example \[Reiter 1991, p. 248\] of a speaker pointing the hearer to a cow and a horse with the utterance Look at the animals / mammals / vertebrates, t None of the terms is basic-level or signigificantly shorter than the others, yet there is a clear order of-'normality' in the sequence of the three candidates.</Paragraph>
    <Paragraph position="9"> In my own work on lexical choice in the 'Moose' generator \[Stede 1999\], I used language-neutral conceptual hierarchies and the subsumption relation, inter alia to account for the fact that different languages occasionally display preferences for different levels of specificity. For example, in hi-lingual instructional text we find a regular correspondence between the general English to remove and numerous more specific German lexemes ( abziehen, abnehmen, herausdrehen, ...); this might very well be a genre-specific tendency. Furthermore, Moose employs a model of lexical connotations that can override the general preference for a more specific lexical item. For example, when referring to a POODLE in a derogatory manner, Moose can choose the appropriately connotated word mutt, which requires moving up the taxonomy to the DOG concept, where a range of near-synonyms (differing in their connotations) are attached.</Paragraph>
    <Paragraph position="10"> Another reason for considering hyperonyms in the lexical choice process is to avoid repeated usage of the same term when referring to some object multiple times.</Paragraph>
    <Paragraph position="11"> In the present Moose implementation, all more general words are inherited to the conceptto-be-lexicalized, and the preference mechanism selects one of them (in case of absence of any decisive factors, it chooses the most specific word). This mechanism is certainly not cognitively adequate (it was not intended to be) and also not particularly efficient: The range of candidates under consideration should be constrained beforehand. null -In conclusion, NLG systems, employ a mixture of constraints and preferences in their approaches to hyperonymy. The factors used by various systems in the choice process are:  o User's vocabulary and knowledge (e.g..</Paragraph>
    <Paragraph position="12"> \[Mcl(eown et al. 199:\]\]) . Successul reference, i.e., discrimination from other candidate entities (e.g., \[Dale, Reiter1995\]) .:: :- ........ ~' * Basic-level and entry-level effects, conversational implicatures (r) Length of words (r) Stylistic features such as formality, positive/negative attitude * Language, genre _, * Givenness of item, avoid repetition or &amp;quot;say null ing the very obvious&amp;quot; Not surprisingly, there is no generator yet that would incorporate all these factors within a single system. It is not clear which general lexical items should be inherited down to the concept-to-be-lexicMized and enter the preferential choice mechanism; it is also not clear how exactly the various preferences would interact and which would take precedence in a particular situation of utterance.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML