File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1195_metho.xml

Size: 20,424 bytes

Last Modified: 2025-10-06 14:08:47

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1195">
  <Title>Tony.Veale@UCD.ie</Title>
  <Section position="3" start_page="0" end_page="1" type="metho">
    <SectionTitle>
2 Exploring the Space of LMH Concepts
</SectionTitle>
    <Paragraph position="0"> Creative discovery requires that we give structure to the space of possible concepts that we plan to explore. This is made somewhat easier if we consider the meaning of conceptual structures to be grounded in a semiotic system of meaningcreating oppositions. Given a starting structure, knowledge of allowable oppositions can then be used to transform this starting point into a variety of different conceivable structures, some of which may be novel and possess value on a particular utility scale.</Paragraph>
    <Paragraph position="1"> The notion of opposition employed here is much broader than that of antonymy. For our purposes, contextual oppositions exist between terms that compete to fill a given dimension of the same concept. For instance, Greek  and Hindu can each be used to differentiate the concept deity along a culture dimension, and so, in the context of deity, both are opposed. However, this is a contextual opposition that, unlike the role of antonymy, does not constitute part of the meaning of either concept. WordNet is a rich source of explicit antonymous oppositions, but contextual oppositions must be inferred from the structure of the ontology itself and from existing compounds. Fortunately, WordNet contains many instances of literal modifier-head terms, such as &amp;quot;pastry crust&amp;quot; and &amp;quot;Greek alphabet&amp;quot;. The concepts denoted by these compound terms, or LMH concepts for short, have the lexical form M-H (such as pizza-pie or prairie-dog) and express their literality in two ways. First, they must be stored in the WordNet ontology under an existing sense of the lexeme H; for instance, pizza-pie is actually stored under the hypernym pie. Secondly, the gloss for the concept M-H should actually contain the lexeme M or some synonym of it. Thus, while Greek-alphabet is a LMH (it literally is a kind of alphabet, and it is literally Greek), neither monkey-bread (which is not literally a kind of bread) nor Dutch-courage (which is not literally Dutch) is a LMH concept.</Paragraph>
    <Section position="1" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
2.1 A Framework for Creativity
</SectionTitle>
      <Paragraph position="0"> We use the terminology of Wiggins (2003) to frame our discussion of creative exploration.</Paragraph>
      <Paragraph position="1"> Wiggins, following earlier work by Boden (1990),  To avoid later confusion with set notion, we denote WordNet senses not as synsets but as italicized terms . formalizes the creative exploration process using the following abstractions: C - the realm of concepts that is being explored R - the set of rules for forming concepts and conversely, deconstructing existing ones T - the transformational rules that generate new concepts via R E - the evaluation mechanism that ascribes value or utility to these new concepts In applying these terms to creativity in WordNet, we introduce the following refinements:  as a starting point for creative exploration R* - the subset of R needed to construct and deconstruct LMH compounds in C* T* - the subset of T needed to hypothesize new LMH concepts for R* to construct So for our current purposes, we define C* as the set of LMH concepts in WordNet, and R* as the compositional criteria used to identify and decompose existing LMH entries and to construct new ones by concatenating an appropriate M and H term pair. However, to define T*, we first need to consider how taxonomic differentiation is used to create LMH concepts in the first place.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="1" end_page="2" type="metho">
    <SectionTitle>
3 Domain Differentiation
</SectionTitle>
    <Paragraph position="0"> LMH concepts exist in WordNet to differentiate more general concepts in meaningful taxonomic ways. For instance, the LMH concepts Greekalphabet, Hebrew-alphabet and Roman-alphabet each serve to differentiate the concept alphabet.</Paragraph>
    <Paragraph position="1"> This is a useful ontological distinction that contributes to the definition of individual letter concepts like Alpha, Beta and Gimel. Since we can represent this specialization pattern via a differentiation set D alphabet as follows:</Paragraph>
    <Paragraph position="3"> More generally, the differentiation set of a concept H comprises the set of all concepts M such that the LMH concept M-H is in C*. Thus we have:</Paragraph>
    <Paragraph position="5"> We use D to denote the set of all differentiation sets that are implied by C*, allowing us to define the absolute affinity between two modifier terms c  can both differentiate. We thus define the relative affinity between two modifier</Paragraph>
    <Paragraph position="7"> A relative affinity of 1.0 means that both terms differentiate exactly the same concepts in WordNet. It follows that the higher the relative affinity between c  , while the higher the absolute affinity, the more reliable this likelihood estimate becomes. Affinity thus provides an effective basis for formulating the transformation rules in T*.</Paragraph>
    <Paragraph position="8"> We should naturally expect near-synonymous modifiers to have a strong affinity for each other. For instance, Jewish and Hebrew are near-synonyms because WordNet compounds Jewish-Calendar and Hebrew-Calendar are themselves synonymous. This is clearly a form of contextual synonymy, since Jewish and Hebrew do not mean the same thing. Nonetheless, their affinity can be used to generate new compounds that add value to WordNet as synonyms of existing terms, such as Jewish-alphabet, Hebrew-Religion, and so on.</Paragraph>
    <Paragraph position="9"> Recall that literal compounds represent a yoking together of two or more ontological branches. In exploring the space of novel compounds, it will be important to recognize which branches most naturally form the strongest bonds. Another variant of affinity can be formulated for this purpose: A</Paragraph>
    <Paragraph position="11"> (sauce, pizza) = 2, since in WordNet the modifier overlap between the pizza and sauce domains is {anchovy, cheese}.</Paragraph>
  </Section>
  <Section position="5" start_page="2" end_page="2" type="metho">
    <SectionTitle>
4 Creative Exploration in the LMH Space
</SectionTitle>
    <Paragraph position="0"> We consider as an exploratory starting point any LMH concept M-H in C*. We can transform this into another concept M'-H by replacing M with any M' for which:</Paragraph>
    <Paragraph position="2"> This formulation may suggest a large range of values of M'. However, these candidates can be sorted by A rel (M, M'), which estimates the probability that a given M'-H will later be validated as useful. One rule in T* can now be formulated for our further consideration:  to be either Hebrew or Roman, leading to the creation of the LMH concepts Hebrew-deity and Roman-deity. One of these, Roman-deity, already exists in C*, but another, Hebrew-deity is novel in a way that Boden terms psychologically or P-Creative, inasmuch as it is neither in C</Paragraph>
    <Paragraph position="4"> may thus be of some value as a hypernym for existing WordNet concepts like Yahwe and Jehovah.</Paragraph>
    <Paragraph position="5"> Rule (5) is a general principle for ontological exploration in the space of compound terms. Consider the compound software-engineering, which, following (5), is suggested by the joint existence in WordNet of the concepts softwareengineer, automotive-engineer and automotiveengineering. While this particular addition could be predicted from the application of simple morphology rules, the point here is that a single exploration principle like (5) can obviate the need for a patchwork of such simple rules.</Paragraph>
    <Paragraph position="6"> Of course, one can imagine rules other than (5) to exploit the regularities inherent in WordNet definitions. For instance, consider the sense gasoline-bomb, which WordNet glosses as: &amp;quot;a crude incendiary bomb made of a bottle filled with flammable liquid and fitted with a rag wick&amp;quot;. By determining which definite description in the gloss conforms to the modifier - in this case it is &amp;quot;flammable liquid&amp;quot; - other modifiers can be found that also match this description. Thus, the new concepts methanol-bomb and butanol-bomb can be generated, and from this the creative concept alcohol-bomb can be generalized. However, each strategy raises its own unique issues, so for now we consider a T* comprising (5) only.</Paragraph>
    <Section position="1" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
4.1 The Evaluation Mechanism E
</SectionTitle>
      <Paragraph position="0"> For purposes of ascribing value or ontological utility to a new LMH concept M'-H, the concept must first be placed into one of the following categories: a) M'-H already exists in C* is thus ascribed zero value as an addition to C*.</Paragraph>
      <Paragraph position="1"> b) M'-H does not exist in C* but does exist in C</Paragraph>
      <Paragraph position="3"> and thus corresponds to an existing non-literal concept (such as monkey-bread). While it may have value if given a purely literal reading, it cannot be added to C w without creating ambiguity, and so has zero value. c) Using R*, M'-H can be seen to describe a non-empty class of existing concepts in C</Paragraph>
      <Paragraph position="5"> would thus have value as either a synonym (when this set is a singleton) or as a new organizing super-type (when this set is a severalton). In this case, we say that M'-H has been internally validated against C</Paragraph>
      <Paragraph position="7"> d) Using a textual analysis of a large corpus such as the World-Wide-Web, M'-H is recognized to have a conventional meaning in C even if it is not described in C w . In this case, we say that M'-H has been externally validated for</Paragraph>
      <Paragraph position="9"> . The fact that M'-H is novel to the system but not to the historical context of the web suggests that it is merely a psychologically or P-Creative invention in the sense of Boden (1990).</Paragraph>
      <Paragraph position="10"> e) M'-H is recognized to have a hypothetical or metaphoric value within a comprehension framework such as conceptual blending theory (e.g., see Veale et al. 2000), mental space theory, etc. In this case, M'-H may truly be a historically or H-Creative invention in the sense of Boden (1990).</Paragraph>
      <Paragraph position="11"> In general, a new compound has value if its existence is suggested by, but not recognized by, the lexical ontology. As noted in the introduction, this value can be realized in a variety of ways, e.g., by automatically suggesting new knowledge-base additions to the lexical ontologist, or by providing potentially creative expansions for a user query in an information retrieval system (see Veale, 2004).</Paragraph>
    </Section>
    <Section position="2" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
4.2 Validating New Concepts
</SectionTitle>
      <Paragraph position="0"> The evaluation strategies (c) and (d) above suggest two ways of validating the results of new compound creation: a WordNet-internal approach that uses the structure of the ontology itself to provide evidence for a compound's utility, and a WordNet-external approach that instead looks to an unstructured archive like the web. In both cases, a new compound is validated by assembling a support set of precedent terms that argue for its meaningfulness.</Paragraph>
      <Paragraph position="1">  The internal support-set for a new compound M-H is the set of all WordNet words w that have: (i) at least one sense that is a hyponym of a sense of H; and (ii) a sense that contains M or some variant of it in its gloss. For instance, the novel compound &amp;quot;rain god&amp;quot; is internally validated by the word set {&amp;quot;Thor&amp;quot;, &amp;quot;Parjanya&amp;quot;, &amp;quot;Rain giver&amp;quot;}. When w is polysemous, two distinct senses may be used, reflecting the fact that M-H may be metonymic in construction. For instance, the compound &amp;quot;raisin-wine&amp;quot; can be validated internally by the polysemous word &amp;quot;muscatel&amp;quot;, since one sense of &amp;quot;muscatel&amp;quot; is a kind of wine, and another, a kind of grape, has a WordNet gloss containing the word &amp;quot;raisin&amp;quot;. From this perspective, a &amp;quot;raisin wine&amp;quot; can be a wine made from the same grapes that raisins are made from.</Paragraph>
      <Paragraph position="2"> Likewise, the compound &amp;quot;Jewish robot&amp;quot; can be validated by simultaneously employing both senses of &amp;quot;Golem&amp;quot; in WordNet, which defines &amp;quot;Golem&amp;quot; as either a Jewish mythical being or as a robotic automaton.</Paragraph>
      <Paragraph position="3"> Creative products arise when conceptual ingredients from different domains are effectively blended (see Veale and O'Donoghue, 2000). It follows that a creative product can be validated by deblending it into its constituent parts and determining whether there is a precedent for combining elements of these types, if not these specific elements. We can thus exploit this notion of deblending to provide internal validation for new compounds. For instance the WordNet gloss for pizza lists &amp;quot;tomato sauce&amp;quot; as an ingredient. This suggests we can meaningfully understand a compound of the form M-pizza if there exists a compound M-sauce that can be viewed as a replacement for this ingredient. Generalizing from this, we can consider a new compound M  . It follows then that the novel compounds apple-pizza, chocolate-pizza, taco-pizza, and curry-pizza will all be internally validated as meaningful (if not necessarily enjoyable) varieties of pizza.</Paragraph>
      <Paragraph position="4">  In contrast, the external validation set for a compound M-H is the set of distinct documents that contain the compound term &amp;quot;M H&amp;quot;, as acquired using a web search engine. For instance, given the WordNet concepts naval-engineer, software-engineer and naval-academy, rule (5) generates the hypothesis software-academy, which cannot be validated internally yet which retrieves over 1000 web documents to atest to its validity. This web strategy is motivated by Keller and Lapata's (2003) finding that the number of documents containing a novel compound reliably predicts the human plausibility scores for the compound.</Paragraph>
      <Paragraph position="5"> Nonetheless, external validation in this way is by no means a robust process. Since web documents are not sense tagged, one cannot be sure that a compound occurs with the sense that it is hypothesized to have. Indeed, it may not even occur as a compound at all, but as a coincidental juxtaposition of terms from different phrases or sentences. Finally, even if found with the correct syntactic and semantic form, one cannot be sure that the usage is not that of a non-native, second language learner.</Paragraph>
      <Paragraph position="6"> These possibilities can be diminished by seeking a large enough sample set, but this has the effect of setting the evidential bar too high for truly creative compounds. However, another solution lies in the way that the results of external validation are actually used, as we shall later see.  Many of the compounds that are validated either by internal or external means will be synonyms of existing WordNet terms. As such, their creative value will not represent an innovative combination of ideas, but rather a creative use of paraphrasing. The nature of (5) makes it straightforward to determine which is the case.</Paragraph>
      <Paragraph position="7"> In general, when M  . For instance, from the combination of applied-science, engineering-science and applied-mathematics, we can generate from (5) the new compound engineeringmathematics. This compound cannot be validated internally, but since it retrieves more than 300,000 documents from the web, this is enough to adequately atest to its meaningfulness. Now, since applied-science and engineering-science are synonymous in WordNet, we can conclude that engineering-mathematics and applied-mathematics are themselves synonymous also.</Paragraph>
    </Section>
    <Section position="3" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
4.3 Creativity in the Validation Gap
</SectionTitle>
      <Paragraph position="0"> The difference between internal and external validation strategies can be illuminating. Internal validation verifies a compound on the basis that it could meaningfully exist, while external validation verifies it on the basis that it does actually exist in a large corpus. Therefore, if a compound can be validated externally but not internally, it suggests that the concept may by P-Creative. In contrast, if the compound can be validated internally but not externally, it suggests that the compound may be H-Creative and represent a genuine historical innovation (if only a lexical one, and of minor proportions).</Paragraph>
      <Paragraph position="1"> For instance, the new compound &amp;quot;sea dance&amp;quot; (analogous to &amp;quot;rain dance&amp;quot;) cannot be validated internally, yet can be found in over 700 internet documents. It thus denotes a P-Creative concept. In contrast, the compound &amp;quot;cranial vein&amp;quot; yields no documents from a web query (on AltaVista), yet can be internally validated by WordNet via the word-concept Diploic-Vein, a blood vessel that serves the soft tissue of the cranial bones.</Paragraph>
      <Paragraph position="2"> Likewise, the compounds &amp;quot;chocolate pizza&amp;quot;, &amp;quot;taco pizza&amp;quot; and many more from the yoking of D pizza and D sauce can all be validated externally via hundreds of different web occurrences, and so represent P-Creative varieties of pizza. However, compounds like &amp;quot;Newburg pizza&amp;quot; (a pizza made with lobster sauce) and &amp;quot;wine pizza&amp;quot; (a pizza made with wine sauce) can only be validated internally and are thus candidates for H-Creative innovation.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="2" end_page="2" type="metho">
    <SectionTitle>
5 Large-Scale Evaluation
</SectionTitle>
    <Paragraph position="0"> A large scale evaluation of these ideas was conducted by exhaustively applying the T* rule of (5) to the noun taxonomy of WordNet 1.7. To better see the effect of affinity between modifiers, Table 1 ranks the results according to the measure  assessment, for each affinity level.</Paragraph>
    <Paragraph position="1"> Conflations are terms that exist both as compounds and as conflated lexical atoms. For instance, while the compound &amp;quot;bull dog&amp;quot; may not exist in WordNet, its conflation &amp;quot;bulldog&amp;quot; does. Compound discovery is thus a useful means of reexpanding these conflations when it is meaningful to do so.</Paragraph>
    <Paragraph position="2"> As one might expect, lower affinity levels allow greater numbers of new compounds to be created. Interestingly, however, Table 1 suggests that as the affinity threshold is raised and the number of compounds lowered, the creativity of these compounds increases, as measured by the relative proportion of H-Creative terms that are generated. Generating compound terms in a lexical ontology is a creative process that demands rigorous validation if the ontology is not to be corrupted. Of the two strategies discussed here, external validation is undoubtedly the weaker of the two, as one should be loathe to add new compounds to WordNet on the basis of web evidence alone. However, external validation does serve to illustrate the soundness of internal validation, since 99.51% of internally validated concepts (at A abs = 1) are shown to exist on the web. It follows then that the absence of external validation yields a very conservative basis for assessing H-Creativity. Web validation is perhaps better used therefore as a means of rejecting creative products than as a means of discovering them. In fact, when used as a reverse barometer in this way, the inevitable errors that arise from web-based validation serve only to make the creative process more selective.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML