File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/w98-0707_intro.xml

Size: 7,447 bytes

Last Modified: 2025-10-06 14:06:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-0707">
  <Title>I i i I I i I I I I I I I I I l I I I Towards a Representation of Idioms in WordNet</Title>
  <Section position="3" start_page="52" end_page="53" type="intro">
    <SectionTitle>
3 Constructions
</SectionTitle>
    <Paragraph position="0"> First, some idiomatic constructions are simply too complex to be integrated into WordNet and must be excluded at this point. We have in mind constructions of the kind studied by (Fillmore et al., 1988) and (Jackendoff, 1997),(Jackendoff, 1997). Examples are the more the merrier and she can't write a letter, let alone a novel These structures comprise discontinuous constituents and morpheme chunks that are governed by special syntactic and semantic rules. Thus, the X-er the Y-er allows the insertion of a wide variety of adjectives. Fillmore et al. discuss let alone and show that its syntactic properties require an amazing amount of description of facts absent from the standard</Paragraph>
    <Paragraph position="2"> grammar. A full account of these constructions goes far beyond the lexical level, and therefore we need to exclude them, at least for now, in a database like WordNet that does not indude much syntax and whose relational semantics cannot accommoda~ the kind of semantic facts observed by Fillmore et al. and Jackendoff. null 4 Idioms as a kind of polysemy By contrast, the second kind of idiomatic structure is unprobl~matic for WordNet. Word-Net contains not only simple verbs and no-na but also more complex verb and noun phrases like show the way and academic gown. Strings like stepping stone, kick the bucket, hit the bottle, and come out of the closet therefore correspond to categories already represented in the database, and can be included when they are considered as partie, lar manifestations of polysemy. Polysemy in WordNet is represented by membership of the polysemous string in different synonym sets; synonym sets (synsets) in WordiNet represent concepts that are lexicalized by one or more strings (synonyms). In other words, the synsets contain different words forms with the same meaning, and a word form with more than one meaning appea~ in as many different synsets as it has meanings.</Paragraph>
    <Paragraph position="3"> For example, the string fish occurs as a verb in two different synsets, and has thus two distinct senses in WordNet. One expresses the concept &amp;quot;catch, or try to catch, seafood;&amp;quot; the other sense is ~seek indirectly,&amp;quot; as in the phrases fish for compliments and fish for information. Note that such a representation does not in fact attempt to answer the question as to whether or not the second sense of fish is indeed an &amp;quot;extended&amp;quot; one or not, but simply treats them as different meanlngs of the same word form.</Paragraph>
    <Paragraph position="4"> Figurative senses can be seen as homophones rather than polysemes in that there is no discernible relation between the &amp;quot;literal&amp;quot; and the &amp;quot;extended&amp;quot; senses. WordNet does not formally distinguish between polysemy and homophony but treats these two phenomena of multiple meanings alike under the label of polysemy.</Paragraph>
    <Paragraph position="5"> In all cases of polysemy, membership in two different synsets entails a different location in the semantic network and relatedness to distinct concepts for each sense. Thus, the first sense of  fish is a subordinate of catch and is further related to more semantically specified senses (troponyms) including flyfish, net fish, trawl, and shrimp. The second, arguably extended, sense h~_q as its superordinate concept the synset containing the strings search and look .for. The different locations in the network of the two senses offish, together with the difference in the kinds of noun objects they select are the sort of information exploited in NLP applications, and they will suffice in most cases to distinguish the two senses in such cases where the senses are homophones rather than polysemes.</Paragraph>
    <Paragraph position="6"> Some phrases consisting of more than one word can be treated in a similar manner. For example, the idiomatic verb phrases kick the bucket, chew the fat, and take a powder can be considered as single units. Their constituents never occur in an order different from the cited one because these idioms are syntactically completely frozen. They not tolerate the insertion of an adjective or adverb, nor do they undergo passivization, clefting, or any movement transformation that would change the order of the individual strings.1 The system therefore needs only to recognize the string that is part of the lexicon. If the strings kick, bucket, powder, fat, etc., occur outside of the idiom order, they do not receive the idiomatic interpretation and must be considered as carrying different meanings.</Paragraph>
    <Paragraph position="7"> Some compound nouns have extended senses as well, such as stepping stone, straight arrow, and square shooter. We classify these as instances of non-literal language, because the head (the rightmost noun) is not the superordinate concept for the figurative reading: a stepping stone is not a kind of stone; a straight arrow is not a type of arrow, and a square shooter is not a specific shooter. By contrast, nouns like limestone, gravestone, and gemstone, and sharpshooter and trapshooter are linked to their superordinates senses, one or more senses of stone and shooter, respectively; similarly, a broad arrow is a subordinate of arrow. Many NLP applications using WordNet for determinlOaly the verb changes in that it shows the usual inflectional endings; this should not pose a major problem for English idioms where the verb is virtually always the first constituent in a Verb Phrase (VP) idiom and can thus be easily recognized.</Paragraph>
    <Paragraph position="9"> ing discourse coherence, finding malpropisms (Hirst and St-Onge, 1998), and word sense disambiguation (Voorhees, 1998); (Leacock and Chodorow, 1998) identify related word senses by means of links such as between super- and subordinates. When searching a text, such systems could easily recognize (and discard as potentlaUy related senses) figurative compounds such as stepping stone and straight shooter because these are not linked to nouns correspondtug to their heads. 2 Moreover, literal and figurative senses are often in very different WorclNet files: an arrow (and its hyponyms broad arrow and butt shaft) are classified as noun.artifacts; while a straight arrow is found in the noun.person file.</Paragraph>
    <Paragraph position="10"> Frozen VP idioms and metaphoric noun compounds can be integrated into the WordNet database and distinguished from literally referring expressions in many cases. But much of what is commonly considered to be figurative language presents more serious problems for a semantic network like WordNet and applications relying on its particular design. The remainder of this paper will be devoted to a discussion of the third category of idioms, which includes verb phrases like learn the ropes and hide one's light under a bushel. These cannot automatically be integrated into WordNet, but we offer some proposals for adding them to the  The integration into WordNet of many idioms that do not fall into one of the categories discussed above is problematic for a variety of reasons. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML