File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/w98-0707_metho.xml
Size: 12,538 bytes
Last Modified: 2025-10-06 14:15:09
<?xml version="1.0" standalone="yes"?> <Paper uid="W98-0707"> <Title>I i i I I i I I I I I I I I I l I I I Towards a Representation of Idioms in WordNet</Title> <Section position="4" start_page="53" end_page="56" type="metho"> <SectionTitle> 6 Formal problems </SectionTitle> <Paragraph position="0"> First, there are formal problems. Some idiom strings have surface forms that do not conform to any of the syntactic categories included in WordNet. For example, many idioms must occur with a negation: the VP give a hoot loses its (figurative) meaning in the absence of negation; the same is true for the VP hold a candle 2In this respect, idiomatic compounds resemble exoeestric compounds like lot~-life and sea~ata, which are not kinds of lives or latum, either, nor ate they found in the vicinity of these concepts in the semantic net.</Paragraph> <Paragraph position="1"> to. The negation must therefore be considered part of the idioms. But a verb phrase headed by negation is not a constituent recognized in WordNet.</Paragraph> <Paragraph position="2"> Consider also the string eat one's cake and have it, too: here, two verb phrases are adjoined and are often followed by an adverb. Moreover, the second clause contains a pronoun coreferent with the noun in the first clause. Again, such a string does not fit in with WordNet's entries. Some idioms are entire sentences. Wild horses could not make me do that and the cat's got your tongue are not compatible with any of WordNet's noun, verb, adjective, or adverb component. WordNet does not contain sentences, and at present we see no way of integrating these into the lexical database. The problem should be addressed in the future, because an NLP system would simply attempt to treat each constituent in these idioms separately, with undesirable consequences.</Paragraph> <Paragraph position="3"> In some cases, idioms whose syntactic shape does not correspond to any of the categories in WordNet could be accommodated nevertheless when they are synonymous with strings that are represented in an existing synset. For example, the negation-headed phrase not in a pig's eye and the clauses when hell freezes over and when the cows come home are all synonymous with never, which is included among Word.Net's adverbs. If such strings are completely frozen, as they tend to be, they can be included as synonymous members of existing WordNet synsets and the fact that they do not conform to any of WordNet's syntactic categories can be ignored.</Paragraph> <Paragraph position="4"> Such idioms do not pose problems for automatic processing because they do not admit of any phrase-internal variation or modification.</Paragraph> <Paragraph position="5"> Another formal (syntactic) problem pertains to the fact that the fixed parts of many VP idioms are not continuous. For example, a number of expressions contain nouns that resemble inalienable possessions, such as body parts, and a possessive adjective that is bound to the subject. Examples are hold one's light under a bushel, blow one's stack, and flip one's wig. In other idioms with a similar structure, the possessive is not bound to the subject but refers to another noun (got someone's number). And expressions like cook one's goose allow for both bound and unbound genitives.</Paragraph> <Paragraph position="7"> These idioms cannot be treated as single strings because the genitive slot can be filled by any of the possessive adjectives, or by a noun in the case of the unbound genitive. One solution would be to enter these strings into the lexicon with a placeholder, such as a metacharacter, in place of the genitive. This would make for a somewhat unfelicitous entry. But a rule could be added to a preprocessor for a syntactic tagger that allowed the placeholder be substituted with either a pronoun from a finite list (for the bound cases) or any noun from Word-Net (for the unbound cases); the preprocessor would then be able to recognize the idiom as a unit and match the WordNet entry and the actual string. Currently, we do not have a pre-processor that is able to recognize discontinuous constituents, but given the large number of VP idioms and their frequency in the language, the development of such a tool seems desirable. 3 7 What kinds of concepts axe these? In the previous section, we considered idioms whose syntactic form does not comply with any of the categories N(P), V(P), Adj(P), or Adverbial(P) represented in Word.Net or whose syntax poses problems for the creation of a neat dictionary entry. However, such idioms could easily be added to the lexical database when they are synonymous with strings that fit into WordNet's design and organization. But many such syntactically idiosyncratic idiom strings raise a second problem having to do with their conceptual-semantic rather than their syntactic nature. They express concepts that cannot be fitted into WordNet's web structure either as members of existing synsets or as independent concepts, because there are no other lexicalized concepts to which they can be linked via any of the WordNet relations. In fact, if one examines idioms and their glosses in an idiom dictionary, one quickly realizes that almost all idioms express complex concepts that cannot be paraphrased by means of any of the standard lexical or syntactic categories. Consider such examplea as fish or cut bait, cook one's/somebody's 9oose, and drown one's sorrows/troubles. These ~A related phenomenon is that of phrasal verbs, many of which allow particle movemeat. In the cases where the verb head and the particle are not contiguous, they e~nnot currently be adjoint by the preprocessor and they are therefore not matched to an entry in Word.Net.</Paragraph> <Paragraph position="8"> idioms carry a lot of highly specific semantic information that would probably get lost if they were integrated into WordNet and attached to more general concepts.</Paragraph> <Paragraph position="9"> The problems for WordNet posed by syntactically or semantically idiosyncratic idioms would be reduced if these could be broken up, that is, if the individual content words in the idioms could be treated as referring expressions and be assigned meanings that are similar to concepts already represented in the lexicon. Some traditional dictionaries decompose a number of such idioms and attempt to give an interpretations to their individual parts. This may seem justifiable particularly in cases where the idioms are syntactically variable, indicating that speakers assign meanings to some of their components.</Paragraph> <Paragraph position="10"> For example, the American Heritage dictionary defines one sense of the noun ice as &quot;extreme unfriendliness or reserve.&quot; This entry seems motivated by the apparent semantic transparency of the noun (in contrast to strings like bucket in Idck the bucket, which seems to have no referent at all, let alone a transparent one). But synsets of the kind ice, extreme unfriendliness or reserve seem undesirable for a computationally viable dictionary like WordNet, because ice cannot be used freely and compositionally with the proposed meanings. This is evident in sentences like the following: (a) I felt/resented his unfriendliness/reserve/*ice. (b) His unfriendliness/reserve/*ice melted away.</Paragraph> <Paragraph position="11"> (c) Our laughter broke the .unfriendliness/reserve/ ice.</Paragraph> <Paragraph position="12"> A language generation system (or a learner of English) relying on WordNet's lexicon could not be blocked from producing the ungrammatical sentences above, if they are exploiting on the close similarity and usage of the members of the synset. Moreover, automatic attempts at word sense disambiguation that rely on syntactic taggers could probably not identify the correct sense of ice in this phrase, because they could not recognize that the noun is a part of an idiom if the dictionary entry contains this noun in isolation, outside of its idiomatic context. Only when one entry for ice lists the specific environment (break and the definite determiner) can a program recognize the idiom and assign the proper meaning.</Paragraph> <Paragraph position="14"> Consider a second example. The American Heritage Dictionary contains an sense of ropes that is glossed as &quot;specialized procedures or details.&quot; This sense of ropes is the one in the expressions know/learn/get/teach the ropes.</Paragraph> <Paragraph position="15"> To assume a compositional reading here seems more justified than in the case of ice, because this idiom is more flexible than break the ice and can undergo some internal modification as well as passivization (he never learnt the ropes~he taught Fred the ropes/Pfed was taught the ropes). Moreover, ropes co-occurs with more verbs than just one. In fact, the verbs for which it can serve as an argument are compatible with the meaning assigned to ropes by the American Heritage Dictionary. A word sense disambiguation system that relied on the semantics of the contexts of the ambiguous word (such as * the verbs a noun co-occurs with), would probably choose the correct sense of rope, because the contexts of &quot;specialized procedures n or &quot;details ~ do not seem to overlap with the contexts in which ropes is found with the sense of &quot;strong cords.&quot; Yet despite their shared verb contexts, the distribution of ropes is far more narrow than that of specialized procedures or details. Again, a language generation system or a learner of English might overgenerate and produce incomprehensible sentences like I forgot the ropes or Tell me the ropes. Therefore, an optimal solution might be to enter the idiom as a string but with a placeholder instead of the verb; a separate rule in the lexicon would list the verbs that are compatible with the idiomatic reading of the string. The proposed solution for the idioms Like teach/%arn/get the ropes and those that contain a possessive genitive might suggest a huge amount of work. However, a survey of English idioms suggests that most are frozen and could therefore simply be entered as entire strings, without the need for specifying a list of selected verbs.</Paragraph> <Paragraph position="16"> Another type of VP idiom that does not readily fit into WordNet is that whose meaning can be glossed as be or become Adj. These idioms have the form of a VP but express states: hide one's light under a bushel and hold one's tongue mean &quot;be modest&quot; and &quot;be quiet, ~ respectively; flip one's wig;, blow one's stack/a fuse, and hit the roof/ceiling all mean &quot;become angry,&quot; and get the axe means be fired/dismissed. Similaxly, the phrase one's heart goes out (to) can be glossed by means of the verb .feel and the adjective phrase &quot;sorry or sympathetic (for).&quot; Such idioms pose a problem for integration into WordNet, not because of their form but because of the kinds of concepts they express. In Word-Net, verbs (including eopular verbs) and adjectives are strictly separated because they express distinct kinds of concepts. This separation is of course desirable and even necessary when one deals with non-idiomatic language, where the meaning of a phrase or sentence is composed of the meanings of its individual parts. Copular or copula-like verbs like be and .feel combine with a large number of adjectives and there is no point in entering specific combinations into a lexicon. 4 While the separation of verbs and the adjectives they select accounts for the large number of possible combinations allowed in the language, it also means that there exist no concepts like &quot;feel sorry/sympathetic (for)&quot; or &quot;become angry&quot; in WordNet, and idioms like one's heart goes out (to} and hit the roof are presently excluded from the lexicon. Yet these strings need to be added if the lexicon is to serve NLP applications of real texts, where idiomatic language is pervasise. Expressions of the kind listed above can simply be added as subordinates of be without causing a change in the structure of the lexicon. They would stretch the meaning of troponymy, the manner relation that organizes the verb lexicon, in that it is somewhat off to state that &quot;to be angry is to be in some manner.&quot; However this seems to be the only way to accommodate such idioms, which express concepts of the kind not found in the literal language.</Paragraph> </Section> class="xml-element"></Paper>