File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/86/c86-1001_metho.xml
Size: 18,832 bytes
Last Modified: 2025-10-06 14:11:48
<?xml version="1.0" standalone="yes"?> <Paper uid="C86-1001"> <Title>Lexicon - Grarar~ar The Representation of Compound Words</Title> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> 1. Compound adverbs </SectionTitle> <Paragraph position="0"> We call adverb any circumstancial complement, including sentent)al phrases, as in the following examples: (1) The show took place nighlly el night during a busy night the night Bob missed his plane By compound adverbs, or frozen or idiomatic adverbs, we mean adverbs that can be separated into several words, with some or all of their words frozen, that is, semantically and/or syntactically noncompositional. In (1), af night is a compound adverb, the lack of compositionality is apparent from lexical restrictions such as: *at day, *at afternoon, *at evening and by the impossibility of inserting material that is a priori plausible, syntactically and semantically: *at (coming. present) night *st (cold, dark) night during the (coming, present) night during s (cold, dark) night 5. Note that words or roots are often considered as units in most attempts to devise semantic representations.</Paragraph> <Paragraph position="1"> Notice that nightly can also be considered as a frozen compound, though not constituted of words but Of a word and a suffix. Again, lack of compositionality stems from the observation that daily, weekly, monthly, yearly, etc. which are compounds of the same formal type have a regular formation, in the sense that their interpretation is homogeneous. Thus, nightly is an isolated case, as opposed to an open series of identical forms with a different interpretation.</Paragraph> <Paragraph position="2"> The two other adverbs of (1) are tree forms. Thus, the determiners Det and modifiers (Adl and Re/clause) of: during Det Adj night Relclauae can vary freely (within semantic constraints). In the same way, tfm event associated with the sentence 8 in the form: the night (E, that) 8 can be expressed by a large variety of unconstrained forms. Frozen or compound adverbs constitute the simplest case of compound forms because they do not allow variations of their components. As mentioned above, in at night no adjective is authorized. Moreover, one cannot insert a determiner: *at (a, this) night, the plural is forbidden: *at nights and no relative clause can be appended: *M night (that, which) was agreed on.</Paragraph> <Paragraph position="3"> Such observations are general, and apply to many adverbs of varied form and lexical content: It rained cats and dogs *many cats and dogs *big cats and dogs *cat and dog from time to tit~e ~trorn timet~ to times *from a time to another time from long time to long time Consequently, these compound adverbs could be identified by a simple recognRiorr procedure, for they do not require any lei, amatization or syntactic analysis to be reduced to a dictionary form, as is the case with verb for)as for example.</Paragraph> <Paragraph position="4"> A lexieal study of compound adverbs has been performed in French and a systematic inventory has been compiled from various dictionaries. Runni,g texts have been examined as well. It is interesting to note that whereas in current dictionaries there are about 1,500 one word adverbs, most of them in -meat (-ly), we have found over 5,000 compound adverbs, These compound adverbs have been classihed according to their sywtacltc shape. The syntactic forms are described at the elementary level ef sequences of fmrts of speech. We use symbols with obvious interpretations such as Prep, Dot, Adl, N, V, COOl (fay conjunction) and W for a variable ranging over verb complements, etc. We write:</Paragraph> <Paragraph position="6"> The examples discussed an far are entirely frozen. Itence, as a i)vuctical matter, they can be located iu a text by using the search function available for strings in any text editor system. There are however more complex examples that require deeper analysis. Consider fay example the idiomatic adverb in the sentence: Max propoaed 8ohrtiena from l,he top of his hat It ic largely frozen: no other determiner is allowed, no adjectives can be appended to either noun, etc., but the person of the possessive adjective Pone, may vary. This possessive adjective must refer to the subject of the sentence, and varies accordingly: *Max propound ideas from the top el your hat *My staler proposed ideas from the top of his hat Bob and Max proposed ideas from the top of their hat(8) In this case. the recognition procedure is no longer a simple string matching operation, since a variable slot must be dealt with inside the fixed string. More general matching rules are required here 6. Once this compound adverb l,laa been identified in a text to be processed, it can be given an iaterpt~etation, for example in terms of a simple adverb such as teiaarely or lightly and the referential information carried by Pots can then be ignored. Itowever, one oar\] easily construct particular discourses where the obligatory cereference relation involved will (bsambiguate some analysis. Thus, not only the variatien of Poaa must be accounted l,or at the lexical level, but its referential infermatien has to be kept l,or possible use in a parser. {fiber compound adverbs oiler different degrees of variation. There m'e cases where one part of the adverb is frozen and another part is entirely free: Max organized a party in honor of Bob Max hid the car at the far end of the parking lot The parts in honor, al the far end are frozen. For example, they do not allow modil,iers. The parts of N are tree, for we observe variations such as: Max organized a party in hJa honor Max hid the car at the lar end, I think, of the parking lot Consider the adverbials: for the sake of ruining thinfjs for the sake of Bob for God's sake We (:all the combinatien for--cake frozen, since the noun sake does not occur elsewhere than in adverbial phrases with the preposition for: it cannot be the subject or object of any verb. On the other hand, the modifiers of sake are quite varied and regular from the point of view of the syntax of noun modifiers 7. There are also cases of seemingly free adverbs which require an ad hoc treatment. For example, dates such as: Monday March 13, 1968 at 9 pan.</Paragraph> <Paragraph position="7"> are described it) a natural way by a finite automaton.</Paragraph> <Paragraph position="8"> Tecl;nical or specialized families of adverbs come close to being frozeu adverbs: (2) They elected Bob on the (firsl, second) ballot (3) Max ate his noodles in a bow/ The special semantic relations that hold between the adverbial complement and the rest el, the sentence are lirmted. There are few verbs such as to eat which combine with in a bowl and which have the non locative interpretation of (3). The usual interpretation is thai found in: 6. PRDLOG rules are particularly well adapted to recognizing such frozen forms (P. Sabatier 1980).</Paragraph> <Paragraph position="9"> 7. There are nonetheless restrictions on them: ~for a heavenly ,~oke Max puF hia noodle~ in a bowl Entering ITozen adverbs into a lexicon-grammar raises many r=ew questions, The bulk of adverbs can be described by means of the Following type of derivation (Z.S, Harris 197C/): Bob left; 7hat Bob left occurred at 9 : Bob left, fhia occurred at 9 :: Bob left at 9 and sulaport verbs play a crucial role here. However, there are cases where no general support verb is found and where adverbs have to be considered as a part of the elementary sentence. Consider the adverb in: Bob sang at the top of hJ~ voice It is syntactically and semantically analogous to tree adverbs such as noisily, powerfelly. For these two free adverbs, a derivational source involving the adjective is available: The way Bob sang was (noiay, powertn/) This is not the case for at the top of his voice which is practically limited to modifying the verbs of saying. Moreover the obbgatory coreference link of hia leads to a representation where this adverb is not analyzed. Thus two semantically similar types of adverbs have to be represented quite differently in the lexicon-grammar. All the situations just exemplified with adverbs are quite common, cod are also encountered with nouns, adjectives and verbs. The paradox el ~ relaresentatJon they lead to can only be solved by introducing a complex level of semantic equivalence for the entries of the lexicon-grammar, 2, Compound nouns C~n'npound nouns form the bulk of the lexicon of languages. Language creativity is largely associated with the growth of technical vocabularies which consist mainly of technical nouns. Compound nouns number in the millions for European laoguages. They are usually built rrem the vocabulary of simple words by means or grammatical rules which may involve grammatical words. By definition, their meaniog is nencompositional. The compound nouns can be described in terms of the sequence of their grammatical categories, in the same way as for adverbs (IA. Gross, D. Tremblay 1985). We have for example: Det N =: the moon Adl N =: crude oil, real ealaFe N of N =: elroke of luck, board of (governors, regenfa) Det N of Dot N =: the talk of the town N N =: lest lobe, color 7V Such nouns can become quite complex in various technical Fields. In general, compound nouns allow variations of determiners and modifiers, but many situations are encountered: the moon is a frozen combination, -- definite article-noun -- which behaves like a proper name, because ot its unicity of reference. It cannot be modified by adjectives without losing its reference: *the (big, yellow) moon; crude oil takes restricted determiners. Since it is a mass noun, there are difficulties in accepting its plural, It can be modified by adjectives and nouns as in (cheap, high quality) crude oH, but these cannot modify el/: *crude, (cheap, high quality) oil; stroke of luck has unrestricted determiners and modifiers, bat no iosertion is allowed immediately before or next to of, in particular luck cannot be modified: *stroke of gor~ luckS; 8. ~,lrnko of bad luck would be a different compound word, whose relation to afroke of luck is only etymological.</Paragraph> <Paragraph position="10"> - board of governors one be modified in several ways: board a~ld governora ta.ke separate determiners and modifiers: ~he powerful boarda of the twelve governora of my bank, Such a compound noun comes close to being a free Form. It is the liruited number of second aeons such as director, governor or regent that suggests we are dealing with a compound noun. Also, the meameg of these phrases is nonoompoaitional in the sense that they have a legal or institutional meaoing that their components do not have clearly. The variations of lurer we have enumerated can be partly hal'=died bit atlcachiag a finite automaton to a given entry, and this automaton will describe the main grammatical changes allowed The adjunction o~ free relative clauses to compound nouns may require a different treatment &quot;l~)e kiads of variation of compound nouns are aO numereu,~ that cletermieing whether a given nomit)al coostruction is a compouod noun or nol: almost requires c~. original demonstratiou. Titus, aotontatizirlg~ the co,infraction of a leKicoa is a,'l activity that will preseot severe Ibnitatioas.</Paragraph> <Paragraph position="11"> Determining the sup~mrt verbs for compound nouns does )tot seem to raise other probletes than those encountered with simple nouns. R~MAIrlK Conrlpound aeons raise other questions in some language : - in Gerraan. whore rio blacks occur between componentC/, segmentation is ~\[ prebleltn; - in French (G. Gross '1985), where the spelling of the plural is ht general not standardized, extra variations have to be expecte(I. Compound modifielFs Adjectives, noun complements and relative clauses carl be cemplex and yet apply to free nouns. From the point ot view developped here, that is, the representation in terms of sequences of grammatical categories allowing for efficient matching procedures witt) texts, th~.,y do not differ from adverbs and nouns.</Paragraph> <Paragraph position="12"> Examples are: The table is as clean as a new pin The book is up to dale Bob is the world's (beat, worat) teacher They discussed it, on a take it or leave it basis</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 3. Compound verbs </SectionTitle> <Paragraph position="0"> Compound verbs or frozerl sentences as we have termed them (M. Gross 1982), can be described as sequences of categories. We write N i for variable noun phrases and C i for frozen noun phrases. For subjects; i = 0, for complements: i = I, 2. Examples are: (I) N O V C t =: Bob hit the /ackpot (2) N 0 V N 1 Prep C 2 =: Bob took your project into account (3) N 0 V CI Prep C 2 =: Bob look the bull by Ihe heron (4) Ndeg~ C 0 V C t =: Bob'a dream came true We outlined in I the description ot a lexicon-grammar of French v~bs and the reasons why compound verbs had to be separated from simple On~S.</Paragraph> <Paragraph position="1"> ~;ystematic search through dictionaries (monolingaal, bilingual, and specialized) has yielded close to 20,000 compound verbs belonging to the same level of language as the 12,000 simple verbs. A syntactic classification has been built for them (Figure 3).</Paragraph> <Paragraph position="2"> Compound verbs are the most complex Forms that have to be entered into a lexicon PSt. The compounds discussed previously were simple 9. There are however a limited number of frozen discourses such as: If wa,s for all Ihe world aa it S Which need an extra level of complexity (L. Danlos 19B5). because by and large they wore topologically connc% that is, either their I'mrts could not be separated by any extraneous linguistic material or else the+ inso~ted material could be easily described (i.e. by moans of a finite automaton).</Paragraph> <Paragraph position="3"> In the case of compound verbs, the various ports of each utterance remain syntactically independent, Thus, the verbs of (1)-.(4) can take any tensed form, as ill: At tbaf time. Bob will be hitting the lackpat Sentential inserts (:an separate a verb from its coruplemonts: Bob hit, if seems to me, the jackpot In example (2). the direct complement N t is Ifee and general. heoce, se+ltenti~d structures can separate the verb from its second (frezed=} complement: Bob took the tact lhat Jo was absent yesterday into account Notice that parts of compound verbs may be recognized directly, for example the iackpof, or into account, but these parts may be ambiguous, whereas the full utterances can rarely be confused with free for~,ns 10.</Paragraph> <Paragraph position="4"> 10+ As a matter of fact, when an utterance is found to be ambiguous, with one analysis as a frozen form and the other as a free form, ignoring competing free forms altogether is a good parsing strategy,</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 4. SoFno (;oncIusions </SectionTitle> <Paragraph position="0"> Ilew to organize the lexicon of compound utterances is an gloom question, From a computational point ef view, many solutions use available for the lookup of a (:emDound term: (i) Io classical algorithms m which left-to-right analysis is ess~,ntird+ the compound teraq could I.)e viewed as an extension of the first Ina)ot element met while scanning the sentence. Vor eXSOlplo, the adiectiw'~ long is the first such element of the cotopoond adverb m the toJ~g rim. Among mmw other possibilities, the program, pausing ,:nJ the word long would test the occurrence of the and in to the loft of long, snd the occurrence of run to the right. Notice that the left-to-right constraint has to be somewhat relaxed iu order to test both left and right contexts of long.</Paragraph> <Paragraph position="1"> (ii) In a futuristic view Of parsing involving parallel computing, one might envision several levels of lexicon. At the firat level, lon(j on the one hand ~md run, on the othe~, would to two sots of cov=structions whose intersection would contain tfJu~ compouiKI ilt I'ilo /oJJfl run; the lattc, r can then be searched N)r in the input text. V(u con'ffJo,ond verbs, one wonh'l have to synthesize a matchinfl utterance, rather than .girn\[dy looking it up. Such a procedure car, always fm sln+utat ed s(tqueutJally.</Paragraph> <Paragraph position="2"> I. all cost-.,';, the representatio, el utterances which we have used. flamen the Se(luer.cos of syntactic categories, agow.~; for the separation of the lexi(:on of con'lpeund \[ornl!~: into classes for which direct access can be provided. In this way, dictionary Iooliup can Lie stied u|l 1i ftEMAIH< In laver el leflqo-right aualysit; one could point to the loci that complex terms can ellen be abbreviated and that abbreviations are nlostty rHfht truncations. In seth situations the remaining part (the tellmast p~rt) af the truocated term must carry the in|ormation that describers the rgtht context m order to allow reconstruction of the reducncl part. Iherc are however examples where abbreviations are carried out on the left part el a term. (e g. a progral~mlng language a larp.quagc).</Paragraph> <Paragraph position="3"> Preliminary figures have shown that conl\[~und terms form thP.</Paragraph> <Paragraph position="4"> essential \[.art of a lexicon-grammar. It is also interesting to observe that they Iorce both the linguist and the computer specialist to adopt a me(;h voore abstract view of language; - ~;emantically, tw defied)on, compoond utterances cannot be decomposed into simple utterances', in other terms, meaning is not compositional fer c(a'npoends, fleece, in a certain sense, one has to recognize that meaning has not nuJch to do with words; - syntactically, it has become a rather general hatlit to attach properties 1o individual words, In the case of compounds this mode of representation is no longer possible: Why privilege one part of a compound with marks rather than some other part? For example, there is no reason to attach the Passive marking to the verb rather than to either of the complements of the utterance to put the cart before the horse, Lexicon-grammar representations eliminate such questions by dolocalizing the syntactic information and by attaching it to the full sentence, In this sense, compound expressions provide a powerful n\]etivation for representing lexical and syntactic phenomena in the form of a lexicon-grammar.</Paragraph> <Paragraph position="5"> 11. The saree use of se(luences of syntactic categories is found in n string grammar (Z.S+ Harris 1961), which has proven to be quite efficient in syntactic recognition (N, Sager 1981, M. Salkoff 1973, 1979).</Paragraph> </Section> class="xml-element"></Paper>