File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/79/j79-1031_metho.xml
Size: 27,899 bytes
Last Modified: 2025-10-06 14:11:12
<?xml version="1.0" standalone="yes"?> <Paper uid="J79-1031"> <Title>American Journal of Computational Linguistics Microfiche 31 A CASE-DRIVE14 PARSER FOR NATURAL LANGUAGE</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> (NOT (SHOULD-BE HUMAN ABSTRACT)) </SectionTitle> <Paragraph position="0"> is put in, which will at first fail if a human or an abstract noun is the candidate, but will pass if nothing else seems to fit either. This simple test runs into problems with certain sentences, (6) I paid the money for ny mother's release. (7) I paia the money for my nother.</Paragraph> <Paragraph position="1"> (8) I paid the money for the prostitute.</Paragraph> <Paragraph position="2"> It will initially force the exchange case to reject &quot;for ~y mother's release&quot; in (6) because it is abstract, but later on it will accept it since all of the other cases flagged by *for1@ will also reject it. Sentence (7) is ambiguous, but &quot;mothern is alaost certainly in the beneficiary role here, so again the test vorks correctly bp rejecting the exchange ease. Sentence (8) is also amhiguous, but our interpretation would usually be that wprostitutew is in the exchange case here. The system will, however, assign it the beneficiary case as it did in (7). Additional work rust be done on case tests if this pacadign is to be useful,</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.3 verb Specific Cases </SectionTitle> <Paragraph position="0"> &any verbs have special constructs or cases which are not used with most other verbs. These irregularities are handled by writing special functions to find these cases. A few exaaples will ixlnstrate, The verb &quot;to hew has eight ~eanings in this system. The thira neaning is rto hate the property. . &quot;, as in sentence [9), (9) The house is red.</Paragraph> <Paragraph position="1"> This meaning 1s the one being used if an adjective phrase immediately follovs the verb. An adjective phrase in this position is therefore a special case of the verb &quot;to bew, and there is a special function, ADJ-IIST, vhich Iooks for it.</Paragraph> <Paragraph position="2"> The sixth meaning of &quot;to bem is &quot;to be fro. . . m., as in sentence (1 0) .</Paragraph> <Paragraph position="3"> (10) The lady is from Onagadouqon.</Paragraph> <Paragraph position="4"> This could be interpreted as an example of the source case, which is the case that mfroam usually flags; but what the sentence really means is that the lady has been living in Ouagadougou. This is, therefore, not the source case, but another special case of nto hew.</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 2-4 Verb Definitions </SectionTitle> <Paragraph position="0"> The verb is treated as the focal point of the sentence, A verb can hare millnip rreanings, The system discovers which meaning is intended by looking at the rest of the sentence. In so doinq, it builds a structure representing a parse of. the sentence, As stated above, each verb has associated with it a caseframe, which is a set of cases of the verb: some obligatosg, some optianal, and sole conditionally optional. These cases are embedded In a form on tbe pxaperty list of the verb, Consider the verb &quot;to order,* Its dictionary entry is as follows: Under the indicator V-!!EBB there is a form beginning (XP [ (AGBlJT . . IF is a function which takes an even, but other~ise variable, nuaber of argaaents, each pair representinq a meaning of the verb. The first element of each pair is a set of cases to be looked for, and the second is the structure to be bailt if they are founa. It is in the first element of the pair that the complexity lies. Let us look at it more closely.</Paragraph> <Paragraph position="1"> The list of cases is. in fact, a list of triples. The first elaaent of the triple is a fora to be EVALed. It is usually looking for a case. but any form is admissible. The second element of the triple is an atom: a register name. If the first forn EVALs to a non-BIL value, the Value is put into this register. In aur erample, for instance, the first triple is: (A,GENT (MUST-BE EIUHAH) ) AG (OPT (GETR PASSIVE) &quot;SOBEONE) The function of the Zirst form is to find the agent of the sentedce. If it succeeds, this agent is put ineo register AG.</Paragraph> <Paragraph position="2"> The third element of the triple indicates what to do on failure, If it is the atom &quot;QBLA, this indicates that the casa was ~bligatory; so if it was not found, IP should fail on this meaning of the verb. If the atorn is &quot;OPT&quot;, then the case is optional, the register is left empty, and IF continues with this meaning. The thira possibility is that this third eleaent is a forn, in which case it is EVALed. If it returns mOBLw. or 90PTm, then the result is as described above. If it returns anything else, then that is put into the register, and IF continues with this meaning of the verb, The third element of the first triple for &quot;to order&quot; is COPT (GETB PASSIVE) 'SOHEOZOE). OPT is a very siraple function uhich, if its first arguaent is non-MIL, returns its second arguaent. Otherwise it returhs &quot;OBLn. (GETR PASSIVE) is true if the sentence is in the passive voice. The first triple can be read as follows: Look for an agent which lust be human. If you find one, put it in register &G. Otherwise, if the sentence is passive, nake SOLIEUHE the agent. Otherwise fail.</Paragraph> <Paragraph position="3"> The second triple is simpler. It merely says: If you find an animate patient, then put it in register PA, else fail.</Paragraph> <Paragraph position="4"> The third triple is equally simple: it is not looking for a case, but a to-conplement.1 If these three elements are found in the sentence, then the spstea will look no further, but assaae that it has found the correct leaning of the verb. It will EVAL the second f orr of the pair, in this case: (BU1LI)Q tm<==>&quot; ? + (n<-,&quot; ORDER +)) TPS Toe) whicb builds the basic structure for the sentence.</Paragraph> <Paragraph position="5"> BUILDQ takes a variable number of argu~ents. The first is a kind of teaplate with slots in it. The rest of the arguaents fill the slots. The denotes a slot which is filled by the contents of a register. IOUA-PUT returns the structure of ths noun phrase associated with the noun in this register. The &quot;?&quot; is filled by the application of the function BOON-PUT to the contents of a register, Finally. the &quot;#* (see Appendix) indicates chat a form is to be EVhLed, and the result put into</Paragraph> <Paragraph position="7"> the slot, The slots are filled in order by the second, third, etc, arguments, It should be noted that the form of BUILDQ has been strongly motivated by its use in Moodsq ATN [11].</Paragraph> <Paragraph position="8"> So in this case:</Paragraph> <Paragraph position="10"> where GETR Leturns the contents of a register, Programming details do not belong in a paper of this kind.</Paragraph> <Paragraph position="11"> A11 of the code is in Taylor[ 8 'j for those interested. In the following example, then, function names and excessive details will be, on the whole, left out. A detailed account of the basic algorithm and control structure will be given. Ue will look at a simple sentence. lore conplex structures such as relative and subordinate clauses are treated in nnch the sane way as their parent sentences, consider the sentence: (11) The Ban beside the window played the piano for Nary.</Paragraph> <Paragraph position="12"> as stated above, the first step in the process is a partial parse using an ATN. The structural description usually derived from this parse is incomplete. That is, no decisions are aade about what modifies what, what meaning of -the verb is being used, etc. The basic idea behind the ATN is to find the verb but while it is doing this, it seeas useful to chop the sentence up into its parts. There are problems with just hou this chopping should be done, but with most sentences it is straightPS oruard. The AT8 parse returned for sentence (11) will have the It is on this preliminary parse that the program works, First, the main verb is found, and a function is invoked which controls the top-level back-up, This function EVALs the form on the property list of the verb under the indicator V-HEAN, This form for PLBY is a very long one, ana is given in the appendix. The forn in question is a call to IF. whose machanisn has been briefly described above, In this* instance IF has ten arguments, indicating that there are five meanings to the verb PLAY in the system. The first meaning is nto play a usical instsnlnent, &quot; The first case looked Ear is the AGENT, This agent should be a musician, ana nost be human. This search is initiated by ElAling the first forn in the first triple of the first argument to IF: (AGENT (AND (SHOULD-BE HUSXCXAB) (BUST-BE HUMAN) ) ) AGENT is fairly complex, but basic all^ it looks for a component of the ATI parse (in future called the &quot;p-parsev, for partial-parse) which is in an appropriate position to be an agent, and which passes the test (the argnhent to AGEHT,) By 'appropriate position' is want, for instance, that if the sentence is in the active voice, the agent is ~obablrr the first noun phrase in the sentence.</Paragraph> <Paragraph position="13"> For this situation, AGENT inrediately finds @?the manR as the obvious candidate, and it applies the test (AHD (SHOIILD-BE IltJSICIAB) (RUST-BE HUMAN) ) , HOW, unless something special has been put on the property list of HAN previously, the (SHOULD-BE BUSICIAN) part of the test will fail. (There are tua levels of tests in this syster: SHOULD-BE tests and NUST-BE tests. Tbis aechanisa is very usePS ul for forcing a verb like PLAY to look very hard for a musician to play an instrument -- but to accept any human if it fails at first. This is especially powerful for resolving anaphoric references). Thus AGEIT fails, which invokes the third eleaent of the AGENT triple: (OPT (GETB PASSIVE) 'SOIIEOIIE) . This may be read as: AGENT is optional if the sentence is in the passive voice, in which case pat SOlEOlE in as the agent; otherwise AGENT is obligatory. Since the sentence is not passive, AGENT is obligatory. As the AGEIT case was not found, this first neaninq of PLAY fajls, IF then goes on to the next pair of arguments. This pair is designed to pick up the meaning of PLAY as in nto play music-,g1 Bote that the test on AGEHT is just like the previous one, vhich means failure here as well. The program moves on to the third meaniqg of to PLAY: &quot;to play a sport,&quot; Here the test on AGENT is</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> (AND (SHOULD-BE SPORTS-UAH) (HOST-BE HUBAIJ) ) . Once again, </SectionTitle> <Paragraph position="0"> providing fl4H does not have SPORTS-flAl on its property list, this attempt fails. The prograa therefore goes onto the fourth meaning which is designed to pick up the ergative usage of &quot;to playw as in HThe music played fran the room,fl Since the test for this meaning is (HOST-BE IUSIC). this neaninq will also fail. On to the fifth, and last, meaning, which is a sort of catch-all.</Paragraph> <Paragraph position="1"> It is the beaning of &quot;to playn as in &quot;to entertain oneself.&quot; Here the test on AGEMT is (flUST-BE AMIHATE). flThe mant1 passes this test, since UAl has the property ANIMATE. Since AGENT is the only case looked for, this meaning is taken to be the correct one, and the following structure is built by the call to IF has completed its job. It has found what it takes to be the correct meaning of the verb. Nou the re- of the sentence mast be processed. The seconil elenent of every top-level list in the p-parse is a flag vhich is initially NIL, but which is turned on when that part of the sentence is considered to be correctly dealt with. In our example, so far only two parts are flagged: the first noun phrase: &quot;the %anw, and the verb phrase. The function which takes care of the rest of the sentence simply goes down the p-parse checking these flags. If it finds one which is MIL it works on that part of the sentence until it either succeeds, or fails -- causing back-up.</Paragraph> <Paragraph position="2"> For this example, then, the first phrase it comes upon needing work is the prepositional phrase: &quot;beside the window&quot;. As rentioned above. there is a raster-table in the systen vhich aBS0Ciates each preposition with the cases it Bay flag. BESIDE flags the cases: LOCATIOI and DESCRIPTIVE. A11 of the cases but DESCRIPTXVE are cases of the verb. DESCRIPTIVE is a special case vhich is used for preposition phrases which modify noons.</Paragraph> <Paragraph position="3"> When the list of cases associated with a preposition is retrieved, there is a question as to uhich case to try first.</Paragraph> <Paragraph position="4"> For this there is a foregrounding routine, with several criteria for f oregrounding : First of all, in the dictionary definition of the verb, the user may specify that a certain preposition trigger a particular case program. Since there is no such specification for &quot;to play&quot; in the current dictionary, nothing hap pens here, Secondly, on the property list of each verb is kept a record of uhich prepositions flagged vhich cases in the previous sentehces. The cases associated with the preposition in question (if there are any) are foregtounded, so that they uill be tried first (the mst recent case first, etc.) Pinally, if DESCRIPTIVE is one of the cases in the list of cases for this preposition, and if a noun phrase or a prepositional phrase immediately' precedes the Jf phrase in question, and if the noun in that noun phrase or prepositional phrase is not a proper noun, then DESCRIPTIVE is put at the front of the list, and is thus tried first.</Paragraph> <Paragraph position="5"> This seemingly obscar~ rule for foregrounding the DESCRIPTIVE case is just a heuristic. If the tests associated with each case are good enough, it makes nb difference to the final outcome iE the foregrounding is done or not. In some instances, houever, if the DESCRIPTIVE case is not tried first, it will never be tried. In our exa~ple, for instance, it is the man who is beside the vindow (DESCRIPTIVE case) ; he did not play the piano beside the window (LOCAT'ION case). But it is perfectlp' feasible for him to have played it beside the, window (if we know nothing about the location of the piaao.) Therefore either of the cases will succeed. It is only the position of the prepositional phrase that indicates which case is correct.</Paragraph> <Paragraph position="6"> Continuing with our exaaple: the DESCRIPTIVE case is foregrounded. and so the descriptive case function, DESC, is invoked with the - phrase &quot;beside the windova as its argument. Since the descriptive case almost always involves a prepositional phrase modifying the noun phrase or prepositional phrase im~ediately before it, DBSC first checks to see if &quot;beside the viado@. is a possible descriptor of &quot;the man. Since we do not have a data base to check to see if there is a man besiae a window, our check must be a general one. Host nouns have a size associated with them under the indicator OBJ-SIZE, This is a very crude breakdown of physical objects into eleven size categories. *The uorld@l is size 10 and &quot;a pinM is size 0. (These sizes should be able to be changed by classifiers, adjectives, or modifying phrases. A toy elephant is probably not the sane size as an elephant. This feature is currently not implemented.) The check for &quot;beside&quot; is merely used to rule out things like *the pin beside CanadamW Because abstract nouns have no size inforaation, sentences like &quot;Be had a thought beside the oceann are not ambiguous. In any event, &quot;beside the window&quot; is found to be a likely modifier of &quot;the mann, and DESC succeeds, Since wbesidew is a locative preposition, DESC returns the structure: A forn is stacked which will put this structure into the main sentence structure if the rest of the sentence can be handled. Just where it is placed is determined by DESC, Since the prepositional phrase modifies &quot;the man*, it will be put in as follows:</Paragraph> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> <-DEPLBITE- THE </SectionTitle> <Paragraph position="0"> so the prepositional phrase &quot;beside the vindow* is flagged as completed, and the next unflagged phrase, &quot;the piano&quot;, is picked UP-Here we run into problems. lhere does &quot;the piano* fit into the structure? Vhat does it modify? %hat is its case? There are relatively feu ways a noun phrase can be used at this point, It copla be an exaaple of the TIaE case, as in nI came home this morn..&quot;. bat flpianol fails the TXBE-test. It could be a classifier, bat the phrase follosing it would have to be a noun phrase for this to be the case. So failure has occurred.</Paragraph> <Paragraph position="1"> Scmethiag has gow wrong, IF must have chosen the wrong meaning of the tarb, The program must back up.</Paragraph> <Paragraph position="2"> A11 the parts of the sentence flagged as used are unflagged, and back-up occurs into IF again, Here it is found that thete are no reaaings of the verb left to try. One of the ssaaZngs that wag rejected earlier most have been the correct one, So IF fails entirely. and the program enters the top-level back-ap rechanisn.</Paragraph> <Paragraph position="3"> There are two possible reasons failure has occurred: 1) Either the progran did not look back far enough in an attempt to resolve an anaphoric reference, or 2) The tests were too severe, (ie: the SHOULD-BE tests caused failure when they should not have,) The anaphoric part of the system has not been explained pet, but as there were no pronouns in the sentence, the first reason can be ruled out. In order to ueaken the tests, a flag is set to shut off the SHOULD-BE tests, That is, all SHOULD-BE tests will succeed in future. The process begins again with IF, The beginning is the same, but this tine he first invocation of AGERT will succeed, because the test (AND (SHOULD-BE BUSZCIAIQ) (HUST-BE HUBAZII) ) succeeds, The structure it returns is put in the register AG, IF continues with the second triple of parameters, and the form (PATIENT (HOST-BE HOSICAL-IISTBTJMEIIT~ ) is EVALed. Bow, PAT1 ENT is very similar to 1GBBT: it looks in the appropriate place in the sentence for the! patient of the verb, It then applies its TEST to it, In an active sentence, such as our example, the candidate for PATIEIT is the first noun phrase after the verb, The pianon is found, and since it gasses the test (MUST-BE HUSICAL-IEJS'ERUBElrlT). PITIEWT returns &quot;the pianom as the patient of the sentence.</Paragraph> <Paragraph position="4"> Once again it seems that the correct meaning of the verb has been found, therefore IF EVALs the BUILDQ associated with that seaninq. The folloving structure is built: It aou remains to try to clean up the unflagged parts of the sentence. The first one, again, is &quot;beside the windowm, and exactly the same thing is done as was done previously: it is decided that &quot;beside the window&quot; is a locative aescriptor of &quot;the manm, and this decision is stacked for later action. The only other part of the sentence to be handled is &quot;for l!Iaty.m 4s with nbesidem, the cases associated with &quot;forw are returned f roa the CAS B-TABLE. They are: DURATION, BEHEPICIABY, EXCBAIIGE, and IBD-SUBJ. (IID-SIIBJ has not been iapletaented yet.) Assawing that tbare hare been no relevant previous sentences, the foregroanding of cases vill have no effect on this ordering. The DUBATIOH case is tried first. DURATION is a particularly simple case. Basically it checks to see that the noun phrase in the prepositional phrase has the property TIRE under the flag B-PROP, &quot;flaryw fails this test, and DUBATION is rejected, The next case is BENEFICIARY. The only test for this case is that the noon phrase be animate. &quot;naryn passes this test since it has the SUPERSET WOHBN and WOHAN has the N-PROP ABIBATE, Therefore BENEFICIARY succeeds and returns: &,<-BEHEPZCIABY- (NPR RABY) ) . Unlike &quot;beside the uindov&quot;, this phrase is a case of the verb. Because all cases of the verb (but AGEIT and PATIENT) are considered to be essentially parallel uith respect to the verb, they are put into the structure at the saae level, that of the verb symbol fl<--w, and their order is arbitrary. A form is stacked to put the above structure into toe main sentence structure in the correct location.</Paragraph> <Paragraph position="5"> Bext the p-parse is checked for any unused phrases, None are found, and the program terminates by placing the two forms into the structure, which is returned as the nmeaninglg of thb A gloss of this structure night be: the man, uho has location mbeside the vindouw, in the past did something which caused the piano to emit sound. The beneficiary of his action</Paragraph> </Section> <Section position="8" start_page="0" end_page="0" type="metho"> <SectionTitle> 4. 0th~ Examples </SectionTitle> <Paragraph position="0"> A few examples of sentences handled by the systea are given here. Space constraints do not allow us to include parses, for all the sentences but the remainder are in Taylot[8].</Paragraph> <Paragraph position="1"> The man vith the wife who is bigger than he goes to Vienna with a wonan who is saaller than he,</Paragraph> </Section> <Section position="9" start_page="0" end_page="0" type="metho"> <SectionTitle> <-DEFINITE- THE </SectionTitle> <Paragraph position="0"> Pred loved the 1 woman before he came to Canada, Many cases can appear as embedded sentences as well as prepositional phrases. Pronoun references within sentences can be resolved.</Paragraph> <Paragraph position="1"> Pred played Jack tennis.</Paragraph> <Paragraph position="2"> Sone verbs allow the co-agent case to appear in this form. The music played loudly from the small room.</Paragraph> <Paragraph position="3"> There is a small. pen in that box, The &quot;there ism construct is a special case of &quot;to be,&quot; The house with the piano in it was given to Fred by his wife.</Paragraph> </Section> <Section position="10" start_page="0" end_page="0" type="metho"> <SectionTitle> 5. 4gaphoric References </SectionTitle> <Paragraph position="0"> Anaphoric references are resolved in the case analysis part of the systea. As the system is developed around a specific aomain or data base, these routines will be modified to give them .ore power. Currently they work solely by looking at the previous sentences.</Paragraph> <Paragraph position="1"> Resolution of anaphoric references fits very well into a case system. Since a pronoan is only eacountered in a search for a particular case, this gives the anaphoric routines a great deal of infornation about what kind of referent to look for, Here we will give just a brief outline of a fairly intricate pr ocedare, When a pronoun is found in the sentence, it triggers a call to the function AI1PBOBfC. ABAPHORIC takes four arguments: 1, A list of cases to look for, 2,2 A test that the referent must pass, 3. B number indicating hov far back in the history to look, Q. The pronoun referenced, The search is breadth first, in that the program tries very hard to find the referent in the earliest possible sentence. Fhe test is an arbitrary fosm. SBOULD-BE and RUST-BE elements of the test are shut off on failure as they are in the rest of the back-up procedure.</Paragraph> <Paragraph position="2"> Say, for instance, that the system is given the sentence: (12) He played the piano, The call to AGENT uould be the form:</Paragraph> </Section> <Section position="11" start_page="0" end_page="0" type="metho"> <SectionTitle> (AGENT (&HD (SHOULD-BE IlUSLCIAN) (BUST-BE HUHAN) ) ) . Since the </SectionTitle> <Paragraph position="0"> obvious candidate for the agent is a pronoon, AHAPHORIC uould be invoked, Its TEST would be:</Paragraph> </Section> <Section position="12" start_page="0" end_page="0" type="metho"> <SectionTitle> (AID (SHOULD-BE BUSLCIAN) (RUST-BE HUHAH) 11, </SectionTitle> <Paragraph position="0"> ANAPHOBIC would look back through the parses and p-parses of the recent sentences which are kept as global variables, ldoking for a noan phrase that ail1 pass this test, As it becomes aote and re desperate it will make the test less strict. Since %eta is the pronoun, ABAPEORIC is smart enough to insist that the referent be male.</Paragraph> <Paragraph position="1"> Host pronoun references within a sentence itself can also be resolved. For instance: ($3) _I_C_ Fred went to London so he could visit the queen. (14) - Ja& took you up in airplane.</Paragraph> <Paragraph position="2"> Beferences to events ana places can also be handled: (15) It was anfostonate that the children were killed[ 161 I went to Prance. Ftetl lives there.</Paragraph> <Paragraph position="3"> The resolution of locat ional references (&quot;herea and thereu) is a difficult proble.. 811 treating stherern as a pronoun whose referent aast be a location, Wtherew Is handled fairly well by the system, 'Here&quot; is mach more difficult, since its resolution is highly context dependent.</Paragraph> <Paragraph position="4"> another dPff icult problea is illustrated by sentence (17) , (I?) Harp. was aboard the Titanic when she sank.</Paragraph> <Paragraph position="5"> Phis sentence is aabigooas: Mary could hare sunk in a sviaminq pool while she was on the ~itanic, bat this is probably not the intanlied meaning. If wto sink&quot; is defined with a test like</Paragraph> </Section> <Section position="13" start_page="0" end_page="0" type="metho"> <SectionTitle> (SHOULD-BE BOAT) </SectionTitle> <Paragraph position="0"> m rfU pick up &quot;Titanicn correctly. Its first u tZmiUdate is *larym, however; thus if the test does ra out, the qsteim will choose her as its initial Qm-fi;S~ This illnstrates a difficulty with the current system's anaphoric routines, The first candidate found which passes the test is chosen, rather than all of the candidates being looked at, and the most likely accepted, In sn~mary, then, what we have implemented is a powerful parser for English sentences. It eaploys case frames to discore2 the intended meaning of the verb, then continues to use case in its analysis of the rest of the sentence. Each case has one ofr more tests associated with it, and each verb can add further tests to the cases in its case frames. These tests are gradually veakened on failure, giving the careful user complete control over the back-up.</Paragraph> <Paragraph position="1"> The system is carefully structured to allow easy extension or aodification. As more world knowledge is added to the systen, the tests on th& cases, and in the case frames can be made to enploy this knowledge, thus naking them aore selective.</Paragraph> <Paragraph position="2"> The structure building routines are coapletely general, allowing the user to retarn any strncture he desires within the constzaints of the general knowledge he puts into the systea. tie feel that this system illustrates the siaplicity, tlexibility, and expressive power of case in applications in zo~putational linguistics,</Paragraph> </Section> class="xml-element"></Paper>