File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/78/t78-1027_metho.xml
Size: 26,866 bytes
Last Modified: 2025-10-06 14:11:12
<?xml version="1.0" standalone="yes"?> <Paper uid="T78-1027"> <Title>WITH A SPOON IN HAND THIS MUST BE THE EATING FRAME</Title> <Section position="4" start_page="187" end_page="187" type="metho"> <SectionTitle> 2 THE CLUE INTERSECTION METHOD </SectionTitle> <Paragraph position="0"> Rather than immediately presenting my scheme, let me start by showing the problems with an alternative possibility, which I will call the &quot;clue intersection&quot; method. This alternative is by no means a straw man as one researcher has in fact explicitly suggested it (Fahiman 1977) and I for one find it a very natural way of thinking about the problem.</Paragraph> <Paragraph position="1"> The idea behind this method is that we are given certain clues in the story about the nature of the correct frame, and to find the frame we simply intersect the possible frames associated with each clue. To see how this might work let us take a close look at the following example: As Jack walked down the aisle he put a can of tunafish in his basket.</Paragraph> <Paragraph position="2"> The clues here are things like &quot;aisle&quot;, &quot;tLmafish&quot; etc. Of course, I do not mean to say that it is the English words which are the clues, but rather the concepts which underlie the words. I will assume that we go from one to the other via an independent parsing algorithm. (However this assumes that there is no vicious interaction between frame determination and disambignation.</Paragraph> <Paragraph position="3"> Given that disambiguation depends on prior frame determination (see (Hayes 1977) for numerous examples) this may be incorrect.) So the input to the frame determiner will be something like: The details of the representation do not figure in the paper, and those which do are fairly uncontroversial. An exception here is the use of specific predicates like BASKET or AISLE. We will return to this point in the conclusion.</Paragraph> <Paragraph position="4"> Given this representation we can imagine one method of finding the appropriate frame. Our clues are the various predicates in the input, such as as AISLE, BASKET; etc. Index under each of them will be pointers to those places where it comes up. Under AISLE we might find CHURCH, THEATER, and SUPERMARKET, while BASKET will have</Paragraph> </Section> <Section position="5" start_page="187" end_page="188" type="metho"> <SectionTitle> LITTLE -RED-R IDIN G-HOOD ~ , and SUPERMARKET. The </SectionTitle> <Paragraph position="0"> point is that none of these clues will be unambiguous, but when we take the intersection the only thing which will be left is SUPERMARKET.</Paragraph> <Paragraph position="1"> There are, however, problems with this view of things. For one thing it ignores what I will call the &quot;clue selection&quot; problem. Put in the plainest fashion the difficultly here is deciding exactly what clues we will hand over to the clue resolution component, and in what order. So in the last example I selected some of the content of the sentence to hand over to the clue resolver; in particular AISLE, and BASKET. This seemed reasonable given that they do tend to suggest &quot;supermarket&quot;, as desired. But there is more information in the sentence. It was Jack who did all of this. Why not intersect what we know about Jack with all of the rest, or WALK? Or again, suppose something ever so slightly odd happens, such as the basket hitting a screwdriver which is on the floor. SCREWDRIVER will have various things indexed under it, but more likely than not the intersection with the rest of the items mentioned above will give us the null set. For that matter, is there any reason to only intersect things in the same sentence? The answer here is clearly no, since there are many examples which require just the opposite.</Paragraph> <Paragraph position="2"> Jack was walking down an aisle. He was pushing his basket.</Paragraph> <Paragraph position="3"> But if we do not stop a sentence houndries where do we stop? It is ridiculous to go through the entire story collecting clues and then do a grand intersection at the end.</Paragraph> <Paragraph position="4"> A reasonably natural solution to the clue selection problem would start with the observation that usually we already have a general frame.</Paragraph> <Paragraph position="5"> When new clues come in we see if they are compatible with what we already believe. If so, fine. If not, we see if the clue suggests a different context frame. If not (a s with, say, WALK which occures so often as to be unsuggestive) then nothing more need be done. If there are newly suggested context frames they should be investigated. This will be done for every predicate. Now the clue intersection method is compatible with this idea, but in its broad outline we are moving closer to what I have been characterizing as the Minsky proposal.</Paragraph> <Paragraph position="6"> Furthermore, there are some problems with the clue intersection method which go beyond the mere suggestive. Consider the following example: Jack took a can of tunafish from the shelf.</Paragraph> <Paragraph position="7"> Then he turned on a light.</Paragraph> <Paragraph position="8"> After the first line the intersection method should leave us undecided between KITCHEN and SUPERMARKET. The next line should resolve the issue, but how is it that it does so? It must have something to do with the fact that normally a shopper at a store would not be the person to turn lights on or off, while it would be perfectly normal for Jack to do it in what presumably is his own kitchen. But this sort of reasoning is not easily modeled by clue intersection because it would seem to depend on making inferences which are themselves dependent on having the context frames available. That is to say, before we can rule out SUPERMARKET, we need some piece of information from the SUPERMARKET frame which will enable us to say that Jack should not be turning on a light, given that he is cast in the role of SHOPPER in that frame.</Paragraph> <Paragraph position="9"> Interestingly enough, Fahlman (who I earlier noted is a proponent of the clue intersection method) had a major role in the evolution of the Minsky proposal which I advocate. As such it behoves us to consider why he then rejected the idea in (Fahlman 1977). His primary reason is his observation that frequently in vision one does not have any single clue which could serve as the basis for the first guess at the appropriate frame. Rather it would seem that one has a multitute of very vague features, each one of which could belong to a wide variety of objects or scenes. To select one of them for a first guess would be quite arbitrary and would involve one in an incredible amount of backtrack. It would seem much more plausible to simply do an intersection on the clues and in this way weed out the obvious implausibilites.</Paragraph> <Paragraph position="10"> While this analysis of the situation in vision is quite plausibile, I estimate that high level vision is still in a sufficiently rudimentary state that these conclusions need not be taken as anything near the final word.</Paragraph> <Paragraph position="11"> Furthermore, even if it were proved that vision does need an intersection type process, I can easily believe that the process which goes on in vision is not the same as that which goes on in language. For one thing in vision there is a natural cut-off for clue selection - the single scene. For another~ within the scene there is a natural metric on the likelyness of two features belonging to the same frame - distance. Weither or not these in fact work in vision, they do suggest why someone primarily worried about the vision problem would not see clue selection as the problem it appears to be in language.</Paragraph> </Section> <Section position="6" start_page="188" end_page="189" type="metho"> <SectionTitle> 3 DIFFERENT KINDS OF INDICES </SectionTitle> <Paragraph position="0"> As I have already said, the scheme I believe can surmount the difficulities presented in the last section is a variant on one proposed by Minsky, and elaborated by Fahlman (1974) and Kuipers (1975). The basic idea is that one major feature or clue is used to select an initial frame. Other facts are then interpreted in light of this frame. If they fit, fine. If not then another frame must be found which either ~omplements or replaces the original frame. In the previous propolsals the original frame contained information about alternate frames to be tried in case of certain types of incompatabilities. This may or may not work in vision (which was the primary concern of those mentioned earlier) however I shall drop this part of the theory. In discourse there are simply too many ways a frame can be inappropriate to make this feasible. For example, it stretches credibility to believe that SUPERMARKET would suggest looking at KITCHEN in the case the shopper turns on the lights.</Paragraph> <Paragraph position="1"> So let us consider a very simple example.</Paragraph> <Paragraph position="2"> Jack walked over to the phone. He had to talk to Bill.</Paragraph> <Paragraph position="3"> It seems reasonable to ass~e that we guess even before the second sentence that Jack will make a call. To anticipate this we must have TELEPHONING indexed under TELEPHONE. When we see the first line we first try to integrate it into what we already know. Since there will be nothing there to integrate it into, we try to construct something. To do this we look to see what we have indexed under TELEPHONE, find TELEPHONING, and try that out. Indeed it will work quite well, since one of the things under TELEPHONING is that the AGENT must be in the proximity of the phone, and Jack just accomplished that. Hence we are able to integrate (AT JACK-1 TELEPHONE-I ) into the TELEPHONING frame, and everything is fine.</Paragraph> <Paragraph position="4"> Nothing is ever really this simple however, and even in this example, which has been selected for its comparative simplicity, there are complications. I suspect most people have assumed in the course of this example that Jack is in a room, and perhaps have even gone so far as to assume he is at home. Nothing in the story says so of course, and if the next line went on to say that Jack put a dime into the phone we would quickly revise our theory.</Paragraph> <Paragraph position="5"> To account for our tendency to place Jack in a room, we must have a second index under TELEPHONE which points to places where phones are typically found. (An possible alternative is to have this stated under TELEPHONING, but this would make it difficult to use the information in cases where no call is actually being made, so TELEPHONING, even if hypothesized, would not stay around long.) So we will hypothesize two kinds of indices, an ACTION index and a LOCATION index.</Paragraph> <Paragraph position="6"> This distinction should mirror the intuitive difference between placing and object in a typical local and placing an action in a typical sequence. Other distinctions of this sort exist and may well lead to the introduction of other such index types: locating objects and actions in time for example. However I would anticipate that the total number is small (under I0, say).</Paragraph> <Paragraph position="7"> To illustrate how these index types might hook up to TELEPHONE I will use a slightly extended version of the frame representation introduced in (Charniak 1977) and (Charniak forthcomming). From the point of view of this paper nothing is dependent on this choice. It is simply to give us a sepecific notation with which to work.</Paragraph> <Paragraph position="8"> (TELEPHONE (OBJECT) ;The frame describes an OBJECT ;(and not, say, an event).</Paragraph> <Paragraph position="9"> VARS:(THING) ;I only introduce one variable ... ;THING which is bound to the ;token in the story repre;senting the phone</Paragraph> <Paragraph position="11"> ;If we instantiate the ROOM frame then the ;HOME-PHONE variable in it should be bound ;to the token which is bound to THING.</Paragraph> <Paragraph position="12"> ;Similarly for PUBLIC-LOC and PAY-PHONE.</Paragraph> <Paragraph position="13"> ACTION: ((TELEPHONING (PHONE . THING))) ...) ;Other portions of the frame would ;describe its appearance, etc.</Paragraph> <Paragraph position="14"> We will not be able to integrate the first line of our story into any other frame, so we will hypothesize the TELEPHONING frame and either the room frame or the public place frame. Given my subject data on what people assume, the room frame is placed, and hence tried, first. This will cause the creation of two new statements which serve to specify the frames now active, and their</Paragraph> <Paragraph position="16"> The syntax here is the name of the frame followed by dotted pairs (VARIABLE . BINDING). Earlier I used a place notation for simplicity, e.g.,</Paragraph> </Section> <Section position="7" start_page="189" end_page="189" type="metho"> <SectionTitle> (TELEPHONE TELEPHONE-I ) </SectionTitle> <Paragraph position="0"> In fact this would be converted internally to the dotted pair format :</Paragraph> <Paragraph position="2"> (1975) calles &quot;slots&quot;. They are also equivalent (to a first approximation) to KRL slots such as</Paragraph> <Paragraph position="4"> So we are hypothesizing I) an instance of telephoning, where the only thing we know about it is the telephone involved, and 2) a room (ROOM-I) which at the moment is only furnished with a telephone. Note that this assumes that in our room frame we have an explicit slot for a telephone. This is equivalent to assuming that rooms typically have phones in them.</Paragraph> <Paragraph position="5"> We can now integrate the fact that Jack is at the phone into the telephoning frame, ass~ning that this state is explicitly mentioned there (i.e. we know that as part of telephoning the AGENT must be AT the TELEPHONE). With this added our TELEPHONING statement will now be:</Paragraph> <Paragraph position="7"> When the second line comes in we must see how this fits into the TELEPHONING frame, but this is a problem of integration. The frame determination problem is over for this example.</Paragraph> </Section> <Section position="8" start_page="189" end_page="189" type="metho"> <SectionTitle> 4 CONSTRAINTS ON THE HYPOTHESIS OF NEW FRAMES </SectionTitle> <Paragraph position="0"> Early on we noted that it was only necessary to worry about a new frame if we received information which did not fit in the old ones.</Paragraph> <Paragraph position="1"> Then when we introduced the two kinds of indecies we noted that we wanted to place events in a sequence of events, and objects in their typical local. This immediately suggests that when we get an unintegratable action we use the ACTION index on the predicate, while for objects we would use the LOCATION index. However, this is not general enough in at least two ways.</Paragraph> <Paragraph position="2"> For one thing, often we will have a non-integratable action where it is not the action frame, but rather the objects involved in the action which suggest the appropriate frame. Our example of someone going over to a phone is a case in point. Here GO tells us nothing, but TELEPHONE is quite suggestive. To handle this the search for ACTION indices must include those which are on OBJECT frames describing the tokens involved in the action. So since Jack is going to something which is a telephone, we look on the ACTION index of TELEPHONE.</Paragraph> <Paragraph position="3"> We must also extend our analysis to handle states. If we are told that Jack is in a restaurant we must activate RESTAURANTING. In our current analysis (RESTAURANT (THING .</Paragraph> <Paragraph position="4"> RESTAURANT-l)) will not do this since it is an OBJECT frame and hence will onlybe looking for LOCATIONs in which the restaurant will fit. Hence in this case the IN frame must act like the GO frame in looking for ACTION indeeies in which it might fit. More generally, any state which is typically modified by an action should cause us to look for ACTION indicies. So IN or STICKY-ON would do so, SOLID or AGE would not. (But if in the case at hand we are told that something did change the SOLID status then we would treat it like an action, as in &quot;In the morning the water in the pond wes solid&quot;.</Paragraph> <Paragraph position="5"> Up to this point then the frame selection process looks like this: I) When a statement comes in try to integrate it into the frames which are already active.</Paragraph> <Paragraph position="6"> In general this can require inference and a major open problem is how much inference one performs before giving up. If the integration is successful, then go on to the next statement.</Paragraph> <Paragraph position="7"> 2) If the statement is a description of an object (i.e. an OBJECT frame) then use the LOCATION index on the frame to find a frame which incorporates the statement. Keep track of yet untried suggested LOCATION frames.</Paragraph> <Paragraph position="8"> 3) If the statement is an action or changable state, then look for an ACTION frame into which the action (or state) can be integrated. First look on the frame for the action (or state) and then on the object frames describing the arguments of the action (or state). Again, keep track of any remaining ones.</Paragraph> <Paragraph position="9"> There must be a complicated process by which we test frames for consistancy with what we know about the story already. If it is not consistant we must involve an even more complicated process of deciding which is more believable, previous hypothesis about the story, or the current frame. I have nothing to say on this aspect of the problem.</Paragraph> <Paragraph position="10"> There is however, one type of example which raises some doubts about the above algorithm. These mention some object with associated ACTION frames, but only in connection with states which do not demand an ACTION frame for their integration. For example: The car was green. Jack had to be home by three.</Paragraph> <Paragraph position="11"> In this example the above algorithm will not consider DRIVING because GREEN will not demand that we look at the action index assoicated with its arguments (the car), (Even if it did nothing would happen because the fact that the car is green would not integrate into DRIVING.) However, much to my surprise, when I gave this example to people they did not get the DRIVING frame either. However, with a modified example they do.</Paragraph> <Paragraph position="12"> The steering wheel was green. Jack had to be home by three.</Paragraph> <Paragraph position="13"> This is most mysterious. One suggestion (Lehnert personal communication) is that to &quot;see&quot; the steering wheel the &quot;viewer&quot; must be in the car, which inturn suggests driving (since IN would demand action integration). This may indeed be correct; but we must then explain why in the first example the fact that the viewer must be NEAR the car does not cause the same thing. In any case however, these examples are sufficiently odd that it seems inadvisable to mold a theory around them.</Paragraph> </Section> <Section position="9" start_page="189" end_page="189" type="metho"> <SectionTitle> 5 MORE COMPLEX INDICES </SectionTitle> <Paragraph position="0"> There is one way in which the telephone example makes the problem look simpler than it is. In the case of TELEPHONE it seems reasonable to have a direct link between the object TELEPHONE and the context frame TELEPHONING. In other cases this is not so clear. For example, we earlier consider the example: The woman waved as the man on the stage sawed her in half.</Paragraph> <Paragraph position="1"> Here it would seem that the notion of sawing a person in half is the crutial concept which leads us to magic, although the fact that the woman does not seem concerned, and the entire thing is happening on a stage certainly help re-enforce this idea. But presulably the output of our parser will simply state that we have here an incident of SAWING. Does this mean that we have under SAWING a pointer to MAGIC-PERFORMANCE? At first glance this seems odd&quot; at best. Some other examples where the same problem arise are: The ground shook.</Paragraph> <Paragraph position="2"> (EARTHQUAKE) (Example due to J. DeJong) There were tin cans and streamers tied to the car. (WEDDING) There were pieces of the fusilage scattered on the ground. (AIRPLANE ACCIDENT) In the final analysis the real problem here is one of efficiency. If, for example we attach EARTHQUAKE to EARTH, then we will be looking at it in many circumstances when it is not applicable. (The alternative of attaching it to SHAKE is little better, and possibly worse since it would not handle &quot;Jack felt the earth MOVE beneath him&quot; - assuming the average person gets EARTHQUAKE out of this also.) One way to cut down the number of false suggestions is to complicate the indices we have on each frame. So far they have simply been lists of possibilities. Suppose we make them discrimination nets. So, under SAWING we would have various tests. On one branch would appear MAGIC-PERFORMANCE, but we would only get to it after many tests, one of which would see if the thing sawed was a person. In much the same way the discrimination net for EARTH could enquire about the action or state which caused us to access it. If it were a MOVE with the EARTH as the thing moved then EARTHQUAKE.</Paragraph> <Paragraph position="3"> Note however that if there were few enough things attached to SAWING our net would not save significant time. Even if we were to access the MAGIC-PERFORMANCE frame the first thing we would do is check that the thing proposed for the SAWED-PERSON variable was indeed a person, The net only saves time when a single test in the net rules out a number of frames. At the present time I have not thought of enough frames associated with SAWING to make this worth while. But as I suspect this is primarily do to lack of work on my part, I will assume that discrimination nets will be required.</Paragraph> <Paragraph position="4"> If we allow a discrimination net to ask arbitrary questions there will be the problem that it may ask questions which are not yet answered in the story. However a reasonable restriction which would prevent this would go as follows: Suppose statement A causes us to look at frames on an index of B. The discrimination net may only enquire about the predicate of A (EARTH looks to see if A was a MOVE), and what object frames describe the arguments of A or B (SAW looks to see if the thing sawed was a PERSON).</Paragraph> </Section> <Section position="10" start_page="189" end_page="191" type="metho"> <SectionTitle> 6 OTHER USES OF FRAME DETERMINATION </SectionTitle> <Paragraph position="0"> Earlier I noted that integrating a statement into a frame requires inference. Here I would like to point out that a modification of the above ideas would be helpful in this process as well.</Paragraph> <Paragraph position="1"> Consider the following: Jack went to a restaurant. The menu was in Chinese. &quot;What will I do now&quot;, thought Jack. Our rules here will get us to RESTAURANTING after the first line. But if we are to understand the significance of the last line we must realize the import of line two; Jack can't read the menu. It would seem unlikely that RESTAURANTING would ask about the language of the menu; hence sentence two cannot be immediately integrated into RESTAURANTING. More reasonable would be to know that if something is in a foreign language it cannot be read, and one normally reads the menu so one can order. Only the second of these can plausibly be included in RESTAURANTING.</Paragraph> <Paragraph position="2"> Given our algorithm the following will occur.</Paragraph> <Paragraph position="3"> The second line will become something like (IN-LANGUAGE MENU-I CHINESE). Since the statement is not integrated we look to see if there is an ACTION pointer on IN-LANGUAGE. Indeed there is, and it will be to the following rule:</Paragraph> </Section> <Section position="11" start_page="191" end_page="191" type="metho"> <SectionTitle> (READ (MOTIVATIONAL-ACTIVITY) VARS : ... EVENT: (AND (SEE READER READING-MATERIAL) (IN-LANGUAGE READING-MATERIAL LANGUAGE ) (KNOW READER LANGUAGE) ) ENABLES (KNOW-CONTENTS READER READING-MATERIAL) ) </SectionTitle> <Paragraph position="0"> Early on I commented that the only controversial aspect of my representation was the use of very specific predicates (BASKET, AISLE, TELEPHONE, etc) rather than a break down into more primitive concepts. We might, for example; define AISEL as a path which is bounded on each side by things which are considered pieces of furniture (e.g., shelves or chairs). The problem with using a primitive representation here is that while it is somewhat plausible having SUPERMARKET and CHURCH indexd under AISLE, indexing them under PATH or some other component of the primitive definition is much less plausible. However ~ we can circumvent this problem by the use of discrimination nets, just as we did to get EARTHQUAKE from MOVE and EARTH. But, it should be noted that by using this method we are eliminating one of the benefits of a primitive analysis - we can no longer assume that we can get our information in a piecemeal fashion and come out with the same analysis. In particular we must get &quot;aisle&quot;, or else we must get all of its components at the same time. If we do not then the discrimination net will fail to notice that we do not have any old path, we have an AISLE. Given this restriction the primitive and non primitive analyses come out pretty much the same. A primitive decomposition just becomes a long name for a higher level concept. Or to turn this around, the use of high level discriptions is not so controversial after all - it is simply a short name for a primitive decomposition.</Paragraph> <Paragraph position="1"> In effect we are saying here that the typical signficance of something being in a certain language is whether a person can read it or not.</Paragraph> <Paragraph position="2"> This will cause us to activatve the READ frame.</Paragraph> <Paragraph position="3"> Initially there is little else we can do since at this point the we do not even know who is trying to read. However when we try to integrate READ we will be successful, and we will have further bound READER to JACK-1. At this point (and this is the modification required) we should return to READ and note that we can assume he does not know Chineese and hence will not be able to read the menu.</Paragraph> </Section> class="xml-element"></Paper>