File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/78/t78-1025_abstr.xml
Size: 12,295 bytes
Last Modified: 2025-10-06 13:45:50
<?xml version="1.0" standalone="yes"?> <Paper uid="T78-1025"> <Title>SEMANTIC PRIMITIVES IN LANGUAGE AND VISION</Title> <Section position="1" start_page="0" end_page="182" type="abstr"> <SectionTitle> SEMANTIC PRIMITIVES IN LANGUAGE AND VISION </SectionTitle> <Paragraph position="0"> Colchester, England.</Paragraph> <Paragraph position="1"> The purpose of this brief note is to argue that, whatever the justification of semantic primitlves for language understanding may be \[see Wilks 1977\] there is no reason to believe that it relates to vision in any strong sense.</Paragraph> <Paragraph position="2"> By &quot;semantic primitives&quot; I mean the general sort of item proposed within Artificial Intelligence (AI) by Wllks (1972, 1977), Schank (1973) and within linguistics by Fodor and Katz (1963), Jackendoff (1975) among many others, in both cases. The generality of these items is essential to my argument, and I shall not count as semantic primitives items used for special tasks, whether or not those tasks are related to vision, as are the visual description primitives of Johnson-Laird (1977).</Paragraph> <Paragraph position="3"> Spatial versus visual What follows is highly naive and speculative: it will rest largely upon the opposition of linguistic knowledge to spatial and visual knowledge respectively. I take it for granted that the latter are not necessarily connected, and so to establish that ~e need spatial knowledge to understand language (to name a task at random) does not establish that we need visual knowledge. The lack of necessary connexion is shown by such hackneyed examples as the person blind from birth, who has no visual, but a great deal of spatial, knowledge.</Paragraph> <Paragraph position="4"> One initial reason for distinguishing the two is the great deal of argumentation in linguistics in recent years that falls under the general heading Iocalism. This thrust of argumentation has sought to establish the central role of spatial concepts in linguistics, and among its best known proponents are Anderson (1971), Fillmore (1977) and Jackendoff (1975). One stand in this view is to argue that temporal expressiomare in general reducable, in some sense, to spatial ones: that in ten minutes (a time expression) is dependent on the spatial sense of such forms as in five miles. This is a very difficult and general debate: there is contrary evidence from cultures where space is indicated by time (The airport is i.S about ten mlnutes a~az), and there is a strong philosophical tradition, centred round Kant, that our sense of time is logically prior to our sense of space. That Is to say, we could conceive of structuring our experience without the concept of space, but not without that of time because, if we could not know that one event preceded another, then we could probably not know anything at all; not even mathematics if that consists at bottom in sequences of operations. Michotte's famous experiments on the willingness of subjects to attach the word cause to moving pictures of pairs of &quot;striking billiard balls&quot; is sometimes cited as providing a visual basis for causality (Clarke & Clark 1977), although the notion of causality may well in fact make no sense without the concept of time. We could assert (wrongly, as it happens) that lightening causes thunder without the aid of a spatial concept, but not without a temporal one.</Paragraph> <Paragraph position="5"> The logical or linguistic priqrity of space to time is by no means a settled matter, and neither therefore is the thesis of localism. I have argued that the role of the visual in language is not necessarily supported by the need for spatial knowledge, and so the status of the latter need not be discussed. Nonetheless, I have questioned the self-evidential truth of Iocalism, just in case anyone should think that, if it were true, it would support the centrality of visual knowledge in language understanding.</Paragraph> <Paragraph position="6"> Let us now, as the brief substance of this paper, look at three arguments that might be put forward to support the dependence, or interdependence, of linguistic and visual knowledge. Evolutionary ar~umants This comes in phylogenetic and ontogenetlc forms. The former is the ingenious argument (Gregory 1970) that, since the human race has been able to see for many times more millenia than it has been able to speak or write, then it might seem reasonable to believe, on evolutionary grounds that the brain &quot;took over&quot; the existing visual structures for language understanding and production. This argument may well be true, but at present there is no independent evidence that would count for or against it.</Paragraph> <Paragraph position="7"> The &quot;ontogenetic form&quot; of the argument - in the individual, that is - is that one first learns words essentially through the visual channel, and so again our linguistic knowledge is essentially dependent upon visual criteria and experience.</Paragraph> <Paragraph position="8"> The best quick answer is to turn to the sort of word often used as a semantic primitive in AI language understanding systems: STUFF (=substance), ATRANS (=changing the ownership of an entity), CAUSE (=preceding and necessitating an event).</Paragraph> <Paragraph position="9"> It is highly dubious that such very general concepts are, or can be, taught by visual/ ostensive methods. Can one point at substance as such? One may want, or mean, to, but can one in fact reliably do so? One structure for many purposes This is a widespread view in AI that has been argued for explicitly by Hinsky (1975) and Rieger (1976), among others. Roughly speaking, it is that implemented systems should use a single knowledge structure for a range of purposes: language understanding, problem solving, etc.</Paragraph> <Paragraph position="10"> It is an additional assumption that human beings do function in this way.</Paragraph> <Paragraph position="11"> The thesis can be expressed at many levels, and at a sufficiently general level it is almost certainly true. But it might then mean no more than that a single programming language could express general sub-routlnes for parsing, noise reduction etc. for a number of input channels.</Paragraph> <Paragraph position="12"> At a more specific level was the thesis, not now widely supported, that language and vision in some sense shared the same &quot;grammar&quot;, in the sense of Chomsky's transformational grammar (Clowes 1972). Striking evidence from the parallelism between visual and linguistic ambiguity was found, and the fact that Chomsky's grammars no longer seem such plausible candidates for such a role does not mean that the thesis itself is false at that level.</Paragraph> <Paragraph position="13"> Let us concentrate for a moment on two more specific levels. First, consider the well-known contrast between such sentences as: The paper moved The dog moved Linguists who differ about much else would want to ascribe a notion of agency to the subject of the second sentence but not the first. Hany in AI working on natural language would agree, and add that the notion of agency is essential if other important inferences are to be made. But, surely no one would argue that agency is, in any useful sense ascribed a visual criteria, that could be reduced to the visual differences of paper and dogs. It is in fact a complex theoretical notion dependent upon our beliefs and theories about the world: we do not now attribute agency to trees, though some fellow humans do. But this difference is a theoretical (including linguistic) one, not one of difference of visual perception.</Paragraph> <Paragraph position="14"> Secondly, we may return to general semantic primitives of the sort already mentioned (and similar inventories may be found in (Bierwisch 1970) and (Leech 1974)).</Paragraph> <Paragraph position="15"> There are many possible ways in which one might seek to justify such primitives (see Wilks 1977), and Bierwisch (1970) has gone on record as saying that they do denote, and are to that extent dependent upon visually observable entities. I suggested above that that may not be so: one may point at treacle, water or elephant meats but it is not so clear one can point at SUBSTANCE, yet this notion has a role to play in language understanding for how, without it, can one economically express such axioms as &quot;A quantity of a substance plus a quantlt~of it yield a quantity 3 of it&quot;. This axiom is not true of physical objects, as distinct from substances. A well-known confusion must be avoided here: it may well be true, as the model theoretic semanticists like Montague claim, that any contentful notion, primitive or not, refers to a function of sets. In that sense move might be said to refer to a set of entities that move.</Paragraph> <Paragraph position="16"> However, this point about logical reference has no consequences for the point about whether or not such primitives denote entities in the real world.</Paragraph> <Paragraph position="17"> Visual and spatial imagery Finally, it is sometimes argued that the structures underlying language must depend upon those underlying vision if only because natural language is so full of visual imagery. In whatever sense &quot;visual imagery&quot; is taken, this fact is, I believe, irrelevant to any precise assertion under discussion, by which I mean any of I) Language understanding processes in humans depend, either as to primitive elements or structure, on visual experience and the mechanisms that interpret it.</Paragraph> <Paragraph position="18"> II) The specification of language in humans has no significant overlap, in terms of primitive elements or structure, with that of other faculties, like vision.</Paragraph> <Paragraph position="19"> III) Visual processes in humans depend, either as to elements or structure, on linguistic experience and the mechanisms that interpret and produce (sic) it.</Paragraph> <Paragraph position="20"> For all three theses only anecdotal evidence is available, though I would be strengthened by empirical evidence that the blind from birth were less able to understand the use of~sual imagery in language. Those with a predellction for motor theories should be tempted to consider the Whorfian thesis III (Whorl, remember, believed we might perceive, say lightning, as an entity, rather than an activity or process because we denoted it by a member of the theoretical category NOUN, rather than VERB) since, as the structural difference of I and III makes clear, language is an activity in a way vision is not.</Paragraph> <Paragraph position="21"> Thesis II will be agreeable to those who are impressed by the way in which confusion can arise when one tries to bring together information on the same topic, but obtained via different channels. As when one refers to two cities whose mutual relation of position one knows from a map; between which one can drive &quot;without thinking&quot;; and also about both of which one has a great deal of textual/factual information. Readers of (Fillmore 1977) will fecal |his attempt to describe the relation of a text-based frame and an experientially-based scene to the same ,, situation. I think AI workers at this particular interface could profit from considering the extent to which such possible inconsistencies can be matters of theory rather than superficial fact: an observer who is asked whether two sides of a long railway line meet at the furthest point he can see will give an answer not independent of of his abstract (possibly linguistically based) theory of parallel lines.</Paragraph> <Paragraph position="22"> In conclusion, this note has tried to do no more than ward off certain confusions, and to stress how many points of view are still open, stnce the evidence for and against them is no more than anecdotal, even when the anecdotes come from Psychology labs. The choice between theses 1/11/111 is a metaphysical one, in the more redblooded sense of that overtired word: it cannot be made on empirical grounds now, but it can have important practical consequences about where one chooses to look for answers.</Paragraph> </Section> class="xml-element"></Paper>