File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-0511_metho.xml

Size: 34,071 bytes

Last Modified: 2025-10-06 14:09:52

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0511">
  <Title>Steps Toward Deep Lexical Acquisition</Title>
  <Section position="2" start_page="0" end_page="91" type="metho">
    <SectionTitle>
1 Where We Are Now
</SectionTitle>
    <Paragraph position="0"> The present Machine Learning Paradigm Much of computational linguistics has converged onto a machine learning paradigm that provides us soothing clarity. The machine learning approach defines a problem as a mapping problem - map some acoustic stream onto a list of word tokens, map a list of word tokens onto a parse tree, map a parse tree onto a set of semantic roles or &amp;quot;logical form&amp;quot;, map each word in a tree onto its best sense, and so on.</Paragraph>
    <Paragraph position="1"> We then develop a learning algorithm to accomplish the desired mapping. Multiple groups describe how well their algorithm maps various test sets given various training sets, and describe a &amp;quot;result&amp;quot; to improve upon. The clarity provided by this paradigm is so soothing, one gets the sense we can turn a crank, and indeed, in many cases, progress has been made proceeding precisely along these lines.</Paragraph>
    <Paragraph position="2"> Turning the crank on deep lexical acquisition, however, we might feel something is missing. What is it? Underlying any model of deep lexical acquisition is a theory of the human conceptual apparatus. Unlike our handle on acoustic streams, word lists, and parse trees, our handle on a suitable &amp;quot;output&amp;quot; for the space of word meanings is remarkably poor.</Paragraph>
    <Paragraph position="3"> Somehow, via experience (of some kind or another), children acquire a mapping from a space of vocabulary items to a space of lexicalizable concepts - the lexicon; our task as modelers is to figure out how this mapping can occur. Many models for the space of lexicalizable concepts exist: concepts are points in Rn, concepts are Jackendoff's lexical conceptual structures, concepts are FrameNet's frame elements, concepts are Schankian script activators, concepts are distributions over syntactic frames, concepts are grounded in sensorimotor statistics, or all of the above. Almost everyone nowadays reports how their algorithm accomplished some mapping to one or more of these models of concepts. They have to, because today's de facto idea of what constitutes a &amp;quot;result&amp;quot; according the machine learning paradigm today is to do exactly this.</Paragraph>
    <Paragraph position="4"> The Golden Oldies formed our concept models Our models of conceptual spaces did not originate from computational linguists following the machine learning paradigm. They were proposed from linguists, psychologists and philosophers back in earlier eras - what we will call Golden Oldies when the idea of a &amp;quot;result&amp;quot; was somewhat different. There are too many to recall: Quine (1960) argued that the linguist watching the natives uttering Gavagai! in the context of a rabbit would nec- null essarily require far more constraints than met the eye. Brown (1957) showed that children used syntactic cues to disambiguate between possible meanings; Landau and Gleitman (1985) followed on these insights, showing just how deep it could be, that even blind children could learn look and see, basing their mapping on syntactic constraints. Chomsky's (1965) notion of &amp;quot;deep structure&amp;quot; - proposed to account for commonplace syntactic phenomena - motivated many insights explored in Gruber (1965)'s thesis, Fillmore (1968)'s classical thematic roles, and Jackendoff (1983)'s Lexical conceptual structures. Hale and Keyser and many linguists labored under the MIT Lexicon project in the 1980s to determine the fundamental features of the lexicon; many of these hard-earned observations appear in Levin (1993). Schank (1972)'s Conceptual dependency theory, Minsky (1975)'s Frames were proposed for the broader goals of capturing commonsense knowledge. Quillian's (1968) and Miller et al (1990)'s WordNet were not intended for models of lexical acquisition or databases to be used in computational linguistics but as models of human semantic memory. Many other Golden Oldies exist, and our debt to them is quite large. Ask what motivates our collection of subcategorization statistics or what drives the quest for semantic roles, and the roots are found in the science questions of the Golden Oldies.</Paragraph>
    <Paragraph position="5"> The present Myopic Learning Paradigm It would have been extremely myopic to take any one of these classical results and accuse their authors of not demonstrating a learning algorithm, not evaluating them on large corpora, and not getting together in workshops to share the results on test sets. The standard for what constituted a result back then consisted of none of these things, because today's machine learning paradigm was just not present then.</Paragraph>
    <Paragraph position="6"> The questions were: * Question (1): What is a lexicalizable concept? * Question (2): How can a word-concept mapping be learned from evidence? But for reasons that no one really talks about, somehow, the standard of what constitutes a result changed from some balance of Question (1) and (2) to a machine learning paradigm essentially focused on Question (2). The dependency between Question (1) and (2) is quite well-understood, but do we have an adequate answer to (1)? We tell ourselves: We've gotta build better parsers, speech recognizers, search engines, machine translation systems, so...</Paragraph>
    <Paragraph position="7"> let's take shortcuts on Question (1) so as to make progress on Question (2). For many, that shortcut consists of semantic role labels and learning from frame distributions. These shortcuts don't answer Question (1), unfortunately.</Paragraph>
  </Section>
  <Section position="3" start_page="91" end_page="92" type="metho">
    <SectionTitle>
2 Where We Need to Go
</SectionTitle>
    <Paragraph position="0"> While the Golden Oldies were used as the foundations of today's lexical acquisition, psychology began to sing a new tune, still balancing Questions (1) and (2).</Paragraph>
    <Paragraph position="1"> Children have naive theories Developmental psychology after the Golden Oldies has shown just how deep our &amp;quot;deep lexical acquisition&amp;quot; theories have to be. On this view, word meanings are couched in changing naive theories of how the world works. The model of the child is that the child possesses a naive theory T[?] changing state from T1 to T2, and that there is a space of concepts accessible from T1 that substantively different from the space of concepts accessible from T2. A learner undergoes radical conceptual change. Developmental psychology has not been explicit about the precise form of T[?], nor have they characterized how T[?] relates to lexicalizable concepts. But their contributions inform us about the fundamental ingredients of concepts (Question (1)) and inform us what deep lexical acquisition must consist of (Question (2)).</Paragraph>
    <Paragraph position="2"> A few examples must suffice in place of a review (c.f. Gopnik and Meltzoff (1997)). Keil (1989)'s transformation studies illustrate theory change in the domain of biology. First, children are shown a picture of a skunk; then, are told a story - that the animal received either (A) surgery or (B) a shot in infancy - and then are shown a picture of a raccoon.</Paragraph>
    <Paragraph position="3"> Young preschool children judge that the animal is a raccoon, as if they base their judgements on superficial features. Children between 7 and 9 (T2) on the other hand, judge that the raccoon-looking figure in (A) is still a skunk. Adults (T3) judge that the raccoon-looking figure in both conditions is still a skunk. Apparently, preschoolers' theory T1  lacks the belief that an animal's kind is determined at birth, but this becomes part of the adult's T3.</Paragraph>
    <Paragraph position="4"> Similarly, preschool children at T1 have concept of death involving a belief in a continued existence in an alternate location (like sleep); When asked whether dead people dream, eat, defecate, and move, 4 to 6 year olds will say that dead people do all of these, except move (Slaughter et al, 2001). Missing in T1 are the causes of death (a total breakdown of bodily functions) and that death is an irreversible, inevitable end. Between 4 and 6, children become superficially aware of the general function of various body parts (e.g &amp;quot;You need a heart to live&amp;quot;). Other phenomena serve the same point: the child at T1 thinks uncle means friendly middle-aged man, and at T2 thinks it means parent's brother. The child at T1 thinks island means a beachy territory and at T2 thinks it means body of land surrounded by water (Keil 1989). And, &amp;quot;theory of mind&amp;quot; concepts/words such as belief, desire, wonder, pretend (Wellman and Bartsch 1995, Leslie 2000) are similarly situated. null How &amp;quot;theory-like&amp;quot; T1 and T2 are is subject to considerable debate (diSessa 1993, Leslie 2000).</Paragraph>
    <Paragraph position="5"> disessa (1993) describes a large number of causal &amp;quot;p-prims&amp;quot; that are highly context specific and considerably larger in number than what Carey (1985) describes; these are shown to apply to everyday physical phenomena - &amp;quot;force as mover&amp;quot;, &amp;quot;vaccuums impel&amp;quot;, &amp;quot;overcoming&amp;quot;, &amp;quot;springiness&amp;quot;, &amp;quot;bigger means lower pitch (or slower)&amp;quot;, to name a few. Each of these have a FrameNet-like causal syntax, of some unknown mapping to vocabulary items. Similarly, Rozenblit and Keil (2003) show that non-expert adults have a remarkably superficial notion of how common mechanisms work - such as how a helicopter changes from hovering to forward flight.</Paragraph>
    <Paragraph position="6"> Theories may be suspiciously weak.</Paragraph>
    <Paragraph position="7"> Students have alternative frameworks Educational psychologists have characterized T[?] by asking a different, more practical question: why is it difficult for science students to learn certain scientific concepts (weight, density, force, heat, ...) when they come to class? The broad insight is this: students come to class not as blank slates but with alternative pre-conceptions that must be understood.</Paragraph>
    <Paragraph position="8"> Data on their pre-conceptions yields clues as to contents of T[?], well before they walk into science class. Again, a few examples illustrate the point.</Paragraph>
    <Paragraph position="9"> Many studies on physics misconceptions have observed deeply held views on the motion of projectiles (McCloskey 1983, Halloun and Hestenes 1985). Ask students to predict what happens when a projectile is thrown upward at an angle, and their answers will typically be consistent with one of (a-c) These answers are consistent with an &amp;quot;impetus&amp;quot; theory of motion, where an object's motion is exclusively dominated by whatever &amp;quot;impetus&amp;quot; the thrower provides it. Medieval scientists such as Buridan also held similar beliefs; Newtonian mechanics, of course, shows that the answer is a parabola. disessa (1993) report a wider array of these types of physics misconceptions in a theoretical framework.</Paragraph>
    <Paragraph position="10"> Likewise, ask students for their knowledge of how their eyes work, and they reveal an &amp;quot;extramission&amp;quot; belief: something somehow shoots out from the eye and reaches the objects (Winer et al 2002); they also say that eye is the sole organ in the body responsible for vision. Plato and da Vinci shared these same beliefs. Systematic catalogues of these sorts of observations have been compiled for just about every domain - e.g. megaphones create sounds, heat is a substance, eggs are not alive, the moon and sun are the same size, and so forth (AAAS 1993).</Paragraph>
  </Section>
  <Section position="4" start_page="92" end_page="96" type="metho">
    <SectionTitle>
3 What Steps We Must Take
</SectionTitle>
    <Paragraph position="0"> Consider this fascinating phenomena from the Best of Today and the comfort of the grammar-generatessentence relation will be replaced by queasiness: the terms theory, concept, and change are most unclear, as many developmental psychologists freely admit.</Paragraph>
    <Paragraph position="1"> But computational linguists may contribute significantly to rendering new clarity: If the Golden Oldies drove the efforts on today's shallow lexical acquisition, the Best of Today's Psychology may drive the results of tomorrow's progress in deep lexical acquisition. null  Theory Model of Concepts: necessary for deep lexical acquisition The new framework: Universal Theory We have much progress to make: We can describe naive theories precisely; we can describe how theory acquisition occurs; we can describe the map from naive theories to a set of lexicalizable concepts. We can describe how vocabulary acquisition occurs.</Paragraph>
    <Paragraph position="2"> Figure 1(a) shows the Golden Oldies model of concepts that we must abandon: a Vocabulary Acquisition Device receives a fixed hypothesis space of possible concepts completely determined by a fixed set of primitives; Figure 1(b) shows the Universal Theory Model of Concepts that we must take steps towards: A Theory Acquisition Device (TAD) outputs a state T[?] that describes a learners's naive theory; A Concept Generator G maps T[?] to a set of lexicalizable concepts G(T[?]). A Vocabulary Acquisition Device (VAD) uses G(T[?]) to learn a lexicon. The theory of the TAD states is Universal Theory (UT); a UT metalanguage enables an abstract characterization of possible theories - each possible theory describes a system of kinds, attributes, relations, part-whole relations, and causal mechanisms. Within this Universal Theory Model of Concepts, we can begin to answer the following core questions:  1. what is the initial state of the TAD? 2. what are possible final states of the TAD? 3. how can the TAD change state? 4. how can the TAD use T[?] to parse experience? 5. how does the concept generator G map T[?] onto a set of lexicalizable concepts G(T[?])? 6. how can the VAD use G(T[?])?  We have made progress on these core questions Many of these questions have been addressed already in computational models where a candidate UT metalanguage and theory T[?] is latent. diSessa (1993) catalogs sets of p-prims in naive physics.</Paragraph>
    <Paragraph position="3"> Atran (1995) describes a theory of family structure. Gopnik et al (2004) uses Bayesian networks to model preschooler's causal reasoning about blickets. McClelland and Rogers (2004) describe connectionist models of some of Carey (1985)'s classic results. In my own work, I have been situating the elements of the Universal Theory Model of Concepts in a microgenesis study, where adult subjects undergo a T1 to T2 transition (Niyogi 2005). The transition can be understood with a minimal UT metalanguage needed to characterize a set of possible theories: T[?] is characterized by a interrelated sets of kinds, attributes, relations, and causal laws. T1 and T2 are described in that UT metalanguage, and the simplest concept generator G is described that mechanically maps T1 and T2 onto G(T1) and G(T2). Subjects undergo theory change in a Blocksworld universe (see Figure 2(a)) while learning 3 verbs (gorp, pilk, seb) that refer to the causal mechanisms governing the universe. Subjects interact with a set of 29 blocks, some of which activate other blocks on contact. On activation, subjects are shown a transitive verb frame (&amp;quot;Z is gorping L, &amp;quot;U is sebbing F&amp;quot;, &amp;quot;D is pilking Y&amp;quot;) in a Word Cue Area. Unbeknownst to subjects, each block belongs to 1 of 4 kinds (A, B, C or D) and 3 activation mechanisms exist between them: lawab: As activate Bs, lawc': Cs activate Cs, and lawd: Ds activate Ds; each of the 3 verbs refers to one the 3 mechanisms. Subjects are probed for the naming conditions on each of the 3 verbs.</Paragraph>
    <Paragraph position="4"> Subjects' responses indicate that their TAD state changes from T[?] = T1 (there is 1 kind of block governed by 1 causal mechanism lawq) to T[?] = T2  dropping blocks onto each other. Cues to the meaning of 3 verbs (gorp, pilk and seb) are given in a Word Cue Area. Shown is how two kinds of subjects - T2 Subjects and T1 Subjects - clustered the blocks; the clusters for the kinds A, B, C and D (boxed) are clear for T2 Subjects but no such differentiation is apparent for T1 subjects; (b) When T[?] = T1, all 3 verbs can only be mapped to a single concept in G(T1) = {Q} (dashed arrows); When T[?] = T2, gorp, pilk and seb can be mapped to 3 new concepts AB,Cprime and D in G(T2) (solid arrows).</Paragraph>
    <Paragraph position="5"> (there are 4 kinds of blocks governed by 3 distinct causal mechanisms, lawab, lawc' and lawd). But this is not true for all subjects: some remain &amp;quot;T1 subjects&amp;quot; while others move onto become &amp;quot;T2 subjects&amp;quot;. Critically, when T[?] = T1, the verbs can only be mapped to a single concept in G(T1) = {Q}; When T[?] = T2, the verbs can be mapped to 3 distinct concepts in G(T2) = {AB,Cprime,D} (See Figure 2(b)). Once T[?] = T2, subjects can &amp;quot;parse&amp;quot; the activation and infer the hidden kind and causal mechanism involved. Critically, subjects cannot learn to distinguish the 3 verbs until T[?] = T2, when the 3 new concepts emerge in G(T[?]). Then gorp, pilk and jeb may be mapped onto those 3 new concepts.</Paragraph>
    <Paragraph position="6"> These verbs are thus theory-laden in the same way as death, uncle and island.</Paragraph>
    <Paragraph position="7"> This UT architecture concretely dissolves the Puzzle of Concept Acquisition (Laurence and Margolis 2002): how can a person ever acquire a &amp;quot;new&amp;quot; concept, when a fixed set of primitives exhaustively span the space of possible concepts? Taking the viewpoint of the learner's VAD at a specific moment in time with a specific T[?], it has access to just those concepts in G(T[?]) - acquisition of a new concept is possible if T[?] changes. Taking the viewpoint of the learner's species across all possible times, the species has access to the union of G(T[?]) over all possible TAD states - thus a &amp;quot;new&amp;quot; concept for the species is impossible. Which viewpoint one takes is a matter of perspective. Critically, the Golden Oldies model of concepts does not expose the TAD state revealed in the UT model of concepts (Fig. 1a,b).</Paragraph>
    <Paragraph position="8"> Universal Theory and the Linguistic Analogy Computational linguists can progress on these questions, because naive theories are like grammars. Just as a grammar generates a set of possible sentences, a theory T[?] generates a set of possible worlds. Just as the space of possible grammars is restricted, so is the space of possible theories. Just as learning a grammar consists of picking a point from a space of possible grammars, learning a theory consists of picking a point from the space of possible theories. The task of writing a naive theory is like writing a grammar. The task of characterizing the space of possible theories requires a theory meta-language just as characterizing the space of possible grammars requires a grammar metalanguage.</Paragraph>
    <Paragraph position="9"> Moreover, research into naive theories does not proceed separately from the program of research in grammar. The two programs are bridged by the concept generator G: T[?] generates G(T[?]), a set of lexicalizable concepts. An adequate account of G would generate concepts present in a particular language, for every language, and for every possible T[?].</Paragraph>
    <Paragraph position="10"> Miller et al (1990) distinguish between a constructive and a differential lexicon. In a differential theory of the lexicon, meanings can be represented by any symbols that enable a theorist to distinguish among them; In a constructive theory of the lexicon, the representation should &amp;quot;contain sufficient information to support an accurate construction of the concept (by either a person or a machine)&amp;quot;.</Paragraph>
    <Paragraph position="11">  The conceptual analyst who desires to produce a constructive theory of the lexicon has four kinds of accounts to provide: (see Niyogi 2005) * an explanatory account of the space of possible theories, for all persons P * an explanatory account of the space of possible concepts, for all persons P, for all possible theories * a descriptive account of a specific theory T[?] held by a representative person P (e.g. of a 3-year old or of a 10-year old) * a descriptive account of a specific lexicon L held by a representative person P (e.g. a 3-year old Chinese speaker, 3-year old English speaker, 10-year old Chinese speaker, 10-year old Chinese speaker) We may envision a &amp;quot;theory-based lexicon&amp;quot; that would capture the two key state variables in Figure 1(b), the two descriptive accounts above: (1) T[?] for an idealized human; (2) a set of vocabulary items mapped to points in G(T[?]). Very limited instances of a theory-based lexicon can be constructed already for subjects at the end of the experiment - such a theory-based lexicon has (1) T2 in the UT metalanguage; (2) the mapping in L to G(T2): gorp = AB, pilk = Cprime, seb = D. This constructive theory-based lexicon would be in stark contrast to differential lexicons such as WordNet and FrameNet.</Paragraph>
    <Paragraph position="12"> Grounding language in perception is insufficient Many have proposed deep lexical acquisition by &amp;quot;grounding language in perception&amp;quot; (Siskind 1996, Regier 1996, Roy and Pentland 2002, Yu and Ballard 2004), constructing systems that can learn to utter, e.g. red, banana, hit and triangle in contexts where there are, e.g., three triangles hitting red bananas. Such systems also propose a space of possible concepts exhausted by a fixed set of primitives, as in the Golden Oldies model. The initial state of the TAD (T[?](t = 0)) can explicitly incorporate all these attributes and relations (contact, luminance, ...); but then, the TAD can further change state to yield new kinds, attributes, relations, and causal mechanisms not present in the initial state, but motivated by the data (see Gopnik and Meltzoff 1997).</Paragraph>
    <Paragraph position="13"> As such, vague appeal to grounding is insufficient; associative processes that may work on red, hit, banana, eye, three are extremely challenging to generalize to color, kind, wonder, pilk, seb, telescope, maybe and uninvented groobles that cannot be perceived. Again, developmental psychology provides some insight on what theoretical innovations would be required for a suitable interface to sensorimotor apparatus (c.f. Mandler 2004).</Paragraph>
    <Paragraph position="14"> Commonsense AI gives UT foundations Primitives well beyond the sensory apparatus have been developed to describe physical systems qualitatively (Regier 1975, Forbus 1984). They show us some of the possibilities of what T[?] and candidate UT metalanguages may look like (quantity spaces, kinds, attributes, relations, part-whole relations, and causal mechanisms that interrelate these sets). Regier (1975)'s description of a toilet appears particularly close to Rozenblit and Keil (2003)'s helicopter. Later qualitative AI frameworks of Forbus (1984) and Kuipers (1994) may be applied to McCloskey (1982)'s intuitive physics and disessa's (1993) p-prims. Except for the work of Hobbs, Pustejovsky and their colleagues, few have mapped commonsense theories onto the lexicon.</Paragraph>
    <Paragraph position="15"> Similar domain-general elements of naive math and causality are present in the workds of Hobbs et al (1987), Kennedy and McNally (2002)'s degree representations for gradable predicates, Talmy (1988)'s force dynamics, and the quantity spaces of Kuipers (1994) and Forbus (1984). These disparate frameworks provide foundational elements for a UT metalanguage. null Shortcuts on UT foundations will not work We must resist the urge to take shortcuts on these foundations. Simply creating slots for foundational phenomena will impede progress. Pustejovsky (1995)'s observations for co-composition have clearly illustrated how much flexibility our interpretation systems must have, e.g. in He enjoyed the beer/movie. But specifying the telic role of beer and movie to be drink and watch does not constitute an adequate theory - we require constraints that relate to the state space of the human conceptual apparatus. Pustejovsky (1995)'s telic, formal, constitutive, agentive roles may be mapped onto T[?]'s characterization of artifacts, materials, and so on. We require nothing less than absolute conceptual transparency. null  We must bridge UT to analogy Lakoff and Johnson (1980) and subsequent cognitive linguistics work have catalogued a stunning level of metaphoric usage of language. Lexical extension of items such as illuminate in, e.g. Analogies illuminate us on theory acquisition are couched in terms of conceptual metaphors such as &amp;quot;ideas are light&amp;quot;. Significant steps have been taken to model analogical mapping (c.f. Falkenhainer et al 1989, Bailey et al 1997) and conceptual blending (Fauconnier and Turner 1998). These processes may motivate TAD state changes. In most cases, the the underlying predicates in the source and target domains are ad hocly constructed; a natural source of these predicates may be the sets internal to T[?] (kinds, attributes, relations, causal mechanisms); similarity between domains may be determined by the structural properties of the UT metalanguage and G. If T[?] incorporates the common causal mechanisms behind ideas and light transmission, for example, then one may strive for a shorter lexicon where the vocabulary item illuminate happens to be used in both domains with &amp;quot;one&amp;quot; core entry. An adequate theory of this process would obviously reduce the number of so called &amp;quot;senses&amp;quot; in word sense disambiguation.</Paragraph>
  </Section>
  <Section position="5" start_page="96" end_page="97" type="metho">
    <SectionTitle>
4 What We Assumed Wrong
</SectionTitle>
    <Paragraph position="0"> Modern computational linguistics appears to have made a set of assumptions that deserve reanalysis, given the availability of other options.</Paragraph>
    <Paragraph position="1"> Assumption: A fixed alphabet of meaning components exists, and we know what it is A key assumption dating to the Golden Oldies is that the meaning of a sentence is adequately captured by a &amp;quot;logical form&amp;quot; (LF) characterized by a fixed alphabet of meaning components (e.g. thematic roles, lexical semantic primitives, conceptual dependency primitives). Today's computational linguistics program uses this assumption to demonstrate systems that answer &amp;quot;who did what to whom, where, why, ...&amp;quot; questions, given sentences like: John saw the man with the telescope.</Paragraph>
    <Paragraph position="2"> John hit the man with the umbrella.</Paragraph>
    <Paragraph position="3"> Is the computational linguist is expected to be satisfied when systems can answer Who saw the man with the telescope? or Who did John hit with the umbrella? This year's CoNLL Shared Task, mapping sentences onto semantic roles, assumes the above.</Paragraph>
    <Paragraph position="4"> But try these: Does John have eyes? Were they ever open when he was looking through the telescope? Could John know whether the man was wearing underwear? Did the umbrella move? Did John move? Did the man feel anything when he was hit? Was John alive? Was the man alive? Why would John need a telescope to see the man, when he has eyes? Why would John use an umbrella when his hands would do? Something is missing in these systems.</Paragraph>
    <Paragraph position="5"> We should be more accountable. Developmental psychology showed that theory change and conceptual change is possible, proving this assumption is wrong: the alphabet behind sentence meaning is a varying set of lexicalizable concepts G(T[?]).</Paragraph>
    <Paragraph position="6"> Missing in today's systems attaching AGENT (or FrameNet's Perceiver passive, or Impactor) to John and INSTRUMENT to umbrella and telescope is T[?], and a mapping of the lexical items to G(T[?]). What T[?] must contain, in some as yet unknown form, is a T of physics described by McCloskey and disessa (1993), a T of vision studied by Landau and Gleitman (1985) and Winer et al (2002), a T of body studied by Carey (1985), a T of materials and artifacts studied by Hobbs et al (1987) and Pustejovsky (1995). This T[?], when mapped via G, forms the alphabet of the above 2 sentences.</Paragraph>
    <Paragraph position="7"> Assumption: The machine learning paradigm can treat deep lexical acquisition.</Paragraph>
    <Paragraph position="8"> If we reject the assumption that there is some &amp;quot;meaning&amp;quot; of a sentence spanned by a set of meaning primitives, the soothing clarity of the machine learning paradigm is no longer available. We cannot map parse trees onto sentence meanings. The possibility of &amp;quot;Putting Meaning in Your Trees&amp;quot; (Palmer 2004) completely disappears. We may still use the machine learning paradigm to parse, disambiguate and recognize speech. But these results are of little use to model theory, concept and lexical acquisition, because there is no output representation where a suitable training set could be collected. The human conceptual apparatus is not that simple: the VAD requires G(T[?]) (which changes, as T[?] changes), and for that we need explanatory accounts of UT and G, and must recognize the diverse ways the TAD may change state.</Paragraph>
    <Paragraph position="9">  Assumption: Paths from shallow to deep lexical acquisition exist The Golden Oldies Models of concepts (Figure 1a) and the Universal Theory models of concepts (Figure 1b) are incommensurable. The path from the shallow to the deep cannot be declared to exist by fiat. Wishful thinking is inappropriate, because one architecture is more powerful than the other: the Golden Oldies model did not expose the TAD state space. Instead, lexical semantics results obtained under the Golden Oldies model require translation into the UT model: the privileged position syntactic positions that motivated thematic roles and lexical semantics primitives, the bi-partite event structure revealed through adverbial modification, and so on.</Paragraph>
    <Paragraph position="10"> This translation is mediated in G, and will not yield a notational variant of what we started with.</Paragraph>
    <Paragraph position="11"> Assumption: Verb classes determine meanings We must distinguish between a representation of verb meanings determined by the distribution of subcategorization frames and cued by these frames.</Paragraph>
    <Paragraph position="12"> Landau and Gleitman (1990) showed that verb's participation in some frames but not others are cues that a child uses to constrain verb meaning. Levin and Rappaport-Hovav (1998) explicitly distinguish structural and idiosyncratic components of meaning. But neither claim that verb classes or statistical distributions of subcategorization frames determine verb meaning. Yet VerbNet maps verbs to predicates in precisely this way: (Kingsbury et al 2002).</Paragraph>
    <Paragraph position="13"> cure, rob, ...: Verbs of Inalienable Possession cause(Agent,E) location(start(E),Theme,Source) marry, divorce, ...: Verbs of Social Interaction social interaction(...) The distinction between cure and rob, or between marry and divorce is not astonishing to the English speaker. Causal mechanisms behind disease, possession, and the marital practices that were labeled idiosyncratic by the lexical semanticist must be captured in T[?].</Paragraph>
    <Paragraph position="14"> Assumption: Language is separate from general systems of knowledge and belief This &amp;quot;defining&amp;quot; assumption helped for the Golden Oldies, but innovations in developmental psychology motivate dropping this assumption. The bridge is provided by the concept generator G: it maps a naive theory T[?] (general systems of knowledge and belief) to G(T[?]), used by the VAD (language).</Paragraph>
    <Paragraph position="15"> Assumption: Real-world knowledge is Bad The absence of the soothing clarity of the machine learning paradigm and presence of real world knowledge in T[?] brings forth 2 associations: Early Schank/Cyc = Much Knowledge = UT research = Bad Statistics = Little Knowledge = shallow semantics = Good The associations lead to the inference that Universal Theory research will suffer a similar fate as the 70s Schankian program and the Cyc program (Schank 1972, Lenat and Guha 1990). However, this inference is incorrect. The 70s Schankian program and Cyc efforts did not carefully consider the constraints of syntactic phenomena or developmental psychology. Schank and his colleagues stimulated research in qualitative physics and explanation-based learning that addressed many of these deficiencies, but there is much work to be done to bridge today's efforts in deep lexical acquisition to this.</Paragraph>
    <Paragraph position="16"> Assumption: Others will provide us the answers Lexical semanticists now rely on cognitive explanations far more heavily than ever before. Jackendoff (2002) concludes: &amp;quot;someone has to study all these subtle frameworks of meaning - so why not linguists?&amp;quot; Levin and Rappaport-Hovav (2003), addressing denominal verbs such as mop and butter, now freely point to &amp;quot;general cognitive principles&amp;quot; rather than situate knowledge in the lexicon. Rather than consume lexical semantics of the Golden Oldies, we can draw upon our toolbox to again answer Question (1): &amp;quot;what is a lexicalizable concept?&amp;quot;</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML