File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-0507_metho.xml
Size: 62,119 bytes
Last Modified: 2025-10-06 14:09:51
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-0507"> <Title>Item-based Constructions and the Logical Problem</Title> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> 1. The Logical Problem </SectionTitle> <Paragraph position="0"> Chomsky (1957, 1980) has argued that the child's acquisition of grammar is 'hopelessly underdetermined by the fragmentary evidence available.' He attributed this indeterminacy to two major sources.</Paragraph> <Paragraph position="1"> The first is the degenerate nature of the input. According to Chomsky, the sentences heard by the child are so ful of retracing, error, and incompletion that they provide no clear indication of the posible sentences of the language. Coupled with this problem of input degeneracy is the problem of unavailability of negative evidence. According to this view, children have a hard time knowing which forms of their language are acceptable and which are unacceptable, because parents fail to provide consistent evidence regarding the ungrammaticality of unacceptable sentences. Worse stil, when such evidence is provided, children appear to ignore it.</Paragraph> <Paragraph position="2"> Chomsky's (1957) views about the degeneracy of the input did not stand up well to the test of time. As Newport, Gleitman & Gleitman (197) reported, 'the speech of mothers to children is unswervingly well-formed.' More recently, Sagae, Lavie & MacWhiney (204) examined several of the corpora in the CHILDES database and found that adult input to children can be parsed with an accuracy level parallel to that for corpora such as the Wall Street Journal database.</Paragraph> <Paragraph position="3"> This evidence for well formedness of the input did not lead to the colapse of the 'argument from poverty of stimulus' (APS). However, it did place increased weight on the remaining claims regarding the absence of relevant evidence. The overall claim is that, given the absence of appropriate positive and negative evidence, no child can acquire language without guidance from a rich set of species-specific inate hypotheses. Some refer to the argument from poverty of stimulus as the 'logical problem of language acquisition (Baker, 1979), while others have called it 'Plato's Problem,'</Paragraph> </Section> <Section position="3" start_page="0" end_page="53" type="metho"> <SectionTitle> 2. Absence of Negative Evidence </SectionTitle> <Paragraph position="0"> In the 1970s, generativist analyses of learnability (Wexler & Hamburger, 1973) relied primarily on an analysis presented by Gold (1967). Gold's analysis contrasted two different language-learning situations: text presentation and informant presentation. With informant presentation, the language learner can receive feedback from an infallible informant regarding the grammaticality of every candidate sentence. This corrective feedback is called 'negative evidence' and it only requires that ungrammatical strings be clearly identified as unacceptable. Whenever the learner formulates an overly general guess about some particular linguistic structure, the informant wil label the resulting structure as ungrammatical and the learner wil use this information to restrict the developing grammar. Based on initial empirical results reported by Brown & Hanlon (1970), Gold argued that negative evidence is not available to the child and that language learning cannot be based on informant presentation.</Paragraph> <Paragraph position="1"> Marcus (193) has argued that the feedback that parents provide does not discriminate consistently between grammatical and ungrammatical constructions. As a result, children cannot rely on simple, overt negative evidence for recovery from overgeneralization. Although I wil argue that parents provide positive evidence in a form that solves the logical problem (Bohannon et al., 190), I agree with the observation that this evidence does not constitute overt grammatical corection of the type envisioned by Gold.</Paragraph> </Section> <Section position="4" start_page="53" end_page="57" type="metho"> <SectionTitle> 3. Absence of Positive Evidence </SectionTitle> <Paragraph position="0"> Begining about 1980, generative analyses of learnability began to shift away from an emphasis on the unavailability of negative evidence to arguments based on the unavailability of positive evidence. This conceptual shift led to a relative decline in attention to recovery from overgeneralization and an increase in atention to reported cases of error-free learning. For example, Chomsky's (1980) statement of the logical problem relies on the notion of error-free learning without positive evidence. The argumentation here is that, if a structure is never encountered in the input, correct use of this structure would have to indicate inate knowledge.</Paragraph> <Paragraph position="1"> Researchers have claimed that the child produces error-free learning without receiving positive evidence for structures such as: structural dependency, c-command, the binding conditions, subjacency, negative polarity items, that-trace deletion, nominal compound formation, control, auxiliary phrase ordering, and the empty category principle.</Paragraph> <Paragraph position="2"> In each of these cases, it is necessary to assume that the underlying universal is a part of the nonparameterized core of universal grammar. If the dimension involved were parameterized, there would be a need for some form of very early parameter seting (Wexler, 198), which could itself introduce some error. Thus, we would expect error-free learning to occur primarily for those aspects of the grammar that are completely universal and not parameterized. Parameterized features, such as subject pro-drop, could stil be guided by universal gramar. However, their learning would not necesarily be error-free.</Paragraph> <Section position="1" start_page="53" end_page="55" type="sub_section"> <SectionTitle> 3.1. Structural dependency </SectionTitle> <Paragraph position="0"> The paradigm case of error-free learning is the child's obedience to the Structural Dependency condition, as outlined by Chomsky in his formal discusion with Jean Piaget (Piattelli-Palmarini, 1980). Chomsky notes that children learn early on to move the auxiliary to initial position in questions, such as 'Is the man coming?' One formulation of this rule is that it stipulates the movement of the first auxiliary to initial position. This formulation would be based on surface order, rather than structural relations. However, if children want to question the proposition given in (1), they wil never produce a movement such as (2). Instead, they wil always produce (3).</Paragraph> <Paragraph position="1"> 1. The man who is runing is coming.</Paragraph> <Paragraph position="2"> 2. Is the man who _ runing is coming? 3. Is the man who is runing _ coming?' In order to produce (3), children must be basing the movement on structure, rather than surface order. Thus, according to Chomsky, they must be innately guided to formulate rules in terms of structure. null In the theory of barriers (Chomsky, 1986), the repositioning of the auxiliary in the tree and then in surface structure involves a movement of INFL to COMP that is subject to the head movement constraint. In (2) the auxiliary would need to move around the N' of 'man' and the CP and COMP of the relative clause, but this movement would be blocked by the head movement constraint (HMC).</Paragraph> <Paragraph position="3"> No such barriers exist in the main clause. In addition, if the auxiliary moves as in (2), it leaves a gap that wil violate the empty category principle (ECP). Chomsky's discusion with Piaget does not rely on these details. Chomsky simply argues that the child has to realize that phrasal structure is somehow involved in this process and that one cannot formulate the rule of auxiliary movement as 'move the first auxiliary to the front.' Chomsky claims that, 'A person might go through much or all of his life without ever having been exposed to relevant evidence, but he wil nevertheless unerringly employ the structuredependent generalization, on the first relevant occasion.' A more general statement of this type provided by Hornstein & Lightfoot (1981) who claim that, 'People attain knowledge of the structure of their language for which no evidence is available in the data to which they are exposed as children.' In order to evaluate these claims empirically, we need to know when children first produce such sentences and whether they have been exposed to relevant examples in the input prior to this time. In searching for instances of relevant input as well as first uses, we should include two types of sentences. First, we want to include sentences such as (3) in which the moved verb was a copula in the relative clause, as well as sentences with auxiliaries in both positions, such as 'Wil the boy who is wearing a Yankee's cap step forward?' The auxiliaries do not have to be lexically identical, since Chomsky's argument from poverty of stimulus would also apply to a child who was learning the movement rule on the basis of lexical class, as opposed to surface lexical form.</Paragraph> <Paragraph position="4"> Examining the TreeBank structures for the Wall Street Journal in the Penn TreeBank, Pulum & Scholz (Pulum & Scholz, 202) estimate that adult corpora contain up to 1% of such sentences. However, the presence of such structures in formal writen English says litle about their presence in the input to the language-learning child. A search by Lewis & Elman (201) of the input to English-speaking children in the CHILDES database (MacWhiney, 200) turned up only one case of this structure out of approximately 3 milion uterances. Since CHILDES includes god sampling of target children up to age 5;0, we can safely say that positive evidence for this particular structure is seldom encountered in the language addressed to children younger than 5;0.</Paragraph> <Paragraph position="5"> Because children do not produce sentences of this type themselves, it is difficult to use production data to demonstrate the presence of the constraint. Crain & Nakayama (1987) attempted to get around this problem by eliciting these forms from children directly. They asked children (3;2 to 5;1) to, 'Ask Jabba if the boy who is watching Mickey is happy.' Children responded with a variety of structures, none of which involved the movement of the auxiliary from the relative clause.</Paragraph> <Paragraph position="6"> Unfortunately, this elicitation procedure encourages children to treat the relative clause ('the boy who is watching Mickey') as an imitated chunk.</Paragraph> <Paragraph position="7"> Despite the serious methodological limitation in this particular study, it seems reasonable to believe that four-year-old children are begining to behave in accordance with the Structural Dependency condition for sentences like (2) and (3). But does this mean that they reach this point without learning? null There is another type of sentence that provides equally useful positive evidence regarding auxiliary movement. These are wh-questions with embedded relative clauses. It turns out that there are hundreds of input sentences of this type in the CHILDES corpus. Most of these have the form of (4), but some take the form of (5).</Paragraph> <Paragraph position="8"> 4. Where is the dog that you like? 5. hich is the dog that is clawing at the dor? In (5) the child receives clear information demonstrating that moved auxiliaries derive from the main clause and not the relative clause. Using evidence of the type provided in (4), the child simply learns that moved auxiliaries and the wh-words that accompany them are arguments of the verb of the main clause. Sentences like (4) and (5) are highly frequent in the input to children and both types instruct the child in the same correct generalization. null Based on evidence from the main clause, the child could formulate the rule as a placement after the wh-word of the auxiliary that is conceptually related to the verb being questioned. In other words, it is an attachment to the wh-word of an argument of the main verb. This is a complex application of the process of item-based construction generation proposed in MacWhiney (1975, 1982).</Paragraph> <Paragraph position="9"> This formulation does not rely on barriers, ECP, HCP, INFL, COMP, or movement. It does rely on the notion of argument structure, but only as it emerges from the application of item-based constructions. Given this formulation, a few simple yes-no questions would be enough to demonstrate the patern. When children hear 'is the baby happy' they can learn that the initial copula auxiliary 'is' takes a subject argument in the next slot and a predicate argument in the folowing slot.</Paragraph> <Paragraph position="10"> They wil learn similar frames for each of the other fronted auxiliaries. When they then encounter sen- null tences such as (11) and (12), they wil further elaborate the item-based auxiliary frames to allow for positioning of the initial wh-words and for attachment of the auxiliaries to these wh-words.</Paragraph> <Paragraph position="11"> One might argue that this learning scenario amounts to a restatement of Chomsky's claim, since it requires the child to pay attention to relational patterns, rather than serial order as calculated from the begining of the sentence. However, if the substance of Chomsky's claim is that children learn to fil argument slots with compound constituents, then his analysis seems indistinguishable from that of MacWhiney (1975; 1987a).</Paragraph> </Section> <Section position="2" start_page="55" end_page="55" type="sub_section"> <SectionTitle> 3.2 Auxiliary phrases </SectionTitle> <Paragraph position="0"> Kimball (1973) presented perhaps the first example of a learnability problem based on poverty of positive evidence. He noted that children are exposed to scores of sentences with zero, one, or two auxiliaries as in (6)-(13). However, his searches of a milion sentences in early machine-readable corpora located not a single instance of a structure such as (13).</Paragraph> <Paragraph position="1"> 6. It rains.</Paragraph> <Paragraph position="2"> 7. It may rain.</Paragraph> <Paragraph position="3"> 8. It may have rained.</Paragraph> <Paragraph position="4"> 9. It may be raining.</Paragraph> <Paragraph position="5"> 10. It has rained.</Paragraph> <Paragraph position="6"> 1. It has been raining.</Paragraph> <Paragraph position="7"> 12. It is raining.</Paragraph> <Paragraph position="8"> 13. It may have been raining.</Paragraph> <Paragraph position="9"> Kimball argued that, despite the absence of positive data for (13), children are stil able to infer its grammaticality from the data in (6) to (12). He tok this as evidence that children have inate knowledge of structural compositionality. The empirical problem with Kimball's analysis is that sentences like (13) are not nearly as rare as his corpus analysis sugests. My search of the CHILDES database for the string 'might have been' located 27 instances in roughly 3 milion sentences. In addition there were 24 cases of 'could have been', 15 cases of 'should have been', and 70 cases of 'would have been.' Thus, there seems to be litle shortage of positive evidence for the direct learning of this pattern. Perhaps Kimball's findings to the contrary arose from focusing exclusively on 'may', since a search for 'may have been' turned up only 5 cases.</Paragraph> </Section> <Section position="3" start_page="55" end_page="56" type="sub_section"> <SectionTitle> 3.3 The complex-NP constraint </SectionTitle> <Paragraph position="0"> The complex-NP constraint blocks movement of a noun from a relative clause, as in (14) and (15).</Paragraph> <Paragraph position="1"> 14. *Who did John believe the man that kised _ arrived 15. Who did John believe _ kised his budy? This same constraint also blocks movement from prepositional phrases and other complex NPs, as in (16) - (18): 16. *Who did pictures of __ surprise you? 17. *hat did you see a happy __ ? 18. *What did you stand between the wall and __ ? The constraint in (18) has also been treated as the coordinated-NP constraint in some accounts. Although it appears that most children obey these constraints, there are some exceptions. Wilson & Peters (198) list these violations of the complex NP constraint from Wilson's son Seth between the ages of 3;0 and 5;0.</Paragraph> <Paragraph position="2"> 19. What am I cooking on a hot _ ? (stove) 20. hat are we gona lok for some _ ? (houses) 21. What is this a funy _ , Dad? 2. hat are we gona push number _ ? (9) 23. Where did you pin this on my _ ? (robe) 24. hat are you shaking all the _ ? (bater and milk) 25. What is this medicine for my _ ? (cold) These seven violations all involve separation of a noun from its modifiers. Two other examples, illustrate violation of the complex-NP constraint in other environments: 26. What did I get lost at the _ , Dad? 27. hat are we gona go at Auntie and _ ? Here, the prohibited raising involves prepositional phrases and a conjoined noun phrase. Violations of the latter type are particularly rare, but stil do occur ocasionally.</Paragraph> <Paragraph position="3"> One might object that a theory of universal grammar should not be rejected on the basis of a few violations from a single child. However, other observers have reported similar errors. In the recordings from my sons Ros and Mark, I observed a few such violations. One occurred when my son Mark (at 5;4.4) said, 'Dad, next time when it's Indian Guides and my birthday, what do you think a picture of __ should be on my cake?' Catherine Snow reports that at age 10;10, her son Nathaniel said, 'I have a fever, but I don't want to said, 'I have a fever, but I don't want to be taken a temperature of.' Most researchers would agree that violations of the complex-NP constraint are rare, but certainly not nonexistent. At the same time, the structures or meanings that might triger these violations are also very rare, as is the input that would tell the child how to handle these structures. Given this, it seems to me that these patterns cannot reasonably be described as cases of error-free learning. Instead, we should treat them as instances of 'lowerror constructions.' In this regard, they resemble errors such as stative progressives ('I am knowing') and double-object violations ('He recommended the library the bok'). As son as we shift from error-free learning to low-error learning, we need to apply a very different form of analysis, since we now have to explain how children recover from making these overgeneralization errors, once they have produced them. This then induces us to again focus on the availability of negative evidence. null Of course, we could assume that the violation of the complex-NP constraint was a transient performance error and that, once the relevant performance factors are eliminated, the constraints of UG operate to block further wh-raising from complex noun phrases. But the important point here is that we now need to consider specific mechanisms for allowing for recovery from overgeneralization, even for what have been offered as the clearest cases of the application of universal constraints.</Paragraph> </Section> <Section position="4" start_page="56" end_page="57" type="sub_section"> <SectionTitle> 3.4 Binding conditions </SectionTitle> <Paragraph position="0"> Binding theory (Chomsky, 1981) offers three proposed universal conditions on the binding of pronouns and reflexives to referents. Sentence (28) ilustrates two of the constraints. In (28), 'he' cannot be coreferential with 'Bil' because 'Bil' does not c-command the pronoun. At the same time, 'himself' must be coreferential with 'Bil' because it is a clausemate and does c-command 'Bil.' 28. He said that Bil hurt himself.</Paragraph> <Paragraph position="1"> When attempting to relate the logical problem to the study of the binding constraints, it is important to remember that the sentences produced or interpreted are fuly grammatical. However, the interpretation in which the pronoun is coreferential with the ful NP is disallowed by the binding principles.</Paragraph> <Paragraph position="2"> This means that, to study the imposition of the constraints, researchers must rely on comprehension studies, often with very young children.</Paragraph> <Paragraph position="3"> It is well known that children often fail to aply these principles, even in carefuly controled experiments (O'Grady, 197). Various accounts have been offered to reconcile these facts with the supposed universality of the constraint. However, one posibility that has seldom been explored is the idea that the binding conditions are learned on the basis of positive data. To ilustrate the role that learning can play in this area, consider a study of long-distance movement of adjuncts by De Viliers, Roeper & Vainika (De Viliers et al., 190). Children were divided into two age groups: 3;7 to 5;0 and 5;1 to 6;1. They were given sentences such as: 29. When did the boy say he hurt himself? 30. hen did the boy say how he hurt himself? null 31. Who did the boy ask what to throw? For (29), 4% of the children gave long distance interpretations, associating 'when' with 'hurt himself', rather than 'say.' For (30), with a medial wh-phrase blocking a long-distance interpretation, only 6% gave long-distance responses. This shows that children were sensitive to the conditions on traces, in accord with P&P (Chomsky & Lasnik, 193) theory. However, the fact that sensitivity to this contrast increases markedly acros the two age groups indicates that children are learning this pattern. In the youngest group, children had trouble even understanding sentences with medial arguments like (31). The fact that this ability improves over time again points to learning of the posible interpretations of these structures.</Paragraph> <Paragraph position="4"> Children can learn to interpret these sentences correctly by applying conservative learning principles that rely on positive data. First, they learn short-distance interpretations that attach the wh-word to the main clause. Then, when they hear sentences with medial &quot;how&quot; they add the additional posibility of the long-distance interpretation. However, they do this in a conservative item-based manner, limiting the new interpretation to sentences like (30) with medial &quot;how.&quot; P&P theory can also provide an account of this development in terms of the setting of parameters.</Paragraph> <Paragraph position="5"> First, children must realize that their language allows movement, unlike Chinese. Next they must decide whether the movement can be local, as in German, or both local and distant as in English.</Paragraph> <Paragraph position="6"> Finally, they must decide whether the movement is indexed by pronouns, traces, or both. However, once a parameter-setting account is detailed in a way that requires careful attention to complex cue patterns over time (Butery, 204; Sakas & Fodor, 201), it can be difficult to distinguish it from a learning account. Using positive evidence, children can first learn that some movement can occur.</Paragraph> <Paragraph position="7"> Next, they can learn to move locally and finally they can acquire the cues to linking the moved argument to its original argument position, one by one.</Paragraph> </Section> <Section position="5" start_page="57" end_page="57" type="sub_section"> <SectionTitle> 3.5 Learnability or learning? </SectionTitle> <Paragraph position="0"> What have we learned from our examination of these four examples? First, we have seen that the application of universal constraints is not errorfree. This is particularly true in the case of the binding conditions. Because the binding conditions involve parameter setting, it is perhaps not surprising that we see errors in this domain. However, we also find errors in the application of the nonparameterized constraint against raising from complex noun phrases. Only in the case of the structural dependency condition do we find no errors.</Paragraph> <Paragraph position="1"> However, for that structure there is also no usage at all by either parents or children, unless we consider attachment of auxiliaries to wh-words, which is quite frequent. It is posible that error-free learning exists in various other corners of syntactic, semantic, or lexical learning. But there is no evidence that error-free learning occurs in association with an absence of positive evidence. This is the crucial association that has been claimed in the literature and it is the one that we have shown to be false.</Paragraph> <Paragraph position="2"> Second, for each of the four learnability problems we examined, we have seen that there are effective learning methods based on available positive evidence. This learning involves mechanisms of conservative, item-based learning folowed by later generalization.</Paragraph> </Section> </Section> <Section position="5" start_page="57" end_page="64" type="metho"> <SectionTitle> 4. Multiple Solutions </SectionTitle> <Paragraph position="0"> Having now briefly surveyed the role of the logical problem in generative theory, we turn next to a consideration of seven factors that, operating together, allow the child to solve the logical problem.</Paragraph> <Paragraph position="1"> Of these seven factors, the first two are simply formal considerations that help us understand the scope of the problem. The last five are processes that can actually guide the child during acquisition.</Paragraph> <Section position="1" start_page="57" end_page="57" type="sub_section"> <SectionTitle> 4.1 Limiting the class of grammars </SectionTitle> <Paragraph position="0"> The first solution to the logical problem adresses the Gold analysis directly by showing how language can be generated from finite-state gramars (Reich, 1969). For example, Hauser (199) has developed an efficient parser for left-associative grammars. He has shown that left-associative grammar can be expressed as a finite automaton that orders words in terms of part-of-speech categories. Because we know that finite automata can be identified from positive evidence (Hopcroft & Ulman, 1979), this means that children should be able to learn left-associative gramars directly without trigering a logical problem. Given the fact that these grammars can parse sentences in a time-linear and psycholinguistically plausible fashion, they would seem to be excellent candidates for further exploration by child language researchers.</Paragraph> <Paragraph position="1"> A formal solution to the logical problem also arises in the context of the theory of categorical grammar. Kanazawa (198) shows that a particular class of categorial grammars known as the k-valued grammars can be learned on positive data.</Paragraph> <Paragraph position="2"> Moreover, he shows that most of the customary versions of categorial grammar discused in the linguistic literature can be included in this k-valued class. Shinohara (194) and Jain, Osherson, Royer & Sharma (199) examine stil further classes of complex non-finite languages that can be learned on the basis of positive data alone. These attempts to recharacterize the nature of human language by revised formal analysis all stand as useful approaches to the logical problem. By characterizing the target language in a way that makes it learnable by children, linguists help bridge the gap between linguistic theory and child language studies.</Paragraph> </Section> <Section position="2" start_page="57" end_page="58" type="sub_section"> <SectionTitle> 4.2 Revised end-state criterion </SectionTitle> <Paragraph position="0"> The second solution to the logical problem involves resetting our notion of what it means to acquire an end-state grammar. Horning (1969) showed that, if the language identification is allowed to involve a stochastic probability of identification, rather than an absolute guarantee of no further error ever, then language can be identified on positive evidence alone. It is surprising that this solution has not received more attention, since this analysis undercuts the core logic of the logical problem, as it applies to the learning of all rule systems up to the level of context-sensitive grammars. If learning were deterministic, children would go through a series of attempts to hypothesize the 'correct' grammar for the language. Once they hit on the correct identification, they would then never abandon this end-state grammar. The fact that adults make speech errors and differ in their judgments regarding at least some syntactic structures sugests that this criterion is to strong and that the view of grammar as stochastic is more realistic.</Paragraph> </Section> <Section position="3" start_page="58" end_page="59" type="sub_section"> <SectionTitle> 4.3 Conservative Item-based Learning </SectionTitle> <Paragraph position="0"> The third solution to the logical problem emphasizes the conservative nature of children's language learning. The most direct way for a language learner to solve Gold's problem is to avoid formulating overly general grammars in the first place. If the child never overgeneralizes, there is no problem of recovery from overgeneralization and no need for negative evidence or corrective feedback. Taking this basic idea one step further, let us imagine that grammars are ordered strictly in terms of their relative generative power. If this is true, then the forms generated by a grammar are a subset of the next slightly larger grammar. This is known as the Subset Principle. If the child always choses the least powerful grammar that is consistent with the input data, then the problem of the unavailability of negative evidence disappears and learning can be based simply on positive evidence.</Paragraph> <Paragraph position="1"> The Subset Principle has often been used to argue for abstract relations between grammars. For example, Fodor & Crain (1987) argue that the child learns the periphrastic dative ('give the bok to John') for each new verb and only asumes that the double object construction ('give John the bok') can be applied if it is atested in the input.</Paragraph> <Paragraph position="2"> In this particular case, the grammar with only the periphrastic is ordered as a subset of the grammar with both constructions. This folows from the principles for expansion of curly braces in GPSG.</Paragraph> <Paragraph position="3"> Conservatism can control acquisition of these structures without invoking the Subset Principle.</Paragraph> <Paragraph position="4"> The theory of item-based acquisition (MacWhiney, 1975, 1982, 1987a; Tomasello, 200) holds that syntactic learning is driven by the induction and combination of item-based constructions. Each item-based construction specifies a set of slots for arguments. Initially, these slots encode features that are specific to the first words encountered in this slot during comprehension. For example, the item 'more' has a slot for a folowing argument. If the first combinations the child picks up from comprehension are 'more cookies' and 'more milk', then this slot wil initially be limited to foods. However, as the child hears 'more' used in additional combinations, the semantics of the slot filer wil extend to any mass noun or plural.</Paragraph> <Paragraph position="5"> This learning is based entirely on generalization from positive evidence.</Paragraph> <Paragraph position="6"> When learning the item-based construction for 'give', children encounter sentences such as 'Bil gives John the bok.' From this, they learn the double-object construction: giver + 'give' + recipient + gift. They also learn the competing item-based construction of giver + 'give' + gift + 'to' recipient. There is no need to invoke the Subset Principle to explain this learning, since item-based constructions are inherently conservative and provide their own constraints on the form of grammars. Having acquired these two basic constructions, children can them join them into a single item-based finite automaton that operates on narrowly defined lexical categories.</Paragraph> <Paragraph position="7"> Children can learn this item-based grammar fragment on the basis of simple positive data. This example uses the formalism of a finite-state automaton to annotate the use of positive data.</Paragraph> <Paragraph position="8"> However, in the Competition Model and other conectionist accounts, the two verb frames compete probabilistically with the outcome of the competition being determined by further cues such as focusing or topicalization.</Paragraph> <Paragraph position="9"> Item-based learning involves an ongoing process of generalization for the semantic features of the arguments. During these processes of generalization, to minimize the posibility of error, the child has to be conservative in three ways: * The child needs to formulate each syntactic combination as an item-based construction.</Paragraph> <Paragraph position="10"> * Each item-based construction needs to record the exact semantic status of each positive instance of an argument in a particular grammatical configuration (MacWhiney, 1987a).</Paragraph> <Paragraph position="11"> * Atempts to use the item-based construction with new arguments must be closely guided by the semantics of previously encountered positive instances.</Paragraph> <Paragraph position="12"> If the child has a god memory and applies this method cautiously, overgeneralization wil be minimized and there wil be no need to recover from overgeneralization.</Paragraph> <Paragraph position="13"> Each item-based construction is linked to a specific lexical item. This item must be a predicate. There are no item-based constructions for nouns.</Paragraph> <Paragraph position="14"> Predicates can have up to three arguments. Item-based constructions for verbs can also include the verbs of embedded clauses as posible arguments.</Paragraph> <Paragraph position="15"> Item-based constructions for prepositions and auxiliaries include both a phrase internal head (endohead) and a head for the phrase attachment (exohead). For details on the implementation of this grammatical relations model through a parser see Sagae, MacWhiney, and Lavie (204). In section 4.6, we wil see how item-based constructions are generalized to feature-based constructions in accord with the account of MacWhiney (1987a) Conservatism also applies to non-local movement patterns. For example, Wolfe Quintero (192) has shown that conservatism can be used to account for L2 acquisition of the wh-movement patterns. She notes that L2 learners acquire these positive contexts for wh-movement in this order: 32. What did the litle girl hit _ with the block today? 3. What did the boy play with _ behind his mother? 34. What did the boy read a story about _ this morning? Because they are proceeding conservatively, learners never produce forms such as (35): 35. *What did the boy with __ read a story this morning? They never hear this structure in the input and never hypothesize a grammar that includes it. As a result, they never make overgeneralizations and never attempt wh-movement in this particular con- null text. Data from Maratsos, Kuczaj, Fox & Chalkley (1979) show that this same analysis aplies to first language learners.</Paragraph> </Section> <Section position="4" start_page="59" end_page="62" type="sub_section"> <SectionTitle> 4.4 Competition </SectionTitle> <Paragraph position="0"> Conservatism is a powerful mechanism for addressing the logical problem. However, children wil eventually go 'beyond the information given' and produce errors (Jespersen, 192). When the child produces errors, some mechanism must force recovery. The four processes that have been proposed by emergentist theory are: competition, cue construction, monitoring, and indirect negative evidence. Each of these processes can work to correct overgeneralization. These processes are important for adressing the version of the logical problem that emphasizes the poverty of negative evidence.</Paragraph> <Paragraph position="1"> The fourth solution to the problem of poverty of negative evidence relies on the mechanism of competition. Of the four mechanisms for promoting recovery from overgeneralization, competition is the most basic, general, and powerful. Psychological theories have often made reference to the notion of competition. In the area of language acquisition, MacWhiney (1978) used competition to account for the interplay between 'rote' and 'analogy' in learning morphophonology. Competition was later generalized to all levels of linguistic processing in the Competition Model. In the 190s, specific aspects of learning in the Competition Model were formulated through both neural network theory and the ACT-R production system.</Paragraph> <Paragraph position="2"> The Competition Model views overgeneralizations as arising from two types of pressures. The first pressure is the underlying analogic force that produces the overgeneralization. The second pressure is the growth in the rote episodic auditory representation of a correct form. This representation slowly grows in strength over time, as it is repeatedly strengthened through encounters with the input data. These two forces compete for the control of production. Consider the case of '*goed' and 'went'. The overgeneralization 'goed' is suported by analogy. It competes against the weak rote form 'went,' which is suported by auditory memory. As the strength of the rote auditory form for 'went' grows, it begins to win out in the competition against the analogic form '*goed'. Finally, the error is eliminated. This is the Competition Model account for recovery from overgeneralization. The compett between two candidate forms is governed by the strength of their episodic auditory representations. In the case of the competition be- null tween '*goed' and 'went', the overgeneralized form has litle episodic auditory strength, since it is heard seldom if at all in the input. Although '*goed' lacks auditory suport, it has strong analogic suport from the general pattern for past tense formation. In the Competition Model, analogic pressure stimulates overgeneralization and episodic auditory encoding reins it in. The analogic pressure hypothesized in this account has been described in detail in several conectionist models of morphophonological learning. The models that most closely implement the type of competition being described here are the models of MacWhiney and Leinbach (191) for English and Machiney, Leinbach, Taraban & McDonald (1989) for German. In these models, there is a pressure for regularization according to the general pattern that produces forms such as '*goed' and '*ranned'. In addition, there are weaker gang effects that lead to overgeneralizations such as '*stang' for the past tense of 'sting'.</Paragraph> <Paragraph position="3"> Competition implements the notion of blocking developed first by Baker (1979) and later by Pinker (194). Blocking is more limited than competition because it requires either strict rule-ordering or all-or-none competition. The assumption that forms are competing for the same meaning is identical to the Principle of Uniqueness postulated by Pinker (194). Competition is also the general case of the Direct Contrast noted by Saxton (197).</Paragraph> <Paragraph position="4"> Competition goes beyond the analyses offered by Baker, Pinker, and Saxton by emphasizing the fact that the child is continually internalizing adult forms in episodic memory. Recent evidence for the power of episodic memory in infant audition (Aslin et al., 199) has underscored the power of neural mechanisms for storing linguistic input and extracting patterns from this input without conscious processing. The Competition Model assumes that children are continually storing traces of the words and phrases they hear along with tags that indicate that these phrases derive directly from adult input. When the child then comes to produce a spontaneous form, these stored forms function as an 'oracle' or 'informant', providing delayed negative evidence that corresponds (because of competition or Uniqueness) to the currently generated productive form. The ultimate source of this negative evidence is the input. Children do not use this evidence when it is initially presented. It is only later when the information is retrieved in the context of productive combinations that it provides negative evidence. This can only happen if it is clear that stored adult forms compete directly (Saxton, 197) with productive child forms. The crucial claim of the Competition Model is that the same retrieval cues that triger the formation of the overgeneralized productive form also triger the retrieval of the internalized negative evidence.</Paragraph> <Paragraph position="5"> When these assumptions hold, there is a direct solution to the logical problem through the availability of internalized negative evidence.</Paragraph> <Paragraph position="6"> To gain a better understanding of the range of phenomena that can be understod in terms of competition, let us lok at examples from morphology. lexical semantics, and syntactic constructions. null Bowerman (1987) argued that recovery from overgeneralizations such as '*unsqueeze' is particularly problematic for a Competition Model account. She holds that recovery depends on processes of semantic reorganization that lie outside the scope of competition. To make her example fuly concrete, let us imagine that '*unsqueeze' is being used to refer to the voluntary opening of a clenched fist. Bowerman holds that there is no obvious competitor to '*unsqueeze.' However, when presented with this concrete example, most native speakers wil say that both 'release' and 'let go' are reasonable alternatives. The Competition Model claim is that, because there is no rote auditory suport for '*unsqueeze,' forms like 'release' or 'let go' wil eventually compete against and eliminate this particular error.</Paragraph> <Paragraph position="7"> Several semantic cues suport this process of recovery. In particular, inanimate objects such as ruber balls and sponges cannot be '*unsqueezed' in the same way that they can be 'squeezed.' Squeezing is only reversible if we focus on the action of the body part doing the squeezing, not the object being squeezed. It is posible that, at first, children do not fuly apreciate these constraints on the reversibility of this particular action. However, it is equally likely that they resort to using '*unsqueeze' largely because of the unavailability of more suitable competitors such as 'release.' An error of this type is equivalent to production of 'falled' when the child is having trouble remembering the correct form 'fell.' Or consider the competition between '*unapproved' and its acceptable competitor 'disaproved'. We might imagine that a mortgage loan application that was initially approved could then be subsequently '*unapproved.' We might have some uncertainty about the reversibility of the aproval process, but the real problem is that we have not sufficiently solidified our notion of 'disapproved' in order to have it apply in this case. The flip side of this coin is that many of the child's extensional productions of reversives wil end up being acceptable. For example, the child may produce 'unstick' without ever having encountered the form in the input. In this case, the form wil survive. Although it wil compete with 'remove', it wil also receive occasional suport from the input and wil survive long enough for it to begin to carve out further details in the semantic scope of verbs that can be reversed with the prefix 'un-' (Li & MacWhiney, 196).</Paragraph> <Paragraph position="8"> The same logic that can be used to account for recovery from morphological overgeneralizations can be used to account for recovery from lexical overgeneralizations. For example, a child may overgeneralize the word 'kity' to refer to tigers and lions. The child wil eventually learn the correct names for these animals and restrict the overgeneralized form. The same three forces are at work here: analogic pressure, competition, and episodic encoding. Although the child has never actually seen a 'kity' that loks like a tiger, there are enough shared features to license the generalization. If the parent suplies the name 'tiger.' there is a new episodic encoding that then begins to compete with the analogic pressure. If no new name is suplied, the child may stil begin to accumulate some negative evidence, noting that this particular use of 'kity' is not yet confirmed in the input.</Paragraph> <Paragraph position="9"> Merriman (199) has shown how the linking of competition to a theory of attentional focusing can account for the major empirical findings in the literature on Mutual Exclusivity (the tendency to treat each object as having only one name). By treating this constraint as an emergent bias, we avoid a variety of empirical problems. Since competition is probabilistic, it only imposes a bias on learning, rather than a fixed inate constraint. The probabilistic basis for competition alows the child to deal with hierarchical category structure without having to enforce major conceptual reorganization.</Paragraph> <Paragraph position="10"> Competition may initially lead a child to avoid referring to a 'robin' as a 'bird,' since the form 'robin' would be a better direct match. However, sometimes 'bird' does not compete directly with 'robin.' This occurs when referring to a colection of different types of birds that may include robins, when referring to an object that cannot be clearly identified as a robin, or when making anaphoric reference to an item that was earlier mentioned as a 'robin.' Overgeneralizations in syntax arise when a feature-based construction common to a group or 'gang' of verbs is incorrectly overextended to a new verb.</Paragraph> <Paragraph position="11"> This type of overextension has been analyzed in both distributed networks (Miikulainen & Mayberry, 199) and interactive activation networks (Elman et al., 205; MacDonald et al., 194; MacWhiney, 1987b). These networks demonstrate the same gang effects and generalizations found in networks for morphological forms (Plunkett & Marchman, 193) and spelling correspondences (Taraban & McClelland, 1987). If a word shares a variety of semantic features with a group of other words, it wil be treated syntactically as a member of the group.</Paragraph> <Paragraph position="12"> Consider the example of overgeneralizations of dative movement. Verbs like 'give', 'send', and 'ship' all share a set of semantic features involving the transfer of an object through some physical medium. In this regard, they are quite close to a verb like 'deliver' and the three-argument verb group exerts strong analogic pressure on the verb 'deliver'. However, dative movement only aplies to certain frequent, monosylabic transfer verbs and not to multisylabic, Latinate forms with a less transitive semantics such as 'deliver' or 'recommend.' When children overgeneralize and say, 'Tom delivered the library the bok,' they are obeying analogic pressure from the group of transfer verbs that permit dative movement. In effect, the child has created a new argument frame for the verb 'deliver.' The first argument frame only specifies two arguments - a subject or 'giver' and an object or 'thing transferred.' The new lexical entry specifies three arguments. These two homophonous entries for 'deliver' are now in competi- null tion, just as '*goed' and 'went' were in competition. Like the entry for '*goed', the three-place entry for 'deliver' has god analogic suport, but no suport from episodic encoding derived from the input. Over time, it loses in its competition with the two-argument form of 'deliver' and its progressive weakening along with strengthening of the competing form leads to recovery from overgeneralization. Thus, the analysis of recovery from 'Tom delivered the library the bok' is identical to the analysis of recovery from '*goed'.</Paragraph> <Paragraph position="13"> 4.4.4 Modeling construction strength It may be useful to characterize the temporal course of competitive item-based learning in slightly more formal terms. To do this, we can say that a human language is generated by the application of a set of constructions that map arguments to predicates. For each item-based construction (IC), there is a correct maping (CM) from argument to its predicates and any number of incorrect mappings (IM). The IMs receive suport from analogical relations to groups of CM with similar structure. From these emerge feature-based constructions (FC). The CMs receive suport from positive input, as well as analogical relations to other CMs and FCs. Each positive input increases the strength S of a matching CM by amount A.</Paragraph> <Paragraph position="14"> Learning of an IC occurs when the S of CM exceeds the S of each of the strongest competing IM by some additional amount. This is the dominance strength or DS.</Paragraph> <Paragraph position="15"> To model language learning within this framework, we need to understand the distribution of the positive data and the sources of analogical suport.</Paragraph> <Paragraph position="16"> From database searches and calculation of ages of learning of CM, we can estimate the number of positive input examples (P) needed to bring a CM to strength DS. For each C, if the input has included P cases by time T, we can say that a particular CM reaches DS monotonically in time T.</Paragraph> <Paragraph position="17"> At this point, IC is learned. Languages are learnable if their component ICs can be learned in time T. To measure learning to various levels, we can specify learning states in which there remain certain specified slow constructions (SC) that have not yet reached DS. Constructions learned by this time can be called NC or normal constructions.</Paragraph> <Paragraph position="18"> Thus, at time T, the degree of completion of the learning of L can be expressed as NC/NC + SC.</Paragraph> <Paragraph position="19"> This is a number that approaches 1.0 as T increases. The residual presence of a few SC, as well as occasional spontaneous declines in DS of CM wil lead to deviations from 1.0. The study of the SCs requires a model of analogic suport from FCs. In essence, the logical problem of language acquisition is then restated as the process of understanding how analogical pressures lead to learning courses that deviate from what is predicted by simple learning on positive exemplars for individual item-based constructions.</Paragraph> </Section> <Section position="5" start_page="62" end_page="63" type="sub_section"> <SectionTitle> 4.5 Cue construction </SectionTitle> <Paragraph position="0"> The fifth solution to the logical problem and the second of the solutions that promotes recovery from overgeneralization is cue construction. Most recovery from overgeneralization relies on competition. However, competition wil eventually encounter limits in its ability to deal with the fine details of grammatical patterns. To ilustrate these limits, consider the case of recovery from resultative overgeneralizations such as '*I untied my shoes lose'. This particular extension receives analogic suport from verbs like 'shake' or 'kick' which permit 'I shok my shoes lose' or 'I kicked my shoes lose.' It appears that the child is not initially tuned in to the fine details of these semantic classifications. Bowerman (198) has sugested that the process of recovery from overgeneralization may lead the child to construct new features to block overgeneralization. We can refer to this process as 'cue construction.' Recovering from other resultative overgeneralizations may also require cue construction. For example, an error such as '*The gardener watered the tulips flat' can be attributed to the operation of a feature-based construction which yields three-argument verbs from 'hammer' or 'rake', as in 'The gardener raked the grass flat.' Source-goal overgeneralization can also fit into this framework.</Paragraph> <Paragraph position="1"> Consider, '*The maid poured the tub with water' instead of 'The maid poured water into the tub' and '*The maid filed water into the tub' instead of 'The maid filed the tub with water.' In each case, the analogic pressure from one group of words leads to the establishment of a case frame that is incorrect for a particular verb. Although this competition could be handled just by the strengthening of the correct patterns, it seems likely that the child also needs to clarify the shape of the semantic features that unify the 'pour' verbs and the 'fil' verbs. Bowerman (personal communication) provides an even more challenging example. One can say 'The customers drove the taxi driver crazy,' but not '*The customers drove the taxi driver sad.' The error involves an overgeneralization of the exact shape of the resultative adjective. A conectionist model of the three-argument case frame for 'drive' would determine not only that certain verbs license a third posible argument, but also what the exact semantic shape of that argument can be. In the case of the standard pattern for verbs like 'drive,' the resultant state must be terminative, rather than transient. To express this within the Competition Model context, we would need to have a competition between a confirmed three-argument form for 'drive' and a loser overgeneral form based only on analogic pressure. A similar competition account can be used to account for recovery from an error such as, '*The workers unloaded the truck empty' which contrasts with 'The workers loaded the truck ful'. In both of these cases, analogic pressure seems weak, since examples of such errors are extremely rare in the language learning literature.</Paragraph> <Paragraph position="2"> The actual modelling of these competitions in a neural network wil require detailed lexical work and extensive corpus analysis. A sketch of the types of models that wil be required is given in MacWhiney (199).</Paragraph> </Section> <Section position="6" start_page="63" end_page="64" type="sub_section"> <SectionTitle> 4.6 Monitoring </SectionTitle> <Paragraph position="0"> The sixth solution to the logical problem involves children's abilities to monitor and detect their own errors. The Competition Model holds that, over time, correct forms gain strength from encounters with positive exemplars and that this increasing strength leads them to drive out incorrect forms. If we make further assumptions about uniqueness, this strengthening of correct forms can guarantee the learnability of language. However, by itself, competition does not fuly account for the dynamics of language processing in real social interactions. Consider a standard self-correction such as 'I gived, uh, gave my friend a peach.' Here the correct form 'gave' is activated in real time just after the production of the overgeneralization.</Paragraph> <Paragraph position="1"> MacWhiney (1978) and Elbers & Wijnen (193) have treated this type of self-correction as involving 'expressive monitoring' in which the child listens to her own output, compares the correct weak rote form with the incorrect overgeneralization, and attempts to block the output of the incorrect form. One posible outcome of expressive monitoring is the strengthening of the weak rote form and weakening of the analogic forms. Exactly how this is implemented wil vary from model to model In general, retraced false starts move from incorrect forms to correct forms, indicating that the incorrect forms are produced quickly, whereas the correct rote forms take time to activate. Kawamoto (194) has shown how a recurent conectionist network can simulate exactly these timing asymmetries between analogic and rote retrieval. For example, Kawamoto's model captures the experimental finding that incorrect regularized pronunciations of 'pint' to rhyme with 'hint' are produced faster than correct irregular pronunciations.</Paragraph> <Paragraph position="2"> An even more powerful learning mechanism is what MacWhiney (1978) called 'receptive monitoring.' If the child shadows input structures closely, he wil be able to pick up many discrepancies between his own productive system and the forms he hears. Berwick (1987) found that syntactic learning could arise from the attempt to extract meaning during comprehension. Whenever the child cannot parse an input sentence, the failure to parse can be used as a means of expanding the grammar. The kind of analysis through synthesis that occurs in some parsing systems can make powerful use of positive instances to establish new syntactic frames. Receptive monitoring can also be used to recover from overgeneralization. The child may monitor the form 'went' in the input and attempt to use his own grammar to match that input.</Paragraph> <Paragraph position="3"> If the result of the receptive monitoring is '*goed', the child can use the mismatch to reset the weights in the analogic system to avoid future overgeneralizations. null Neural network models that rely on back-propagation assume that negative evidence is continually available for every learning trial. For this type of model to make sense, the child would have to depend heavily on both expresive and receptive monitoring. It is unlikely that these two mechanisms operate as continuously as would be required for a mechanism such as back-propagation.</Paragraph> <Paragraph position="4"> However, not all conectionist models rely on the availability of negative evidence. For example, Kohonen's self-organizing feature map model (Miikulainen, 193) learns linguistic paterns simply using cooccurences in the data with no reliance on negative evidence.</Paragraph> </Section> <Section position="7" start_page="64" end_page="64" type="sub_section"> <SectionTitle> 4.7 Indirect negative evidence </SectionTitle> <Paragraph position="0"> The seventh solution to the logical problem of language acquisition relies on the computation of indirect negative evidence. This computation can be ilustrated with the error '*goed.' To construct indirect negative evidence in this case, children need to track the frequency of all verbs and the frequency of the past tense as marked by the regular '-ed.' Then they need to compute regular '-ed' as a percentage of all verbs. Next they need to track the frequency of the verb 'go' in all of its uses and the frequency of '*goed&quot;. To gain a bit more certainty, they should also calculate the frequency of a verb like 'jump' and the frequency of 'jumped.' With these ratios in hand, the child can then compare the ratio for 'go' with those for 'jump' or verbs in general and conclude that the attested cases of '*goed' are fewer than would be expected on the basis of evidence from verbs like 'jump.' They can then conclude that '*goed' is ungrammatical. Interestingly, they can do this without receiving overt correction.</Paragraph> <Paragraph position="1"> The structures for which indirect negative evidence could provide the most useful accounts are ones that are learned rather late. These typically involve low-error constructions of the type that motivate the strong form of the logical problem.</Paragraph> <Paragraph position="2"> For example, children could compute indirect negative evidence that would block wh-raising from object-modifying relatives in sentences such as (37).</Paragraph> <Paragraph position="3"> 36. The police arrested the thieves who were carrying the lot.</Paragraph> <Paragraph position="4"> 37. *What did the police arrest the thieves who were carrying? 38. To do this, they would need to track the frequency of sentences such as: 39. Bil thought the thieves were carrying the lot.</Paragraph> <Paragraph position="5"> 40. What did Bil think the thieves were carrying? null Noting that raising from predicate complements occurs fairly frequently, children could reasonably conclude that the absence of raising from object modification position means that it is ungrammatical. Coupled with conservatism, indirect negative evidence can be a useful mechanism for avoiding overgeneralization of complex syntactic structures. The item-based acquisition component of the Competition Model provides a framework for computing indirect negative evidence. The indirect negative evidence tracker could note that, although 'squeeze' occurs frequently in the input, '*unsqueeze' does not. This mechanism works through the juxtaposition of a form receiving episodic suport ('squeeze') with a predicted inflected form ('unsqueeze').</Paragraph> <Paragraph position="6"> This mechanism uses analogic pressure to predict the form '*unsqueeze.' This is the same mechanism as used in the generation of '*goed.' However, the child does not need to actually produce '*unsqueeze,' only to hypothesize its existence. This form is then tracked in the input. If it is not found, the comparison of the near-zero strength of the unconfirmed form 'unsqueeze' with the confirmed form 'squeeze' leads to the strengthening of competitors such as 'release' and blocking of any attempts to use 'unsqueeze.' Although this mechanism is plausible, it is more complicated than the basic competition mechanism and places a greater requirement on memory for tracking of nonoccurrences. Since the end result of this tracking of indirect negative evidence is the same as that of the basic competition mechanism, it is reasonable to imagine that learners use this mechanism only as a fall back strategy, relying on simple competition to solve most problems requiring recovery from overgeneralization.</Paragraph> </Section> </Section> class="xml-element"></Paper>