File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/82/p82-1032_metho.xml
Size: 29,661 bytes
Last Modified: 2025-10-06 14:11:28
<?xml version="1.0" standalone="yes"?> <Paper uid="P82-1032"> <Title>A Model of Early Syntactic Development</Title> <Section position="4" start_page="145" end_page="146" type="metho"> <SectionTitle> START </SectionTitle> <Paragraph position="0"> If you want to describe node1, and node2 is in relation to node1, then describe node2.</Paragraph> <Paragraph position="1"> Matching first against the main goal node, this rule selects one of the nodes below it in the tree and creates a subgoal to describe that node. This rule continues to establish lower level goals until paraphrases of the actual PRISM productions. All variables are italicized; these may match against any symbol, but all occurrences of a variable -&quot; ~'. ~,~atch to the same element. a terminal node is reached. At this point, a second production (the speak rule) is matched; this rule may be stated: SPEAK If you want to describe a conceptt and word is the word for concept, then say word and note that concept has been described.</Paragraph> <Paragraph position="2"> This production retrieves the word for the concept AMBER wants to describe, actually says this word, and marks the terminal goal as satisfied. Once this has been done, the third and final performance production becomes true. This rule matches whenever a subgoal has been satisfied, and attempts to mark the supergoal as satisfied; it may be paraphrased as:</Paragraph> </Section> <Section position="5" start_page="146" end_page="146" type="metho"> <SectionTitle> STOP </SectionTitle> <Paragraph position="0"> If you want to describe node1, and node2 is in re/ation to nodel, and node2 has already been described, then note that node1 has been described. Since the stop rule is stronger 3 than the start rule (which would like to create another subgoal), it moves back up the tree, marking each of the active goals as satisfied (including the main goal). As a result, AMBER believes it has successfully described an event after it has uttered only a single word. Thus, although the model starts with the potential for producing multi.word utterances, it must learn additional rules (and make them stronger than the stop rule) before it can generate multiple content words in the correct order.</Paragraph> <Paragraph position="1"> In general, AMBER learns by comparing adult sentences to the sentences it would produce in the same situations. These predictions reveal two types of mistakes - errors of omission and errors of commission. These errors are detected by additional/earning productions that are responsible for creating new performance rules. Thus, AMBER is an example of what Waterman (1975) has called an adaptive production system, which modifies its own behavior by inserting new condition-action rules. Below I discuss AMBER'S response to errors of omission, since these are the first to occur and thus lead to the system's first steps beyond the one-word stage. I consider the omission of content words first, and then the omission of grammatical morphemes. Finally, I discuss the importance of errors of commission in discovering conditions on the production of morphemes.</Paragraph> <Paragraph position="2"> 3. Learning Preferences and Orders AMBER'S initial self-modifications result from tile failure to predict content words. Given its initial ability to say one word at a time, the system can make two types of content word omissions - it can fail to predict a word before a correctly predicted one, or it can omit a word after a correctly predicted one. Rather different rules are created in each case. For example, imagine that Daddy is bouncing a ball, and suppose that AMBEa predicted only the word &quot;ball&quot;, while hearing the sentence &quot;Daddy is bounce ing the ball&quot;. In this case, one of the system's learning rules would note the omitted content word 3The notion of strength plays an important role in AMBER'S explanation of language learning. When a new rule is created, it is given a low initial strength, but this is increased whenever that rule is relearned. And since stronger productions are preferred to their weaker competitors, rules that have been learned many times determine behavior.</Paragraph> <Paragraph position="3"> &quot;Daddy&quot; before the content word &quot;ball&quot;, and an agent production would be created:</Paragraph> </Section> <Section position="6" start_page="146" end_page="146" type="metho"> <SectionTitle> AGENT </SectionTitle> <Paragraph position="0"> If you want to describe event1, and agent1 is the agent of event1, then desc ribe agent1.</Paragraph> <Paragraph position="1"> Although I do not have the space to describe the responsible learning rule in detail, I can say that it matches against situations in which one content word is omitted before another, and that it always constructs new productions with the same form as the agent rule described above. In this case, it would also create a similar rule for describing actions, based on the omitted &quot;bounce&quot;. Note that these new productions do not give AMBER the ability to say more than one word at a time. They merely increase the likelihood that the program will describe the agent or action of an event instead of the object.</Paragraph> <Paragraph position="2"> However, as AMBER begins to prefer agents to actions and actions to objects, the probability of the second type of error (omitting a word after a correctly predicted one) increases. For example, suppose that Daddy is again bouncing a ball, and the system says &quot;Daddy&quot; while it hears &quot;Daddy is bounce ing the ball&quot;. In this case, a slightly different production is created that is responsible for ordering the creation of goals. Since the agent relation was described but the object was omitted, an agent.</Paragraph> <Paragraph position="3"> object rule is constructed:</Paragraph> </Section> <Section position="7" start_page="146" end_page="146" type="metho"> <SectionTitle> AGENT- OBJECT </SectionTitle> <Paragraph position="0"> If you want to describe event1, and agent1 is the agent of event1, and you have described agent1, and object1 is the object of event1, then describe object1.</Paragraph> <Paragraph position="1"> Together with the agent rule shown above, this production lets AMBER produce utterances such as &quot;Daddy ball&quot;. Thus, the model provides a simple explanation of why children omit some content words in their early multi-word utterances. Such rules must be constructed many times before they become strong enough to have an effect, but eventually they let the system produce telegraphic sentences containing all relevant content words in the standard order and lacking only grammatical morphemes.</Paragraph> </Section> <Section position="8" start_page="146" end_page="147" type="metho"> <SectionTitle> 4. Learning Suffixes and Prefixes </SectionTitle> <Paragraph position="0"> Once AMBER begins to correctly predict content words, it can learn rules for saying grammatical morphemes as well. As with content words, such rules are created when the system hears a morpheme but fails to predict it in that position. For example, suppose the. program hears the sentence &quot;Daddy deg is bounce ing &quot;the ball&quot;, 4 but predicts only &quot;Daddy bounce ball&quot;. In this case, the following rule is generated: ING-1 If you have described action1, and action1 is the action of event1, then say ING.</Paragraph> <Paragraph position="1"> Once it has gained sufficient strength, this rule will say the morpheme &quot;ing&quot; after any action word. As stated, the production is overly general and will lead to errors of commission. I consider AMBER'S response to such errors in the following section.</Paragraph> <Paragraph position="2"> 4Asterisks represent pauses in the adult sentence. These cues are necessary for AMBER to decide that a morpheme like &quot;is&quot; is a prefix for &quot;bounce&quot; instead of a suffix for &quot;Daddy&quot;. The omission of prefixes leads to very similar rules. In the above example, the morpheme &quot;is&quot; was omitted before &quot;bounce&quot;, leading to the creation of a prefix rule for producing the missing function word: IS-1 If you want to describe action1, and action I is the action of event1, then say IS.</Paragraph> <Paragraph position="3"> Note that this rule will become true before an action has been described, while the rule ing-I can apply only after the goal to describe the action has been satisfied. AMBER uses such conditions to control the order in which morphemes are produced.</Paragraph> <Paragraph position="4"> Figure 1 shows AMBER'S mean length of utterance as a function of the number of sample sentences (taken in groups of five) seen by the program, b As one would expect, the system starts with an average of around one word per utterance, and the length slowly increases with time. AMBER moves through a two. word and then a three-word stage, until it eventually produces sentences lacking only grammatical morphemes. Finally, the morphemes are included, and adult-like sentences are produced. The incremental nature of the learning curve results from the piecemeal way in which AMBER learns rules for producing sentences, and from the system's reliance on the strengthening process.</Paragraph> </Section> <Section position="9" start_page="147" end_page="148" type="metho"> <SectionTitle> 5. Recovering from Errors of Commission </SectionTitle> <Paragraph position="0"> Errors of commission occur when AMBER predicts a morpheme that does not occur in the adult sentence. These errors result from the overly general prefix and suffix rules that we saw in the last section. In response to such errors, AMBER calls on a discrimination routine in an attempt to generate more conservative productions with additional conditions. ~ Earlier, I considered a rule (is-1) for producing &quot;is&quot; before the action of an event. As stated, this rule would apply in inappropriate situations as well as correct ones. For example, suppose that AMBER learned this rule in the context of the sentence &quot;Daddy is bounce ing the ball&quot;. Now suppose the system later uses this rule to predict the same sentence, but that it instead hears the sentence &quot;Daddy was bounce ing the ball&quot;.</Paragraph> <Paragraph position="1"> 5AMBER iS implemented on a PDP KL. tO in PRISM (Langley and Neches, t981), an adaptive production system language designed for modeling learning phenomena; the run summarized in Figure t took approximately 2 hours of CPU time.</Paragraph> <Paragraph position="2"> At this point, AMBER'S discrimination routine would retrieve the rule responsible for predicting &quot;is&quot; and lowers its strength; it would also retrieve the situation that led to the faulty application, passing this information to the discrimination routine. Comparing the earlier good case to the current bad case, the discrimination mechanism finds only one difference - in the good example, the action node was marked present, while no such marker occurred during the faulty application. The result is a new production that is identical to the original rule, except that an additional condition has been included: IS-2 If you want to describe action1, and action I is the action of event1, and action1 is in the present, then say IS.</Paragraph> <Paragraph position="3"> This new condition will let the variant rule fire only when the action is marked as occurring in the present. When first created, the is-2 production is too weak to be seriously considered.</Paragraph> <Paragraph position="4"> However, as it is learned again and again, it will eventually come to mask its predecessor. This transition is aided by the weakening of the faulty is-1 rule each time it leads to an error. Once the variant production has gained enough strength to apply, it will produce its own errors of commission. For example, suppose AMBER uses the is-2 rule to predict &quot;The boy s is bounce ing the ball&quot;, while the system hears &quot;The boy s are bounce ing the ball&quot;. This time the difference is more complicated. The fact that the action had an agent in the good situation is no help, since an agent was present during the faulty firing as well. However, the agent was singular in the first case but not during the second. Accordingly, the discrimination mechanism creates a secondvariant: IS-3 If you want to describe action1, and action1 is the action of event1, and action1 is in the present, and agent1 is the agent of event1, and agent1 is singular, then say IS.</Paragraph> <Paragraph position="5"> The resulting rule contains two additional conditions, since the learning process was forced to chain through two elements to find a difference. Together, these conditions keep the production from saying the morpheme &quot;is&quot; unless tl~e agent of the current action is singular in number.</Paragraph> <Paragraph position="6"> Note that since the discrimination process must learn these sets of conditions separately, an important prediction results: the more complex the conditions on a morpheme's use, the longer it will take to master. For example, three sets of conditions are required for the &quot;is&quot; rule, while only a single condition is needed for the &quot;ing&quot; production. As a result, the former is mastered after the latter, just as found in children's speech. Table 1 presents the order of acquisition for the six classes of morpheme learned by AMBER, and the order in which the same morphemes were mastered by Brown's children. The number of sample sentences the model required before mastery are also included.</Paragraph> <Paragraph position="7"> 6Anderson's ALAS (1981) system uses a very similar process to recover from overly general morpheme rules. AMBER and AL, ~ :~ have much in common, both having grown out of discussions between Anderson and the author. Although there is considerable overlap, ALAS generally accounts for later developments in children's speech than does AMBER.</Paragraph> <Paragraph position="8"> The general trend is very similar for the children and the model, but two pairs of morphemes are switched. For AMEER, the plural construction was mastered before &quot;ing&quot;, while in the observed data the reverse was true. However, note that AMBER mastered the progressive construction almost immediately after the plural, so this difference does not seem especially significant. Second, the model mastered the articles &quot;the&quot;, &quot;a&quot;, and &quot;some&quot; before the construction for past tense. However, Brown has argued that the notions of &quot;definite&quot; and &quot;indefinite&quot; may be more complex than they appear on the surface; thus, AMBER'S representation of these concepts as single features may have oversimplified matters, making articles easier to learn than they are for the child.</Paragraph> <Paragraph position="9"> Thus, the discrimination process provides an elegant explanation for the observed correlation between a morpheme's complexity and its order of acquisition. Observe that if the conditions on a morpheme's application were learned through a process of generalization such as that proposed by Winston (1970), exactly the opposite prediction would result. Since generalization operates by removing conditions which differ in successive examples, simpler rules would be finalized later than more complex ones. Langley (1982) has discussed the differences between generalization-based and discrimination.</Paragraph> <Paragraph position="10"> based approaches to learning in more detail.</Paragraph> </Section> <Section position="10" start_page="148" end_page="148" type="metho"> <SectionTitle> CHILDREN'S ORDER AMBER'S ORDER LEARNING TIME </SectionTitle> <Paragraph position="0"> Some readers will have noted the careful crafting of the above examples, so that only one difference occurred in each case.</Paragraph> <Paragraph position="1"> This meant that the relevant conditions were obvious, and the discrimination mechanism was not forced to consider alternate corrections. In order to more closely model the environment in which children learn language, AMBER was presented with randomly generated sentence/meaning pairs. Thus, it was usually impossible to determine the correct discrimination that should be made from a single pair of good and bad situations.</Paragraph> <Paragraph position="2"> AMBER'S response to this situation is to create all possible discriminations, but to give each of the variants a low initial strengtl~. Correct rules, or rules containing at least some correct conditions, are learned more often than rules containing spurious conditions. And since AMBER strengthens a production whenever it is relearned, variants with useful conditions come to be preferred over their competitors. Thus, AMEER may be viewed as carrying out a breadth-first search through the space of possible rules, considering many alternatives at the same time, and selecting the best of these for further attention. Only variants that exceed a certain threshold (generally those with correct conditions) lead to new errors of commission and additional variants. Eventually, this search process leads to the correct rule, even in the presence of many irrelevant features.</Paragraph> <Paragraph position="3"> Figure 2 presents the learning curves for the &quot;ing&quot; morpheme. Since AMEER initially lacks an &quot;ing&quot; rule, errors of commission abound at the outset, but as this production and its variants are strengthened, such errors decrease. In contrast, errors of commission are absent at the beginning, since AMEER lacks an &quot;ing&quot; rule to make false predictions. As the morpheme rule becomes stronger, errors of commission grow to a peak, but they disappear as discrimination takes effect. By the time it has seen 63 sample sentences, the system has mastered the present progressive construction.</Paragraph> </Section> <Section position="11" start_page="148" end_page="150" type="metho"> <SectionTitle> 6. Directions for Future Research </SectionTitle> <Paragraph position="0"> In the preceding pages, we have seen that AMEER offers explanations for a number of phenomena observed in children's early speech. These include the omission of content words and morphemes, the gradual manner in which these omissions are overcome, and the order in which grammatical morphemes are mastered. As a psychological model of early syntactic development, AMEER constitutes an improvement over previous language learning programs. However, this does not mean that the model can not be improved, and in this section I outline some directions for future research efforts.</Paragraph> <Section position="1" start_page="148" end_page="150" type="sub_section"> <SectionTitle> 6.1. Simplicity and Generality </SectionTitle> <Paragraph position="0"> One of the criteria by which any scientific theory can be judged is simplicity, and this is one dimension along which AMEER could stand some improvement. In particular, some of AMBER'S learning heuristics for coping with errors of omission incorporate considerable knowledge about the task of learning a language. For example, AMEER knows the form of the rules it will learn for ordering goals and producing morphemes. Another questionable piece of information is the distinction between major and minor meanings that lets AMEER treat content words and morphemes as completely separate entities. One might argue that the child is born with such knowledge, so that any model of language acquisition should include it as well, However, until such innateness is proven, any model that can manage without such information must be considered simlsler, more elegant, and more desirable than a model that requires it to learn a language.</Paragraph> <Paragraph position="1"> In contrast to these domain-apecific heuristics, AMBER'S strategy for dealing with errors of commission incorporates an apparently domain-independent learning mechanism - the discrimination process. This heuristic can be applied to any domain in which overly general rules lead to errors, and can be used on a variety of representations to discover the conditions under which such rules should be selected. In addition to language development, the discrimination process has been applied to concept learning (Anderson, Kline, and Beasely, 1979; Langley, 1982) and strategy acquisition (Brazdil, 1978; Langley, 1982)~ Langley (1982) has discussed the generality and power of discrimination-based approaches to learning in greater detail.</Paragraph> <Paragraph position="2"> As we shall see below, this heuristic may Provide a more plausible explanation for the learning of word order. Moreover, it opens the way for dealing with some aspects of language acquisition that AMBER has so far ignored - the learning of word/concept links and the mastering of irregular constructions. 6.2. Learning Word Order Through Discrimination AMBER learns the order of content words through a two-stage process, first learning to prefer some relations (like agent) over others (like action or object), and then learning the relative orders in which such relations should be described. The adaptive productions responsible for these transitions contain the actual form of the rules that are learned; the particular rules that result are simply instantiations of these general forms.</Paragraph> <Paragraph position="3"> Ideally, future versions of AMBER should draw on more general learning strategies to acquire ordering rules.</Paragraph> <Paragraph position="4"> Let us consider how the discrimination mechanism might be applied to the discovery of such rules. In the existing system, the generation of &quot;ball&quot; without a preceding &quot;Daddy&quot; is viewed as an error of omission. However, it could as easily be viewed as an error of commission in which the goal to describe the object was prematurely satisfied. In this case, one might use discrimination to generate a variant version of the start rule: If you want to describe node1, and node2 is the object of node1, and node3 is the agent of nodel, and you have described node3, then describe node2.</Paragraph> <Paragraph position="5"> This production is similar to the start rule, except that it will set up goals only to describe the object of an event, and then only if the agent has already been described. In fact, this rule is identical to the agent-object rule discussed in an earlier section; the important point is that it is also a special case of the start rule that might be learned through discrimination when the more general rule fires inappropriately. The same process could lead to variants such as the agent rule, which express preferences rather than order information. Rather than starting with knowledge of the forms of rules at the outset, AMBER would be able to determine their form through a more general learning heuristic.</Paragraph> <Paragraph position="6"> 6.3. Major and Minor Meanings The current version of AMSEn relies heavily on the representational distinction between major meanings and mcJulations of those meanings. Unfortunately, some languages express through content wor~s what others express through grammatical morphemes. Future versions of the system should lessen this distinction by using the same representation for both types o\[ information. In addition, the model might employ a single production for learning to produce both content words and morphemes; thus, the program would lack the speak rule described earlier, but would construct specific versions of this production for particular words and morphemes. This would also remedy the existing model's inability to learn new connections between words and concepts. Although the resulting rules would probably be overly general, AMBER would be able to recover from the resulting errors by additional use of the discrimination mechanism.</Paragraph> <Paragraph position="7"> The present model also makes a distinction between morphemes that act as prefixes (such as &quot;the&quot;) and those that act as suffixes (such as &quot;ing&quot;). Two separate learning rules are responsible for recovering from function word omissions, and although they are very similar, the conditions under which they apply and the resulting morpheme rules are different.</Paragraph> <Paragraph position="8"> Presumably, if a single adaptive production for learning words and morphemes were introduced, it would take over the functions of both the prefix and suffix rules. If this approach can be successfully implemented, then the current reliance on pause information can be abandoned as welt, since the pauses serve only to distinguish suffixes from prefixes.</Paragraph> <Paragraph position="9"> Such a reorganization would considerably simplify the theory, but it would also lead to two complications. First, the resulting system would tend to produce utterances like &quot;Daddy ed&quot; or &quot;the bounce&quot;, before it learned the correct conditions on morphemes through discrimination. (This problem is currently avoided by including information about the relation when a morpheme rule is first built, but this requires domain-specific knowledge about the language learning task.) Since children very seldom make such errors, some other mechanism must be found to explain their absence, or the model's ability to account for the observed phenomena will suffer, Second, if pause information (and the ability to take advantage of such information) is removed, the system wilt sometimes decide a prefix is a suffix and vice versa. For example, AMBER might construct a rule to say &quot;ing&quot; before the object of an event is described, rather than after the action has been mentioned.</Paragraph> <Paragraph position="10"> However, such variants would have little effect on the system's overall performance, since they would be weakened if they ever led to deviant utterances, and they would tend to be learned less often than the desired rules in any case. Thus, the strengthening and weakening processes would tend to direct search through the space of rules toward the correct segmentation, even in the absence of pause information.</Paragraph> <Paragraph position="11"> 6.4, Mastering Irregular Constructions Another of AMBER'S limitations lies in its inability to learn irregular constructions such as &quot;men&quot; and &quot;ate&quot;. However, by combining discrimination and the approach to learning word/concept links described above, future implementations should fare much better along this dimension. For example, consider the irregular noun &quot;foot&quot;, which forms the plural &quot;feet&quot;. Given a mechanism for connecting words and concepts, AMBER might initially form a rule connecting the concept *foot to the word &quot;foot&quot;. After gaining sufficient strength, this rule would say &quot;~?'~+&quot; whenever seeing an example of the concept degfoot. Upon encountering an occurrence of &quot;feet&quot;, the system would note the error of commission and call on discrimination. This would lead to a variant rule that produced &quot;foot&quot; only when a sing/e marker was present. Also, a new rule connecting &quot;foot to &quot;feet&quot; would be created. Eventually, this new rule would also lead to errors of commission, and a variant with a plural condition would come to replace it.</Paragraph> <Paragraph position="12"> Dealing with the rule for producing the plural marker &quot;s&quot; would be somewhat more difficult. Although AMBER might initially learn to say &quot;foot&quot; and &quot;feet&quot; under the correct circumstances, it would eventually learn the general rule for saying &quot;s&quot; after plural agents and objects. This would lead to constructions such as &quot;feet s&quot;, which have been observed in children's utterances. The system would have no difficulty in detecting such errors of commission, but the appropriate response is not so clear. Conceivably, AMBER could create variants of the &quot;s&quot; rule which stated that the concept to be described must not be =foot. However, a similar condition would atso have to be included for every situation in which irregular pluralization occurred (deer, man, cow, and so on). Similar difficulties arise with irregular constructions for the past tense. A better solution would have AMBER construct a special rule for each irregular word, which &quot;imagined&quot; that the inflection had already been said. Once these productions became stronger than the %&quot; and &quot;ed&quot; rules, they would prevent the latter's application and bypass the regular constructions in these cases.</Paragraph> <Paragraph position="13"> Overly general constructions like &quot;foot s&quot; constitute a related form of error. Although AMBER would generate such mistakes before the irregular form was mastered, it would not revert to the overgeneral regular construction at a later point, as do many children. The area of irregular constructions is clearly a phenomenon that deserves more attention in the future.</Paragraph> </Section> </Section> class="xml-element"></Paper>