File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-1308_intro.xml

Size: 7,277 bytes

Last Modified: 2025-10-06 14:02:38

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-1308">
  <Title>Modelling syntactic development in a cross-linguistic context</Title>
  <Section position="3" start_page="53" end_page="54" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Children acquiring the syntax of their native language are faced with a task of considerable complexity, which they must solve using only noisy and potentially inconsistent input. Mainstream linguistic theory has addressed this learnability problem by proposing the nativist hypothesis that children come into the world with rich innate knowledge about language and grammar (Chomsky, 1981; Piattelli-Palmarini, 2002; Pinker, 1984).</Paragraph>
    <Paragraph position="1"> However, there is also strong empirical evidence that the amount of information present in the input is considerably greater than has traditionally been assumed by the nativist approach. In particular, computer simulations have shown that a distributional analysis of the statistics of the input can provide a significant amount of syntactic information (Redington &amp; Chater, 1997).</Paragraph>
    <Paragraph position="2"> One limitation of the distributional approach is that analyses have rarely been done with naturalistic input (e.g. mothers child-directed speech) and have so far not been linked to the detailed analysis of a linguistic phenomenon found in human data, (e.g., Christiansen &amp; Chater, 2001). Indeed, neither the nativist nor the distributional approach has been developed to the point of providing detailed and quantitative predictions about the developmental dynamics of the acquisition of language. In order to remedy this weakness, our group has recently been exploring a different approach. This approach, which we think is a more powerful way of understanding how children acquire their native language, has involved developing a computational model (MOSAIC; Model Of Syntax Acquisition In Children) that learns from naturalistic input, and produces utterances that can be directly compared with the utterances of language-learning children.</Paragraph>
    <Paragraph position="3"> This makes it possible to derive quantitative predictions about empirical phenomena observed in children learning different languages and about the developmental dynamics of these phenomena.</Paragraph>
    <Paragraph position="4"> MOSAIC, which is based upon a simple distributional analyser, has been used to simulate a number of phenomena in language acquisition.</Paragraph>
    <Paragraph position="5"> These include: the verb-island phenomenon (Gobet &amp; Pine, 1997; Jones, Gobet, &amp; Pine, 2000); negation errors in English (Croker, Pine, &amp; Gobet, 2003); patterns of pronoun case marking error in English (Croker, Pine, &amp; Gobet, 2001); patterns of subject omission error in English (Freudenthal, Pine, &amp; Gobet, 2002b); and the optional-infinitive phenomenon (Freudenthal, Pine, &amp; Gobet, 2001, 2002a, 2003). MOSAIC has also been used to simulate data from three different languages (English, Dutch, and Spanish), which has helped us to  understand how these phenomena are affected by differences in the structure of the language that the child is learning.</Paragraph>
    <Paragraph position="6"> In this paper, we illustrate our approach by showing how MOSAIC can account in detail for the optional-infinitive phenomenon in two languages (English and Dutch) and its quasi-absence in a third language (Spanish). This phenomenon is of particular interest as it has generally been taken to reflect innate grammatical knowledge on the part of the child (Wexler, 1994, 1998).</Paragraph>
    <Paragraph position="7"> We begin by highlighting the theoretical challenges faced in applying our model to data from three different languages. Then, after describing the optional-infinitive phenomenon, we describe MOSAIC, with an emphasis on the mechanisms that will be crucial in explaining the empirical data. We then consider the data from the three languages, and show to what extent the same model can simulate these data. When dealing with English, we describe the methods used to collect and analyse children s data in some detail. While these details may seem out of place in a conference on computational linguistics, we emphasise that they are critical to our approach: first, our approach requires fine-grained empirical data, and, second, the analysis of the data produced by the model is as close as possible to that used with children s data.</Paragraph>
    <Paragraph position="8"> We conclude by discussing the implications of our approach for developmental psycholinguistics.</Paragraph>
    <Paragraph position="9"> 2 Three languages: three challenges The attempt to use MOSAIC to model data in three different languages involves facing up to a number of challenges, each of which is instructive for different reasons. An obvious problem when modelling English data is that English has an impoverished system of verb morphology that makes it difficult to determine which form of the verb a child is producing in any given utterance. This problem militates against conducting objective quantitative analyses of children s early verb use and has resulted in there being no detailed quantitative description of the developmental patterning of the optional infinitive phenomenon in English (in contrast to other languages like Dutch). We have addressed this problem by using exactly the same (automated) methods to classify the utterances produced by the child and by the model.</Paragraph>
    <Paragraph position="10"> These methods, which do not rely on the subjective judgment of the coder (e.g. on Bloom s, 1970, method of rich interpretation) proved to be sufficiently powerful to capture the development of the optional infinitive in English, and to do so at a relatively fine level of detail.</Paragraph>
    <Paragraph position="11"> One potential criticism of these simulations of English is that we may have tuned the model s parameters in order to optimise the goodness of fit to the human data. An obvious consequence of over-fitting the data in this way would be that MOSAIC s ability to simulate the phenomenon would break down when the model was applied to a new language. The simulations of Dutch show that this is not the case: with this language, which has a richer morphology than English, the model was still able to reproduce the key characteristics of the optional-infinitive stage.</Paragraph>
    <Paragraph position="12"> Spanish, the syntax of which is quite different to English and Dutch, offered an even more sensitive test of the model s mechanisms. The Dutch simulations relied heavily on the presence of compound finites in the child-directed speech used as input.</Paragraph>
    <Paragraph position="13"> However, although Spanish child-directed speech has a higher proportion of compound finites than Dutch, children learning Spanish produce optional-infinitive errors less often than children learning Dutch. Somewhat counter-intuitively, the simulations correctly reproduce the relative scarcity of optional-infinitive errors in Spanish, showing that the model is sensitive to subtle regularities in the way compound finites are used in Dutch and Spanish. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML