File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/p03-1053_intro.xml

Size: 3,672 bytes

Last Modified: 2025-10-06 14:01:48

<?xml version="1.0" standalone="yes"?>
<Paper uid="P03-1053">
  <Title>A Word-Order Database for Testing Computational Models of Language Acquisition</Title>
  <Section position="2" start_page="0" end_page="1" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The exact process by which a child acquires the grammar of his or her native language is one of the most beguiling open problems of cognitive science. There has been recent interest in computer simulation of the acquisition process and the interrelationship between such models and linguistic and psycholinguistic theory. The hope is that through computational study, certain bounds can be established which may be brought to bear on pivotal issues in developmental psycholinguistics.</Paragraph>
    <Paragraph position="1"> Simulation research is a significant departure from standard learnability models that provide results through formal proof (e.g., Bertolo, 2001; Gold, 1967; Jain et al., 1999; Niyogi, 1998; Niyogi &amp; Berwick, 1996; Pinker, 1979; Wexler &amp; Culicover, 1980, among many others). Although research in learnability theory is valuable and ongoing, there are several disadvantages to formal modeling of language acquisition: * Certain proofs may involve impractically many steps for large language domains (e.g. those involving Markov methods).</Paragraph>
    <Paragraph position="2"> * Certain paradigms are too complex to readily lend themselves to deductive study (e.g. connectionist models).</Paragraph>
    <Paragraph position="3">  * Simulations provide data on intermediate stages whereas formal proofs typically prove whether a domain is (or more often is not) learnable a priori to specific trials.</Paragraph>
    <Paragraph position="4"> * Proofs generally require simplifying assumptions which are often distant from natural language. null However, simulation studies are not without disadvantages and limitations. Most notable perhaps, is that out of practicality, simulations are typically carried out on small, severely circumscribed domains - usually just large enough to allow the researcher to hone in on how a particular model (e.g. a connectionist network or a principles &amp; parameters learner) handles a few grammatical features (e.g. long-distance agreement and/or topicalization) often, though not always, in a single language. So although there have been many successful studies that demonstrate how one algorithm or another is able to acquire some aspect of grammatical structure, there is little doubt that the question of what mechanism children actually employ during the acquisition process is still open. This paper reports the development of a large, multilingual database of sentence patterns, gram- null Although see Niyogi, 1998 for some insight.</Paragraph>
    <Paragraph position="5"> mars and derivations that may be used to test computational models of syntax acquisition from widely divergent paradigms. The domain is generated from grammars that are linguistically motivated by current syntactic theory and the sentence patterns have been validated as psychologically/developmentally plausible by checking their frequency of occurrence in corpora of child-directed speech. We report here the structure of the domain, its interface and a case-study that demonstrates how the domain has been used to test the feasibility of several different acquisition strategies. null The domain is currently publicly available on the web via http://146.95.2.133 and it is our hope that it will prove to be a valuable resource for investigators interested in computational models of natural language acquisition.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML