File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/94/j94-3007_abstr.xml

Size: 12,276 bytes

Last Modified: 2025-10-06 13:48:16

<?xml version="1.0" standalone="yes"?>
<Paper uid="J94-3007">
  <Title>The Acquisition of Stress: A Data-Oriented Approach</Title>
  <Section position="2" start_page="0" end_page="423" type="abstr">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
1.1 Metrical Phenomena and Theory
</SectionTitle>
      <Paragraph position="0"> Machine learning of metrical phenomena is an interesting domain for exploring the potential of particular machine learning techniques. First of all, the assignment of stress in monomorphemic words, the subject of this paper, has been fairly well studied in metrical phonology. Within this framework, the stress patterns of numerous languages have been described in considerable detail. Thus, a solid theoretical framework as well as elaborate descriptions of the linguistic data are available. Moreover, learning metrical phenomena has been cast in terms of the Principles and Parameters approach (Chomsky 1981), which provides both the basic parameters along which possible stress systems may vary, and makes strong claims about the allegedly innate knowledge of the natural language learner.</Paragraph>
      <Paragraph position="1">  * Institute for Language Technology and AI (ITK), Tilburg University, P.O. Box 90153, 5000 LE Tilburg, The Netherlands. E-mail: walter.daelemans@kub.nl t Center for Dutch Language and Speech, University of Antwerp, UIA, Universiteitsplein 1, 2610 Wilrijk, Belgium. E-mail: steven.gillis@uia.ac.be :~ Center for Dutch Language and Speech, University of Antwerp, UIA, Universiteitsplein 1, 2610 Wilrijk, Belgium. E-mail: gert.durieux@uia.ac.be (c) 1994 Association for Computational Linguistics  Computational Linguistics Volume 20, Number 3 Secondly, the domain of metrical phenomena can be studied as a (relatively) independent problem domain (unlike other domains such as, for instance, linguistic pragmatics, that typically have multiple dependencies with other domains like syntactic and/or semantic phenomena).</Paragraph>
      <Paragraph position="2"> Thirdly, metrical phenomena exhibit a number of interesting characteristics that make them well suited for testing the capacity of machine learning algorithms to generalize as well as handle irregularities. On the one hand, stress assignment appears to be governed by a number of solid generalizations. For instance, we found that in a lexicon of 4868 Dutch polysyllabic monomorphematic words (for details see Section 2.1), approximately 80% are regular according to a generally accepted metrical analysis (Trommelen and Zonneveld 1989, 1990). The remaining 20% have to be dealt with in terms of idiosyncratic marking (such as, for instance, exception features or simply a marking of the irregular pattern in the lexicon). On the other hand, the domain exhibits a large number of local ambiguities, or, in other words, it can be said to be noisy. For instance, of the items in the aforementioned lexicon, a metrical encoding (using syllable weights--see below) was performed and it revealed that only 44 of the 89 attested combinations of syllable weights were unambiguous with respect to stress assignment.</Paragraph>
      <Paragraph position="3"> In sum, it can readily be seen that the microcosm of metrical phonology is endowed with generalizations as well as irregularities, a phenomenon characteristic of the macrocosm of the linguistic system in general.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="423" type="sub_section">
      <SectionTitle>
1.2 Machine Learning of Metrical Phenomena
</SectionTitle>
      <Paragraph position="0"> Recently, computational learning models that specifically address the problem of learning the regularities of stress assignment have been proposed. These include Gupta and Touretzky (1994), Dresher and Kaye (1990), Dresher (1992), and Nyberg (1991). We will briefly review these models in this section.</Paragraph>
      <Paragraph position="1"> Dresher and Kaye (1990) and Nyberg (1991) approach the learning problem from the angle of the Principles and Parameters framework (Chomsky 1981), and they explicitly incorporate the constructs of that theory into their models. It is assumed in this approach that the learner comes to the task of language learning equipped with a priori knowledge incorporated in a universal grammar that constrains him or her to entertain only useful generalizations. More specifically, the a priori knowledge consists of a finite set of parameters, the values of which have to be fixed by the learner. Starting from a finite set of parameters, each with a finite set of possible values, the number of possible grammars that can be developed by the learner is restricted to a finite set.</Paragraph>
      <Paragraph position="2"> Computational models such as Dresher and Kaye's (1990) add a learning theory to the (linguistic) notion of universal grammar. This learning theory specifies which aspects of the input data are relevant to each parameter, and it determines how the data processed by the learner are to be used to set the values of the parameters.</Paragraph>
      <Paragraph position="3"> Eventually, the learner will be able to stress input words, and in doing so will build metrical structures and perform the structure-sensitive operations defined by metrical theory.</Paragraph>
      <Paragraph position="4"> Gupta and Touretzky (1994) tackle the problem of learning linguistic stress from a different angle: a simple two-layer perceptron is used as the learning device. In their perceptron model there is no explicit representation of the notion of parameter or the process of parameter setting in any sense. Their system does not aim at setting the correct values of parameters given a learning theory especially designed to do so: &amp;quot;the learning theory employed consists of one of the general learning algorithms common in connectionist modelling.&amp;quot; (p. 4) Moreover, their system does not build metrical  Walter Daelemans, Gert Durieux, and Steven Gillis The Acquisition of Stress representations in the sense proposed in metrical theory when determining the stress pattern of a particular word. Thus, learning in the perceptron is not related in any obvious way to setting the values of parameters that specify the precise geometry of metrical trees. Nor is producing the stress pattern of a particular word related in any obvious way to the construction of a metrical tree and to structure-sensitive metrical operations.</Paragraph>
      <Paragraph position="5"> The learning material for Gupta and Touretzky's perceptron consists of the stress patterns of 19 languages. 1 It appears that the learning times for the stress patterns vary according to several dimensions: they describe six dimensions that act as determinants of learnability. For instance, it will take longer for the perceptron to learn the stress pattern of a language that incorporates the factor 'inconsistent primary stress' than to learn a language that does not show that feature. These factors or--so to speak--'parameters' do not coincide with the parameters proposed in metrical theory.</Paragraph>
      <Paragraph position="6"> However, it is pointed out that there is a close correspondence between ease of learning in the perceptron (as measured by learning times) and some of the markedness and (un)learnability predictions of metrical theory.</Paragraph>
      <Paragraph position="7"> The simulations of Gupta and Touretzky show that data-oriented acquisition of stress assignment is possible. Moreover, in observing the perceptron learn stress systems, a number of factors are discovered that appear to determine the learning process. This account of the behavior of the model is termed a 'pseudo-linguistic' theory, and some interesting parallels with metrical phonology are drawn. The crucial point is, however, that the perceptron is not equipped with a priori knowledge about the domain, nor with a specifically designed learning theory.</Paragraph>
      <Paragraph position="8"> There are some drawbacks to the simulations presented by both Dresher and Kaye and Gupta and Touretzky. One of the main objections is that they use highly simplified versions of the linguistic data, i.e. small samples encoded using syllable weight only, and without attention to irregularities. Such highly stylized characterizations of stress systems may well capture the core of a language system, but a processing model that aims at learning the stress system of a language should go further. It should also deal with the noise in the actual linguistic data, the irregularities, and the plain exceptions. Gupta and Touretzky (1994:27) appear to be aware of this limitation in their approach: &amp;quot;It could be argued that a theoretical account is a descriptive formalism, which serves to organize the phenomena by abstracting away from the exceptions in order to reveal an underlying regularity, and that it is therefore a virtue rather than a failing of the theoretical analysis that it ignores &amp;quot;performance&amp;quot; considerations. However, it becomes difficult to maintain this with respect to a processing model that uses the descriptive formalism as its basis: the processing or learning account still has to deal with actual data and actual performance phenomena.&amp;quot; The research reported in this paper aims at exploring the potential of a learning algorithm that shares the data-oriented (empiricist) mode of learning with the perceptron used in the simulation experiments discussed above, instead of the nativist approach exemplified by the research of Dresher and Kaye (1990). The learning material consists of a lexicon that contains a substantial amount of the attested monomorphemic multisyllabic words of Dutch (see Section 2.1). In this learning material, the details 1 These are the stress patterns of the languages also used by Dresher and Kaye (1990). They represent a selection of the possible stress systems along a variety of metrical dimensions.</Paragraph>
      <Paragraph position="9">  Computational Linguistics Volume 20, Number 3 of the stress system are not simplified to arrive at a regularized description of the system. Instead, it actually contains the patterns we may expect a language learner to be confronted with.</Paragraph>
      <Paragraph position="10"> First, we show that a data-driven alternative to the Principles and Parameters approach is feasible, given a set of examples of a language, in this case Dutch. It is shown that (i) the major generalizations governing main stress assignment can be acquired as well as the major classes of subregularities; and (ii) that the kind of a priori knowledge assumed in the Principles and Parameters approach appears to be unnecessary, even to the extent that the less 'theoretical bias' encoded in the input, the better the learning results are. More specifically, experimental results unequivocally indicate that a phonemic input encoding yields superior results to an encoding in which only the phonological notion of syllable weight is represented.</Paragraph>
      <Paragraph position="11"> Secondly, the correspondences of our learning results with metrical theory will be studied: the results of the simulations reveal interesting correlations between learnability by the artificial learner, and markedness in a metrical framework.</Paragraph>
      <Paragraph position="12"> Finally, the algorithm's own classification of the test words is analyzed. The algorithm discovers subregularities in the data that are not expressible in metrical terms. Instead it uses the phonemic material presented to form subcategories that act as homogeneous classes with respect to stress assignment. This finding suggests that metrical theory could benefit from proceeding to incorporate segmental information in order to arrive at a more complete description of the data.</Paragraph>
      <Paragraph position="13"> The remainder of the paper will be organized as follows: we will first present the most relevant facts about and a metrical analysis of the stress system of Dutch. Next the artificial learning algorithm will be introduced, followed by a discussion of the experimental results.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML