File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/96/j96-4003_abstr.xml
Size: 9,913 bytes
Last Modified: 2025-10-06 13:48:39
<?xml version="1.0" standalone="yes"?> <Paper uid="J96-4003"> <Title>Learning Bias and Phonological-Rule Induction</Title> <Section position="2" start_page="0" end_page="499" type="abstr"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> A fundamental debate in the machine learning of language has been the role of prior knowledge in the learning process. Nativist models suggest that learning in a complex domain like natural language requires that the learning mechanism either have some previous knowledge about language, or some learning bias that helps direct the formation of correct generalizations. In linguistics, theories of such prior knowledge are referred to as Universal Grammar (UG); nativist linguistic models of learning assume, implicitly or explicitly, that some kind of prior knowledge that contributes to language learning is innate, a product of evolution. Despite sharing this assumption, nativist researchers disagree strongly about the exact constitution of this Universal Grammar. Many models, for example, assume that much of the prior knowledge that children bring to bear in learning language is not linguistic at all, but derives from constraints imposed by our general cognitive architecture. Others, such the influential Principles and Parameters model (Chomsky 1981), assert that what is innate is linguistic knowledge itself, and that the learning process consists mainly of searching for the values of a relatively small number of parameters. Such nativist models of phonological learning include, for example, Dresher and Kaye's (1990) model of the acquisition of stress-assignment rules, and Tesar and Smolensky's (1993) model of learning in Optimality Theory.</Paragraph> <Paragraph position="1"> Other scholars have argued that a purely nativist, parameterized learning algorithm is incapable of dealing with the noise, irregularity, and great variation of human language data, and that a more empiricist learning paradigm is possible. Such data-driven models include the stress acquisition models of Daelemans, Gillis, and Durieux (1994) (an application of Instance-based Learning \[Aha, Kibler, and Albert 1991\]) and Gupta and Touretzky (1994) (an application of Error Back-Propagation), as well as Ellison's (1992) Minimum-Description-Length-based model of the acquisition of the basic concepts of syllabicity and the sonority hierarchy. In each of these cases a general, domain-independent learning rule (BP, IBL, MDL) is used to learn directly from the data.</Paragraph> <Paragraph position="2"> In this paper we suggest that an alternative to the purely nativist or purely empiricist learning paradigms is to represent the prior knowledge of language as a set of abstract learning biases, which guide an empirical inductive learning algorithm. Such biases are implicit, for example, in the work of Riley (1991) and Withgott and Chen (1993), who induced decision trees to predict the realization of a phone in its context.</Paragraph> <Paragraph position="3"> By initializing the decision-tree inducer with a set of phonological features, they essentially gave it a priori knowledge about the kind of phonological generalizations that the system might be expected to learn.</Paragraph> <Paragraph position="4"> Our idea is that abstract biases from the domain of phonology, whether innate (i.e., part of UG) or merely learned prior to the learning of rules, can be used to guide a domain-independent empirical induction algorithm. We test this idea by examining the machine learning of simple Sound Pattern of English (SPE)-style phonological rules (Chomsky and Halle 1968), beginning by representing phonological rules as finite-state transducers that accept underlying forms as input and generate surface forms as output. Johnson (1972) first observed that traditional phonological rewrite rules can be expressed as regular (finite-state) relations if one accepts the constraint that no rule may reapply directly to its own output. This means that finite-state transducers (FSTs) can be used to represent phonological rules, greatly simplifying the problem of parsing the output of phonological rules in order to obtain the underlying, lexical forms (Koskenniemi 1983; Karttunen 1993; Pulman and Hepple 1993; Bird 1995; Bird and Ellison 1994). The fact that the weaker generative capacity of FSTs makes them easier to Gildea and Jurafsky Learning Bias and Phonological-Rule Induction learn than arbitrary context-sensitive rules has allowed the development of a number of learning algorithms including those for deterministic finite-state automata (FSAs) (Freund et al. 1993), deterministic transducers (Oncina, Garcia, and Vidal 1993), as well as nondeterministic (stochastic) FSAs (Stolcke and Omohundro 1993; Stolcke and Omohundro 1994; Ron, Singer, and Tishby 1994). Like the empiricist models discussed above, these algorithms are all general-purpose; none include any domain knowledge about phonology, or indeed natural language; at most they include a bias toward simpler models (like the MDL-inspired algorithms of Ellison \[1992\]).</Paragraph> <Paragraph position="5"> Our experiments were based on the OSTIA (Oncina, Garcia, and Vidal 1993) algorithm, which learns general subsequential finite-state transducers (SFSTs; formally defined in Section 2). We presented pairs of underlying and surface forms to OSTIA, and examined the resulting transducers. Although OSTIA is capable of learning arbitrary SFSTs in the limit, large dictionaries of actual English pronunciations did not give enough samples to correctly induce phonological rules.</Paragraph> <Paragraph position="6"> We then augmented OSTIA with three kinds of learning biases, which are specific to natural language phonology, and are assumed explicitly or implicitly by every theory of phonology: faithfulness (underlying segments tend to be realized similarly on the surface), community (similar segments behave similarly), and context (phonological rules need access to variables in their context). These biases are so fundamental to generative phonology that they are left implicit in many theories. But explicitly modifying the OSTIA algorithm with these biases allowed it to learn more compact, accurate, and general transducers, and our implementation successfully learns a number of rules from English and German. The algorithm is also successful in learning the composition of multiple rules applied in series. The more difficult problem of decomposing the learned underlying/surface correspondences into simple, individual rules remains unsolved.</Paragraph> <Paragraph position="7"> Our transducer induction algorithm is not intended as a cognitive model of human phonological learning. First, for reasons of simplicity, we base our model on simple segmental SPE-style rules; it is not clear what the formal correspondence is of these rules to the more recent theoretical machinery of phonology (e.g., optimality constraints). Second, we assume that a cognitive model of automaton induction would be more stochastic and hence more robust than the OSTIA algorithm underlying our work. 1 Rather, our model is intended to suggest the kind of biases that may be added to empiricist induction models to build a learning model for phonological rules that is cognitively and computationally plausible. Furthermore, our model is not necessarily nativist; these biases may be innate, but they may also be the product of some other earlier learning algorithm, as the results of Ellison (1992) and Brown et al. (1992) suggest (see Section 5.2). So our results suggest that assuming in the system some very general and fundamental properties of phonological knowledge (whether innate or previously learned) and learning others empirically may provide a basis for future learning models.</Paragraph> <Paragraph position="8"> Ellison (1994), for example, has shown how to map the optimality constraints of Prince and Smolensky (1993) to finite-state automata; given this result, models of 1 Although our assumption of the simultaneous presentation of surface and underlying forms to the learner may seem at first glance to be unnatural as well, it is quite compatible with certain theories of word-based morphology. For example, in the word-based morphology of Aronoff (1976), word-formation rules apply only to already existing words. Thus the underlying form for any morphological rule must be a word of the language. Even if this word-based morphology assumption holds only for a subset of the language (see e.g., Orgun \[1995\]) it is not unreasonable to assume that a part of the learning process will involve previously-identified underlying/surface pairs. Computational Linguistics Volume 22, Number 4 automaton induction enriched in the way we suggest may contribute to the current debate on optimality learning. This may obviate the need to build in every phonological constraint, as for example nativist models of OT learning suggest (Prince and Smolensky 1993; Tesar and Smolensky 1993; Tesar 1995). We hope in this way to begin to help assess the role of computational phonology in answering the general question of the necessity and nature of linguistic innateness in learning.</Paragraph> <Paragraph position="9"> The next sections (2 and 3) introduce the idea of representing phonological rules with transducers, and describe the OSTIA algorithm for inducing such transducers.</Paragraph> <Paragraph position="10"> Section 4 shows that the unaugmented OSTIA algorithm is unable to induce the correct transducer for the simple flapping rule of American English. Section 5 then describes each of the augmentations to OSTIA, based on the faithfulness, community, and context principles. We conclude with some observations about computational complexity and the inherent bias of the context-sensitive rewrite-rule formalism.</Paragraph> </Section> class="xml-element"></Paper>