XML Viewer - w06-3203

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-3203_intro.xml
Size: 13,430 bytes
Last Modified: 2025-10-06 14:04:09
<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-3203">
  <Title>Learning Quantity Insensitive Stress Systems via Local Inference</Title>
  <Section position="2" start_page="0" end_page="24" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The central premise of this research is that phonotactic patterns are have properties which reflect properties of the learner. This paper illustrates this approach for quantity-insensitive (QI) stress systems (see below).</Paragraph>
    <Paragraph position="1"> I present an unsupervised batch learner that correctly learns every one of these languages. The learner succeeds because there is a universal prop-erty of QI stress systems which I refer to as neighborhood-distinctness (to be defined below).</Paragraph>
    <Paragraph position="2"> This property, which is a structural notion of locality, is used by the learning algorithm to successfully infer the target pattern from samples.</Paragraph>
    <Paragraph position="3"> A learner is a function from a set of observations to a grammar. An observation is some linguistic sign, in this case a word-sized sequence of stress values. A grammar is some device that must at least respond Yes or No when asked if a linguistic sign is a possible sign for this language (Chomsky and Halle, 1968; Halle, 1978).</Paragraph>
    <Paragraph position="4">  The remainder of the introduction outlines the typology of the QI stress systems, motivates representing phonotactics with regular languages, and examines properties of the attested patterns. In SS2, I define the class of neighborhood-distinct languages. The learning algorithm is presented in two stages. SS3 introduces a basic version of the learner the learner, which successfully acquires just under 90% of the target patterns. In SS4, one modification is made to this learner which consequently succeeds on all target patterns. SS5 discusses predictions made by these learning algorithms. The appendix summarizes the target patterns and results.</Paragraph>
    <Section position="1" start_page="1" end_page="21" type="sub_section">
      <SectionTitle>
1.1 Quantity-Insensitive Stress Systems
</SectionTitle>
      <Paragraph position="0"> Stress assignment in QI languages is indifferent to the weight of a syllable. For example, Latin is quantity-sensitive (QS) because stress assignment depends on the syllable type: if the penultimate syllable is heavy (i.e. has a long vowel or coda) then it receives stress, but otherwise the antepenult does.</Paragraph>
      <Paragraph position="1"> The stress systems under consideration here, unlike Latin, do not distinguish syllable types.</Paragraph>
      <Paragraph position="2">  In this respect, this work departs from (or is a special case of) gradient phonotactic models (Coleman and Pierrehumbert, 1997; Frisch et al., 2000; Albright, 2006; Hayes and Wilson, 2006)  There are 27 types of QI stress systems found in Gordon's (2002) typology. Gordon adds six plausibly attestable QI systems by considering the behavior of all-light-syllabled words from QS systems. These 33 patterns are divided into four kinds: single, dual, binary and ternary. Single systems have one stressed syllable per word, and dual systems up to two. Binary and ternary systems stress every second (binary) or third (ternary) syllable.</Paragraph>
      <Paragraph position="3"> The choice to study QI stress systems was made for three reasons. First, they are well studied and the typology is well established (Hayes, 1995; Gordon, 2002). Secondly, learning of stress systems has been approached before (Dresher and Kaye, 1990; Gupta and Touretzky, 1991; Goldsmith, 1994; Tesar, 1998) making it possible to compare learners and results.</Paragraph>
      <Paragraph position="4"> Third, these patterns have been analyzed with adjacency restrictions (e.g. no clash), as disharmony (e.g. a primary stress may not be followed by another), and with recurrence requirements (e.g. build trochaic feet iteratively from the left). Thus the patterns found in the QI stress systems are representative of other phonotactic domains that the learner should eventually be extended to.</Paragraph>
      <Paragraph position="5"> The 33 types are shown in Table 1. See Gordon (2002) and Hayes (1995) for details, examples, and original sources. Note that some patterns have a minimal word condition (Prince, 1980; Mc-Carthy and Prince, 1990; Hayes, 1995), banning either monosyllables or light monosyllables. For example, Cayuvava bans all monosyllables, whereas Hopi bans only light monosyllables. Because this paper addresses QI stress patterns I abstract away from the internal structure of the syllable. For convenience, when stress patterns are explicated in this paper I assume (stressed) monosyllables are permitted. The learning study, however, includes each stress pattern both with and without stressed monosyllables. Predictions our learner makes with respect to the minimal word condition are given in SS5.2.</Paragraph>
      <Paragraph position="6">  We use the (first) language name to exemplify the stress pattern. The number in parentheses is an index to the language Gordon's 2003 appendix. All stress representations follow Gordon's notation, who uses the metrical grid (Liberman and Prince, 1977; Prince, 1983). Thus, primary stress is indicated by 2, secondary stress by 1, and no stress by 0.</Paragraph>
    </Section>
    <Section position="2" start_page="21" end_page="22" type="sub_section">
      <SectionTitle>
1.2 Phonotactics as Regular Languages
</SectionTitle>
      <Paragraph position="0"> I represent phonotactic descriptions as regular sets, accepted by finite-state machines. A finite state machine is a 5-tuple (S,Q,q  ,F,d) where S is a finite alphabet, Q is a set of states, q  [?] Q is the start state, F [?] Q is a set of final states, and d is a set of transitions. Each transition has an origin and a terminus and is labeled with a symbol of the alphabet; i.e. a transition is a 3-tuple (o,a,t) where o,t [?] Q and a [?] S.</Paragraph>
      <Paragraph position="1"> Empirically, it has been observed that most phonological phenomena are regular (Johnson, 1972; Kaplan and Kay, 1981; Kaplan and Kay, 1994; Ellison, 1994; Eisner, 1997; Karttunen, 1998). This is especially true of phonotactics: reduplication and metathesis, which have higher complexity, are not phonotactic patterns as they involve alternations.  Formally, regular languages are widely studied in computer science, and their basic properties are well understood (Hopcroft et al., 2001). Also, a learning literature exists. E.g. the class of regular languages is not exactly identifiable in the limit (Gold, 1967), but certain subsets of it are (Angluin, 1980; Angluin, 1982). Thus it is becomes possible to ask: What subset of the regular languages delimits the class of possible human phonotactics and can properties of this class be exploited by a learner? This perspective also connects to finite state models of Optimality Theory (OT) (Prince and Smolensky, 1993). Riggle (2004) shows that if OT constraints are made finite-state, it is possible to build a transducer that takes any input to a grammatical output. Removing from this transducer the input labels and hidden structural symbols (such as foot boundaries) in the output labels yields a phonotactic acceptor for the language, a target for our learner. Consider Pintupi, #26 in Table 1, which exemplifies a binary stress pattern. Its phonotactic grammar is given in Figure 1. The hexagon indicates the start state, and final states are marked by the double perimeter.</Paragraph>
      <Paragraph position="2"> This machine accepts the Pintupi words, but not other words of the same length. Also, the Pintupi grammar accepts an infinite number of wordsjust like the grammars in Hayes (1995) and Gordon  See Albro (1998; 2005) for restricted extensions to regular languages.</Paragraph>
      <Paragraph position="3">  Single Systems 1. (1) Chitimacha 20000000 2000000 200000 20000 2000 200 20 2 2. (2) Lakota 02000000 0200000 020000 02000 0200 020 02 2 3. (3) Hopi (qs) 02000000 0200000 020000 02000 0200 020 20 2 4. (4) Macedonian 00000200 0000200 000200 00200 0200 200 20 2 5. (5) Nahuatl / Mohawk  (2002), who take the observed forms as instances of a pattern that extends to longer words. The learner's task is to take the Pintupi words in Table 1 and return the pattern represented by Figure 1.</Paragraph>
    </Section>
    <Section position="3" start_page="22" end_page="24" type="sub_section">
      <SectionTitle>
1.3 Properties of QI Stress Patterns
</SectionTitle>
      <Paragraph position="0"> The deterministic acceptor with the fewest states for a language is called the language's canonical acceptor. Therefore, let us ask what properties the canonical acceptors for the 33 stress types have in common that might be exploited by a learner.</Paragraph>
      <Paragraph position="1"> One property shared by all grammars except Estonian is that they have exactly one loop (Estonian has two). Though this restriction is nontrivial, it is insufficient for learning to be guaranteed.</Paragraph>
      <Paragraph position="2">  A second shared property is slenderness. A machine is slender iff it accepts only one word of length n. The only exceptions to this are Walmatjari and Estonian, which have free variation in longer words (see Table 1). I focus in this paper on another property which are shared by all machines without exception. In 29 of the canonical acceptors, each state can be uniquely identified by its incoming symbol set, its outgoing symbol set, and whether it is final or non-final. These items make up the neighborhood of a state, which will be formally defined in the next section.</Paragraph>
      <Paragraph position="3"> The other four stress systems have non-canonical acceptors wherein each state can also be uniquely identified by its neighborhood. This property I call neighborhood-distinctness. Thus, neighborhood-distinctness is a universal property of QI stress systems, and it is this property that the learner will exploit. null  The proof is similar to the one used to show the cofinite languages are not learnable (Osherson et al., 1986).  otherwise, I = {a|[?]o [?] Q, (o,a,q) [?] d}, and O = {a|[?]t [?] Q, (q,a,t) [?] d}  Thus the neighborhood of state can be determined by looking solely at whether or not it is final, the set of symbols labeling the transitions which reach that state, and the set of symbols labeling the transitions which depart that state. For example in Figure 2, states p and q have the same neighborhood because they are both nonfinal, can both be reached by some element of {a,b}, and because each state can only be exited by observing a member of {c,d}.</Paragraph>
      <Paragraph position="4">  Neighborhood-distinct acceptors are defined in (2).</Paragraph>
      <Paragraph position="5"> (2) An acceptor is said to be neighborhood-distinct iff no two states have the same neighborhood.</Paragraph>
      <Paragraph position="6"> This class of acceptors is finite: there are 2 2|S|+1 neighborhoods, i.e. types of states. Since each state in a neighborhood-distinct machine has a unique neighborhood, this becomes an upper bound on machine size.</Paragraph>
      <Paragraph position="7">  The notion of neighborhood can be generalized to neighborhoods of size k, where sets I and O are defined as the incoming and outgoing paths of length k. However, this paper is only concerned with neighborhoods of size 1.</Paragraph>
      <Paragraph position="8">  For some acceptor, the notion of neighborhood lends itself to an equivalence relation R</Paragraph>
      <Paragraph position="10"> q iff p and q have the same neighborhood. Therefore, R N partitions Q into blocks, and neighborhood-distinct machines are those where this partition equals the trivial partition.</Paragraph>
    </Section>
    <Section position="4" start_page="24" end_page="24" type="sub_section">
      <SectionTitle>
2.2 Neighborhood-Distinct Languages
</SectionTitle>
      <Paragraph position="0"> The class of neighborhood-distinct languages is defined in (3).</Paragraph>
      <Paragraph position="1"> (3) The neighborhood-distinct languages are those for which there is an acceptor which is neighborhood-distinct.</Paragraph>
      <Paragraph position="2"> The neighborhood-distinct languages are a (finite) proper subset of the regular languages over an alphabet S: all regular languages whose smallest acceptors have more than 2 2|S|+1 states cannot be neighborhood-distinct (since at least two states would have the same neighborhood).</Paragraph>
      <Paragraph position="3"> The canonically neighborhood-distinct languages are defined in (4).</Paragraph>
      <Paragraph position="4"> (4) The canonically neighborhood-distinct languages are those for which the canonical acceptor is neighborhood-distinct.</Paragraph>
      <Paragraph position="5"> The canonically neighborhood-distinct languages form a proper subset of the neighborhood-distinct languages. For example, the canonical acceptor shown in Figure 3 of Lower Sorbian (#9 in Table 1) is not neighborhood-distinct (states 2 and 3 have the same neighborhood). However, there is a non-canonical (because non-deterministic) neighborhood-distinct acceptor for this language, as shown in Figure 4.</Paragraph>
      <Paragraph position="6">  Neighborhood-distinctness is a universal property of the patterns under consideration. Additionally, it is a property which a learner can use to induce a grammar from surface forms.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML