File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/95/p95-1002_intro.xml

Size: 2,404 bytes

Last Modified: 2025-10-06 14:05:54

<?xml version="1.0" standalone="yes"?>
<Paper uid="P95-1002">
  <Title>Automatic Induction of Finite State Transducers for Simple Phonological Rules</Title>
  <Section position="3" start_page="0" end_page="9" type="intro">
    <SectionTitle>
2 The OSTIA Algorithm
</SectionTitle>
    <Paragraph position="0"> Our phonological-rule induction algorithm is based on augmenting the Onward Subsequential Transducer Inference Algorithm (OSTIA) of Oncina et al. (1993). This section outlines the OSTIA algorithm to provide background for the modifications that follow.</Paragraph>
    <Paragraph position="1"> OSTIA takes as input a training set of input-output pairs. The algorithm begins by constructing a tree transducer which covers all the training samples. The root of the tree is the transducer's initial state, and each leaf of the tree corresponds to the end of an input sample.</Paragraph>
    <Paragraph position="2"> The output symbols are placed as near the root of the tree as possible while avoiding conflicts in the output of a given arc. An example of the result of this initial tree construction is shown in Figure 2.</Paragraph>
    <Paragraph position="3"> At this point, the transducer covers all and only the strings of the training set. OSTIA now attempts to generalize the transducer, by merging some of its states together. For each pair of states (s, t) in the transducer, the algorithm will attempt to merge s with t, building a new</Paragraph>
    <Paragraph position="5"> ping: Labels on arcs are of the form (input symbol):(output symbol). Labels with no colon indicate the same input and output symbols. 'V' indicates any unstressed vowel, &amp;quot;v&amp;quot; any stressed vowel, 'dx' a flap, and 'C' any consonant other than 't', 'r' or 'dx'. '#' is the  far. However, when trying to learn phonological rules from linguistic data, the necessary training set may not be available. In particular, systematic phonological constraints such as syllable structure may rule out the necessary strings. The algorithm does not have the language bias which would allow it to avoid linguistically unnatural transducers.</Paragraph>
    <Paragraph position="7"> state with all of the incoming and outgoing transitions of s and f. The result of the first merging operation on the transducer of Figure 2 is shown in Figure 3, and the end result of the OSTIA alogrithm in shown in Figure 4.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML