File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/98/p98-2240_evalu.xml

Size: 3,670 bytes

Last Modified: 2025-10-06 14:00:34

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-2240">
  <Title>Discovering Phonotactic Finite-State Automata by Genetic Search</Title>
  <Section position="6" start_page="1472" end_page="1473" type="evalu">
    <SectionTitle>
4 Results
</SectionTitle>
    <Paragraph position="0"> Toy Data Set The data used in the first set of tests was generated with the grammar S --, cA, A --* ClVlB , A ~ c2v2B, B &amp;quot;-'* c. Here, c abbreviates a set of 4 consonants, cl 2 sharped consonants, c2 2 non-sharped consonants, v 1 2 front vowels, and v2 2 non-front vowels. The grammar (generating a total of 128 strings) is a simple version of a phonotactic constraint in Russian, where non-sharped consonants cannot be followed by front vowels, (e.g. Halle (1971)).</Paragraph>
    <Paragraph position="1"> Tests were carried out for different-size randomly selected language subsets. Different combinations of weights for the three fitness criteria were tested. Table 1 gives results for the best weight combination averaged over 10 runs differing only in random population initialisation (and results averaged over all other weight combinations in brackets). The first column indicates how many successful runs there were out of 10 for the best weight combination (and the average for all weight combinations in brackets). A run was deemed successful if the target automaton was found before convergence. The last column shows how many accurate automata were in final populations, and in brackets how many of these also matched the target FSA in size.</Paragraph>
    <Paragraph position="2">  The general effects of reducing the size of D + are that successful runs become less likely, and that the weight assigned to the degree of over-generation becomes more important and tends to have to be increased. The larger the data set, the more similar the results that can be achieved with different weights.</Paragraph>
    <Paragraph position="3"> Russian Noun Data Set The data used in the second series of tests were bisyllabic feminine Russian nouns ending in -a. The alphabet consisted of 36 phonemic symbols 2. The training set contained 100 strings, and a related set of 100 strings was used as a test set.</Paragraph>
    <Paragraph position="4"> Results are shown in Table 2. The target degree of overgeneration was set to 100 times the size of the data set. Tests were carried out for different weights assigned to the overgeneration criterion. Results are given for the best automaton found in 10 runs for the best weight settings, and for the corresponding averages for all 10 runs.</Paragraph>
    <Paragraph position="5"> Figure 1 shows the fittest automaton from Table 1. Phonemes are grouped together (as label sets on arcs) in several linguistically useful ways. Vowels and consonants are separated completely. Vowels are separated into the set of those that can be preceded by sharped consonants (capitalised symbols) and those that cannot. Correspondingly, sharped consonants tend to be separated from nonsharped equivalents. The phonemes k, r, 1: are singled out (arc 4 ~ 5) because they combine only with nonsharped consonants to form stem-final clusters. The groupings S (0 ---* 6) and L,M,P,R</Paragraph>
    <Paragraph position="7"> These groupings are typical of the automata discovered in all 10 runs. Occasionally ka (the feminine diminutive ending) was singled out as a separate ending, and the stem vowels were frequently grouped together more efficiently.</Paragraph>
    <Paragraph position="8"> Different degrees of generalisaton were  achieved with different weight settings. The automaton shown here corresponds most closely to (Halle, 1971).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML