File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-2240_metho.xml
Size: 3,065 bytes
Last Modified: 2025-10-06 14:15:10
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2240"> <Title>Discovering Phonotactic Finite-State Automata by Genetic Search</Title> <Section position="5" start_page="0" end_page="1472" type="metho"> <SectionTitle> 3 Search Method </SectionTitle> <Paragraph position="0"> By direct analogy with natural evolution, genetic algorithms (GAs) work with a population of individuals each of which represents a candidate solution to the given problem. These individuals are assigned a fitness score and on its basis are selected to 'mate', and produce the next generation. This process is typically iterated until the population has converged, i.e. when individuals have reached a degree of similarity beyond which further improvement becomes impossible. Characteristics that form part of good solutions are passed on through the generations and begin to combine in the offspring to approach global optima, an effect that has been explained in terms of the building block hypothesis (Goldberg, 1989). Unlike other search methods, GAs sample different areas of the search space simultaneously, and are therefore able to escape local optima and to avoid areas of low fitness.</Paragraph> <Paragraph position="1"> The main issues in GA design are encoding the candidate solutions (individuals) as data structures for the GA to work on, defining a fitness .function that accurately expresses the goodness of candidate solutions, and designing genetic operators that combine and alter the genetic material of good solutions (parents) to produce new, better solutions (offspring).</Paragraph> <Paragraph position="2"> In the present GA 1, the state-transition matrices of FSAs are directly converted into genotypes. Mutation randomly adds or deletes one transition in each FSA and a variant of uniform crossover tends to preserve the general structure of the fitter parent, while adding some sub-structures from the weaker parent (offspring can be larger or smaller than both parents). Fitness is evaluated according to three fitness criteria. The first two follow directly from the task description: (1) size of S (smallness), and (2) ability to parse strings in D + (consistency), where ability to partially parse strings is also rewarded. Used on their own, however, these criteria lead search in the direction of universal automata that produce all strings x E I= Up to the length of the longest string in D +. To avoid this, (3) an overgeneration criterion is added that requires automata to achieve a given degree of generalisation, such that the size of L(A) is equal to the size of the target language (where the target language is not known, its size is es-</Paragraph> <Paragraph position="4"/> <Paragraph position="6"> timated). Transitions that are not required to parse any string in D + are eliminated. Fitness criteria 1-3 are weighted (reflecting their relative importance to fitness evaluation). These weights can be manipulated to directly affect the structural and functional properties of automata. null</Paragraph> </Section> class="xml-element"></Paper>