File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/01/j01-1003_evalu.xml

Size: 3,795 bytes

Last Modified: 2025-10-06 13:58:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="J01-1003">
  <Title>Machine Learning</Title>
  <Section position="7" start_page="81" end_page="83" type="evalu">
    <SectionTitle>
7. Performance Issues
</SectionTitle>
    <Paragraph position="0"> Generating a morphological analyzer once the descriptive data is given can be carried out very fast. Each paradigm can be processed within tens of seconds on a fast workstation, including the few tens of iterations of rule learning from the examples. A new version of the analyzer can be generated within minutes and tested rapidly on any test data. Thus, none of the processes described in this paper constitutes a bottleneck in the elicitation process. Figure 5 provides some relevant information from the runs of the first paradigm in Polish described above. The top graph shows, for different runs, the number of distinct rules generated from the aligned segmented form--surface-form pairs generated from the examples provided, using a rule format with at most five symbols in each of the left and right contexts. The bottom graph shows, for different runs, the total number of rules generated and generalized--again, with the same context size as above.</Paragraph>
    <Paragraph position="1"> There are a few interesting things about these graphs. As expected, when more examples are added, the number of rules and the number of iterations needed for convergence usually increases. All curves have a steeper initial segment and a steeper final segment. The steep initial segments result from the initial selection of rules that fix the largest number of &amp;quot;errors&amp;quot; between the segmented and surface forms. Once those rules are found, the curves flatten as a number of morphographemic rules are selected, each dealing with a very small number of errors. Finally, when all the morphographemic changes are accounted for, the segmentation rules kick in and each such rule fixes a large number of segmentation &amp;quot;errors,&amp;quot; so that a few general rules deal with all such cases.</Paragraph>
    <Paragraph position="2">  Oflazer, Nirenburg, and McShane Bootstrapping Morphological Analyzers Rules generated in each iteration of the learner in sequential runs ---deg---Run 1  We have presented the highlights of our approach for automatically generating finite-state morphological analyzers from information elicited from human informants. Our approach uses transformation-based learning to induce morphographemic rules from examples and combines these rules with the lexicon information elicited to compile the morphological analyzer. There are other opportunities for using machine learning in this process. For instance, one of the important issues in wholesale acquisition of  Computational Linguistics Volume 27, Number 1 open-class items is that of determining which paradigm a given citation form belongs to. From the examples given during the acquisition phase, it is possible to induce a classifier that can perform this selection to aid the language informant.</Paragraph>
    <Paragraph position="3"> We believe that we have presented a viable approach to the automatic generation of a natural language processor. Since this approach involves a human informant working in an elicit-generate-test loop, the noise and opaqueness of other induction schemes can be avoided.</Paragraph>
    <Paragraph position="4"> We also feel that the task of analyzing a set of incorrectly generated forms and automatically offering a diagnosis of what may have gone wrong and what additional examples can be supplied as remedies is, in itself, an important aspect of this work. Although we have only scratched the surface of this topic here, we consider it a fruitful extension of the work described in this paper.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML