File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/96/j96-1003_concl.xml
Size: 2,939 bytes
Last Modified: 2025-10-06 13:57:39
<?xml version="1.0" standalone="yes"?> <Paper uid="J96-1003"> <Title>Error-tolerant Finite-state Recognition with Applications to Morphological Analysis and Spelling Correction</Title> <Section position="4" start_page="86" end_page="87" type="concl"> <SectionTitle> 5. Conclusions </SectionTitle> <Paragraph position="0"> This paper has presented an algorithm for error-tolerant finite-state recognition that enables a finite-state recognizer to recognize strings that deviate mildly from some string in the underlying regular set. Results of its application to error-tolerant morphological analysis and candidate generation in spelling correction were also presented. The approach is very fast and applicable to any language with a list of root and inflected forms, or with a finite-state transducer recognizing or analyzing its word forms. It differs from previous error-tolerant finite-state recognition algorithms in that it uses a given finite-state machine, and is more suitable for applications where the number of patterns (or the finite-state machine) is large and the string to be matched is small.</Paragraph> <Paragraph position="1"> In some cases, however, the proposed approach may not be efficient and may be augmented with language-specific heuristics: For instance, in spelling correction, users (at least in Turkey, as indicated by our error model \[Oflazer and Gfizey 1994\]) usually replace non-ASCII characters with their nearest ASCII equivalents because of inconveniences such as nonstandard keyboards, or having to input the non-ASCII characters using a sequence of keystrokes. In the last spelling correction experiment for Turkish, almost all incorrect forms with an edit distance of 3 or more had three or more non-ASCII Turkish characters, all of which were rendered with the nearest ASCII version (e.g., ya~g~n~m~zde (on our birthday) was written as yasgunumuzde). These forms could surely be found with appropriate edit distance thresholds, but at the cost of generating many words containing more substantial errors. Under these circumstances, one may use language-specific heuristics first, before resorting to error-tolerant recognition, along the lines suggested by morphological-analysis-based approaches (Aduriz et al. 1993; Bowden and Kiraz 1995).</Paragraph> <Paragraph position="2"> Although the method described here does not handle erroneous cases where omission of space characters causes joining of otherwise correct forms (such as inspite of), such cases may be handled by augmenting the final state(s) of the recognizers with a transition for space characters and ignoring all but one of such space characters in the edit distance computation.</Paragraph> <Paragraph position="3"> implemented some of the algorithms. I would like to thank the anonymous reviewers for suggestions and comments that contributed to the improvement of the paper in many respects.</Paragraph> </Section> class="xml-element"></Paper>