File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/96/j96-1003_abstr.xml
Size: 3,595 bytes
Last Modified: 2025-10-06 13:48:40
<?xml version="1.0" standalone="yes"?> <Paper uid="J96-1003"> <Title>Error-tolerant Finite-state Recognition with Applications to Morphological Analysis and Spelling Correction</Title> <Section position="2" start_page="0" end_page="0" type="abstr"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> Error-tolerant finite-state recognition enables the recognition of strings that deviate mildly from any string in the regular set recognized by the underlying finite-state recognizer. For example, suppose we have a recognizer for the regular set over {a, b} described by the regular expression (aba + bab)*, and we would like to recognize inputs that may be slightly corrupted, for example, abaaaba may be matched to abaaba (correcting for a spurious a), or babbb may be matched to babbab (correcting for a * Department of Computer Engineering and Information Science, Bilkent University, Ankara, TR-06533, Turkey @ 1996 Association for Computational Linguistics Computational Linguistics Volume 22, Number 1 deletion), or ababba may be matched to either abaaba (correcting a b to an a) or to ababab (correcting the reversal of the last two symbols). Error-tolerant recognition can be used in many applications that are based on finite-state recognition, such as morphological analysis, spelling correction, or even tagging with finite-state models (Voutilainen and Tapanainen 1993; Roche and Schabes 1995). The approach presented in this paper uses the finite-state recognizer built to recognize the regular set, but relies on a very efficiently controlled recognition algorithm based on depth-first searching of the state graph of the recognizer. In morphological analysis, misspelled input word forms can be corrected and morphologically analyzed concurrently. In the context of spelling correction, error-tolerant recognition can universally be applied to the generation of candidate correct forms for any language, provided it has a word list comprising all inflected forms, or its morphology has been fully described by automata such as two-level finite-state transducers (Karttunen and Beesley 1992; Karttunen, Kaplan, and Zaenen 1992). The algorithm for error-tolerant recognition is very fast and applicable to languages that have productive compounding, or agglutination, or both, as word formation processes.</Paragraph> <Paragraph position="1"> There have been a number of approaches to error-tolerant searching. Wu and Manber (1991) describe an algorithm for fast searching, allowing for errors. This algorithm (called agrep) relies on a very efficient pattern matching scheme whose steps can be implemented with arithmetic and logical operations. It is most efficient when the size of the pattern is limited to 32 to 64 symbols, though it allows for an arbitrary number of insertions, deletions, and substitutions. It is particularly suitable when the pattern is small and the sequence to be searched is large. Myers and Miller (1989) describe algorithms for approximate matching to regular expressions with arbitrary costs, but like the algorithm described in Wu and Manber, these are best suited to applications where the pattern or the regular expression is small and the sequence is large. Schneider, Lim, and Shoaff (1992) present a method for imperfect string recognition using fuzzy logic. Their method is for context-free grammars (hence, it can be applied to finite state recognition as well), but it relies on introducing new productions to allow for errors; this may increase the size of the grammar substantially.</Paragraph> </Section> class="xml-element"></Paper>