File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/p96-1010_intro.xml
Size: 1,439 bytes
Last Modified: 2025-10-06 14:06:06
<?xml version="1.0" standalone="yes"?> <Paper uid="P96-1010"> <Title>Combining Trigram-based and Feature-based Methods for Context-Sensitive Spelling Correction</Title> <Section position="3" start_page="71" end_page="72" type="intro"> <SectionTitle> 2 Methodology </SectionTitle> <Paragraph position="0"> Each method will be described in terms of its operation on a single confusion set C = {Wl,..., w,}; that is, we will say how the method disambiguates occurrences of words wl through wn. The methods handle multiple confusion sets by applying the same technique to each confusion set independently.</Paragraph> <Paragraph position="1"> Each method involves a training phase and a test phase. We trained each method on 80% (randomly selected) of the Brown corpus (Ku6era and Francis, 1967) and tested it on the remaining 20%. All methods were run on a collection of 18 confusion sets, which were largely taken from the list of &quot;Words Commonly Confused&quot; in the back of Random House (Flexner, 1983). The confusion sets were selected on the basis of being frequently-occurring in Brown, and representing a variety of types of errors, including homophone confusions (e.g., {peace, piece}) and grammatical mistakes (e.g., {among, between}). A few confusion sets not in Random House were added, representing typographical errors (e.g., {begin, being}). The confusion sets appear in Table 1.</Paragraph> </Section> class="xml-element"></Paper>