File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-2167_intro.xml
Size: 2,720 bytes
Last Modified: 2025-10-06 14:06:40
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2167"> <Title>Machine Aided Error-Correction Environment for Korean Morphological Analysis and Part-of-Speech Tagging</Title> <Section position="3" start_page="0" end_page="1015" type="intro"> <SectionTitle> 2 Related Works </SectionTitle> <Paragraph position="0"> An automatic tagging is prone to errors that cannot be avoidable due to the lack of over-all linguistic information. To model the automatic error-detection process, the statistical approach of detecting tagging error has been developed (Foster, 1991). In this section, we will describe some approaches about rule-based error correction method for Korean partof-speech(hereafter, &quot;POS&quot;) tagging system.</Paragraph> <Section position="1" start_page="0" end_page="1015" type="sub_section"> <SectionTitle> 2.1 Transformation-Based Part-of-Speech Tagging System </SectionTitle> <Paragraph position="0"> (Lim et al., 1996) proposed tagging system that uses word-tag transformation rules dealing with agglutinative characteristics of Korean, and also extends the tagger by using specific transformation rule considering the lexical information of mistagged word.</Paragraph> <Paragraph position="1"> General training algorithm of the transformation rule (Brill, 1993) is as follows: 1. Train initial tagger on initial training corpus Co.</Paragraph> <Paragraph position="2"> 2. Make Confusion matrix with the result of comparing the current training corpus Ci (initially, i -- 0) and C~, the output of a manual annotation on Co.</Paragraph> <Paragraph position="3"> 3. Extract rules correcting the errors of Confusion matrix best.</Paragraph> <Paragraph position="4"> 4. Apply the extracted tagging rules to the training corpus Ci and generate improved version Ci+l.</Paragraph> <Paragraph position="5"> 5. Save the rule and increase i.</Paragraph> <Paragraph position="6"> 6. Repeat steps 2 to 5 until frequency of error correction, which is done by rules found in the previous step, is less than threshold.</Paragraph> </Section> <Section position="2" start_page="1015" end_page="1015" type="sub_section"> <SectionTitle> 2.2 Rule-based Error Correction </SectionTitle> <Paragraph position="0"> This method (Lee and Lee, 1996) is based on Eric Brill's tagging model (Brill, 1993).</Paragraph> <Paragraph position="1"> This tagging system is a hybrid system using both statistical training and rule-based training. Rule-based training is performed only on the statistical tagging errors. The rules are learned by comparing the correctly tagged corpus with the output of tagger. The training is leveraged to learn the error-correction rules.</Paragraph> </Section> </Section> class="xml-element"></Paper>