File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/02/c02-1101_evalu.xml

Size: 1,884 bytes

Last Modified: 2025-10-06 13:58:48

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1101">
  <Title>Detecting Errors in Corpora Using Support Vector Machines</Title>
  <Section position="5" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
4 Discussion
</SectionTitle>
    <Paragraph position="0"> Compared to conventional probabilistic approaches for corpus error detection, although precise comparison is difficult, our approach achieved relatively high precision. Using a probabilistic approach, Murata et al. (2000) detected errors of morphemes in a corpus with a precision of 70!80%, and Eskin (2000) detected errors with a precision of 69%, but our approach achieved more than 80%. The probabilistic methods cannot handle infrequent events or compare events with similar probabilities, since the probabilities cannot be calculated or compared with enough confidence, but our method can handle such infrequent events.</Paragraph>
    <Paragraph position="1"> SVMs are similar to boosting, and our approach uses the weights attached by SVMs in a similar manner to what Abney et al.</Paragraph>
    <Paragraph position="2"> (1999) studied. However, we introduced a post-processing step to extract inconsistent similar  examples, and this improved the precision of detection and usability. Ma et al. (2001) studied corpus error detection by finding conflicting elements using min-max modular neural networks.</Paragraph>
    <Paragraph position="3"> Compared to their method, our method is useful in the point that the detected errors can be sorted by the attached weights, because human can check more likely elements first.</Paragraph>
    <Paragraph position="4"> In the experiment, our method had a high precision but a low recall. The value will be controlled by tuning the features for SVMs as well as the threshold value fi, and detecting more errors in a corpus remains as future work.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML