File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/00/w00-1311_abstr.xml

Size: 9,264 bytes

Last Modified: 2025-10-06 13:41:53

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-1311">
  <Title>Detection of Language (Model) Errors</Title>
  <Section position="1" start_page="0" end_page="88" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> The bigram language models are popular, in much language processing applications, in both Indo-European and Asian languages.</Paragraph>
    <Paragraph position="1"> However, when the language model for Chinese is applied in a novel domain, the accuracy is reduced significantly, from 96% to 78% in our evaluation. We apply pattern recognition techniques (i.e. Bayesian, decision tree and neural network classifiers) to discover language model errors. We have examined 2 general types of features: model-based and language-specific features. In our evaluation, Bayesian classifiers produce the best recall performance of 80% but the precision is low (60%). Neural network produced good recall (75%) and precision (80%) but both Bayesian and Neural network have low skip ratio (65%). The decision tree classifier produced the best precision (81%) and skip ratio (76%) but its recall is the lowest (73%).</Paragraph>
    <Paragraph position="2"> Introduction Language models are important post-processing modules to improve recognition accuracy of a wide variety of input, namely speech recognition (Balh et al., 1983), handwritten recognition (Elliman and Lancaster, 1990) and printed character recognition (Sun, 1991), for many human languages. They can also be used for text correction (Ron et al., 1994) and part-of-speech tagging.</Paragraph>
    <Paragraph position="3"> For Indo-European languages, the word-bigram language model is used in speech recognition (Jelinek, 1989) and handwriting recognition (Nathan et al., 1995). Various ways to improve language models were reported. First, the model has been extended with longer dependencies (e.g. trigram) (Jelinek, 1991) and using non-contiguous dependencies, like trigger pairs (Rosenfeid, 1994) or long distance n-gram language models (Huang et al., 1993). For better probability estimation, the model was extended to work with (hidden) word classes (Brown et al., 1992, Ward and Issar, 1996). A more error-driven approach is the use of hybrid language models, in which some detection mechanism (e.g. perplexity measures \[Keene and O'Kane, 1996\] or topic detection \[Mahajan et al., 1999\]) selects or combines with a more appropriate language model.</Paragraph>
    <Paragraph position="4"> For Asian languages (e.g. Chinese, Japanese and Korean) represented by ideographic characters, language models are widely used in computer entry because these Asian languages have a large set of characters (in thousands) that the conventional keyboard is not designed for. Apart from using speech and handwriting recognition for computer entry, language models for Asian languages can be used for sentence-based keyboard input (e.g. Lochovsky and Chung, 1997), as well as detecting improper writing (e.g. dialectspecific words or expressions).</Paragraph>
    <Paragraph position="5"> Unlike Indo-European languages, words in these Asian languages are not delimited by space and conventional approximate string matching techniques (Wagner and Fisher, 1974; Oommen and Zhang, 1974) in handwriting recognition are seldom used in Asian language models. Instead, a widely used and reported Asian language model is the character-bigram language model (Jin et al., 1995; Xia et al., 1996) because it (1) achieved high recognition accuracy (around 90-96%) (2) is easy to estimate model parameters (3) can be processed quickly and (4) is relatively easy to implement.</Paragraph>
    <Paragraph position="6"> Improvement of these language models for Indo-European languages can be applied for the Asian languages but words need to be identified. For Asian languages, the model was integrated with syntactic rules (Chien, Chen and Lee, 1993).</Paragraph>
    <Paragraph position="7"> Class based language model (Lee and Tung, 1995) was also examined but the classes are based on semantically related words. A-new approach (Yang et al., 1998) is reported using segments  expressed by prefix and suffix trees but the comparison is based on perplexity measures, which may not correlate well with recognition improvement (Iyer et al., 1997).</Paragraph>
    <Paragraph position="8"> While attempts to improve the; (bigram) language models were (quite) successful, the high recognition accuracy (about 96%) is not adequate for professional data entry services, which typically require an error rate lower than 1 in 1,000. As part of the quality control exercises, these services estimate their error rate by sampling, and they identify and correct the errors manually to achieve the required quality. Faced with a large volume of text, the ability to automatically identify where the errors are is perhaps more important than automatically correcting errors, in post-editing because (1) manual correction is more reliable than automatic correction, (2) manual error sampling can be carried out and (3) more manual efforts are required in error identification than correction due to the large volume of text. For example, if the identification of errors is 97% and there are no errors in error correction, then the accuracy of the language model is improved from 96% to 99.9% after error correction.</Paragraph>
    <Paragraph position="9"> In typical applications, the accuracy of the bigram language model may not be as high as those reported in the literature because the data may be in a different genre than that of the training data. For evaluation, we tested a bigram language model with text from a novel domain and its accuracy dropped significantly from 96% to 78%, which is similar to English (Mahajan et al., 1999). Improvement in the robustness of the bigram language model across different genre is necessary and several approaches are available, based on detecting errors of the language model. One (adaptive) approach is to automatically identify the errors and manually correcting them. The information about the correction of errors is used to improve the bigram language model. For example, the bigram probabilities of the language model may be estimated and updated with the corrected data. In this way, future occurrences of these errors are reduced.</Paragraph>
    <Paragraph position="10"> Another (hybrid) approach uses another language model to correct the identified errors. This language model can be computationally more expensive than the bigram language model because it is applied only to the identified errors. Also, topic detection (Mahajan et al., 1999) and language model selection (Keene and O'Kane, 1996) can be applied to those area to find a more appropriate language model because usually topic-dependent words are those causing errors.</Paragraph>
    <Paragraph position="11"> Another (integrative) approach improves the language model accuracy using more sophisticated recognizers, instead of a complementary language model. The more sophisticated recognizer may give a set of different results that the bigram language model can re-apply on or this recognizer simply gives the recognized character. This integrates well with the coarse-fine recognition architecture proposed by Nagy (1988) back in the 1960s. Coarse recognition provides the candidates for the language model to select. Fine, expensive recognition is carried out only where the language models failed. Finally, it is possible to combine all the different approaches (i.e. adaptive, hybrid and integrative).</Paragraph>
    <Paragraph position="12"> Given the significance in detecting errors of language models, there is little work in this area. Perhaps, it was considered that these errors were random and therefore hard to detect. However, users can detect errors quickly. We suspect that some of these errors may be systematic due to the properties of the language model used or due to language specific properties.</Paragraph>
    <Paragraph position="13"> We adopt a pattern recognition app~'~z, ch to detecting errors of the bigram language rnoaei for the Chinese language. Each output is assigned to either the class of correct output or the class of errors. The assignment of a class to an output is based on a set of features. We explore a number of features to detect errors, which are classified into model-based features and language-specific features.</Paragraph>
    <Paragraph position="14"> The proposed approach can work with Indo-European languages at the word-bigram level. However, language-specific features have to be discovered for the particular language. In addition, this approach can be adopted for n-gram language models. In principal, the model-based features can be found or evaluated similar to the bigram language model. For example, if the trigram probability (instead of bigram probability) is low, then the likelihood of a language model error is high.</Paragraph>
    <Paragraph position="15"> This paper is organized as follows. Section 1 discusses various features and some preliminary evaluation of their suitability for error  identification. Section 2 describes 3 types of classifiers used. In section 3, our evaluation is reported. Finally, we conclude.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML