File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-1311_intro.xml
Size: 3,483 bytes
Last Modified: 2025-10-06 14:00:59
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-1311"> <Title>Detection of Language (Model) Errors</Title> <Section position="3" start_page="91" end_page="91" type="intro"> <SectionTitle> 2 Classifiers </SectionTitle> <Paragraph position="0"> One of the problems witlh using individual features is that the recall and precision are not very high, except the language-specific features. It is also difficult to set the threshold for detection because of the precision-recall trade-off. In addition, there may be some improvement in detection performance if features are combined for detection. Therefore, we adopt a pattern recognition approach to detect errors.</Paragraph> <Paragraph position="1"> Several classifiers are used to decide for error identification because we do not know whether particular features work well with particular classifiers, which make different assumptions about classification. Three types of classifers will be examined: Bayesian, decision tree and neural network.</Paragraph> <Section position="1" start_page="91" end_page="91" type="sub_section"> <SectionTitle> 2.1 Bayesian Classifier </SectionTitle> <Paragraph position="0"> The Bayesian classifier is simple to implement and is compatible with the model-based features.</Paragraph> <Paragraph position="1"> Given the feature vector x, the Bayesian detection scheme assigns the correct class wc and the error class we, using the following rule: g~(x) > ge(X) assign wc .Otherwise assign we where go(.) and ge(.) are: gc (x) = -(x -/~c)r y.c-. (x _/~,)- log l , I +2 log p(w c ) g, (x)= -(x-lee) r Z,-~(x-/~,) - log\] Z~ \] +2log p(w,) Pc and ,ue are the mean vectors of the class wc and we, respectively, ~ and ~ are the covariance matrices of the class wc and we, respectively, and 1-I is the determinant.</Paragraph> </Section> <Section position="2" start_page="91" end_page="91" type="sub_section"> <SectionTitle> 2.2 Decision Tree </SectionTitle> <Paragraph position="0"> Originally, we tried to use the support vector machine (SVM) (Vanpik, 1995) but it could not converge. Instead, we used the decision tree algorithm C4.5 by Quinlan (1993). Decision trees are known to produce good classification if clusters can be bounded by some hyper-rectilinear regions. We trained C4.5 with a set of feature vectors, described in Section 1.3.</Paragraph> </Section> <Section position="3" start_page="91" end_page="91" type="sub_section"> <SectionTitle> 2.3 Neural Network </SectionTitle> <Paragraph position="0"> We use the multi-layer perceptron (MLP) because it can perform non-linear classification. The MLP has 3 layers of nodes: input, hidden and output.</Paragraph> <Paragraph position="1"> Nodes in the input layer are fully connected with those in the hidden layer. Likewise nodes in the hidden layer are fully connected to the output layer. For our application, one input node corresponds to a feature in section 1.3. The value of the feature is the input value of the node. Two output nodes indicate whether the current character is correct or erroneous. The number of hidden nodes is 2-4, calculated according to (Fujita, 1998).</Paragraph> <Paragraph position="2"> The output of each node in the MLP is the weighted sum of its input, which is transformed by a sigmoidal function. Initially, the weights are assigned with small random numbers, which are adjusted by the gradient descend method with learning rate 0.05 and momentum 0.1.</Paragraph> </Section> </Section> class="xml-element"></Paper>