File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/i05-2033_metho.xml
Size: 9,685 bytes
Last Modified: 2025-10-06 14:09:36
<?xml version="1.0" standalone="yes"?> <Paper uid="I05-2033"> <Title>POS tagger combinations on Hungarian text</Title> <Section position="4" start_page="191" end_page="192" type="metho"> <SectionTitle> 2 The TBL tagger </SectionTitle> <Paragraph position="0"> Transformation Based Learning (TBL) was introduced by Brill (Brill, 1995) for the task of POS tagging. Brill's implementation consists of two processing steps. In the rst step, a lexical tagger calculates the POS tags based on lexical information only (word forms). The result of the lexical tagger is used as a rst guess in the second run where both the word forms and the actual POS tags are applied by the contextual tagger. Both lexical and contextual taggers make use of the TBL concept.</Paragraph> <Paragraph position="1"> During training, TBL performs a greedy search in a rule space in order to nd the rules that best improve the correctness of the cur- null rent tagging. The rule space contains rules that change the POS tag of some words according to their environments. From these rules, an ordered list is created. In the tagging phase, the rules on the rule list are applied one after another in the same order as the rule list. After the last rule is applied, the current tag sequence is returned as a result.</Paragraph> <Paragraph position="2"> For the Hungarian language, Megyesi applied this technique initially with moderate success. (Megyesi, 1998) The weak part of her rst implementation was the lexical module of the tagger, as described in (Megyesi, 1999). With the use of extended lexical templates, TBL produced a much better performance but still lagged behind the statistical taggers.</Paragraph> <Paragraph position="3"> We chose a di erent approach that is similar to (Kuba et al., 2004). The rst guess of the TBL tagger is the result of the baseline tagger. For the second run, the contextual tagger implementation we used is based on the fnTBL learner module. (Ngai and Florian, 2001) We used the standard parameter settings included in the fnTBL package.</Paragraph> <Section position="1" start_page="192" end_page="192" type="sub_section"> <SectionTitle> 2.1 Baseline Tagger </SectionTitle> <Paragraph position="0"> The baseline tagger relies on an external morphological analyzer1 to get the list of possible POS tags. If the word occurs in the training data, the word gets its most frequent POS tag in the training. If the word does not appear in the training, but representatives of its ambiguity class (words with the same possible POS tags) are present, then the most frequent tag of all these words will be selected. Otherwise, the word gets the rst tag from the list of possible POS tags.</Paragraph> <Paragraph position="1"> Some results produced by the baseline tagger and the improvements achieved by the TBL tagger are given in Table 2.</Paragraph> </Section> </Section> <Section position="5" start_page="192" end_page="194" type="metho"> <SectionTitle> 3 Classi er Combinations </SectionTitle> <Paragraph position="0"> The goal of designing pattern recognition systems is to achieve the best possible classi cation performance for the speci ed task. This objective traditionally led to the development the TBL tagger.</Paragraph> <Paragraph position="1"> of di erent classi cation schemes for recognition problems the user would like solved. Experiments shows that although one of the designs should yield the best performance, the sets of patterns misclassi ed by the di erent classi ers do not necessarily overlap. These observations motivated the relatively recent interest in combining classi ers. The main idea behind it is not to rely on the decision of a single classi er. Rather, all of the inducers or their subsets are employed for decision-making by combining their individual opinions to produce a nal decision.</Paragraph> <Section position="1" start_page="192" end_page="193" type="sub_section"> <SectionTitle> 3.1 Bagging </SectionTitle> <Paragraph position="0"> The Bagging (Bootstrap aggregating) algorithm (Breiman, 1996) applies majority voting (Sum Rule) to aggregate the classi ers generated by di erent bootstrap samples. A bootstrap sample is generated by uniformly sampling m instances from the training set with replacement. T bootstrap samples B1; B2; :::; BT are generated and a classi er</Paragraph> <Paragraph position="2"> For a given bootstrap sample, an instance in the training set will have a probability 1 (1 1=m)m of being selected at least once from the m instances that are randomly picked from the training set. For large m, this is about 1-1/e = 63.2%. This perturbation causes di erent classi ers to be built if the inducer is unstable (e.g. ANNs, decision trees) and the performance may improve if the induced classi ers are uncorrelated. However, Bagging can slightly degrade the performance of stable algorithms (e.g. kNN) since e ectively smaller training sets are used for training. null</Paragraph> </Section> <Section position="2" start_page="193" end_page="194" type="sub_section"> <SectionTitle> 3.2 Boosting </SectionTitle> <Paragraph position="0"> Boosting (Freund and Schapire, 1996) was introduced by Shapire as a method for improving the performance of a weak learning algorithm. AdaBoost changes the weights of the training instances provided as input for each inducer based on classi ers that were previously built. The nal decision is made using a weighted majority voting schema for each classi er, whose weights depend on the performance of the training set used to build it.</Paragraph> <Paragraph position="2"> The boosting algorithm requires a weak learning algorithm whose error is bounded by a constant strictly less than 1/2. In the case of multi-class classi cations this condition might be di cult to guarantee, and various techniques may need to be applied to get round this restriction.</Paragraph> <Paragraph position="3"> There is an important issue that relates to the construction of weak learners. At step t the weak learner is constructed based on the weighting dt. Basically, there are two approaches for taking this weighting into account. In the rst approach we assume that the learning algorithm can operate with reweighted examples. For instance, when the learner minimizes a cost function, one can construct a revised cost function which assigns weights to each of the examples. However, not all learners can be easily adapted to such an inclusion of the weights. The other approach is based on resampling the data with replacement. This approach is more general as it is applicable to all kinds of learners.</Paragraph> <Paragraph position="4"> 4 Boosted results for TBL TBL belongs to the group of learners that generates abstract information, i.e. only the class label of the source instance. Although it is possible to transcribe the output format to con dence type, this limitation degrades the range of the applicable combination schemes.</Paragraph> <Paragraph position="5"> Min, Max and Prod Rules cannot produce a competitive classi er ensemble, while Sum Rule and Borda Count are equivalent to majority voting. From the set of available boosting algorithms we may only apply those methods that do not require the modi cation of the learner.</Paragraph> <Paragraph position="6"> For the experiments we chose Bagging and a modi ed Adaboost.M1 algorithm as boosting. Since the learner is incapable of handling instance weights, individual training datasets were generated by bootstrapping (i.e.</Paragraph> <Paragraph position="7"> resamping with replacement). The original Adaboost.M1 algorithm requires that the weighted error should be below 50%. In this case the modi ed algorithm reinitializes the instance weights and goes on with the processing. null The Boosting algorithm is based on weighting the independent instances based on the classi cation error of the previously trained Boosting algorithm on the training and training datatsets ( dashed: Bagging, solid: Boosting). null learners. TBL operates on words, but words are not treated as independent instances.</Paragraph> <Paragraph position="8"> Their context, the position in the sentence, a ects the building of the classi er. Thus instead of the straightforward selection of words, the boosting method handles the sentences as instance samples. The classi cation error of the instances are calculated as the arithmetic mean of the classi cation errors of the words in the corresponding sentence.</Paragraph> <Paragraph position="9"> Despite this, the combined nal error is expressed as the relative number of the misclassi ed words.</Paragraph> <Paragraph position="10"> The training and testing datasets were chosen from the business news domain of the Szeged Corpus. The train database contained about 128,000 annotated words in 5700 sentences. The remaining part of the domain, the test database, has 13,300 words in 600 sentences.</Paragraph> <Paragraph position="11"> The results of training and testing error rates are shown in Fig.1. The classi cation error of the stand-alone TBL algorithm on the test dataset was 1.74%. Boosting is capable of decreasing it to below 1.31%, which means a 24.7% relative error reduction. As the graphs show, boosting achieves this during the rst 20 iterations, so further processing steps cannot make much di erence to the classi cation accuracy. It can also be seen that the training error does not converge to a zero-error level. This behavior is due to the fact that the learner cannot maintain the 50% weighted error limit condition. Bagging achieved only a moderate gain in accuracy, its relative error reduction rate being 18%.</Paragraph> </Section> </Section> class="xml-element"></Paper>