File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/92/h92-1023_evalu.xml
Size: 4,791 bytes
Last Modified: 2025-10-06 14:00:08
<?xml version="1.0" standalone="yes"?> <Paper uid="H92-1023"> <Title>Decision Tree Models Applied to the Labeling of Text with Parts-of-Speech</Title> <Section position="7" start_page="118" end_page="119" type="evalu"> <SectionTitle> 6. Experimental Results </SectionTitle> <Paragraph position="0"> In this section we report on two experiments in part-of-speech labeling using decision trees. In the first experiment, we created a model for tagging text using a portion of the Lancaster treebank. In the second experiment, we tagged a portion of the Brown corpus using a model derived from the University of Pennsylvania corpus of hand-corrected labeled text. In each case we compared the standard HMM model to a maximum entropy model of the form</Paragraph> <Paragraph position="2"> where the parameters P(tn \[ tn-l,tn-1) were obtained Using the usual HMM method, and the parameters P(tn \[ wn-2, w,~-l, wn, wn+l, wn+2) were obtained from a smoothed decision tree as described above. The trees were grown to have from 30,000 to 40,000 leaves.</Paragraph> <Paragraph position="3"> The relevant data of the experiments is tabulated in Tables 2 and 3. The word and tag vocabularies were derived from the data, as opposed to being obtained from on-line dictionaries or other sources. In the case of the Lancaster treebank, however, the original set of approximately 350 tags, many of which were special tags for idioms, was compressed to a set of 163 tags. A rough categorization of these parts-of-speech appears in Table 1.</Paragraph> <Paragraph position="4"> For training the model we had at our disposal approximately 1.9 million words of hand-labeled text. This corpus is approximately half AP newswire text and half English Hansard text, and was labeled by the team of Lancaster linguists. To construct our model, we divided the data into three sections, to be used for training, smooth- null We created an initial lexicon with the word-tag pairs that appear in the training, smoothing, and test portions of this data. We then filled out this lexicon using a statistical procedure which combines information from word spellings together with information derived from word bigram statistics in English text. This technique can be used both to discover parts-of-speech for words which do not occur in the hand-labeled text, as well as to discover additional parts-of-speech for those that do. In both experiments multiword expressions, such as &quot;nineteenthcentury&quot; and &quot;stream-of-consciousness,&quot; which were assigned a single tag in the hand-labelled text, were broken up into single words in the training text, with each word receiving no tag.</Paragraph> <Paragraph position="5"> The parameters of the HMM model were estimated from the training section of the hand-labeled text, without any use of the forward-backward algorithm. Subsequently, we used the smoothing section of the data to construct an interpolated model as described by Merialdo \[4, 6\].</Paragraph> <Paragraph position="6"> We evaluated the performance of the interpolated hidden Markov model by tagging the 2000 sentences which make up the testing portion of the data. We then compared the resultant tags with those produced by the Lancaster team, and found the error rate to be 3.03%.</Paragraph> <Paragraph position="7"> We then grew and smoothed a decision tree using the same division of training and smoothing data, and combined the resulting marginals for predicting tags from the word context with the marginals for predicting tags from the tag context derived from the HMM model. The resulting error rate was 2.61%, a 14% reduction from the In the case of the experiment with the UPenn corpus, the word vocabulary and dictionary were derived from the training and smoothing data only, and the dictionary was not statistically filled out. Thus, there were unknown words in the test data. The tag set used in the second experiment was comprised of the 48 tags chosen by the UPenn project. For training the model we had at our disposal approximately 4.4 million words of hand-labeled text, using approximately half the Brown corpus, with the remainder coming from the Wall Street Journal texts labelled by the UPenn team. For testing the model we used the remaining half of the Brown corpus, which was not used for any other purpose. To construct our model, we divided the data into a training section of 4,113,858 words, and a smoothing section of 292,731 words. The error rate on 8,000 sentences from the Brown corpus test set was found to be 4.57%. The corresponding error rate for the model using a decision tree grown only on the Brown corpus portion of the training data was 4.37%, representing only a 4.31% reduction in the error rate.</Paragraph> </Section> class="xml-element"></Paper>