XML Viewer - h92-1023

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/h92-1023_intro.xml
Size: 2,135 bytes
Last Modified: 2025-10-06 14:05:18
<?xml version="1.0" standalone="yes"?>
<Paper uid="H92-1023">
  <Title>Decision Tree Models Applied to the Labeling of Text with Parts-of-Speech</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> In this paper we describe work which uses decision trees to estimate probabilities of words appearing with various parts-of-speech, given the context in which the words appear. In principle, this approach affords the optimal solution to the problem of predicting the correct sequence of parts-of-speech. In practice, the method is limited by the lack of large, hand-labeled training corpora, as well as by the difficulty inherent in constructing a set of questions to be used in the decision procedure. Nevertheless, decision trees provide a powerful mechanism for tackling the problem of modeling long-distance dependencies.</Paragraph>
    <Paragraph position="1"> The following sentence is typical of the difficulties facing a tagging program: The new energy policy announced in December by the Prime Minister will guarantee sufficien~ oil supplies at one price only.</Paragraph>
    <Paragraph position="2"> structed a complete set of binary questions to be asked of words, using a mutual information clustering procedure \[2\]. We then extracted a set of events from a 2-million word corpus of hand-labeled text. Using an algorithm similar to that described in \[1\], the set of contexts was divided into equivalence classes using a decision procedure which queried the binary questions, splitting the data based upon the principle of maximum mutual information between tags and questions. The resulting tree was then smoothed using the forward-backward algorithm \[6\] on a set of held-out events, and tested on a set of previously unseen sentences from the hand-labeled corpus.</Paragraph>
    <Paragraph position="3"> The results showed a modest improvement over the usual hidden Markov model approach. We present explanations and examples of the results obtained and suggest ideas for obtaining further improvements.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML