File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/w99-0607_intro.xml

Size: 2,324 bytes

Last Modified: 2025-10-06 14:06:57

<?xml version="1.0" standalone="yes"?>
<Paper uid="W99-0607">
  <Title>Applying Extrasentential Context To Maximum Entropy Based Tagging With A Large Semantic And Syntactic Tagset</Title>
  <Section position="3" start_page="0" end_page="46" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> It appears intuitively that information from earlier sentences in a document ought to help reduce uncertMnty as to a word's correct part-of-speech tag. This is especially so for a large semantic and syntactic tagset such as the roughly-3000-tag ATR General English Tagset (Black et al., 1996; Black et al., 1998). And in fact, (Black et al., 1998) demonstrate a significant &amp;quot;tag trigger-pair&amp;quot; effect. That is, given that certain &amp;quot;triggering&amp;quot; tags have already occurred in a document, the probability of occurrence of specific &amp;quot;triggered&amp;quot; tags is raised significantly--with respect to the unigram tag probability model. Table 1, taken from (Black et al., 1998), provides examples of the tag trigger-pair effect.</Paragraph>
    <Paragraph position="1"> Yet, it is one thing to show that extrasentential context yields a gain in information with respect to a unigram tag probability model.</Paragraph>
    <Paragraph position="2"> But it is another thing to demonstrate that extrasentential context supports an improvement in perplexity vis-a-vis a part-of-speech tagging model which employs state-of-the-art techniques: such as, for instance, the tagging model of a maximum entropy tag-n-gram-based tagger.</Paragraph>
    <Paragraph position="3"> The present paper undertakes just such a demonstration. Both the model underlying a standard tag-n-gram-based tagger, and the same model augmented with extrasentential contextual information, are trMned on the</Paragraph>
    <Section position="1" start_page="0" end_page="46" type="sub_section">
      <SectionTitle>
850,000-word ATR General English Treebank
</SectionTitle>
      <Paragraph position="0"> (Black et al., 1996), and then tested on the accompanying 53,000-word test treebank. Performance differences are measured, with the result that semantic information from previous sentences within a document is shown to help significantly in improving the perplexity of tagging</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML