File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/w99-0607_intro.xml
Size: 2,324 bytes
Last Modified: 2025-10-06 14:06:57
<?xml version="1.0" standalone="yes"?> <Paper uid="W99-0607"> <Title>Applying Extrasentential Context To Maximum Entropy Based Tagging With A Large Semantic And Syntactic Tagset</Title> <Section position="3" start_page="0" end_page="46" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> It appears intuitively that information from earlier sentences in a document ought to help reduce uncertMnty as to a word's correct part-of-speech tag. This is especially so for a large semantic and syntactic tagset such as the roughly-3000-tag ATR General English Tagset (Black et al., 1996; Black et al., 1998). And in fact, (Black et al., 1998) demonstrate a significant &quot;tag trigger-pair&quot; effect. That is, given that certain &quot;triggering&quot; tags have already occurred in a document, the probability of occurrence of specific &quot;triggered&quot; tags is raised significantly--with respect to the unigram tag probability model. Table 1, taken from (Black et al., 1998), provides examples of the tag trigger-pair effect.</Paragraph> <Paragraph position="1"> Yet, it is one thing to show that extrasentential context yields a gain in information with respect to a unigram tag probability model.</Paragraph> <Paragraph position="2"> But it is another thing to demonstrate that extrasentential context supports an improvement in perplexity vis-a-vis a part-of-speech tagging model which employs state-of-the-art techniques: such as, for instance, the tagging model of a maximum entropy tag-n-gram-based tagger.</Paragraph> <Paragraph position="3"> The present paper undertakes just such a demonstration. Both the model underlying a standard tag-n-gram-based tagger, and the same model augmented with extrasentential contextual information, are trMned on the</Paragraph> <Section position="1" start_page="0" end_page="46" type="sub_section"> <SectionTitle> 850,000-word ATR General English Treebank </SectionTitle> <Paragraph position="0"> (Black et al., 1996), and then tested on the accompanying 53,000-word test treebank. Performance differences are measured, with the result that semantic information from previous sentences within a document is shown to help significantly in improving the perplexity of tagging</Paragraph> </Section> </Section> class="xml-element"></Paper>