XML Viewer - p03-1064

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/p03-1064_intro.xml
Size: 4,283 bytes
Last Modified: 2025-10-06 14:01:48
<?xml version="1.0" standalone="yes"?>
<Paper uid="P03-1064">
  <Title>A SNoW based Supertagger with Application to NP Chunking</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Supertagging and NP Chunking
</SectionTitle>
    <Paragraph position="0"> In (Srinivas, 1997) trigram models were used for supertagging, in which Good-Turing discounting technique and Katz's back-off model were employed.</Paragraph>
    <Paragraph position="1"> The supertag for a word was determined by the lexical preference of the word, as well as by the contextual preference of the previous two supertags. The model was tested on WSJ section 20 of PTB, and trained on section 0 through 24 except section 20.</Paragraph>
    <Paragraph position="2"> The accuracy on the test data is a2a19a9a21a5a17a15a16a22a4a11 2. In (Srinivas, 1997), supertagging was used for NP chunking and it achieved an F-score of a2a4a3a6a5a8a7a23a11 . (Chen, 2001) reported a similar result with a tri-gram supertagger. In their approaches, they first supertagged the test data and then uesd heuristic rules to detect NP chunks. But it is hard to say whether it is the use of supertags or the heuristic rules that makes their system achieve the good results.</Paragraph>
    <Paragraph position="3"> As a first attempt, we use fast TBL (Ngai and Florian, 2001), a TBL program, to repeat Ramshaw and Marcus' experiment on the standard dataset. Then we use Srinivas' supertagger (Srinivas, 1997) to supertag both the training and test data. We run the fast TBL for the second round by using supertags instead of POS tags in the dataset. With POS tags we achieve an F-score of a2a4a3a6a5a14a13a24a9a12a11 , but with supertags we only achieve an F-score of a2a19a9a21a5a17a20a4a20a16a11 . This is not surprising becuase Srinivas' supertag was only trained with a trigram model. Although supertags are able to encode long distance dependence, supertaggers trained with local information in fact do not take full advantage of their strong capability. So we must use long distance dependencies to train supertaggers to take full advantage of the information in supertags.</Paragraph>
    <Paragraph position="4">  A few supertags were grouped into equivalence classes for evaluation null The trigram model often fails in capturing the co-occurrence dependence between a head word and its dependents. Consider the phrase &amp;quot;will join the board as a nonexecutive director&amp;quot;. The occurrence of join has influence on the lexical selection of as.</Paragraph>
    <Paragraph position="5"> But join is outside the window of trigram. (Srinivas, 1997) proposed a head trigram model in which the lexical selection of a word depended on the supertags of the previous two head words , instead of the supertags of the two words immediately leading the word of interest. But the performance of this model was worse than the traditional trigram model because it discarded local information.</Paragraph>
    <Paragraph position="6"> (Chen et al., 1999) combined the traditional tri-gram model and head trigram model in their trigram mixed model. In their model, context for the current word was determined by the supertag of the previous word and context for the previous word according to 6 manually defined rules. The mixed model achieved an accuracy of a2a19a9a21a5a25a22a21a2a16a11 on the same dataset as that of (Srinivas, 1997). In (Chen et al., 1999), three other models were proposed, but the mixed model achieved the highest accuracy. In addition, they combined all their models with pairwise voting, yielding an accuracy of a2a4a3a6a5a26a9a27a2a16a11 .</Paragraph>
    <Paragraph position="7"> The mixed trigram model achieves better results on supertagging because it can capture both local and long distance dependencies to some extent.</Paragraph>
    <Paragraph position="8"> However, we think that a better way to find useful context is to use machine learning techniques but not define the rules manually. One approach is to switch to models like PMM, which can not only take advantage of generative models with the Viterbi algorithm, but also utilize the information in a larger contexts through flexible feature sets. This is the basic idea guiding the design of our supertagger.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML