File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-2208_metho.xml

Size: 8,233 bytes

Last Modified: 2025-10-06 14:15:03

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-2208">
  <Title>Tagging English by Path Voting Constraints</Title>
  <Section position="4" start_page="1278" end_page="1279" type="metho">
    <SectionTitle>
3 Results from Tagging English
</SectionTitle>
    <Paragraph position="0"> We evaluated our approach using l 1-fold cross validation on the Wall Street Journal Corpus and 10-fold cross validation on a portion of the Brown Corpus from the Penn Treebank CD.</Paragraph>
    <Paragraph position="1"> We used two classes of constraints: (i) we extracted a set of tag k-grams from a training corpus and used them as constraint rules with votes assigned as described below, and (ii) we hand-crafted a set rules mainly incorporating negative constraints (demoting impossible or unlikely situations), or lezicalized positive constraints. These were constructed by observing the failures of the statistical constraints on the training corpus.</Paragraph>
    <Paragraph position="2"> Rules derived from the training corpus For the statistical constraint rules, we extract tag k-grams from the tagged training corpus for k = 2, and k = 3. For each tag k-gram, we compute a vote which is essentially very similar to the rule strength used by Tzoukermann et al. (1995) except that we do not use their notion of genotypes exactly in the same way. Given a tag k-gram tl,t2,...tk, let n = count(t1 E Tags(wi),t2 E Tags(wi+l),...,tk E Tags(wi+k-1)) for all possible i's in the training corpus, be the number of possible places the tags sequence can possibly occur, footnoteTags(wi) is the set of tags associated with the token wi. Let f be the number of times the tag sequence tl,t2,...tk actually occurs in the tagged text, that is, f = count(tl,t~,...tk). We smooth fin by defining /+0.5 so that neither p nor 1 -p is zero. The P&amp;quot;- n+l uncertainty of p is then given as ~/p(1- p)/n (Tzoukermann et al., 1995). We then computed the vote for this k-gram as</Paragraph>
    <Paragraph position="4"> This formulation thus gives high votes to k-grams which are selected most of the time they are &amp;quot;selectable.&amp;quot; And, among the k-grams which are equally good (same f/n), those with a higher n (hence less uncertainty) are given higher votes.</Paragraph>
    <Paragraph position="5"> After extracting the k-grams as described above for k = 2 and k = 3, we ordered each group by decreasing votes and conducted an initim set of experiments to select a small group of constraints performing satisfactorily. We selected the first 200 (with highest votes) of the 2gram and the first 200 of the 3-gram constraints, as the set of statistical constraints. It should be noted that the constraints obtained this way are purely constraints on tag sequences and do not use any lexical or genotype information.</Paragraph>
    <Paragraph position="6"> Hand-crafted rules In addition to these statistical constraint rules, we introduced 824 hand-crafted constraint rules. Most of the hand-crafted constraints imposed negative constraints (with large negative votes) to rule out certain tag sequences that we encountered in the Wall Street Journal Corpus. Another set of rules were lexicahzed rules involving the tokens as well as the tags. A third set of rules for idiomatic constructs and collocations was also used. The votes for negative and positive hand-crafted constraints are selected to override any vote a statisticM constraint may have.</Paragraph>
    <Paragraph position="7"> Initial Votes To reflect the impact of lexical frequencies we initialize the totM vote of each path with the sum of the lexical votes for the token and tag combinations on it. These lexical votes for the parse ti,j of token wi are obtained from the training corpus in the usuM way, i.e., as count(wi,ti,j)/count(w~), and then are normahzed to between 0 and 100.</Paragraph>
    <Paragraph position="8"> Experiments on WSJ and Brown Corpora We tested our approach on two English Corpora  from the Penn Treebank CD. We divided a 5500 sentence portion of the Wall Street Journal Corpus into 11 different sets of training texts (with about 118,500 words on the average), and corresponding testing texts (with about 11,800 words on the average), and then tagged these texts using the statistical rules and hand-crafted constraints. The hand-crafted rules were obtained from only one of the training text portions, and not from all, but for each experiment the 400 statistical rules were obtained from the respective training set.</Paragraph>
    <Paragraph position="9"> We also performed a similar experiment with a portion of the Brown Corpus. We used 4000 sentences (about 100,000 words) with 10-fold cross validation. Again we extracted the statistical rules from the respective training sets, but the hand-crafted rules were the ones developed from the Wall Street Journal training set. For each case we measured the accuracy by counting the correctly disambiguated tokens. The manual rules used for Brown Corpus were the rules derived the from Wall Street Journal data. The results of these experiments are shown in Table  1.</Paragraph>
    <Paragraph position="10"> WSJ Brown Const. Tra. Test Tra. Test Set Acc. Acc. Acc. Acc.</Paragraph>
    <Paragraph position="11">  We feel that the results in the last row of Table 1 are quite satisfactory and warrant further extensive investigation. On the Wall Street Journal Corpus, our tagging approach is on par or even better than stochastic taggers making closed vocabulary assumption. Weischedel et al. (1993) report a 96.7% accuracy with 1,000,000 words of training corpus. The performance of  test set with some tokens left ambiguous our system with Brown corpus is very close to that of Brill's transformation-based tagger, which can reach 97.2% accuracy with closed vocabulary assumption and 96.5% accuracy with open vocabulary assumption with no ambiguity (Brill, 1995). Our tagging speed is also quite high. With over 1000 constraint rules (longest spanning 5 tokens) loaded, we can tag at about 1600 tokens/sec on a Ultrasparc 140, or a Pentium 200.</Paragraph>
    <Paragraph position="12"> It is also possible for our approach to allow for some ambiguity. In the procedure given earlier, in line 4.3, if one selects all (partial) paths whose accumulated vote is within p (0 &lt; p &lt;__ 1) of the (partial) path with the largest vote, then a certain amount of ambiguity can be introduced, at the expense of a slowdown in tagging speed and an increase in memory requirements.</Paragraph>
    <Paragraph position="13"> In such a case, instead of accuracy, one needs to use ambiguity, recall, and precision (Voutilainen, 1995a). Table 2 presents the recall, precision and ambiguity results from tagging .one of the Wall Street Journal test sets using the same set of constraints but with p ranging from 0.91 to 0.99. These compare quite favorably with the k-best results of Brill(1995), but reduction in tagging speed is quite noticeable, especially for lower p's. Any improvements in single tag per token tagging (by additional hand crafted constraints) will certainly be reflected to these results also.</Paragraph>
  </Section>
  <Section position="5" start_page="1279" end_page="1280" type="metho">
    <SectionTitle>
4 Conclusions
</SectionTitle>
    <Paragraph position="0"> We have presented an approach to constraint-based tagging that relies on constraint rules vot- null ing on sequences of tokens and tags. This approach can combine both statistically and manually derived constrMnts, and relieves the rule developer from worrying about rule ordering, as removal of tags is not immediately committed but only after all rules have a say. Using positive or negative votes, we can promote meaningful sequences of tags or collocations, or demote impossible sequences. Our approach is quite general and is applicable to any language. Our results from the Wall Street Journal Corpus indicate that with 400 statistically derived constraint rules and about 800 hand-crafted constraint rules, we can attain an average accuracy of 9Z89~ on the training corpus and an average accuracy of 9Z50~ on the testing corpus. Our future work involves extending to open vocabulary case and evaluating unknown word performance. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML