File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/97/w97-1011_evalu.xml

Size: 9,795 bytes

Last Modified: 2025-10-06 14:00:27

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-1011">
  <Title>Learning and Application of Differential Grammars</Title>
  <Section position="7" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
9 Results
</SectionTitle>
    <Paragraph position="0"> Table 1 presents results from an experiment using 100Meg of training text (TIPSTER, 1994) and three test texts of similar size but different character, in which Differential Grammars are trained and used to grammar check the test texts, which are also checked by two commercial systems. Our methodology is summarized generally in Fig. 1.</Paragraph>
    <Paragraph position="1"> We trained Differential Grammars for 78 confusion pairs using 161 eigentokens and a 95% significance level and tested the grammar checker at the default 75% likelihood threshold. Performance was comparable with that of the two commercial systems, but all three systems showed individual coverage characteristics. The confusion pair 'its/it's' was responsible for our poorer performance on the newsgroup corpus (SFB), but we demonstrated that  better). Our system could not resolve 'its/~t's' which was the most common error in SFB so the final -its column shows the results with these errors discounted. The three corpora were chosen to be as similar as possible, including one published computer-related work (THC), science fiction genre text written by a member of our team (SFK), and text of the same genre taken from a newgroup (SFB).</Paragraph>
    <Paragraph position="2"> Identify/model potential confusion pairs Build significantlDGs for them Ensure sufficient instances of pair Collect legal eigenunit environments Analyze contexts of size one to limit if not significant data abort if useful store and continue Scan and correct PSext sample For each potential  the statistics for this pair invalidated our assumption of ergodicity across the three different 12000 line test corpora used. Also our initial prototype could not distinguish 'a/an' correctly. Conversely, on supposedly correct published text, we found six errors which had been missed by human proofreaders and commercial systems alike. For the record, these errors consisted of four 'was/were', one 'affect/effect', and one 'are/our' substitution. The first two errors are clear syntactic errors where the semantics is essentiMly the same. The second is a very common phono-frequens where, as different parts of speech, resolution is again straightforward. Note that a 95% precision setting should have been sufficient to find them, but would have eliminated around 80% of the false errors. The most difficult of these errors to resolve is the 'was/were' error because of the higher likelihood of a parenthetic intervention, which also contributes to the problem with 'its/it's'.</Paragraph>
    <Paragraph position="3"> An example from (THC) demonstrating an unlikely usage of 'its', which requires a context of more than ten words to resolve, illustrates the problem of parenthetic intervention: Its specialty magazines, such as *Telephony,* *AT&amp;T Technical Journal,* *Telephone Engineer and Management,* are decades old; they make computer publications like *Macworld* and *PC Week* look like amateur j ohnny-come-latelies.</Paragraph>
    <Paragraph position="4"> Another factor which causes severe problems with 'its/it's' is the extreme sensitivity of its differential grammar to contexts. Even the raw counts illustrate this quite clearly, and a far more representative training corpus will be needed to resolve the question of whether an adequate differential grammar can be built for this case: see Table 2.</Paragraph>
    <Paragraph position="5"> The method we used to cope with the 'a/an' pair is simple and effective, but increases the number of additional affix classes from 13 to 26 as each is split according to whether it starts with a vowel or not.</Paragraph>
    <Paragraph position="6"> This increase the size of the eigenset to 174, but in addition we added 20 h-words and 2 y-words which take 'an', giving a total of 196 eigenunits. We illustrate what the eigenset now looks like in Table 3, where we present the top 15 eigenunits and their occurrence counts.</Paragraph>
    <Paragraph position="7"> The affix information in Table 3 is equivalent to the cross-product of 26 prefixes with 13 suffixes (counting the 0-morph) and would have tripled the number of classes required if we hadn't made the preclassification into consonant and vowel. This is relevant as we go on to consider how our affix information could be derived automatically.</Paragraph>
    <Paragraph position="8"> One of stated our aims was to seek to learn the syntactic information we use, but, in fact, we have used a set of 12 hand-chosen syntactically significant suffixes in the grammar checking discussed above, along with 150 words chosen on the basis of frequency, to which we have now added a pair of phonologically motivated features. We have therefore experimented with the automated discovery of an appropriate set of words and affixes.</Paragraph>
    <Paragraph position="9"> For this purpose we sought to derive a set of maximal Ngrams which were significant but were not part of any larger significant Ngrams. Allowing Ngrams of different sizes means we are double counting some strings, and it is thus usual to deduct from a given Ngram prefix the frequencies of all N+l-grams which it prefixes, and similarly for suffixes. Using these as significance measures, however, tends to lead to us picking up not only frequent words and affixes, but frequent phrases and all proper substrings of each of these. Furthermore the last character of a suffix may well be involved in many other words and suffixes and thus tends to appear more significant.</Paragraph>
    <Paragraph position="10">  that are not matched as words or with specific suffixes like 'C-s C-ed'. The corpus (RSV) was selected to be topically focussed and of convenient size (757523 words).</Paragraph>
    <Paragraph position="11"> We therefore used a related heuristic in which we required that a unit be significant in both contexts in order to be treated as significant, and achieved this by double discounting - subtracting counts for both prefixes and suffixes. Although this method was intended as only a rough ranking for examining the results, it did indeed provide more useful information than either of the more principled discounts or their maximum or sum, for which again frequent words were represented multiply. With our double discount, words which are almost always used as part of a bigger significant string will end up heavily negatively weighted, and thus the heuristic is likely to prefer to embed it in a larger string - see Table 4.</Paragraph>
    <Paragraph position="12">  sought to discover the eigenunits of RSV as discounted Ngrams. Significance threshold was set at the 99.99% level, and contexts were discounted by the frequency of any significant contexts which extended them to the right, to the left, or either.</Paragraph>
    <Paragraph position="13"> While our eigenwords and hand-selected suffixes tended to be proposed relatively quickly, it will be observed that many actually occurred as part of strings which crossed word boundaries. Moreover, without some segmentation information the technique is sensitive to the significance threshold, which has a direct influence on the length of the Ngrams proposed. Limiting to maximal space-bounded 'words' is however reasonable in this application, but since we need to include punctuation and numbers in our eigenclass, we do not to filter these out. The top 75 candidates then consist almost entirely of Unix eigenwords, plus corpus eigenwords 'god' and 'lord', some punctuation, some standard affixes, some combinations of punctuation and affixes, and some unexpected candidate affixes. In fact some of these candidates, '-e -es', are not at all unreasonable: '-es' is a variant of '-s' and both can fit in the same slot as '-ed'. But others, 'bo- ba- ne-', are harder to make sense of. The next 75 strings are similar with a higher proportion of affixes, both syntactic (6/12 now covered) and non-syntactic (20), as well as two unclassifiable sequences ('rai ob').</Paragraph>
    <Paragraph position="14"> Thus, it is clearly easy to obtain a fair approximation to our list of eigenunits, and the fact that 10 or 20% of them may not satisfy our syntactic expectations does not preclude them from being useful and will not necessarily worsen the results. For example, we note that our 24 prefixes handle resp. 33% and 20% of the 'a/an' cases covered by our 'V*' and 'C*' classes. As long as we are not overwhelmed by poor candidates, our eigenset will still be able to meet its goal.</Paragraph>
    <Paragraph position="15"> An automatically generated eigenset, of the same size as our original 172 eigenunit version, included 80 of our original eigenwords which covered 54% (the Unix 150 covers 60%) of the corpus, and included 7 of our original suffixes covering an additional 12% (our handpicked 12 cover 13%). On the other hand, it proved that one of our hand-selected suffixes was not very significant in the corpus ('-ic') and occurred only 92 times (the .0001 threshold sets significance at 75 occurences). The last of the other suffixes ('ble') to be proposed had rank 503, again because of Powers 94 Differential Grammars larger significant contexts '-able ble-', which caused it to be discounted as a suffix in its own right.</Paragraph>
    <Paragraph position="16"> Forcing a word-boundary between words and punctuation increases the rate at which eigenunits are found, as combinations of letters and punctations constitute the majority of the dross. Word-internal apostrophe (but not hyphen) is treated as a letter for this purpose.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML