XML Viewer - a00-1035

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/00/a00-1035_concl.xml
Size: 6,134 bytes
Last Modified: 2025-10-06 13:52:37
<?xml version="1.0" standalone="yes"?>
<Paper uid="A00-1035">
  <Title>Spelling and Grammar Correction for Danish in SCARRIE</Title>
  <Section position="7" start_page="258" end_page="260" type="concl">
    <SectionTitle>
5 Evaluation and Conclusion
</SectionTitle>
    <Paragraph position="0"> The project's access to a set of parallel unedited and proofread texts has made it possible to automate the evaluation of the system's linguistic functionality. A tool has been implemented to compare the results obtained by the system with the corrections suggested by the publisher's human proofreaders in order to derive measures telling us how well the system performed on recall (lexical coverage as well as coverage of errors), precision (percentage of correct flaggings), as well as suggestion adequacy (hits, misses and no suggestions offered). The reader is referred to (Paggio and Music, 1998) for more details on the evaluation methodology.</Paragraph>
    <Paragraph position="1"> The automatic procedure was used to evaluate the system during development, and in connection with the user validation. Testing was done on constructed test suites displaying examples of the errors targeted in the project and with text excerpts from the parallel corpora~ Figure 2 shows error recall and suggestion adequacy figures for the various error types represented in the test suites. These figures are very positive, especially with regard to the treatment of grammar errors. To make a comparison with a commercial product, the Danish version of the spelling and grammar checker provided by Microsoft Word does not flag any of the grammar errors.</Paragraph>
    <Paragraph position="2"> Figure 3 shows how the system performed on one of the test corpora. The corpus was assembled by mixing short excerpts containing relevant grammar errors and randomly chosen text.</Paragraph>
    <Paragraph position="3"> Since unlike test suites, the corpus also contains correct text, the figure this time also shows lexical coverage and precision figures. The corpus consists of 278 sentences, with an average length of 15.18 words per sentence. It may be surprising to see that it contains a limited number of errors, but it must be remembered that the texts targeted in the project are written by experienced journalists.</Paragraph>
    <Paragraph position="4"> The corpus was processed in 58 cpu-seconds on an HP 9000/B160. As expected, the system performs less well than on the test suites, and in general precision is clearly too low. However, we still consider these results encouraging given the relatively small resources the project has been able to spend on grammar development, and we  We regard error coverage as quite satisfactory for a research prototype. In a comparative test made on a similar (slightly smaller) corpus, SCARR/E obtained 58.1% error coverage, and Word 53.5%. To quote a figure from another recently published test (Martins et al., 1998), the ReGra system is reported to miss 48.1% real errors. It is worth noting that ReGra has much more extensive linguistic resources available than SCARRIE, i.e. a dictionary of 1.5 million words and a grammar of 600 production rules. Most of the errors not found by SCARRIE in the test have to do with punctuation and other stylistic matters not treated in the project. There are also, however, agreement errors which go unnoticed. These failures are due to one of two reasons: either that no parse has been produced for the sentence in question, or that the grammar has produced a wrong analysis. null The precision obtained is at least at first sight much too low. On the same test corpus, however, Word only reached 15.9% precision. On closer inspection, 72 of the bad flags produced by SCARRIE turned out to be due to unrecognised proper names. Disregarding those, precision goes up to 34.9%. As was mentioned early, SCARRIE has a facility for guessing unknown proper names on the basis of their frequency of occurrence in the text. But since the test corpus consists of Short unrelated excerpts, a large number of proper names only occur once or twice. To get an impression of how the system would perform in a situation where the same proper names and unknown words had a higher frequency of occurrence, we doubled the test corpus by simply repeating the same text twice. As expected, precision increased. The system produced 178 flags, 60 of which were correct (39.7%). This compares well with the 40% precision reported for instance for ReGra.</Paragraph>
    <Paragraph position="5"> In addition to the problem of unkown proper names, false flags are related to unrecognised acronyms and compounds (typically forms containing acronyms or dashes), and a not very precise treatment of capitalisation. Only 13 false flags are due to wrong grammar analyses caused either by the fragment approach or by the grammar's limited coverage. In particular, genitive phrases, which are not treated at the moment, are responsible for most of these false alarms.</Paragraph>
    <Paragraph position="6"> In conclusion, we consider the results obtained so far promising, and the problems revealed by the evaluation tractable within the current system design. In particular, future development should focus on treating stylistic matters such as capitalisation and punctuation which have not been in focus in the current prototype. The coverage of the grammar, in particular the treatment of genitive phrases, should also be further developed. The data pro- null vided by the evaluation reported on in this paper, however, are much too limited to base further development on. Therefore, more extensive testing and debugging should also be carried out.</Paragraph>
    <Paragraph position="7"> In addition, two aspects of the system that have only be touched on in this paper would be worth further attention: one is the mechanism for the treatment of split-ups and run-ons, which as mentioned earlier is not well-integrated at the moment; the other is the weight adjustment process, which is done manually at the moment, and for which the adoption of a semi-automatic tool could be considered.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML