File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/95/m95-1007_concl.xml
Size: 3,373 bytes
Last Modified: 2025-10-06 13:57:27
<?xml version="1.0" standalone="yes"?> <Paper uid="M95-1007"> <Title>UNIVERSITY OF DURHAM :DESCRIPTION OF THE LOLITA SYSTEM AS USED IN MUC-6 .</Title> <Section position="6" start_page="84" end_page="84" type="concl"> <SectionTitle> CONCLUSIONS </SectionTitle> <Paragraph position="0"> The LOLITA system was entered in MUC-6 mainly because of the importance of evaluation . There is n o clear methodology for evaluation in the NLP field ; however, a well-established and well-known event such a s MUC presents an excellent challenge and provides important resources for evaluation. We wanted to see i f our system could be adapted to perform the MUC tasks, and then to see how well it could do them . Clearly our general purpose `deep analysis ' approach to the tasks did not produce scores that compare well with th e best systems, however there are some general reasons why we believe this is the case . Firstly, the use of a system which aims to be general purpose rather than generic, means that it is not possible to start from a `clean slate' and populate the system with a set of rules ideally suited to just the MUC evaluation . Any modifications to rule bases in the system's core must be carried out with a careful view to their effect on al l aspects of the core. Given that the MUC tasks only test certain aspects of the core system, this means that much effort is expended on issues that will not affect the MUC performance . Secondly, the nature of the MUC-6 tasks is such that only a small percentage of the marks are available for `deep' analysis and so suc h an analysis is counter productive unless it achieves an extremely high level of robustness . We are working towards such a level of robustness, but our MUC-6 results make it clear that we are not there yet .</Paragraph> <Paragraph position="1"> As well as providing impetus to develop the core system, the experience has taught us much about testin g and evaluation . This will help in subsequent development . Code-wise, several major extensions have bee n added, and much existing code has been improved . Very little of this work is MUC-specific, so the amoun t of reuse is high. Evaluation-wise, we have a set of measures with which to evaluate at least some aspects o f our progress .</Paragraph> <Paragraph position="2"> We do not see the scores as a refutation of our approach . As is to be expected in a system of LOLITA 's size and complexity, we see the effects of several small bugs in the analysis which obscure the potential scores: witness our recent improvement in the walk-through article . It is clear that increasing robustness , for example by providing backup strategies when the main analysis fails, is a good idea . We also plan t o improve our parsing and grammar techniques . During development, we have seen several examples of goo d scores being obtained when the system works to its full potential, and we are much encouraged by it .</Paragraph> <Paragraph position="3"> In summary, we are pleased with our first participation in MUC . Not only have we successfully implemented all four tasks on our first attempt at MUC, but we managed to produce a deep analysis of a goo d part of the text in the formal evaluation set . Despite the hard work, MUC has been an extremely useful an d enjoyable experience, and we look forward to MUC-7 .</Paragraph> </Section> class="xml-element"></Paper>