File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/95/m95-1011_concl.xml
Size: 6,382 bytes
Last Modified: 2025-10-06 13:57:26
<?xml version="1.0" standalone="yes"?> <Paper uid="M95-1011"> <Title>DESCRIPTION OF THE UMASS SYSTEM AS USED FOR MUC- 6</Title> <Section position="8" start_page="439" end_page="439" type="concl"> <SectionTitle> CONCLUSIONS </SectionTitle> <Paragraph position="0"> In the time that has passed since MUC-5 we have been exploring trainable information extraction technologie s in a number of different application areas [1, 7] . We have rewritten all of our software in order to enhance cross platform portability . In addition, we have exploited tagged text documents of the type used in MUC-6 . Our experiences with previous MUC evaluations gave us a clear understanding of the hurdles that lie ahead . We made progress in some areas and ignored others completely .</Paragraph> <Paragraph position="1"> Over the years we have come to appreciate the significant difficulties of software evaluation and the uniqu e problems associated with language processing evaluations . We have pondered the lessons of previous MUCs an d worried about the wisdom of a research agenda dominated by score reports. Do performance evaluations succeed in bringing theoretical ideas closer to real-world systems? Or do they inadvertently create a wedge between basi c research and applied research? These big questions are easier to ask than answer, but we have always found the MUC evaluations to b e valuable for our own system development efforts . There are always lessons to be learned and a few genuinely new ideas to ponder as well. Two of this year's lessons are painfully obvious to us . Most notably, it pays to engineer the best possible string specialists . If you fail to recognize half of the organizations in a test set, it will be difficul t to do well on extraction tasks involving organizations. It is also a mistake to rely on a trainable system componen t that is not given enough training . We learned that 100 documents do not provide enough training for our system .</Paragraph> <Paragraph position="2"> Other lessons are more subtle and were not immediately obvious (at least to us) . Most notably, the TE task fo r MUC-6 could be effectively tackled without the benefit of a sentence analyzer . Noun phrase analysis was very important, and coreference resolution played a role, but we saw no benefit from CRYSTAL's dictionary for TE . If the CN definitions had been operating at a finer level of granularity, we might have been able to acquire usefu l extraction rules for noun phrase analysis. Although our current version of CRYSTAL does not operate at this level , we are currently developing a version of CRYSTAL to learn finer-grained CN definitions .</Paragraph> <Paragraph position="3"> BADGER's CN output was put to better use in ST where WRAP-UP used CN patterns in order to induc e relations between entities . Unfortunately, there was not enough training to produce effective decision trees, s o WRAP-UP didn't exactly get a fair trial here . We also noticed that CRYSTAL's dictionary was too sparse to cove r all the useful morphological variants for important verbs and verb phrases . In previous MUC evaluations, wher e 1,000 or more training documents were provided, our dictionary construction tool picked up the more important morphological variants because the training corpus contained examples of them . With only 100 ST trainin g documents this year, our dictionary was missing important verb constructions. For example, the training corpus contained a past tense instance for a specific verb but no present progressive instance . It appears that 100 trainin g texts are not enough for CRYSTAL's dictionary induction algorithm . Or at the very least, a CRYSTAL dictionary based on so little training should be strengthened with recognition routines for morphological variants . These problems do not reflect any essential weakness on the part of CRYSTAL's learning algorithm : they merely illustrate the importance of adequate training materials for machine learning algorithms .</Paragraph> <Paragraph position="4"> In spite of these difficulties, we are quite pleased with the portability of our trainable system components . The one-month time frame for ST was not a problem for us as far as CRYSTAL, WRAP-UP, and RESOLVE were concerned. CRYSTAL and WRAP-UP could have trained on 1000 texts as easily as 100, and RESOLVE would hav e been in the same boat if it had been able to train from the MUC-annotated documents directly .</Paragraph> <Paragraph position="5"> The manual text annotations required for RESOLVE provide us with our final observation about MUC-6 an d our MUC-6 system . When a trainable system relies on annotated texts for training, some annotations are mor e useful than others. Because RESOLVE was designed without any concern for the task definition of the MUC-6 C O task, the training annotations we developed for RESOLVE were not deducible from the annotation system used i n MUC-6. As other trainable text processing technologies emerge and develop independent from MUC-6, it may b e impossible to create an annotation system that is equally accommodating to all . The inevitable politics of such a situation will be difficult to mediate unless all sites agree to follow the lead of MUC annotation conventions fo r their own internal development efforts . Although this would ease the problem of diverging systems, it might als o suppress imaginative new perspectives on the coreference problem . This conundrum has all the earmarks of a no-wi n situation.</Paragraph> <Paragraph position="6"> In summary, our greatest concern after MUC-6 is that the preparation of adequate training corpora may be to o expensive or too labor-intensive to enable a fair evaluation of trainable text processing technologies . This will either drive those technologies &quot;underground&quot; (at least with rrespect to MUC meetings), or it may discourage a whol e line of research which we feel holds great promise . In our experience, it seems clear that annotated text document s are much less difficult to generate than the key templates used in previous MUC evaluations . It therefore seems ironic to find such a small collection of training documents available to MUC-6 participants this year . We hope that this decision to minimize the training corpus can be reconsidered for future evaluations .</Paragraph> </Section> class="xml-element"></Paper>