File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/98/m98-1024_abstr.xml
Size: 4,681 bytes
Last Modified: 2025-10-06 13:49:16
<?xml version="1.0" standalone="yes"?> <Paper uid="M98-1024"> <Title>MUC-7 TEST SCORES</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> MUC-7 TEST SCORES </SectionTitle> <Paragraph position="0"> This appendix contains the summary score reports for each system in the MUC-7 evaluation. The reports are ordered by task (Template Element, Template Relation, Scenario Template, Named Entity, and Coreference) and secondarily by site and language (in alphabetical order).</Paragraph> <Paragraph position="1"> A brief introduction to reading the MUC-7 score reports is presented here. First, the common aspects of the score reports will be presented followed by the task-specific aspects. Then references will be given for more detailed information on the metrics, scoring algorithms, and statistical significance testing.</Paragraph> <Paragraph position="2"> The first column in the report, SLOT, contains one of the following types of label: A slot name, e.g. ent_name An object name, e.g. entity The following tallies for the test set are presented in the lefthand columns of the score report: The first row at the bottom of the report is marked as the ALL SLOTS row and gives the official overall tallies and the overall REC, PRE, UND, OVG, SUB, and ERR. The tallies in the POS, ACT, COR, PAR, INC, SPU, MIS, and NON columns add up to the ALL SLOTS values, but not all rows are considered. The rows that are not considered are the OBJ SCORES (i.e., the section where the entries in the SLOT column are object names). The scores for the metrics in the ALL SLOTS row are not computed by averaging the scores in the individual slot rows, but instead are computed on the basis of the tally totals contained in the ALL SLOTS row itself. The remaining row at the bottom of the report contains scores for the following metrics: F-MEASURES F-Measures (weighted combination of recall and precision scores in ALL SLOTS row) F-Measure scores corresponding to three different weightings of precision relative to recall are computed. The P&R value weights precision the same as recall; the 2P&R value gives twice the weight to precision; the P&2R value weights precision half as much as recall. The general formula is:</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Information Extraction Scenario Template </SectionTitle> <Paragraph position="0"> In the Scenario Template task, there is one template object per article and the content slot may or may not be filled depending on whether the text contains a relevant event. The ability of the system to judge relevancy is reflected in the TEXT-FILTERING row of the score report.</Paragraph> <Paragraph position="1"> TEXT-FILTERING Text filtering (recall and precision for document detection) The score category columns used to contain the tallies for these decisions across the test set are as follows: COR (decides relevant and relevant is correct), SPU (decides relevant and nonrelevant is correct), MIS (decides nonrelevant and relevant is correct), and NON (decides nonrelevant and nonrelevant is correct). Text filtering formulas are therefore as follows: REC = COR/(COR + MIS); PRE = COR/(COR + SPU); UND = MIS/(COR + MIS); OVG = SPU/(COR + SPU).</Paragraph> <Paragraph position="2"> Template Element and Template Relation The variation on the Scenario Template (ST) score report for Template Element (TE) and Template Relation(TR) is minor; there is no template object and no text filtering score in either because the objects are generated independent of any domain (scenario) relevance criteria.</Paragraph> <Paragraph position="3"> Named Entity For the Named Entity (NE) task, the reporting is slightly more extensive. The first table is the same as for Information Extraction, i.e., it contains subtask scores. The second table reports the scores according to which part of the document the named entity appeared in. The remaining tables are the same as those in the Information Extraction tasks.</Paragraph> <Paragraph position="4"> Coreference The Coreference (CO) task is quite different from the other four tasks and the output is also different. The number of coreference chains per document is given for the key and the response. The scoring method looks at linkages that define equivalence classes, and the important measure is how many linkages need to be added or taken away to get the correct equivalence class. The score reports contain recall, precision, and F-measure scores. Please note that the recall and precision scores for CO are reported as both ratios and percentages.</Paragraph> </Section> </Section> class="xml-element"></Paper>