XML Viewer - m98-1018

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/98/m98-1018_evalu.xml
Size: 3,984 bytes
Last Modified: 2025-10-06 14:00:29
<?xml version="1.0" standalone="yes"?>
<Paper uid="M98-1018">
  <Title>NYU: Description of the MENE Named Entity System as Used in MUC-7</Title>
  <Section position="7" start_page="2" end_page="2" type="evalu">
    <SectionTitle>
RESULTS
</SectionTitle>
    <Paragraph position="0"> MENE's maximum entropy training algorithm gives it reasonable performance with moderate-sized training corpora or few information sources, while allowing it to really shine when more training data and information sources are added. Table 2 shows MENE's performance on the within-domain corpus from MUC-7's dry run as well as the out-of-domain data from MUC-7's formal run. All systems shown were trained on 350 aviation disaster articles #28this training corpus consisted of about 270,000 words, which our system turned into 321,000 tokens#29.</Paragraph>
    <Paragraph position="1">  Note the smooth progression of the dry run scores as more information is added to the system. Also note that, when combined under MENE, the three weakest systems, MENE, Proteus, and Manitoba outperform the strongest single system of the group, IsoQuest's. Finally, the top dry-run score of 97.12 from combining all three systems seems to be competitive with human performance. According to results published elsewhere in this volume, human performance on the MUC-7 formal run data was in a range of 96.95 to 97.60. Even better is the score of 97.38 shown in table 3 below which weachieved by adding an additional 75 articles from the formal-run test corpus into our training data. In addition to being an outstanding result, this #0Cgure shows MENE's responsiveness to good training material.</Paragraph>
    <Paragraph position="2"> Theformalevaluationinvolvedashift in topicwhichwasnotcommunicatedtothe participantsbeforehand#7B the training data focused on airline disasters while the test data was on missile and rocket launches. MENE faired much more poorly on this data than it did on the dry run data. While our performance was still reasonably good, we feel that it is necessary to view this number as a cross-domain portability result rather than an indicator of how the system can do on unseen data within its training domain. In addition, the progression of scores of the combined systems was less smooth. Although MENE improved the Manitoba and Proteus scores dramatically, it left the IsoQuest score essentially unchanged. This mayhave been due to the tremendous gap between the MENE- and IsoQuest-only scores. Also, there was no improvementbetween the MENE + Proteus + IsoQuest score and the score for all four systems. We suspect that this was due to the relatively low precision of the Manitoba system on formal-run data.</Paragraph>
    <Paragraph position="3"> We also did a series of runs to examine how the systems performed on the dry run corpus with di#0Berent amounts of training data. These experiments are summarized in table 3.</Paragraph>
    <Paragraph position="4">  A few conclusions can be drawn from this data. First of all, MENE needs at least 20 articles of tagged training data to get acceptable performance on its own. Secondly, there is a minimum amount of training data which is needed for MENE to improve an external system. For Proteus and the Manitoba system, this number seems to be around 80 articles. Since the IsoQuest system was stronger to start with, MENE required 150 articles to show an improvement.</Paragraph>
    <Paragraph position="5"> MENE has also been run against all-uppercase data. On this we achieved formal run F-measures of 77.98 and 82.76 and dry run F-measures of 88.19 for the MENE-only system and 91.38 for the MENE + Proteus system. The formal run numbers su#0Bered from the same problems as the mixed-case system, but the combined dry run number matches the best currently published result #5B1#5D on all-caps data. Wehave put very little e#0Bort into optimizing MENE on this type of corpus and believe that there is room for improvement here.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML