File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/m98-1018_intro.xml
Size: 2,443 bytes
Last Modified: 2025-10-06 14:06:29
<?xml version="1.0" standalone="yes"?> <Paper uid="M98-1018"> <Title>NYU: Description of the MENE Named Entity System as Used in MUC-7</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> INTRODUCTION </SectionTitle> <Paragraph position="0"> This paper describes a new system called #5CMaximum Entropy Named Entity&quot; or #5CMENE&quot; #28pronounced #5Cmeanie&quot;#29 whichwas NYU's entrant in the MUC-7 named entityevaluation. By working within the framework of maximum entropy theory and utilizing a #0Dexible object-based architecture, the system is able to make use of an extraordinarily diverse range of knowledge sources in making its tagging decisions. These knowledge sources include capitalization features, lexical features and features indicating the currenttype of text #28i.e. headline or main body#29. It makes use of a broad array of dictionaries of useful single or multi-word terms such as #0Crst names, company names, and corporate su#0Exes. These dictionaries required no manual editing and were either downloaded from the web or were simply #5Cobvious&quot; lists entered by hand.</Paragraph> <Paragraph position="1"> This system, built from o#0B-the-shelf knowledge sources, contained no hand-generated patterns and achieved a result on dry run data which is comparable with that of the best statistical systems. Further experiments showed that when combined with handcoded systems from NYU, the University of Manitoba, and IsoQuest, Inc., MENE was able to generate scores which exceeded the highest scores thus-far reported byany system on a MUC evaluation.</Paragraph> <Paragraph position="2"> Given appropriate training data, we believe that this system is highly portable to other domains and languages and have already achieved state-of-the-art results on upper-case English. We also feel that there are plentyofavenues to explore in enhancing the system's performance on English-language newspaper text.</Paragraph> <Paragraph position="3"> Although the system was ranked fourth out of the 14 entries in the N.E. evaluation, wewere diappointed with our performance on the formal evaluation in whichwe got an F-measure of 88.80. We believe that the deterioration in performance was mostly due to the shift in domains caused by training the system on airline disaster articles and testing it on rocket and missile launch articles.</Paragraph> </Section> class="xml-element"></Paper>