File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/95/m95-1020_concl.xml
Size: 3,216 bytes
Last Modified: 2025-10-06 13:57:26
<?xml version="1.0" standalone="yes"?> <Paper uid="M95-1020"> <Title>STERLING SOFTWARE : AN NLTOOLSET-BASED SYSTEM FOR MUC-6</Title> <Section position="5" start_page="259" end_page="260" type="concl"> <SectionTitle> RESULTS AND CONCLUSION S </SectionTitle> <Paragraph position="0"> The overall results (see Table 2) were obtained in 4 person-weeks of effort, lifting some pattern and code ideas from the ATS, which worked on a very different set of message types, and wasting a few day s on the ST task and on filling in date templates . These results show that our semantic-pattern-based approach to entity detection and templating is a very good one, and one which can be brought to bear o n a new application quickly .</Paragraph> <Paragraph position="1"> As we have noted, dramatic improvements in the worst numbers (timex in NE, org locale and country in TE) would have been obtained with very minor changes in the patterns -- literally, a couple hour s worth of work . The org locale fix would actually have given us the highest f-measure on that category : 61.3. Despite that &quot;couple hours&quot; estimate, we would have to say that our greatest limiting factor wa s time -- time to test more thoroughly and isolate the causes of the biggest problems . Slowness of the system was a problem but not a major one, as it took only a minute or two per article .</Paragraph> <Paragraph position="2"> After those two improvements, we turn to the problem of org descriptors -- although we had th e highest f-measure, it was only 43.6, which shows that there is still room for improvement . Here, the solutions are less obvious . One step to take is to add to the patterns to allow modifier phrases after the head noun in a descriptor noun phrase, such as &quot;the agency with billings of $400 million&quot; . More exploration is needed on this, especially in light of the fact that both the recall and precision rates were low.</Paragraph> <Paragraph position="3"> Another area where we would like to make changes is in the order of reduction stages . For example, the system currently does all person reductions after organization reductions . This meant we had to prevent the secondary organization reduction from matching what are clearly person names (eg: primary &quot;Schecter Group&quot; -/-> secondary &quot;Mr. Schecter&quot;). The solution, clearly, is to apply some of the perso n patterns before the organization patterns .</Paragraph> <Paragraph position="4"> Since all the processing occurs without any regard to the types of events discussed in the articles, the system we have developed here is easily portable across domains. If a domain required a different set o f template slots than used for MUC-6, the patterns would be unchanged but the reduction code that fill s the slots, and the postprocessing code that reports them, would have to be modified slightly .</Paragraph> <Paragraph position="5"> We have demonstrated, on MUC-6 and on CDIS, that we have an excellent approach to both entity an d event extraction on a range of document types . We hope to have the opportunity to continue this work , as funding permits.</Paragraph> </Section> class="xml-element"></Paper>