File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/93/m93-1025_concl.xml
Size: 5,223 bytes
Last Modified: 2025-10-06 13:57:01
<?xml version="1.0" standalone="yes"?> <Paper uid="M93-1025"> <Title>USC : DESCRIPTION OF THE SNAP SYSTEM USED FOR MUC- 5</Title> <Section position="13" start_page="315" end_page="315" type="concl"> <SectionTitle> PERFORMANCE AND DISCUSSIO N </SectionTitle> <Paragraph position="0"> Summary of Score Result s During the development of the system, the 186 dryrun messages were used as a test set . For the final run, 251 new messages were processed . A part of the summary of score results for the final run is shown i n Tables 2 and 3. In Error-based metrics, ER.R = 84%, and in recall-precision-based metrics, the recall for al l objects was 19%, the precision was 36%, and the F-measure was 24 .67.</Paragraph> <Paragraph position="1"> While developing the system, we have tried various options to improve the performance . The most important option was changing the semantic pattern knowledge base . To detect the TIE-UP relationship, we used simple string matching rules, domain specific FP-structures, and a relaxed version of the FP-structures . Table 4 shows the different results according to the selection of different knowledge base options .</Paragraph> <Paragraph position="2"> The basic system only include the simple string matching rules . The string matching rules are too primitive, so the recall was only 10% . With the FP-structures, the recall improved 9%, the precision decrease d slightly, and the overall F-measure increased more than 8% The third row is the result when relaxed pattern s are included . In the relaxed pattern, the semantic constraints are removed, so any noun group can be mappe d to an ENTITY slot. The relaxed patterns were tested because there were many company names that wer e not detected (they are not in the company name list) . Due to the relaxed constraint, more TIE-UP relation s were detected, so the recall increased 3%. However, the relaxed patterns also produced many incorrect result , and the precision decreased 10%, resulting in a slightly decreased F-measure . By carefully selecting onl y part of the relaxed pattern, the overall performance would increase by detecting unknown company names without producing a large amount of incorrect matching .</Paragraph> <Paragraph position="3"> Amount of Effor t The development of whole system including the knowledge base took 6 month in our team consisting o f one faculty and five graduate students . Since the skeleton of the memory-based parser used for MUC-4 wa s reusable, most of the time was spent on construction of the knowledge base, updating the dictionary entries , and coding the template generation rules . The total effort for MUC-5 can be estimated as follows : Regarding the limiting factors in the performance of the system we have noticed that : (1) Inferencin g rules to detect unknown company names are needed, (2) More semantic patterns are needed, (3) Our discourse processing capability was insufficient, (4) The parser does not address enough linguistic problems , and (5) The semantic tagging in dictionary entries was not. complete .</Paragraph> <Paragraph position="4"> The burden to the parser and template generator was greatly reduced by moving a large part of processing into the preprocessor . The preprocessor not only performs the dictionary look-up, but also detects various multi-word nouns, and provides semantic tags for important words, which are necessary for late r processing stages . The use of domain specific semantic patterns to detect TIE-UP relationships is efficien t and effective . The memory-based parser performs fast and efficient semantic parsing by using those patterns. The lexical acquisition system was effective in constructing the knowledge base of patterns . It also provides portability-to other domains . The main weaknesses lie in the lack of discourse processing and lack of a module that performs inference to detect company names . A large portion of the dryrun and finalru n messages did not produce a correct TIE-UP relationship only because we failed to detect the company names. Reusability Assuming that the domain and the required output is changed, approximately 75% of the lexicon and concept hierarchy is reusable. None of the inferencing rules for filling templates and the domain specifi c patterns are reusable. However, the patterns can be easily acquired for a new domain by using the lexica l acquisition system .</Paragraph> <Section position="1" start_page="315" end_page="315" type="sub_section"> <SectionTitle> What was Learned </SectionTitle> <Paragraph position="0"> Through the development of the SNAP system for MUC-5, we have learned that the information extraction task needs different approaches from the Natural Language Understanding task, although a full natural language understanding capability is necessary for perfect performance . Full syntactic analysis and semantic interpretation are not necessary for information extraction . If sufficient semantic information is provided for each lexical entry, a phrasal pattern matching approach can achieve reasonable performance in a ver y efficient way. Over all, our experience with MUC-5 has been useful and rewarding .</Paragraph> </Section> </Section> class="xml-element"></Paper>