File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/91/m91-1005_metho.xml
Size: 5,677 bytes
Last Modified: 2025-10-06 14:12:44
<?xml version="1.0" standalone="yes"?> <Paper uid="M91-1005"> <Title>Category Total Labor Hours Domain Independent MUC-3 Specific Profiler 0 180 Parser / Analyzer Grammar rule s Lexical entrie s</Title> <Section position="3" start_page="49" end_page="49" type="metho"> <SectionTitle> CAUCUS KB </SectionTitle> <Paragraph position="0"> Most of the suggested solutions to these failures will improve recall while reducing the precision of this portion of the system . By increasing the profiler recall at the expense of precision, we add to the number of sentences tha t must be analyzed by the parser . RUBRIC is designed so that we can easily experiment with this trade-off, but we di d not do this because we expected that improvements to the parser at this stage in its development would achieve higher 5 1 payoffs in our MUC-3 score than experiments of this sort.</Paragraph> <Paragraph position="1"> Two failure types, the misspelled keyword and the unusual term, point out what might be considered a fla w in the filter-before-parsing approach. One could, of course, apply the reverse of spelling correction to all words not found in the parser's lexicon to see if we can make them match keywords, or one could apply a spelling-mess-up program to the profiler keywords to catch potentially misspelled keywords in the text, but these are likely to be high-cost , low-payoff solutions . Another solution might be to parse all sentences with unrecognized words. This approach might have a higher payoff because it is likely to uncover sentences that have lists of perpetrators or victims without mentio n of any event keywords.</Paragraph> <Paragraph position="2"> Detecting events described with unusual terms is a problem for all keyword-based concept detection techniques, especially if the knowledge bases are developed automatically through statistical techniques . However, w e have an idea for developing concept knowledge bases based on the structure of the parser's semantic network and lexicon that may result in a more complete profiler knowledge base than would be feasible using only statistical techniques. null</Paragraph> </Section> <Section position="4" start_page="49" end_page="49" type="metho"> <SectionTitle> MUC-3 PREPARATION </SectionTitle> <Paragraph position="0"> Effort expended for MUC-3 involved : (1) Development of profiler rules, (2) Development of grammar, lexicon and semantics for the parser/analyzer, (3) System integration and testing, and (4) General administration . Approximate staff-hours of effort expended on these tasks over the period of the MUC-3 evaluation cycle (i .e., December 1990 through May 1991) is shown in Table 4 separated by domain independent activities and MUC-3 specific activities .</Paragraph> <Paragraph position="1"> Other than the administrative cost of hosting the MUC-3 interim conference, the largest MUC-3 specific task s were developing the back-end procedures for extracting template fillers from the parser and profiler results . We have ideas for reducing or eliminating these application-specific tasks, which we hope to implement in the next year . Other domain-specific knowledge engineering tasks involved relatively minor additions to the lexicon and semantic network . The bulk of effort for MUC-3 went into expanding the capacity of CAUCUS for lexical processing . Under system engineering, we added a lexical analyzer and a facility for storing and accessing compiled lexical entries in dis k files, as well as a few tools for partially automating lexical acquisition. As can be seen in Table 5, our core lexicon grew from a few hundred entries to about 10,000. Our grammar grew by about 20%, adding relative clauses and con junction of clauses and adjectival phrases . We also added a few semantic net nodes to handle some new types of violence and some new speech acts.</Paragraph> </Section> <Section position="5" start_page="49" end_page="53" type="metho"> <SectionTitle> LESSONS LEARNED </SectionTitle> <Paragraph position="0"> CODEX was designed to scale up to large realistic problems such as MUC-3 . The C/Lisp implementation used in MUC-3 was designed to facilitate knowledge engineering and experimentation with parameters and algorithm s that will at least partially automate adaptation to new domains and applications. The Profiler component of the syste m uses the relatively mature RUBRIC technology, but CAUCUS, the Parser/Analyzer component of the system is stil l in its infancy. Prior to MUC-3, we had implemented CAUCUS' Generalized Composition Grammar, grammar compiler, chart parser with prioritized agenda, and a semantic network and lexicon that covered several small domains tha t we have worked in the past. During MUC-3, most of our effort was expended in upgrading CAUCUS' facilities fo r processing lexemes and adding roughly 10,000 lexical entries . Although we were not, in the end, able to get this facilit y up in time to test the parser on MUC-3 TST2, all of this work will be generally useful to future applications . For ADS then, MUC-3 served as a catalyst in the development of generally applicable language processing facilities for messag e data extraction applications .</Paragraph> <Paragraph position="1"> Though the timing was not quite right for us, our analysis of MUC-3 results shows that this type of testing i s useful for assessing the capabilities and weaknesses of a message understanding system as a whole and for showin g where future efforts will achieve the highest payoff in improving the system's performance . ADS' results validate our CODEX approach to the degree that it was implemented for MUC-3 .</Paragraph> </Section> class="xml-element"></Paper>