File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/98/x98-1008_abstr.xml
Size: 5,220 bytes
Last Modified: 2025-10-06 13:49:38
<?xml version="1.0" standalone="yes"?> <Paper uid="X98-1008"> <Title>THE SRI TIPSTER III PROJECT</Title> <Section position="2" start_page="0" end_page="39" type="abstr"> <SectionTitle> CPSL </SectionTitle> <Paragraph position="0"> One step towards ease-of-use by nonexperts was the development reported in Phase II \[1\] of SRI's FastSpec language which enabled greater facility in generating and modifying the syntactic and semantic patterns necessary for identifying pertinent data. This was a motivating factor for the establishment of the Common Pattern Specification Language (CPSL) Working Group devoted to formulating a CPSL in order that different IE systems could share pattern</Paragraph> <Section position="1" start_page="0" end_page="39" type="sub_section"> <SectionTitle> libraries .2 Open Domain System </SectionTitle> <Paragraph position="0"> A central goal of SRI's open domain work was to allow users to develop their own customized IE systems called Fastlets for different topics of interest. SRI analyzed a large corpus of business news and developed a basic ontology of concepts. This ontology provides the basis for specifying patterns in a wide range of business news applications. Overall, the concepts and patterns cover 150 most common verbs and nominalizations in the Wall Street Journal.</Paragraph> <Paragraph position="1"> The patterns are currently being implemented in SRI's IE system, FASTUS.</Paragraph> <Paragraph position="2"> ' This material has been reviewed by the CIA. That review neither constitutes CIA authentication of information nor implies CIA endorsement of the author's views.</Paragraph> <Paragraph position="3"> &quot;~ SRI's Doug Appelt has proven the feasibility of CPSL by developing TextPro, a Mac-based IE system written in CPSL. (www.ai.sri.com/-appelt) As part of a collateral effort, SRI built Fastlets for 23 of the 47 TREC-6 \[2\] topics which were used in two joint TREC-6 submissions with GE. In the first, SRI re-ranked the top 2000 documents returned by the GE routing query. In the second submission GE used the topic-relevant text identified by the SRI Fastlets for query expansion. The results were encouraging and indicate that the Fastlet approach to document detection is feasible.</Paragraph> </Section> <Section position="2" start_page="39" end_page="39" type="sub_section"> <SectionTitle> Merging </SectionTitle> <Paragraph position="0"> The previous FASTUS merging algorithm was simple in that it merged newly extracted datastructures from a document with earlier extracted ones starting with the most recent. The results were~go0d, but error analysis showed that improving this process would help increase overall system accuracy. SRI's approach was to apply machine learning (ML) techniques in order to learn merging strategies automatically -- an essential capability for enabling nonexperts to port the system to a new domain.</Paragraph> <Paragraph position="1"> SRI employed two supervised learning techniques -decision tree and maximum entropy. Both approaches were highly successful at classifying merges, but neither technique improved the overall accuracy of the end-to-end system. It is thought that this is due to the fact that correct merges improve the system's score much more than bad merges hurt it. Therefore, although the supervised-learning-based merges made more correct decisions than before, the incorrect blocking of a few correct merges reduced performance. SRI also experimented with unsupervised and weakly supervised learning of merging strategies. These results were reported at AAAI-98. \[3\]</Paragraph> </Section> <Section position="3" start_page="39" end_page="39" type="sub_section"> <SectionTitle> Coreference Resolution </SectionTitle> <Paragraph position="0"> SRI's coreference module in FASTUS at the start of TIPSTER III performed close to the best published results for handling third person pronouns and reflexives. SRI worked on porting coreference improvements to the TIPSTER III system, and, at the same time, began work on coreferential elements to be handled for the first time; bare nominals, possessed nominals, and indefinites. It was thought that overall system degradation corresponds to the lack of semantic and discourse information available to the FASTUS processing modules and that WordNet might be a possible source knowledge to provide this information.</Paragraph> <Paragraph position="1"> WordNet In another related effort, SRI performed experiments in utilizing WordNet (WN) as a knowledge source for IE. SRI developed mechanisms for automatically adding relations on top of WN with respect to nominalizations. There are nominalizations in WN but they are not linked to the word sense of the verb that they nominalize. Because information such as &quot;manager&quot; is an AGENT nominalization of &quot;manage&quot; is useful to the open domain approach, SRI developed routines to identify the appropriate word sense. They also analyzed how WN might improve coreference resolution. Although the potential payoff of using WN is still unclear, it is clear that utilization of WN is not a simple matter of plug-and-play.</Paragraph> </Section> </Section> class="xml-element"></Paper>