File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/c00-1072_intro.xml

Size: 4,137 bytes

Last Modified: 2025-10-06 14:00:47

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-1072">
  <Title>The Automated Acquisition of Topic Signatures for Text Summarization</Title>
  <Section position="3" start_page="0" end_page="495" type="intro">
    <SectionTitle>
2 Related Work
</SectionTitle>
    <Paragraph position="0"> In late 1970's, \])e.long (DeJong, 1982) developed a system called I&amp;quot;tIUMP (Fast Reading Understanding and Memory Program) to skim newspaper stories and extract the main details. FRUMP uses a data structure called sketchy script to organize its world knowh'dge. Each sketchy script is what FRUMI ) knows al)out what can occur in l)articular situations such as denmnstrations, earthquakes, labor strike.s, an(t so on. FRUMP selects a t)articular sketchy script based on clues to styled events in news articles. In other words, FRUMP selects an eml)t3 ~ t(uni)late 1 whose slots will be tilled on the fly as t&amp;quot;F\[UMP reads a news artMe. A summary is generated })ased on what has been (:al)tured or filled in the teml)Iate.</Paragraph>
    <Paragraph position="1"> The recent success of infornmtion extractk)n research has encore'aged the FI{UM1 ) api)roach. The SUMMONS (SUMMarizing Online News artMes) system (McKeown and Radev, 1999) takes teml)late outputs of information extra(:tion systems develofmd for MUC conference and generating smnmaries of multit)le news artMes. FRUMP and SUMMONS both rely on t/rior knowledge of their domains, th)wever, to acquire such t)rior knowledge is lal)or-intensive and time-consuming. I~)r exam-l)le, the Unive.rsity of Massa(:husetts CIRCUS sysl.enl use(l ill the MUC-3 (SAIC, 1998) terrorism domain required about 1500 i)erson-llours to define extraction lmtterns 2 (Rilotf, 1996). In order to make them practical, we need to reduce the knowhxlge engineering bottleneck and iml)rove the portability of FI{UMI ) or SUMMONS-like systems.</Paragraph>
    <Paragraph position="2"> Since the worhi contains thousands, or perhal)s millions, of COml)lex (:on(:et)ts , it is important; to be able to learn sketchy scripts or extraction patterns automatically from corpora -no existing knowledge base contains nearly enough information. (Rilotf aim Lorenzen, 1999) 1)resent a system AutoSlog-TS that generates extraction i)atterns and learns lexical constraints automatically fl'om t)rec\]assified text to alleviate the knowledge engineering I)ottleneck mentioned above. Although Riloff al)plied AutoSlog-TS l\Ve viewed sketchy s(:lil)tS and teml)lates as equivalent (ollstrllctS ill the sense that they sl)ecil ~, high level entities and relationships for specific tot)its.</Paragraph>
    <Paragraph position="3"> 2Aii extra(:l;iOll pattt!rlk is essentially ;t case fraine contains its trigger word, enabling conditions, variable slots, and slot constraints. CIRCUS uses a database of extraction patterns to t~alSe texts (l{ilolI', 1996).</Paragraph>
    <Paragraph position="4">  to text categorization and information extraction, the concept of relevancy signatures introduced by her is very similar to the topic si.qnatures we proposed in this paper. Relevancy signatures and topic signatures arc both trained on preclassitied documents of specific topics and used to identify the presence of the learned topics in previously unseen documents. The main differences to our approach are: relevancy signatures require a parser. They are sentence-based and applied to text categorization.</Paragraph>
    <Paragraph position="5"> On the contrary, topic signatures only rely on corpus statistics, arc docmnent-based a and used in text smnmarization.</Paragraph>
    <Paragraph position="6"> In the next section, we describe the automated text smmnarization system SUMMARIST that we used in the experiments to provide the context of discussion. We then define topic signatures and detail the procedures for automatically constructing topic signatures. In Section 5, we give an overview of the corpus used in the evaluation. In Section 6 we present the experimental results and the possibility of enriching topic signatures using an existing ontology. Finally, we end this paper with a conclusion.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML