File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-0711_metho.xml

Size: 23,729 bytes

Last Modified: 2025-10-06 14:14:42

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0711">
  <Title>A Scalable Summarization System Using Robust NLP</Title>
  <Section position="3" start_page="66" end_page="69" type="metho">
    <SectionTitle>
2 System Description
</SectionTitle>
    <Paragraph position="0"> Our summarization system DlmSum consmts of the Summarization Server and the Summarlzatzon Chent The Server extracts features (the Feature Extractor) from a document using various robust NLP techmques, described In Sectzon 2 1, and combines these features (the Feature Combiner) to basehne multiple combinations of features, as described m Section 2 2 Our work m progress to automattcally tram the Feature Combiner based upon user and apphcatlon needs m presented in Section 2 2 2 The Java-based Chent, which wdl be dmcnssed In Section 4, provides a graphical user interface (GUI) for the end user to cnstomlze the summamzatlon preferences and see multiple views of generated sum-</Paragraph>
    <Section position="1" start_page="66" end_page="67" type="sub_section">
      <SectionTitle>
Inarles
2.1 Extracting Stlmmarization Features
</SectionTitle>
      <Paragraph position="0"> In this section, we describe how we apply robust NLP technology to extract summarization features Our goal IS to add more mtelhgence to frequency-based approaches, to acqmre domain knowledge In a more automated fashion, and to apprommate text structure by recogmzing sources of dmcourse cohesion and coherence  Frequency-based summarization systems typically use a single word stnng as a umt for counting frequencies Whde such a method IS very robust, it ignores the semantic content of words and their potential membership m multi-word phrases For example, zt does not dmtmgumh between &amp;quot;bill&amp;quot; m &amp;quot;Bdl Table 1 Collocations with &amp;quot;chlps&amp;quot; {potato tortdla corn chocolate b~gle} chips {computer pentmm Intel macroprocessor memory} chips {wood oak plastlc} cchlps bsrgmmng clups blue clups mr chips Clmton&amp;quot; and &amp;quot;bill&amp;quot; in &amp;quot;reform bill&amp;quot; This may introduce noise m frequency counting as the same strmgs are treated umformly no matter how the context may have dmamblguated the sense or regardless of membership in multl-word phrases For DlrnSum, we use term frequency based on tf*Idf (Salton and McGdl, 1983, Brandow, Mitze, and Rau, 1995) to derive ssgnature words as one of the summarization features If single words were the sole basra of countmg for our summarization application, nome would be introduced both m term frequency and reverse document frequency However, recent advances in statmtlcal NLP and information extraction make it possible to utilize features which go beyond the single word level Our approach is to extract multi-word phrases automatlcally with high accuracy and use them as the basic unit in the summarization process, including frequency calculation Ftrst, just as word association methods have proven effective m lemcal analysis, e g (Church and Hanks, 1990), we are exploring whether frequently occurring Collocatlonal reformation can improve on simple word-based approaches We have pre-processed about 800 MB of LA tlmes/Wastnngton Post newspaper articles nsmg a POS tagger (Bnll, 1993) and derived two-word noun collocations using mutual information The. result included, for example, varlons &amp;quot;chips&amp;quot; phrases as shown m Table 1 The word &amp;quot;ch~ps&amp;quot; occurred 1143 times m this corpus, and the table shows that thin word m semantically very amblguons In word associatmns, It can refer to food, computer components, abstract concepts, etc By incorporating these conocatlons, we can dlsamblguate dtfferent meamngs of &amp;quot;chips&amp;quot; Secondly, as the recent Message Understanding Conference (MUC-6) showed (Adv, 1995), the accuracy and robustness of name extraction has reached a mature level, equahng the level of human performance m accuracy (lind-90%) and exceeding human speed by many thousands of times We employed SRA's NameTag TM (Krupka, 1995) to tag the aforementioned corpus with names of people, entztIes, and places, and derived a baseline database for tffIdf calculation In the database, we not only treated multi-word names (e g, &amp;quot;Ball Clinton&amp;quot;) as single tokens but also dmamblguated the semantic types of names so that, for instance, the company &amp;quot;Ford&amp;quot;  ts treated separately from President &amp;quot;Ford&amp;quot; Our approach is thus different from (Kupiec, Pedersen, and Chen, !995) where only capitalization reformation was used to identify and group various types of proper names  In knowledge-based summarization approaches, the biggest challenge ts to acquire enough domain knowledge to create conceptual representations for a text Though summarization from conceptual representation has many advantages (as discussed m Section 1), extracting such representations constrains a system to domain dependency and is too knowledge-intensive for our approach Instead, we took an automatic and robust approach where we acqmre some domain knowledge from a large corpus and incorporate that knowledge as summarization features m the system We incorporated corpus knowledge m three ways, that is, by using a large corpus baseline to calculate'ldf values for selecting signature words, by denying collocations statistically from a large corpus, and by creating a word association index derived from a large corpus (Jmg and Croft, 1994) With thin method, the system can automatically adapt to each dmtmct domain, hke newspapers vs legal documents without manually developing domain knowledge Domain knowledge is embraced in szgnature words, which indicate key concepts of a given-document, in col-Iocat:on phrases, which provide richer key concepts than single-word key concepts (e g &amp;quot;appropriations bill,&amp;quot; &amp;quot;ommbus bill,&amp;quot; &amp;quot;brady ball,&amp;quot; &amp;quot;reconciliation bill,&amp;quot; &amp;quot;crime bill,&amp;quot; &amp;quot;stopgap bdl,&amp;quot;, etc ), and in their assoczated words, which are clusters of dommn related terms (e g, &amp;quot;Bayer&amp;quot; and &amp;quot;aspirin,&amp;quot; &amp;quot;Columbia Raver&amp;quot; and &amp;quot;gorge,&amp;quot; &amp;quot;Dead Sea&amp;quot; and &amp;quot;scrolls&amp;quot;) 2.i.3 Recognizing sources of Discourse  Past research (Pmce, 1990) has described the negative impact on abstract quality of fathng to perform some type of discourse processing Since dincourse knowledge (e g, effective anaphora resolution and text segmentation) cannot currently be automatlcally acquired easily wlth high accuracy and robustness, heuristic techniques are often employed in summarization systems to suppress sentences with interdependent cohesive markers However, there are several shallower but robust methods we can employ now to acquire some discourse knowledge Namely, we exploit the dmcourse features of lexlcal items within a text by using name aliases, synonyms, and morphological variants Within a document, subsequent references to full names are often aliases Thus, linking name aliases provides some indication as to which sentences are interrelated, as shown below The Institutional Revolutionary Parry, or PRI, capped sis landmark assembly to reform ,tself w,th a .Nourish of pomp and prom,ses Among the measures coming out of the assembly's fiercest pubhc debate, zn which party members rose up agaznst the,r leadership Saturday nlght, are new requsrements for future PRI pres,denttal cand, dates, quahficatwns that netther ~eddlo nor any of Mezzco's prevzous four pres,dents would have met The NameTag name extractxon tool discussed m the previous section performs hnkmg of name aliases within a document such as &amp;quot;Albnght&amp;quot; to &amp;quot;Madeleme Albnght,&amp;quot; &amp;quot;U S&amp;quot; to &amp;quot;Umted States,&amp;quot; and &amp;quot;IBM&amp;quot; for &amp;quot;International Business Machine&amp;quot; We used tlus tool to link full names and. their aliases so that term frequency can be more accurately reflected, x e, &amp;quot;IBM&amp;quot; and &amp;quot;International Business Machine&amp;quot; are counted as two occurrences of the same term Another overt clue for chscourse cohesion and coherence is synonymous words When a theme of an article m developed throughout the text, synonymous words often appear as variants m the text In the example below, forinstance, &amp;quot;pictures&amp;quot; and ~mages&amp;quot; are used interchangeably A new medzcal imaging technzque may someday be able to detect lung cancer and diseases of the bra:n earher than conventwnal methods, according to doctors at the State Un:vers,ty of New York, Stony Brook,</Paragraph>
    </Section>
    <Section position="2" start_page="67" end_page="68" type="sub_section">
      <SectionTitle>
and Princeton Un:verszty If doctors
</SectionTitle>
      <Paragraph position="0"> want to take pictures of the lungs, he noted, they have to use X-ray machines, ezpos:ng thezr pat:ents to doses of radtatzon :n the process The new technlque uses an anesthetfc, tenon gas, instead of water to create images of the body Although synonym sets have not proven effective in reformation retrieval for query expansion (Vorhees, 1994), we are using WordNet (Mallet et al, 1990) to link synonymous words m an article In the IR task, a query term is expanded with Its synonyms without dlsambxguatmg the senses of the term Thus, semantically irrelevant query terms are added, and the system typically retrieves more irrelevant documents, decreasing the precision Our summarization approach, in contrast, attempts to exploit WordNet synonym sets of only signature terms m a szngle document Our hypothesis m that if a synonym of a signature term extsts m the article, the term has been dlsamblgnated by the context of the article and the &amp;quot;correct&amp;quot; synonym, not a synonym of the term in a different sense, m likely to co-occur in the same document In addition, morphological analysts allows us to link morphological variants of the same word within a document Morphological variants are often used to refer to the same concept throughout a document,</Paragraph>
      <Paragraph position="2"> providing discourse clues In the above example, &amp;quot;lma~ng&amp;quot; and &amp;quot;Images&amp;quot; are morphologically linked Like synonyms, morphology or stemming has not proven to be useful for &amp;quot;lmprowng information retrieval (Salton and Lesk, 1968, Harman,~1991) However, the recent work by (Church, 1995) showed that effectiveness of morphology, or correlations among morphological variants within a document, vanes from word to word A word hke &amp;quot;hostage&amp;quot; has a large correlation with its variant Uhostages&amp;quot; while a word like &amp;quot;await&amp;quot; does not According to his experiments, good keywords like &amp;quot;hostage&amp;quot; and its variants are likely to be repeated more than chance within a document and highly correlate with variant forms Tins implies that important signature words we use for summarization are likely to appear In a single document multiple times using their variant forms</Paragraph>
    </Section>
    <Section position="3" start_page="68" end_page="69" type="sub_section">
      <SectionTitle>
2.2 Combining Sl~rnrnarlzatlon Features
</SectionTitle>
      <Paragraph position="0"> The DlmSum summarizer exploits our flexible definition of a signature word and sources of domain  ated multiple baseline databases based upon multiple deflmtions of the signature words Signature words are flexibly defined as collections of features Presently, we derive databases cousustmg of  (a) terms alone, (b) terms plus multi-word names, (c) stemmed terms plus muti-words names, and (d) terms plus multi-word names and collocations  The duscourse features, 1 e, synonyms, morphological variants or name ahases, for s~gnature words, on the other hand, can affect the term frequency (tf) values Using these discourse features boosts the term frequency score within a text .when they are . treated as var!ants of signature words Having multiple baseline databases available makes it easy to test the contribution of each feature or combination of features  In order to select sentences for a summary, each sentence in the document us scored using different combinations of signature word features and discourse features Currently, every token m a document us assigned a score based on its tf*ldf value The token score us used, in turn, to calculate the score of each sentence in the document More specifically, the score of a sentence is calculated as the average of the scores of the tokens contained m that sentence with the exception that certain types of  tokens can be ehmmated from the sentence as discussed That m, the DlmSum system can Ignore any combination of name types (1 e, person, place, entity) from a ~ven document for sconng (cf Section 3 for more details) After every sentence is assigned a score, the top n tnghest scoring sentences are chosen as a summary of the content of the document Currently, the Dun-Sum system chooses the number of sentences equal to a power k (between zero and one) of the total number of sentences Thus, the system can vary the length of a summary accordmg to ~ For instance, if 0 5 is chosen as the power, and the document consists of 100 sentences, the output summary would contain 10 sentences Thus scheme has an advantage over choosing a given percentage of document size as it yields more information for longer documents while keeping summary size manageable We use the results of thus method as the baseline summary performance (; e, without any training), and report them m Section 3  As our goal is to make our summarization system trainable to different user and application needs, we are currently workmg on learning the best feature combination method from a tralmng corpus automatically For training and evaluating our summa~ nzatlon system, we had a user create extract summaries by selecting relevant sentences m articles In order to compare with the results of a trainable summanzer reported by (Kuplec, Pedersen, and Chen, 1995), we first use Bayes' rules to learn the best scoring method Then, we will use an inductive learning algorithm such as the decusion tree algorithm (Qumlan, 1993) to learn summarization rules which can deal with feature dependencies across sentences</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="69" end_page="71" type="metho">
    <SectionTitle>
3 Evaluation
</SectionTitle>
    <Paragraph position="0"> Much research has been devoted to assessing correspondence between human and machine abstracts because of the complexity of analyzing &amp;quot;ahoutness&amp;quot; as illustrated in (Hahn, 1990) As a result, most of the prehmmary evaluations of summarizatlon systerns have been developer-based A common aFproach IS to compare correspondence between automatlc performance and human performance (Rath, Restock, and Savage, 1961, Edmundson, 1969, Kuplec, Pedersen, and Chen, 1995) or summary accep~ ability (Brandow, Mltze, and Ran, 1995) Others have been task-based, comparing abstract and full text on~nals m terms of the browsing and search time (Mnke et al, 1994, Sumlta, Ono, and Mllke, 1993) or recall and precision m-document retrieval (Brandow, Mltze, and Ran, 1995) Our evaluation methodology us two-pronged First, we evaluate the system by scoring for correspondence with human generated extracts (Seco tlon 3 1) Second, m our future work we are collaborating with the Umverslty of Massachusetts to evaluate retrieval effectiveness for system-generated and human-generated summaries (Section 3 2)</Paragraph>
    <Section position="1" start_page="69" end_page="70" type="sub_section">
      <SectionTitle>
3.1 Developer-based Evaluation
</SectionTitle>
      <Paragraph position="0"> The DlmSum development envtronment software incorporates automatic sconng software to calculate system recall and precision for any user's training or test data ThLs allows us to evaluate system performance for any user and for variatl0us m summary preferences We performed an informal experiment in which 6 users created summary extract versions of the same set of 15 texts These versions varied considerably among users, winch supports our view that a summarlzation system should he trained for user preference Then, we ran the DlmSum system over these 15 texts using multiple feature combinations (l e, combmatlous among names, synonyms, and morphologtcal variants), and scored against the six versions of summary extracts Though correspondence between the DlmSum summaries and user suminarles was low (ranging between 14% and 31% F-measures), clearly some feature sets were more effective for some users than for others For example, the best feature c0mbmatlon for the best-case correspondence between the user and DlmSum (l e, 31% case) was the combination of name, synonym and morphological mforinatlon On the other hand, the best combination for the worst-case correspondence between the user and DlmSum (l e, 14% case) was the combination of name and synonym reformation Some summary extracts, however, were not affected by different combinations of features The second step was to obtain a &amp;quot;bottom-hne&amp;quot; score for a singl e user We ran the DlmSum system over a set of 86 texts using multiple feature combinatlous The features were combined by taking an average Of tf*ldf, tf or ldf scores of each token m a-sentence No training was performed We vaned the length of summaries (by changing/~ from 0 5 to 1 0), use of different types Of names (l e, person, place, and entity), use of abases, and use of synonyms for different parts of speech (l e, adjective, adverb, noun, and verb) Table 2 shows the top three F-measure scores (13), and the score for using the simplest baseline (4) For the best summary (1), place names were used winle person and entity ~ames were recognized but removed for sentence sconng Synonyms were also used The /c value was set to 0 65 (about 20-25% of a document as a summary) Use of aliases and synonyms chdn't make much difference m the scores (2-3) However, they all scored shghtly ingher than * the summary which &amp;dn't use any of these features, i e, a summary which didn't use names, synonyms, or aliases (4) It Is interesting that using name tagging in a re- null verse way, 1 e , recogmzmg and then deleting person names from * sentence scoring, made a slgmficantly positive effect on summarization The best summary score with the person name used m sentence scoring was 38 6% (5) The.reason why person names made negative contnbutlous to*the summary seems ' to be because personal names were often mentioned as passing references (e g, names of spokespeople) m the corpus, but they had ingh ldf values Finally, m every feature combination, taking tf*ldf scores of each word outperformed the ldf-based calculation, and the latter m turn outperformed the if-based score calculation These results further motivate us to apply automated learning to combine summarization features The fact that humans vary m summarization suggests that recall/preclmon evaluation.is not meanmgful unless a summarization system Is trainable to a particular summary style Our current work is to identify through training what feature combinations produce an optimal summary for a given user We anticipate that the summary performance will improve with tralmng as DhmSum learns antomatlcally how or whether these, different mgna-.</Paragraph>
      <Paragraph position="1"> * ture word definitions are contnbutmg to the summary The current design does not incorporate para~aph/sentence location reformation or genre-specific indicator phrases We are explonng if these features can be indirectly subsumed by the derived features we have already identified Also, the cursory look at the summaries of Dim-Sum shows that the system-generated summary may be prowdmg the same reformation as the summary provided by the user, but the sentences were chosen differently ThE happens because the same reformation can be conveyed by dnTerent sentences within the same document This motivates us to conduct a more task-oriented summarization evaluation, winch Is discussed below</Paragraph>
    </Section>
    <Section position="2" start_page="70" end_page="71" type="sub_section">
      <SectionTitle>
3.2 Task-based Evaluation
</SectionTitle>
      <Paragraph position="0"> As a more task-oriented evaluation, we. are collaborating with the Umverslty of Massachusetts to evaluate retrieval etfectiveness for DlmSum system-generated .and human-generated extracts for topics from TREC-5 (Text REtrieval Conference-5) We have selected 30 topics, five assessed as difficult, five assessed as easy (Harman, 1996), and the remaining 15 randomly The top 50 documents judged relevant by the INQUERY system m TRECC-5 for each topic have been identified For each document, two extract versions are being manually created One extract m based on the topic description, wtule the second L9 generated independent of the topic description In addition, the DlmSum system will automatlcally.generate two versions (query dependent and generic) for each of the texts With the TREC-5 full text results as a baseline, multiple lteratlous of the  the human and machine generated extracts to compare retrieval effectiveness</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="71" end_page="71" type="metho">
    <SectionTitle>
4 Multi-dimensional Summary
</SectionTitle>
    <Paragraph position="0"> Views .</Paragraph>
    <Paragraph position="1"> The DlmSum Summarization Clientprovldes a summary of a document in multiple dlmeuslons through a graphical user interface (GUI) to smt dflferent users' needs In contrast to a static view of a document, the system brings the contributing hngnmtie and other resources to the desktop and the user chooses the view he wants As shown m Flgnre 1, the GUI is divided mto the Lint Box on the left and the Text Viewer on the right When a user asks for a summary of a text, extracted summary sentences are hlghhghted m the Text Viewer The user can dynarmeally control a percentage of sentences to highlight for a summary In addition, the Client can automatically color-code top keywords m different colors for different types (1 e, person, entity, place and other) for quack and easy browsing In the Lint Box, the user can explore two different summary views of a text First, the user can choose the &amp;quot;Name Mode,&amp;quot; and all the names of people, entities, and places which were recognized by the name extraction tool are sorted and displayed m the List Box (el Figure 1) The user can also select a subset of name types (e g, only person and entity, but not place) to d~play Aliases of a name are indented and hsted under their full names In the &amp;quot;Keyword Mode,&amp;quot; the top keywords, or signature words, (including names) axe dmplayed in the Last Box Analogous to the name aliases, for each keyword its synonyms and morphological variants, if exast, are indented and hsted below it (cf Figure 2) The user can choose the score threshold or percentage to vary the number of keywords for display In both modes, the names and signature words in the List Box can be sorted alphabetically, by frequency, or by the tf*ldf score Choking on a term in the Lint Box also causes the first occurrence of the term to be hlghhghted in the Text Viewer From there, the user can use the FIRST, PREVIOUS, NEXT, or LAST button at the bottom of the GUI to track the other occurrences of the term, including its ahases, synonyms, and morphological variants This provides the user with a way to track themes of the text lnteractively</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML