File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-0707_intro.xml
Size: 4,410 bytes
Last Modified: 2025-10-06 14:06:21
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0707"> <Title>I I I I I</Title> <Section position="3" start_page="41" end_page="42" type="intro"> <SectionTitle> 3 Experiment </SectionTitle> <Paragraph position="0"> Several automaUc exwacuon schemes, including the above, have been proposed earher (Salton et al, 19961>, Salton et al, 1996a) General features of the extracts produced by these chfferent algonthrns have been noted, based on manually exarmmng some of the extracts However, objective evaluauon of these algorithms has always been problematsc In (Salton and Smghal, 1995), an attempt was made to evaluate the summaries based on</Paragraph> <Paragraph position="2"> telccommumca~om mdusUy as a whole because ff two dewces use different standards they arc unable to conmmmcato properly Standards me developed m Table 4 Text for augmented segmented bushy path for article Telecommumcatwns ranked retrieval Since relevance judgments were not avadable for passages or extracts, the avadable relevance judgments for full documents were extrapolated to the extracts However, the pomon of a document that is relevant to a query may well get left out of a passage, and so, results obtmned from such an evaluauon are unrehable Since the goal of our summarization schemes ~s to automate a process that has trachUonally been done manually, a comparison of automaucally generated extracts with those produced by humans would prowde a reasonable evaluauon of these methods We assume that a human would be able to identify the most Important * paragraphs In an amcle most effectwely If the set of paragraphs selected by an automatic extraction method has a Ingh overlap with the human-generated extract, the automauc method should be regarded as effective Thus, our evaluation method takes the following form a user submits a document to the system for summanzatson, in one case, the system presents a summary generated by another person, m the other, It produces an automatically generated extract The user compares the two summaries manual and automauc -- to Ins/her own notion of an ideal extract To evaluate the automatic methods, we compare the user's 'sausfacuon' m the two cases Such an evaluation methodology has its shurtcormngs, for example It does not account for the readabihty aspect of a summary, It also ignores the fact that user satisfaction Is related to whether a user has seen the full-arucle or not Unfortunately, given the lack of a good testhed for evaluaung automatic summarization, xt ts the best we can do Fifty articles were selected from the Funk and WagnallsEncyclopedla(PunkandWagnalls, 1979) Foreach arucle, two extracts were constructed manually One of these extracts was used as the manual summary The otherone, winch then becomes a user's 0deal) smnmary, is used as the oracle to compare the performance of the manual summary and an automatic summary .The following instructions were given to those who constructed the manual extracts Please read through the articles Determine -, wh!ch n paragraphs are the most tmportant for summarizing tins amcle n = MAX(5, l/5th the total number of paragraphs (round to the next Ingher number for fracuons)) Mark the paragraphs winch you chose The resulting database of 100 manual summanes (two for each of the fifty arUcles) was used m the final evaluation of the automaUc methods Summaries were then automatically generated for the amcles, using each of the four methods descnbed above In each caseJhe automauc and manual extracts had the same number of paragraphs 3 In manual summarization by paragraph extraction, there are certam paragraphs m a text that certainly belong m a summary extract, but then there are many paragraphs whose importance is subjectively judged by the mChvldual doing the extraction To reduce the effect of the arbltranness introduced by mchvldual's subjective notions, for very short arUcles, we asked our subjects to extract. at least five paragraphs, hoping that the mtersecuon of the two manual summaries roll indeed yield the most important paragraphs m an artscle The articles used m our evaluation had anywhere between thn'teen and forty eight content paragraphs The current implementation of the Smart system also considers the section headings, etc as |ndlvldual paragraphs Such paragraphs were marked as non-content and were ~gnored m the summanzatmn process</Paragraph> </Section> class="xml-element"></Paper>