File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/96/c96-2166_evalu.xml
Size: 4,730 bytes
Last Modified: 2025-10-06 14:00:21
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-2166"> <Title>Fast Generation of Abstracts from General Domain Text Corpora by Extracting Relevant Sentences</Title> <Section position="5" start_page="987" end_page="987" type="evalu"> <SectionTitle> 4 Results and Discussion </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="987" end_page="987" type="sub_section"> <SectionTitle> 4.1 Automatically created abstracts </SectionTitle> <Paragraph position="0"> Table 1 shows the precision/recall values for the tf*idf-method described in section ?? and for a default method that selects just the first N senfences fi'om the beginning of each artMe (&quot;lead&quot;method). Whereas tile lead method most likely provides a higher readability (see Brandow et al.</Paragraph> <Paragraph position="1"> (?)), tile data clearly indicates that the tf*idf method is superior to this default approach in terms of relevance, ldeg The computation of these precision/recall values is based on the sentences which were chosen by the human subjects from the experiinent, i.e., an average was built over the precision/recall between the machine system and each individual subject.</Paragraph> </Section> <Section position="2" start_page="987" end_page="987" type="sub_section"> <SectionTitle> 4.2 Abstracts produced by human subjects </SectionTitle> <Paragraph position="0"> The global analysis shows a surprisingly good correlation across the hunmn subjects for the sentence scores of all six articles (see table ??): in the Pearson-r correlation matrix, 71 coefticients are significant at the 0.01 level (***), 5 at the 0.05 level (*), and only 2 are non significant (n.s.). This result indicates that there is a good inter-subject agreement about the relative &quot;relevance&quot; of sentences in these texts.</Paragraph> </Section> <Section position="3" start_page="987" end_page="987" type="sub_section"> <SectionTitle> 4.3 Comparison of machine-made and </SectionTitle> <Paragraph position="0"> hurnan-Inade abstracts We computed precision/recall for (;very human subject, compared to all the other 12 subjects (taking the average precision/recall). From these individual recall/precision values, the average was computed to yield a global measure for interhuinan precision/recall. Depending oil the article, these values range from 0.43/0.43 to 0.58/0.58, the mean being 0.49/0.49. As we can see, these results are in the same range as the results for the machine system discussed previously (0.46/0.55, for abstracts with 6' sentences). This means that if we compare the output of the automatic system to the output of an average human subject in the experiment, there is no noticeable ditference in terms of precision/recall the machine l)erforlns as well as human subjects do, given the task of selecting the most relevant sentences from a text.</Paragraph> </Section> <Section position="4" start_page="987" end_page="987" type="sub_section"> <SectionTitle> 5.1 Dealing with multi-topical texts </SectionTitle> <Paragraph position="0"> It can be argued that so far we have only dealt with short texts about a single topic. It is not clear how well the system would be able to handh; texts where multiple threads of contents occur; possibly, one couhl employ the method of texttiling here (see e.g., (?)), which helps determining coherent sections within a text and thus could &quot;guide&quot; the abstracting system ill that it would be able to track a sequence of multit)le topics in a text,.</Paragraph> </Section> <Section position="5" start_page="987" end_page="987" type="sub_section"> <SectionTitle> 5.2 On-line abstracting </SectionTitle> <Paragraph position="0"> While our system currently produces abstracts offline, it is feasible to extend it in a way where it uses the user's query in an IR environment to determine tile relevant sentences of the retrieved documents, tIere, instead of producing a &quot;general abstract&quot;, the resulting on-line abstract would reflect more of the &quot;user's perspective&quot; on the respective text. However, it would have to be investigated, how nmch weight-increase the words from the user's query should get in order not to bias tile resulting output in too strong a way.</Paragraph> <Paragraph position="1"> Further issues concerning the human-inaehine interface are: * highlighting passages containing the query words * listing of top ranked keywords in tile retrieved text(s) * indicating the relative position of the extracted sentences in the text * allowing for scrolling in the main text, starting at an arbitrary position within the abstract null</Paragraph> </Section> </Section> class="xml-element"></Paper>