File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/98/p98-1112_concl.xml
Size: 4,423 bytes
Last Modified: 2025-10-06 13:58:02
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1112"> <Title>by NSF award #IRI-9618797 STIMULATE: Generating Coherent Summaries of On-Line Documents: Combining</Title> <Section position="6" start_page="684" end_page="685" type="concl"> <SectionTitle> 4 Results and Future Work. </SectionTitle> <Paragraph position="0"> Basis for WordNet and EVCA comparison. This paper reports results from two approaches, one using WordNet and other based on EVCA classes. However, the basis for comparison must be made explicit. In the case of WordNet, all verb tokens (n = 10K) were considered in all senses, whereas in the case of EVCA, a subset of less ambiguous verbs were manually selected. As reported above, we covered 56% of the verbs by token. Indeed, when we attempted to add more verbs to EVCA categories, at the 59% mark we reached a point of difficulty in adding new verbs due to ambiguity, e.g. verbs such as get. Thus, although our results using EVCA are revealing in important ways, it must be emphasized that the comparison has some imbalance which puts WordNet in an unnaturally negative light. In order to accurately compare the two approaches, we would need to process either the same less ambiguous verb subset with WordNet, or the full set of all verbs in all senses with EVCA. Although the results reported in this paper permitted the validation of our hypothesis, unless a fair comparison between resources is performed, conclusions about WordNet as a resource versus EVCA class distinctions should not be inferred.</Paragraph> <Paragraph position="1"> Verb Patterns. In addition to considering verb type frequencies in texts, we have observed that verb distribution and patterns might also reveal subtle information in text. Verb class distribution within the document and within particular sub-sections also carry meaning. For example, we have observed that when sentences with movement verbs such as rise or fall are followed by sentences with cause and then a telic aspectual verb such as reach, this indicates that a value rose to a certain point due to the actions of some entity. Identification of such sequences will enable us to assign functions to particular sections of contiguous text in an article, in much the same way that text segmentation program seeks identify topics from distributional vocabulary (Hearst, 1994; Kan et al., 1998). We can also use specific sequences of verbs to help in determining methods for performing semantic aggregation of individual clauses in text generation for summarization.</Paragraph> <Paragraph position="2"> Future Work. Our plans are to extend the current research in terms of verb coverage and in terms of article coverage. For verbs, we plan to (1) increase the verbs that we cover to include phrasal verbs; (2) increase coverage of verbs by categorizing additional high frequency verbs into EVCA classes; (3) examine the effects of increased coverage on determining article type.</Paragraph> <Paragraph position="3"> For articles, we plan to explore a general parser so we can test our hypothesis on additional texts and examine how our conclusions scale up. Finally, we would like to combine our techniques with other indicators to form a more robust system, such as that envisioned in Biber (1989) or suggested in Kessler et al. (1997).</Paragraph> <Paragraph position="4"> Conclusion. We have outlined a novel approach to document analysis for news articles which permits discrimination of the event profile of news articles. The goal of this research is to determine the role of verbs in document analysis, keeping in mind that event profile is one of many factors in determining text type. Our results show that Levin's EVCA verb classes provide reliable indicators of article type within the news domain. We have applied the algorithm to WSJ data and have discriminated articles with five EVCA semantic classes into categories such as features, opinions, and announcements. This approach to document type classification using verbs has not been explored previously in the literature. Our results on verb analysis coupled with what is already known about NP identification convinces us that future combinations of information will be even more successful in categorization of documents. Results such as these are useful in applications such as passage retrieval, summarization, and information extraction. null</Paragraph> </Section> class="xml-element"></Paper>