File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/c96-2164_intro.xml
Size: 2,645 bytes
Last Modified: 2025-10-06 14:06:05
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-2164"> <Title>A Method for Abstracting Newspaper Articles by Using Surface Clues</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The rapid expansion of the Internet enables us to easily access a lot of information sources in the world.</Paragraph> <Paragraph position="1"> The ability to browse information quickly is therefore a very important feature of an information retrieval and navigation system. Abstraction of a document is one useful tool for quick browsing of textual information.</Paragraph> <Paragraph position="2"> Generally, an abstract can be considered to be a concise text giving an outline of the original text. Creating an abstract requires deep semantic processing with broad knowledge, and the strategy for generating an abstract depends on the type of target text. Abstracts created by humans tend to differ according to their creators' background knowledge and interests. Furthermore, as stated in \[6\], the same person is likely to create different abstracts of the same text at different times. Simulating this human process is clearly outside the area that can be dealt with by current computational linguistics. There are, however, some cases in which an abstract can be created by using surface clues to make conjectures as to which portions are the most important without using deep semantic processing. null The most practical way to create an abstract is thus to determine the most important portions by using surface clues. There are two lines of research based on this approach: one analyzes some aspects of a text's structure, such as the rhetorical structure \[7\], and selects some sentences according to this structure \[5, 3\]; the other analyzes surface features for each sentence in a given text and selects the most important sentences according to some heuristics \[6, 1, 9\]. In methods of former type, the rhetorical structure is appropriate for a relatively small set of sentences such as a paragraph, but it does not give enough information to create an abstract for a large set of sentences. In methods of the latter type, the validity of the heuristics is uncertain when the target text is changed. Therefore, this paper proposes a method for selecting important sentences by using an equation based on surface features and their weights, and a method for determining these weights by multiple-regression analysis of abstracts created by humans. The target texts of this method are Japanese newspaper articles.</Paragraph> </Section> class="xml-element"></Paper>