File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/w98-1213_intro.xml
Size: 3,419 bytes
Last Modified: 2025-10-06 14:06:45
<?xml version="1.0" standalone="yes"?> <Paper uid="W98-1213"> <Title>I I I I I I I I I I I I I I I I I Automatically generating hypertext in newspaper articles by computing semantic relatedness</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> A survey, reported in Outing (1996), found that there were 1,115 commercial newspaper online services worldwide, 94% of which were on the World-Wide Web (WWW). Of these online newspapers, 73% are in North America. Outing predicted that the number of newspapers online would increase to more than 2,000 by the end of 1997.</Paragraph> <Paragraph position="1"> The problem is that these services are not making full use of the hypertext capabilities of the WWW. The user may be able to navigate to a particular article in the current edition of an online paper by using hypertext links, but they must then read the entire article to find the information that interests them. These databases are &quot;shallow&quot; hypertexts; the documents that are being retrieved are dead ends in the hypertext, rather than offering starting points for explorations. In order to truly reflect the hypertext nature of the Web, links should to be placed within and between the documents.</Paragraph> <Paragraph position="2"> As Westland (1991) has pointed out, manually creating and maintaining the sets of links needed for a large-scale hypertext is prohibitively expensive. This is especially true for newspapers, given the volume of articles Work done at the Department of Computer Science of the University of Toronto produced every day. This could certainly account for the state of current WWW newspaper efforts. Aside from the time-and-money aspects of building such large hypertexts manually, humans are inconsistent in assigning hypertext links between the paragraphs of documents (Ellis et al., 1994; Green, 1997). That is, different linkers disagree with each other as to where to insert hypertext links into a document.</Paragraph> <Paragraph position="3"> The cost and inconsistency of manually constructed hypertexts does not necessarily mean that large-scale hypertexts can never be built. It is well known in the IR community that humans are inconsistent in assigning index terms to documents, but this has not hindered the construction of automatic indexing systems intended to be used for very large collections of documents. Similarly, we can turn to automatically constructed hypertexts to address the issues of cost and inconsistency.</Paragraph> <Paragraph position="4"> In this paper, we will describe a novel method for building hypertext links within and between newspaper articles. We have selected newspaper articles for two main reasons. First, as we stated above, there is a growing number of services devoted to providing this information in a hypertext environment. Second, many newspaper articles have a standard structure that we can exploit in building hypertext links.</Paragraph> <Paragraph position="5"> Most of the proposed methods for automatic hypertext construction rely on term repetition. The underlying philosophy of these systems is that texts that are related will tend to use the same terms. Our system is based on lexical chaining and the philosophy that texts that are related will tend to use related terms.</Paragraph> </Section> class="xml-element"></Paper>