I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
Automatically generating hypertext in newspaper articles by computing 
semantic relatedness 
Stephen J. Green 
Microsoft Research Institute 
School of Mathematics, Physics, Computing and Electronics* 
Macquarie University 
Sydney, NSW 2109 
Australia 
sj green@mri.mq.edu.au 
Abstract 
We discuss an automatic method for the construction of 
hypertext links within and between newspaper articles. 
The method comprises three steps: determining the lexical 
chains in a text, building links between the paragraphs of 
articles, and building links between articles. Lexical chains 
capture the semantic relations between words that occur 
throughout a text. Each chain is a set of related words that 
captures a portion of the cohesive structure of a text. By 
considering the distribution of chains within an article, we 
can build links between the paragraphs. By computing the 
similarity of the chains contained in two different articles, 
we can decide whether or not to place a link between them. 
We also describe the results of an evaluation performed to 
test the methodology. 
1 Introduction 
A survey, reported in Outing (1996), found that there 
were 1,115 commercial newspaper online services world- 
wide, 94% of which were on the World-Wide Web 
(WWW). Of these online newspapers, 73% are in North 
America. Outing predicted that the number of newspa- 
pers online would increase to more than 2,000 by the end 
of 1997. 
The problem is that these services are not making full 
use of the hypertext capabilities of the WWW. The user 
may be able to navigate to a particular article in the cur- 
rent edition of an online paper by using hypertext links, 
but they must then read the entire article to find the in- 
formation that interests them. These databases are "shal- 
low" hypertexts; the documents that are being retrieved 
are dead ends in the hypertext, rather than offering start- 
ing points for explorations. In order to truly reflect the 
hypertext nature of the Web, links should to be placed 
within and between the documents. 
As Westland (1991) has pointed out, manually creat- 
ing and maintaining the sets of links needed for a large- 
scale hypertext is prohibitively expensive. This is espe- 
cially true for newspapers, given the volume of articles 
Work done at the Department of Computer Science of the Univer- 
sity of Toronto 
produced every day. This could certainly account for the 
state of current WWW newspaper efforts. Aside from the 
time-and-money aspects of building such large hypertexts 
manually, humans are inconsistent in assigning hypertext 
links between the paragraphs of documents (Ellis et al., 
1994; Green, 1997). That is, different linkers disagree 
with each other as to where to insert hypertext links into 
a document. 
The cost and inconsistency of manually constructed 
hypertexts does not necessarily mean that large-scale hy- 
pertexts can never be built. It is well known in the IR 
community that humans are inconsistent in assigning in- 
dex terms to documents, but this has not hindered the 
construction of automatic indexing systems intended to 
be used for very large collections of documents. Simi- 
larly, we can turn to automatically constructed hypertexts 
to address the issues of cost and inconsistency. 
In this paper, we will describe a novel method for 
building hypertext links within and between newspaper 
articles. We have selected newspaper articles for two 
main reasons. First, as we stated above, there is a grow- 
ing number of services devoted to providing this informa- 
tion in a hypertext environment. Second, many newspa- 
per articles have a standard structure that we can exploit 
in building hypertext links. 
Most of the proposed methods for automatic hypertext 
construction rely on term repetition. The underlying phi- 
losophy of these systems is that texts that are related will 
tend to use the same terms. Our system is based on lexi- 
cal chaining and the philosophy that texts that are related 
will tend to use related terms. 
2 Lexical chains 
A lexical chain (Morris and Hirst, 1991) is a sequence of 
semantically related words in a text. For example, ifa text 
contained the words apple and fruit, they would appear in 
a chain together, since apple is a kind of fruit. Each word 
in a text may appear in only one chain, but a document 
will contain many chains, each of which captures a por- 
tion of the cohesive structure of the document. Cohesion 
Green 101 Automatically generating hypertext 
Stephen J. Green (1998) Automatically generating hypertext in newspaper articles by computing semantic relatedness. In 
D.M.W. Powers (ed.) NeMLaP3/CoNLL98: New Methods in Language Processing and Computational Natural Language 
is what, as Halliday and Hasan (1976) put it, helps a text 
"hang together as a whole". The lexical chains contained 
in a text will tend to delineate the parts of the text that are 
"about" the same thing. Morris and Hirst showed that the 
organization of the lexical chains in a document mirrors, 
in some sense, the discourse structure of that document. 
The lexical chains in a text can be identified using any 
lexical resource that relates words by their meaning. Our 
current lexical chainer (based on the one described by St- 
Onge, 1995) uses the WordNet database (Beckwith et al., 
199 I). The WordNet database is composed of synonym 
sets or synsets. Each synset contains one or more words 
that have the same meaning. A word may appear in many 
synsets, depending on the number of senses that it has. 
Synsets can be connected to each other by several dif- 
ferent types of links that indicate different relations. For 
example, two synsets can be connected by a hypernym 
link, which indicates that the words in the source synset 
are instances of the words in the target synset. 
For the purposes of chaining, each type of link between 
WordNet synsets is assigned a direction of up, down, or 
horizontal. Upward links correspond to generalization: 
for example, an upward link from apple to fruit indicates 
that fruit is more general than apple. Downward links 
correspond to specialization: for example, a link from 
fruit to apple would have a downward direction. Hori- 
zontal links are very specific specializations. For exam- 
ple, the antonymy relation in WordNet is given a direc- 
tion of horizontal, since it specializes the sense of a word 
very accurately, that is, if a word and its antonym appear 
in a text, the two words are very likely being used in the 
senses that are antonyms. 
Given these types of links, three kinds of relations are 
built between words: 
Extra strong An exwa strong relation is said to exist be- 
tween repetitions of the same word: i.e., term repe- 
tition. 
Strong A strong relation is said to exist between words 
that are in the same WordNet synset (i.e., words that 
are synonymous). Strong relations are also said to 
exist between words that have synsets connected by 
a single horizontal link or words that have synsets 
connected by a single IS-A or INCLUDES relation. 
Regular A regular relation is said" to exist between two 
words when there is at least one allowable path 
between a synset containing the first word and a 
synset containing the second word in the WordNet 
database. A path is allowable if it is short (less than 
n links, where n is typically 3 or 4) and adheres to 
three rules: 
1. No other direction may precede an upward 
link. 
2. No more than one change of direction is al- 
lowed. 
3. A horizontal link may be used to move from 
an upward to a downward direction. 
When a word is processed during chaining, it is ini- 
tially associated with all of the synsets of which it is a 
member. When the word is added to a chain, the chainer 
attempts to find connections between the synsets associ- 
ated with the new word and the synsets associated with 
words that are already in the chain. Synsets that can 
be connected are retained and all others are discarded. 
The result of this processing is that, as the chains are 
built, the words in the chains are progressively sense- 
disambiguated. When an article has been chained, a de- 
scription of the chains contained in the document is writ- 
ten to a file. Table 1 shows some of the chains that were 
recovered from an article about the trend towards "virtual 
parenting" (Shellenbarger, 1995). In this table, the num- 
bers in parentheses show the number of occurrences of a 
particular word. 
The process of lexical chaining is not perfect, but if 
we wish to process articles quickly, then we must ac- 
cept some errors or at least bad decisions. In our sam- 
ple article, for example, chain 1 is a conglomeration of 
words that would have better been separated into differ- 
ent chains. This is a side effect of the current implemen- 
tation of the lexical chainer, but even with these difficul- 
ties, we are able to perform useful tasks. We expect to 
address some of these problems in subsequent versions 
of the chainer, hopefully with no loss in efficiency. 
3 Building links within an article 
3.1 Analyzing the iexicai chains 
Newspaper articles are written so that one may stop read- 
ing at the end of any paragraph and feel as though one 
has read a complete unit. For this reason, it is natural to 
choose to use paragraphs as the nodes in our hypertext. 
Table 1 showed the lexical chains recovered from a news 
article about the trend towards "virtual parenting". Figure 
1 shows the second and eighth paragraphs of this article 
with the words that participate in lexical chains tagged 
with their chain numbers. We will use this particular arti- 
cle to illustrate the process of building intra-article links. 
The first step in the process is to determine how im- 
portant each chain is to each paragraph in an article. We 
judge the importance of a chain by calculating the frac- 
tion of the content words of the paragraph that are in that 
chain. We refer to this fraction as the density of that chain 
in that paragraph. The density of chain c in paragraph p, 
dc,p, is defined as: 
dc,p ~ Wc,p wp 
Green 102 Automatically generating hypertext 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
II 
II 
II 
II 
/ 
/ 
/ 
l 
/ 
/ 
/ 
/ 
/ 
/ 
/ 
/ 
/ 
/ 
/ 
II 
l 
Table I: 
Word Syn 
working (5) 40755 
ground (I) 58279 
field (1) 57992 
antarctica (I) 58519 
michigan (I) 57513 
feed (I) 53429 
chain (I) 57822 
hazard (1) 77281 
risk ( 1 ) 77281 
young (2) 24623 
need (1) 58548 
parent (7) 62334 
kid (3) 60256 
child (1) 60256 
baby (1) 59820 
wife (1) 63852 
adult (I) 59073 
traveller (3) 59140 
substitute (1) 63327 
backup (1) 63327 
computer(l) 60118 
Some lexical chains from the virtual parenting article. 
4 
10 
C Word 
expert(I) 
mark ( 1 ) 
worker (I) 
speaker (1) 
advertiser (I) 
entrepreneur (I) 
engineer (1) 
sitter (I) 
consultant (2) 
management_consultant ( I ) 
man (1) 
flight_attendant (I) 
folk (1) 
family (4) 
management (2) 
professor (i) 
conference (1) 
meeting (I) 
school (I) 
university (I) 
company (!) 
Sya 
59108 
60270 
59145 
63258 
59643 
60889 
59101 
59827 
59644 
61903 
619O2 
63356 
54362 
54362 
55578 
62638 
55372 
55371 
55261 
55299 
54918 
C Word 
12 giving (I) 
pushing (!) 
push (1) 
high-tech (2) 
19 planning (1) 
arranging ( 1 ) 
21 good_night(l) 
wish (l) 
22 phone (2) 
cellular.phone (I) 
fax (2) 
gear (1) 
joint (2) 
junction (1) 
network (I) 
system (2) 
audiotape (1) 
gadget (I) 
23 feel (I) 
kissing (I) 
Syn 
19911 
20001 
2000 I 
19957 
23089 
23127 
48074 
48061 
40017 
33808 
35302 
32030 
36574 
36604 
37247 
32196 
39983 
32428 
22808 
22806 
Although no one is pushing 12 virtual-reality headgear 16 as a substitute I for parents I, many I 
technical ad campaigns 13 are promoting cellular phones ~, faxes ~ , computers I and pagers to" l 
working I parents ! as a way of bridging separations 17 from their kids I . A recent promotion 13 
by A T & T and Residence 2 Inns 7 in the United States 6 , for example 3, suggests that business 3 
travellers I with young j children use video 3 and audio tapes ~, voice 3 mail 3, videophones and 
E-mail to stay 3 connected, including kissing ~ the kids I good night 21 by phone 22. 
More advice 3 from advertisers t: Business s travellers I can dine with their kids t by speakerL 
phone or "tuck them in" by cordless phone z2. Separately, a management I0 newsletter 24 rec- 
ommends faxing your child I when you have to break 17 a promise 3 to be home 2 or giving 12 a 
young I child I a beeper to make him feel ~ more secure when left "s alone. 
Figure 1: Two portions of a text tagged with chain numbers. 
where wc,p is the number of words from chain c that 
appear in paragraph p and w v is the number of content 
words (i.e., words that are not stop words) in p. For ex- 
ample, if we consider paragraph two of our sample arti- 
cle, we see that there are 9 words from chain 1. We also 
note that there are 48 content words in the paragraph. So, 
in this case the density of chain 1 in paragraph I, dr,z, is 
9 4-g = 0.19. 
The result of these calculations is that each paragraph 
in the article has associated with it a vector of chain den- 
sities, with an element for each of the chains in the article. 
Table 2 shows these chain density vectors for the chains 
shown in table I. Note that an empty element indicates a 
density of 0. 
3.2 Determining paragraph links 
As we said earlier, the parts of a document that are about 
the same thing, and therefore related, will tend to contain 
the same lexical chains. Given the chain density vectors 
that we described above, we need to develop a method to 
determine the similarity of the sets of chains contained in 
each paragraph. The second stage of paragraph linking, 
therefore, is to compute the similarity between the para- 
graphs of the article by computing the similarity between 
the chain density vectors representing them. We can com- 
pute these similarities using any one of 16 similarity co- 
efficients that we have taken from Ellis et al. (1994). 
This similarity is computed for each pair of chain den- 
sity vectors, giving us a symmetric p x p matrix of simi- 
laritie s, where p is the number of paragraphs in the arti- 
cle. From this matrix we can calculate the mean and the 
standard deviation of the paragraph similarities. 
The next step is to decide which paragraphs should be 
linked, on the basis of the similarities computed in the 
previous step. We make this decision by looking at how 
the similarity of two paragraphs compares to the mean 
paragraph similarity across the entire article. Each sim- 
ilarity between two paragraphs i and j, si,j, is converted 
Green 103 Automatically generating hypertext 
Table 2: Some chain density vectors for the virtual parenting article. 
Chain 
1 
4 
10 
12 
19 
21 
22 
23 
1 
0.14 
0.07 
Chain Words 8 
Content 14 
Density 0.57 
Paragraph 
2 3 4 5 6 7 8 / 9 10 
0.19 0.07 0.16 0.28 0.18 0.10 0.25 \[ 0.24 0.13 
0.11 0.05 0.03 0.03 
0.07 0.05 0.11 0.04 0.03 
0.02 0.04 0.05 0.04 0.03 
0.04 0.06 
I1 
0.33 
0.02 0.05 
0.08 0.04 0.05 0.I1 0.07 0.07 0.08 0.03 
0.02 0.04 
30 15 15 10 15 16 19 20 15 6 
48 27 19 18 28 29 28 38 30 9 
0.62 0.56 0.79 0.56 0.54 0.55 0.68 0.53 0.5'0 0.67 
Table 3: Adjacency matrix for the virtual parenting arti- 
cle. 
Par 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
1 2 3 4 5 6 7 8 9 10 11 
0 0 0 0 0 0 0 0 0 0 0 
0 0 0 1 0 0 I I 1 0 
0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 1 I 0 0 
0 0 0 0 1 0 
0 0 0 1 0 
0 1 0 0 
0 0 I 
0 0 
0 
to a z-score, zi,j. If two paragraphs are more similar than 
a threshold given in terms of a number of standard de- 
viations, then a link is placed between them. The result 
is a symmetric adjacency matrix where a 1 indicates that 
a link should be placed between two paragraphs. Figure 
3 shows the adjacency matrix that is produced when a z- 
score threshold of 1.0 is used to compute the links for our 
virtual parenting example. 
Once we have decided which paragraphs should be 
linked, we need to be able to produce a representation 
of the hypertext that can be used for browsing. In the 
current system, there are two ways to output the HTML 
representation of an article. The first simply displays all 
of the links that were computed during the last stage of 
the process described above. The second is more compli- 
cated, showing only some of the links. The idea is that 
links between physically adjacent paragraphs should be 
omitted so that they do not clutter the hypertext. 
4 Building links between articles 
While it is useful to be able to build links within articles, 
for a large scale hypertext, links also need to be placed 
between articles. You will recall from section 2 that the 
output of the lexical chainer is a list of chains, each chain 
consisting of one or more words. Each word in a chain 
has associated with it one or more synsets. These synsets 
indicate the sense of the word as it is being used in this 
chain. An example of the kind of output produced by 
the ehainer is shown in table 4, which shows a portion of 
the chains extracted from an article (Gadd, 1995b) about 
cuts in staffat children's aid societies due to a reduction 
in provincial grants. Table 5 shows a portion of another 
set of chains, this time from an article (Gadd, 1995a) de- 
scribing the changes in child-protection agencies, due in 
part to budget cuts. 
It seems quite clear that these two articles are related, 
and that we would like to place a link from one to the 
other. It is also clear that the words in these two articles 
display both of the linguistic factors that affect IR per- 
formance, namely synonymy and polysemy. For exam- 
ple, the first set of chains contains the word abuse, while 
the second set contains the synonym maltreatment. Sim- 
ilarly, the first set of chains includes the word kid, while 
the second contains child. The word abuse in the first ar- 
ticle has been disambiguated by the lexieal chainer into 
the "cruel or inhuman treatment" sense, as has the word 
maltreatment from the second article. We once again note 
that the lexieal chaining process is not perfect: for exam- 
ple, both texts contain the word abuse, but it has been 
d.isambiguated into different senses-- in the first article, 
it is meant in the sense of "ill-treatment", while in the 
second it is meant in the sense of "verbal abuse". 
Although the articles share a large number of words, 
by missing the synonyms or by making incorrect (or no) 
judgments about different senses, a traditional IR system 
might miss the relation between these documents or rank 
them as less related than they really are. Aside from the 
problems of synonymy and polysemy, we can see that 
there are also more-distant relations between the words of 
these two articles. For example, the second set of chains 
Green 104 Automatically generating hypertext 
II 
1 
Ii 
1 
1 
l 
1 
II 
1 
II 
II 
Ii 
II 
II 
l 
/ 
/ 
I 
I 
l 
I 
I 
I 
/ 
I 
I 
I 
/ 
/ 
I 
Table 4: Some lexical chains from an articl Word Syn C Word 
society (7) 54351 annual (1) 
group (I) 19698 5 ontario (I) 
mother(l) 62088 canadian (I) 
parent (4) 62334 
kid (1) 60256 burlington (1) 
recruit (!) 62769 union (3) 
employee (2) 60862 10 saying (1) 
worker (2) 59145 interview (2) 
computer(l) 60118 27 try(1) 
teen-ager (2) 59638 seeking (1) 
provincial (3) 62386 acting (1) 
face (I) 59111 services (I) 
spokesman (I) 63287 work (3) 
insolvent (I) 59869 risk (2) 
about cuts in children's aid societies. Syn C Word Syn' 
64656 care (I) 22204 
56918 social_work (l) 24180 
58424 slowdown (1) 23640 
59296 abuse (3) 21214 
57612 child..abuse ( l ) 21215 
57424 neglect ( 1 ) 21235 
50294 28 living (I) 7562~ 
50268 standing (I) 75573 
22561 complaint ( I ) 76270 
22571 agency (I) 75786 
21759 stress (1) 76799 
21922 76906 
21919 32 executive_director (2) 60922 
22613 manager (l) 59634 
Table 5: Some lexical chains from Word Syn C Word 
wit (I) 48647 guardian (1) 
play ( I ) 48668 official (I) 
abuse (4) 48430 worker (1) 
cut (4) 48431 neighbour (1) 
criticism (1) 48406 youngster (1) 
recommendation (I) 48310 kid (2) 
case (1) 48682 natural (1) 
problem (I) 48680 lawyer (2) 
question (3) 48679 professional (I) 
child ( 1 O) 60256 prostitute ( 1 ) 
parent (9) 62334 provincial (2) 
mother (3) 62088 welfare_worker (1) 
daughter (1) 60587 lorelei (1) 
foster.home (I) 54374 god (I) 
society (5) 54351 4 protection (2) 
at_home (i) 55170 care (5) 
social (1) 55184 preservation (2) 
function (1) 55154 judgment (I) 
expert (3) 59108 act (1) 
human (1) 19677 behaviour (I) 
related article. 
Syn 
59099 
62223 
59145 
62152 
60255 
60255 
62139 
61725 
62636 
62660 
62386 
63220 
61833 
58615 
22672 
22721 
22676 
22881 
19697 
24235 
C Word 
making (1) 
calling (I) 
services (2) 
prevention (l) 
supply (1) 
providing (3) 
maltrea~'nent (2) 
child.abuse (2) 
investigation (I) 
research (I) 
investigating ( 1 ) 
work (1) 
aid (9) 
social.work ( 1 ) 
risk (1) 
dispute (1) 
intervention (1) 
fail (1) 
Syn 
24236 
23076 
21911 
21922 
23683 
23596 
23596 
21214 
21215 
22142 
22143 
22142 
21885 
22204 
24180 
22613 
24051 
24317 
19811 
contains the word maltreatment while the first set con- 
tains the related word child abuse (a kind of maltreat- 
ment) as well as the repetition of child abuse. 
We can build these inter-article links by determining 
the similarity of the two sets of chains contained in two 
articles. In essence, we wish to perform a kind of cross- 
document chaining. 
4.1 Synset weight vectors 
We can represent each document in a database by two 
vectors. Each vector will have an element for each synset 
in WordNet. An element in the first vector will contain 
a weight based on the number of occurrences of that par- 
ticular synset in the words of the chains contained in the 
document. An element in the second vector will contain 
a weight based on the number of occurrences of that par- 
ticular synset when it is one link away from a synset as- 
sociated with a word in the chains. We will call these 
vectors the member and linked synset vectors, or simply 
the member and linked vectors, respectively. 
The weight of a particular synset in a particular docu- 
ment is not based solely on the frequency of that synset 
in the document, but also on how frequently that term ap- 
pears throughout the database. The synsets that are the 
most heavily weighted in a document are the ones that 
appear frequently in that document but infrequently in 
the entire database. The weights are calculated using the 
standard ff-idf weighting function: 
Wik =- sf ik" log(N/nk) 
~/Y~= t (sf ij) 2. (log(N lnj) )2 
where sfik is the frequency of synset k in document i, N 
is the size of the document collection, n, is the number 
of documents in the collection that contain synset k, and 
s is the number of synsets in all documents. Note that 
this equation incorporates the normalization of the synset 
weight vectors. 
The weights are calculated independently for the mem- 
ber and linked vectors. We do this because the linked 
vectors introduce a large number of synsets that do not 
necessarily appear in the original chains of an article, and 
should therefore not influence the frequency counts of the 
member synsets. Thus, we make a distinction between 
Green 105 Automatically generating hypertext 
strong links that occur due to synonymy, and strong links 
that occur due to IS-A or INCLUDES relations. The simi- 
larity between two documents, DI and/32, is then deter- 
mined by calculating three cosine similarities: 
1. The similarity of the member vectors of DI and/)2; 
2. The similarity of the member vector of Dl and 
linked vector olD2; and 
3. The similarity of the linked vector of Di and the 
member vector of D2. 
Clearly, the first similarity measure (the member- 
member similarity) is the most important, as it will cap- 
ture extra-strong relations as well as strong relations be- 
tween synonymous words. The last two measures (the 
member-linked similarities) are less important as they 
capture strong relations that occur between synsets that 
are one link away from each other. If we enforce a thresh- 
old on these measures of relatedness, then we ensure that 
there are several connections between two articles, since 
each element of the vectors will contribute only a small 
part of the overall similarity. 
4.2 Building inter-article finks 
Once we have built a set of synset weight vectors for a 
collection of documents, the process of building links be- 
tween articles is relatively simple. Given an article that 
we wish to build links from, we can compute the simi- 
larity between the article's symet weight vectors and the 
vectors of all other documents. Documents whose mem- 
ber vectors exceed a given threshold of similarity will 
have a link placed between them. Our preliminary work 
shows that a threshold of 0.15 will include most related 
documents while excluding many unrelated documents. 
This is almost exactly the methodology used in vector- 
space IR systems such as SMART, with the difference 
being that for each pair of documents we are calculating 
three separate similarity measures. The best way to cope 
with these multiple measurements seems to be to rank 
related documents by the sum of the three similarities. 
The sum of the three similarities can lie, theoretically, 
anywhere between 0 and 3. In practice, the sum is usually 
less than 1. For example, the average sum of the three 
similarities when running the vectors of a single article 
against 5,592 other articles is 0.039. 
5 Evaluation 
In the evaluation that we conducted, the basic question 
that we asked was: Is our hypertext linking methodology 
superior to other methodologies that have been proposed 
(e.g., that of Allan, 1995)? The obvious way to answer 
the question was to test whether the links generated by 
our methodology lead to better performance when they 
were used in the context of an appropriate IR task. 
We selected a question-answering task for our study. 
We made this choice because it appears that this kind 
of task is well suited to the browsing methodology that 
hypertext links are meant to support. This kind of task 
is also useful because it can be performed easily using 
only hypertext browsing. This is necessary because in the 
interface used for our experiment, no query engine was 
provided for the subjects. 
We used the "Narrative" section of three TREC topics 
(Harman, 1994) to build three questions for our subjects 
to answer. There were approximately 1996 documents 
that were relevant to the topics from which these ques- 
tions were created. We read these documents and pre- 
pared lists of answers for the questions. Our test database 
consisted of these articles combined randomly with ap- 
proximately 29,000 other articles selected randomly from 
the TREC corpus. The combination of these articles pro- 
vided us with a database that was large enough for a 
reasonable evaluation and yet small enough to be easily 
manageable. 
5.1 The test system 
We considered two possible methods for generating inter- 
article hypertext links. The first is our own method, de- 
scribed above. The second method uses a vector space IR 
system called Managing Gigabytes (MG) (Witten et al., 
1994) to generate links by calculating a document simi- 
laxity that is based strictly on term repetition. We used the 
MG system to generate links in a way very similar to that 
presented in Allan (1995). For simplicity's sake, we will 
call the links generated by our technique HT links and the 
links generated by the MG system MG links. 
Figure 2 shows the interface of the test system used. 
The main part of the screen showed the text of a single 
article. The subjects could navigate through the article 
by using the intra-article links, a scroll bar, or the page 
up and down keys. The Previous Article and Next Article 
buttons could be used for navigating through the set of ar- 
ticles that had been visited and the Back button returned 
the user to the point from which an intra-article link was 
taken. Each search began on a "starter" page that con- 
mined the text of the appropriate TREC topic as the "ar- 
ticle" and the list of articles related to the topic shown 
(this was computed by using the text of the topic as the 
initial "query" to the database). Subjects were expected 
to traverse the links, writing down whatever answers they 
could find. 
At each stage during a subject's browsing, a set of 
inter-article links was generated by combining the set of 
I-IT links and the set of MG links. By using this strat- 
egy, the subjects "vote" for the system that they prefer by 
choosing the links generated by that system. Of course, 
the subjects are not aware of which system generated the 
links that they are following -- they can only decide to 
Green 106 Automatically generating hypertext 
1 
1 
1 
1 
II 
1 
1 
II 
1 
II 
II 
II 
1 
1 
1 
II 
1 
II 
II 
il 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
I! 
II 
File Article Help 
Previous Article 
I Next Article 
Back 
I Jurno to 
Reloted I 
I Arh'clet=. 
Here is the Headline of the Article 
Here is a subheading 
The text 0t the arlJcle thal you're viewing goes here. If you're looking at 
it and you decide that it's relevant to the query that you're trying to 
ans',tc, r, then you should write down the answer! 
• Here is a link that will,.. • This is another rink... 
Headline 
Here is the headline of an article that you can jump to. 
Try clicking on me to jump to a new article! 
Figure 2: The interface of the evaluation system. 
follow a link by considering the article headlines dis- 
played as anchors. We can, however, determine which 
system they "voted" for by considering their success in 
answering the questions they were asked. If we can show 
that their success was greater when they followed more 
I-IT links, then we can say that they have "voted" for the 
superiority of HT links. A similar methodology has been 
used previously by Nordhausen et al. (1991) in their com- 
parison of human and machine-generated hypertext links. 
The two sets of inter-article links can be combined by 
simply taking the unique links from each set, that is, the 
links that we take are those that appear in only one of 
the sets of links. Of Course, we would expect the two 
methods to have many links in common, but it is diffi- 
cult to tell how these links should be counted in the "vot- 
ing" procedure. By leaving them out, we test the differ- 
ences between the methods rather than their similarities. 
Of course, by excluding the links that the methods agree 
on we are reducing the ability of the subjects to find an- 
swers to the questions that we have posed for them. In 
fact, we found that nearly 40% of the links found were 
found by both methods. It does seem, however, that the 
users could find enough answers to give some interesting 
results. 
5.2 Experimental results 
The number of both inter- and intra-articte links followed 
was, on average, quite small and variable (full data are 
given in Green, 1997). The number of correct answers 
found was also low and variable, which we believe is due 
partly to the methodology and partly to the time restric- 
tions placed on the searches (15 minutes). On average, 
the subjects showed a slight bias for HT links, choosing 
47.9% MG links and 52.1% HT links. This is interesting, 
especially in light of the fact that, for all the articles the 
subjects visited, 50.4% of the links available were MG 
links, while 49.6% were HT links. A paired t-test, how- 
ever indicates that this difference is not significant. 
For the remainder of the discussion, we will use the 
variable LHT tO refer to the number of HT links that a 
subject followed, LMG to refer to the number of MG links 
followed, and L/to refer to the number of intra-article 
links followed. The variable Ans will refer to the number 
of correct answers that a subject found. We can combine 
LHr and LMG into a ratio, LR = ~u-'~G" If LR > 1, then a 
" W . M~ . subject folio ed more HT links than MG hnks. An inter- 
esting question to ask is: did subjects with significantly 
higher values for LR find more answers? With 23 subjects 
each answering 3 questions, we have 69 values for LR. If 
we sort these values in decreasing order and divide the 
resulting list at the median, we have two groups with a 
significant difference in LR. An unpaired t-test then tells 
us that the differences in Ans for these two groups are 
significant at the 0. I level. 
So it seems that there may be some relationship be- 
tween the number and kinds of links that a subject fol- 
lowed and his or her success in finding answers to the 
questions pose. We can explore this relationship using 
two different regression analyses, one incorporating only 
inter-article links and another incorporating both inter- 
and intra-article links. These analyses will express the 
relationship between the number of links followed and 
the number of correct answers found. 
5.2.1 Inter-article links 
A model incorporating only the inter-article links that 
our subjects followed gives us the following equation: 
Green 107 Automatically generating hypertext 

II 
II 
il 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
16 
14 
12 
10 
8 
6 
I I I 
A 
-- t 44 A 
• 4 • 
-- • AA 4. 4&A 
~ A •• • 
4 A • 
0 "'|" I , 
0 1 2 
I I I 
, Data Aria 
= 3.65 + 0.56- LR 
/ 
J 
I I I I 
3 4 5 6 
LR 
Figure 3: Data and regression line for a two-dimensional model. 
we can see a set of subjects (the High Web group) who 
found significantly more answers and followed signifi- 
cantly more I-IT links, indicating the advantage of HT 
links over MG links. 
5.2.4 Viewed answers 
In the analyses that we've performed to this point, we 
have been using the number of correct answers that the 
subjects provided as our dependent variable. Part of the 
reason we are using this dependent variable is that the 
subjects were limited in the amount of time that they 
could spend on each search, and so they could only find a 
certain number of answers, no matter how many answers 
there were to find. We can mitigate this effect by intro- 
ducing a new dependent variable, Ansv, or the number of. 
viewed answers. 
The number of viewed answers for a particular ques- 
tion is simply the number of answers that were contained 
in articles that a subject visited while attempting to an- 
swer a question. These answers need not have been writ- 
ten down. We are merely saying that, given more time, 
the subjects might have been able to read the article more 
fully and find these answers. This idea is analogous to the 
use of judged and viewed recall by Golovchinsky (1997) 
in his studies. 
When we consider Ansi, as our dependent variable, the 
model for the High Web group is still not significant, and 
there is still a high probability that the coefficient of L/ 
is 0. For our Low Web group, who followed signifi- 
cantly more intra-article links than the High Web group, 
the model that results is significant and has the following 
equation: 
Ansv = 0.58.L,~r + 0.21 .LMG + 0.21 "L1 (R 2 = 0.41) 
Table 9: 
model using viewed answers. 
Parameter Value t 
Ltcr 0.58 4.37 
LMG 0.21 1.62 
L! 0.21 2.19 
95% confidence intervals for coefficients in a 
p Low High 
0.00 0.31 0.85 
0.06 -0.05 0.47 
0.02 0.01 0.40 
Table 9 shows the 95% confidence intervals for this 
model. We see that the coefficient of Lt is always pos- 
itive, indicating some effect on Ansv from intra-article 
links. We also see that the probability that this coeffi- 
cient is 0 is less than 0.02. We note, however, that for 
this model we earmot claim that the coefficient of LHr is 
always greater than the coefficient of LMG. This is not 
too surprising in light of the fact that the High Web group 
chose significantly more HT links than did the Low Web 
group. 
6 Conclusions and future work 
Our evaluation shows that we cannot reject our null hy- 
pothesis that there is no difference in the two methods for 
generating inter-article links. Having said this, we can 
demonstrate a partition of the subjects such that the only 
significant differences between them are the number of 
HT links followed and the number of answers found. Fur- 
thermore, we determined that the probability of obtaining 
results such as these by chance is less than 0.1. Our in- 
ability to achieve a significant result may be due to several 
implementation factors, described in Green (1997). Thus, 
we conclude that we need to replicate the experiment in 
order to gain further information about the relationship 
between the two kinds of inter-article links. 
Green 109 Automatically generating hypertext 

References
James Allan. Automatic hypertext construction. PhD thesis, Cornell. 1995.
Richard Beckwith, Christiane Fellbaum, Derek Gross, and George A. Miller. WordNet: A lexical database organized on psycholinguistic principles. In Uri Zernik, ed., Lexical acquisition: Exploiting on-line resources to build a lexicon, 1991.
David Ellis, Jonathan Furner-Hines, and Peter Willett. The creation of hypertext linkages in full-text documents: Parts I and II. Tech Report, 1994.
Jane Gadd. Child aid on double-edged sword. The Globe and Mail, 1995.
Jane Gadd. Children's aid societies plan staff, services cuts. The Globe and Mail, 1995.
Gene Golvchinsky. From information retrieval to hypertext and back again: The role of interaction in the information exploration interface. PhD thesis, Univ. of Toronto, 1997.
Stephen J. Green. Automaticallu generating hypertext by computing semantic similarity. PhD thesis, Univ. of Toronto, 1997.
M. A. K. Halliday and Ruqaiya Hasan. Cohesion in English. Longman, 1976.
Donna Harman. Overview of the third Text Retrieval Conference (TREC-3). TREC 1994.
Jane Morris and Graeme Hirst. Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics, 1991. 
Bernd Nordhausen, Mark H. Chignell, and John Waterworth. The missing link? Comparison of manual and automated linking in hypertext engineering. Proc. of Human Factors Society 35th annual meeting, 1991.
Steve Outing. Newspapers online: the latest statistics. Editor and Publisher Interactive, 1996. online.
Sue Shellenbarger. High-tech parenting virtually a finger tip away. The Globe and Mail, 1995.
David St-Onge. Detecting and correcting malapropisms with lexical chains. Masters thesis, Univ. of Toronto, 1995.
J. Christopher Westland. Economic constraints in hypertext. JASIS, 1991.
Ian H. Witten, Alistair Moffat, and Timothy C. Bell. Managing gigabytes: compressing and indexing documents and images. Van Nostrand Reinhold. 1994.
