File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/98/w98-1213_evalu.xml
Size: 10,525 bytes
Last Modified: 2025-10-06 14:00:33
<?xml version="1.0" standalone="yes"?> <Paper uid="W98-1213"> <Title>I I I I I I I I I I I I I I I I I Automatically generating hypertext in newspaper articles by computing semantic relatedness</Title> <Section position="7" start_page="22806" end_page="22806" type="evalu"> <SectionTitle> 5 Evaluation </SectionTitle> <Paragraph position="0"> In the evaluation that we conducted, the basic question that we asked was: Is our hypertext linking methodology superior to other methodologies that have been proposed (e.g., that of Allan, 1995)? The obvious way to answer the question was to test whether the links generated by our methodology lead to better performance when they were used in the context of an appropriate IR task.</Paragraph> <Paragraph position="1"> We selected a question-answering task for our study.</Paragraph> <Paragraph position="2"> We made this choice because it appears that this kind of task is well suited to the browsing methodology that hypertext links are meant to support. This kind of task is also useful because it can be performed easily using only hypertext browsing. This is necessary because in the interface used for our experiment, no query engine was provided for the subjects.</Paragraph> <Paragraph position="3"> We used the &quot;Narrative&quot; section of three TREC topics (Harman, 1994) to build three questions for our subjects to answer. There were approximately 1996 documents that were relevant to the topics from which these questions were created. We read these documents and prepared lists of answers for the questions. Our test database consisted of these articles combined randomly with approximately 29,000 other articles selected randomly from the TREC corpus. The combination of these articles provided us with a database that was large enough for a reasonable evaluation and yet small enough to be easily manageable.</Paragraph> <Section position="1" start_page="22806" end_page="22806" type="sub_section"> <SectionTitle> 5.1 The test system </SectionTitle> <Paragraph position="0"> We considered two possible methods for generating inter-article hypertext links. The first is our own method, described above. The second method uses a vector space IR system called Managing Gigabytes (MG) (Witten et al., 1994) to generate links by calculating a document similaxity that is based strictly on term repetition. We used the MG system to generate links in a way very similar to that presented in Allan (1995). For simplicity's sake, we will call the links generated by our technique HT links and the links generated by the MG system MG links.</Paragraph> <Paragraph position="1"> Figure 2 shows the interface of the test system used.</Paragraph> <Paragraph position="2"> The main part of the screen showed the text of a single article. The subjects could navigate through the article by using the intra-article links, a scroll bar, or the page up and down keys. The Previous Article and Next Article buttons could be used for navigating through the set of articles that had been visited and the Back button returned the user to the point from which an intra-article link was taken. Each search began on a &quot;starter&quot; page that conmined the text of the appropriate TREC topic as the &quot;article&quot; and the list of articles related to the topic shown (this was computed by using the text of the topic as the initial &quot;query&quot; to the database). Subjects were expected to traverse the links, writing down whatever answers they could find.</Paragraph> <Paragraph position="3"> At each stage during a subject's browsing, a set of inter-article links was generated by combining the set of I-IT links and the set of MG links. By using this strategy, the subjects &quot;vote&quot; for the system that they prefer by choosing the links generated by that system. Of course, the subjects are not aware of which system generated the links that they are following -- they can only decide to Here is the Headline of the Article Here is a subheading The text 0t the arlJcle thal you're viewing goes here. If you're looking at it and you decide that it's relevant to the query that you're trying to ans',tc, r, then you should write down the answer! * Here is a link that will,.. * This is another rink...</Paragraph> <Paragraph position="4"> Headline Here is the headline of an article that you can jump to.</Paragraph> <Paragraph position="5"> Try clicking on me to jump to a new article! follow a link by considering the article headlines displayed as anchors. We can, however, determine which system they &quot;voted&quot; for by considering their success in answering the questions they were asked. If we can show that their success was greater when they followed more I-IT links, then we can say that they have &quot;voted&quot; for the superiority of HT links. A similar methodology has been used previously by Nordhausen et al. (1991) in their comparison of human and machine-generated hypertext links. The two sets of inter-article links can be combined by simply taking the unique links from each set, that is, the links that we take are those that appear in only one of the sets of links. Of Course, we would expect the two methods to have many links in common, but it is difficult to tell how these links should be counted in the &quot;voting&quot; procedure. By leaving them out, we test the differences between the methods rather than their similarities. Of course, by excluding the links that the methods agree on we are reducing the ability of the subjects to find answers to the questions that we have posed for them. In fact, we found that nearly 40% of the links found were found by both methods. It does seem, however, that the users could find enough answers to give some interesting results.</Paragraph> </Section> <Section position="2" start_page="22806" end_page="22806" type="sub_section"> <SectionTitle> 5.2 Experimental results </SectionTitle> <Paragraph position="0"> The number of both inter- and intra-articte links followed was, on average, quite small and variable (full data are given in Green, 1997). The number of correct answers found was also low and variable, which we believe is due partly to the methodology and partly to the time restrictions placed on the searches (15 minutes). On average, the subjects showed a slight bias for HT links, choosing 47.9% MG links and 52.1% HT links. This is interesting, especially in light of the fact that, for all the articles the subjects visited, 50.4% of the links available were MG links, while 49.6% were HT links. A paired t-test, however indicates that this difference is not significant.</Paragraph> <Paragraph position="1"> For the remainder of the discussion, we will use the variable LHT tO refer to the number of HT links that a subject followed, LMG to refer to the number of MG links followed, and L/to refer to the number of intra-article links followed. The variable Ans will refer to the number of correct answers that a subject found. We can combine LHr and LMG into a ratio, LR = ~u-'~G&quot; If LR > 1, then a &quot; W . M~ . subject folio ed more HT links than MG hnks. An interesting question to ask is: did subjects with significantly higher values for LR find more answers? With 23 subjects each answering 3 questions, we have 69 values for LR. If we sort these values in decreasing order and divide the resulting list at the median, we have two groups with a significant difference in LR. An unpaired t-test then tells us that the differences in Ans for these two groups are significant at the 0. I level.</Paragraph> <Paragraph position="2"> So it seems that there may be some relationship between the number and kinds of links that a subject followed and his or her success in finding answers to the questions pose. We can explore this relationship using two different regression analyses, one incorporating only inter-article links and another incorporating both interand intra-article links. These analyses will express the relationship between the number of links followed and the number of correct answers found.</Paragraph> <Paragraph position="3"> A model incorporating only the inter-article links that our subjects followed gives us the following equation: we can see a set of subjects (the High Web group) who found significantly more answers and followed significantly more I-IT links, indicating the advantage of HT links over MG links.</Paragraph> <Paragraph position="4"> In the analyses that we've performed to this point, we have been using the number of correct answers that the subjects provided as our dependent variable. Part of the reason we are using this dependent variable is that the subjects were limited in the amount of time that they could spend on each search, and so they could only find a certain number of answers, no matter how many answers there were to find. We can mitigate this effect by introducing a new dependent variable, Ansv, or the number of. viewed answers.</Paragraph> <Paragraph position="5"> The number of viewed answers for a particular question is simply the number of answers that were contained in articles that a subject visited while attempting to answer a question. These answers need not have been written down. We are merely saying that, given more time, the subjects might have been able to read the article more fully and find these answers. This idea is analogous to the use of judged and viewed recall by Golovchinsky (1997) in his studies.</Paragraph> <Paragraph position="6"> When we consider Ansi, as our dependent variable, the model for the High Web group is still not significant, and there is still a high probability that the coefficient of L/ is 0. For our Low Web group, who followed significantly more intra-article links than the High Web group, the model that results is significant and has the following equation:</Paragraph> <Paragraph position="8"> Table 9 shows the 95% confidence intervals for this model. We see that the coefficient of Lt is always positive, indicating some effect on Ansv from intra-article links. We also see that the probability that this coefficient is 0 is less than 0.02. We note, however, that for this model we earmot claim that the coefficient of LHr is always greater than the coefficient of LMG. This is not too surprising in light of the fact that the High Web group chose significantly more HT links than did the Low Web group.</Paragraph> </Section> </Section> class="xml-element"></Paper>