File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/h01-1040_metho.xml
Size: 15,884 bytes
Last Modified: 2025-10-06 14:07:33
<?xml version="1.0" standalone="yes"?> <Paper uid="H01-1040"> <Title>Intelligent Access to Text: Integrating Information Extraction Technology into Text Browsers</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2. IE AND INFORMATION SEEKING IN LARGE ENTERPRISES </SectionTitle> <Paragraph position="0"> While TRESTLE aims to support information workers in the pharmaceutical industry, most of the functionality it embodies is required in any large enterprise. Our analysis of user requirements at GlaxoSmithKline has led us to distinguish various categories of information seeking. At the highest level we must distinguish requirements for current awareness from those for retrospective search. Current awareness requirements can be further split into general updating (what's happened in the industry news today/this week) and entity or event-based tracking (e.g. what's happened concerning a specific drug or what regulatory decisions have been made).</Paragraph> <Paragraph position="1"> Retrospective search tends to break down into historical tracking of entities or events of interest (e.g. where has a specific person been reported before, what is the clinical trial history of a particular drug) and search for a specific event or a remembered context in which a specific entity played a role.</Paragraph> <Paragraph position="2"> Notice that both types of information seeking require the identification of entities and events in the news - precisely the functionality that IE systems are intended to deliver.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3. THE TRESTLE SYSTEM </SectionTitle> <Paragraph position="0"> tracking (a minor modification of the MUC-6 management succession scenario), clinical trials experimental results (drug, phase of trial, experimental parameters/outcomes) and regulatory announcements (drugs approved, rejected by various agencies).</Paragraph> <Paragraph position="1"> After the IE system outputs the NE tagged texts and scenario templates, an indexing process is run to update indices which are keyed by entity type (person, drug, disease, etc.) and date, and by scenario type and date.</Paragraph> <Paragraph position="2"> The on-line component of TRESTLE is a dynamic web page creation process which responds to the users' information seeking behaviour, expressed as clicks on hypertext links in a browser-based interface, by generating web pages from the information held in the indexed IE results and the original Scrip texts. A basic Information Retrieval component has also been plugged in to TRESTLE to provide users with seamless access to query Scrip texts, i.e., not confined to the pre-defined named entities in the index.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Interface Overview </SectionTitle> <Paragraph position="0"> The interface allows four ways of accessing Scrip: by headline, by named entity category, by scenario summary, and by freetext search. For the three first access routes the date range of Scrip articles accessible may be set to the current day, previous day, last week, last four weeks or full archive.</Paragraph> <Paragraph position="1"> The interface is a browser whose main window is divided into three independently scrollable frames (see Figure 2). An additional frame (the &quot;head frame&quot;) is located at the top displaying the date range options, as well as information about where the user currently is in the system. Down the full length of the left side of the window is the &quot;access frame&quot;, in which text access options are specified. The remainder of the main window is split horizonally into two frames, the upper of which is used to display the automatically generated index information (the &quot;index frame&quot;) and the lower of which is used to present the Scrip articles themselves (the &quot;text frame&quot;).</Paragraph> <Paragraph position="2"> Headline access is the traditional way GSK Scrip users access text, and is retained as the initial default presentation in TRESTLE.</Paragraph> <Paragraph position="3"> In the index frame a list of Scrip headlines is presented in reverse chronological order. Each headline is a clickable link to full text of the article; clicking on one displays the full text in the text frame (like Figure 2, only without the second column in the index frame).</Paragraph> <Paragraph position="4"> Named entity and scenario access are the novel IE-based techniques TRESTLE supports.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 NEAT: Named Entity Access to Text </SectionTitle> <Paragraph position="0"> From the access frame a user selects a category, for example, drugs. The index frame then displays an alphabetically ordered list of drug names extracted from the Scrip texts by the IE engine (Figure 2). To the right of each drug name is the title of the article from which the name was extracted (if a name occurs in multiple texts, there are multiple lines in the index frame). Once again the title is a hyperlink to the text and if followed the full text is displayed in the text frame.</Paragraph> <Paragraph position="1"> When a text is displayed in the text frame, every occurrence of every name which has been identified as a named entity of any category is displayed as a clickable link; furthermore, each name category is displayed in a different colour. Clicking on a name, say a company name (e.g. Warner-Lambert in Figure 2) occurring in a text which was accessed initially via the drug index, updates the index frame with the subset of entries from the index for that name only - in our example, all entries for the selected company.</Paragraph> <Paragraph position="2"> In addition to listing the full drug index alphabetically, the user may also enter a specific drug name in the Index Look-up box in the access frame, and the index frame will then list the titles of all articles containing that drug name.</Paragraph> <Paragraph position="3"> NEAT allows rapid text navigation by named entity. A user with a watching brief on, say diabetes, can start by reviewing recent articles mentioning diabetes, but then follow up all recent references to companies or drugs mentioned in these articles, extending the search back in time as necessary, and at any point branching off to pursue related entities.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 SCAT: Scenario Access to Text </SectionTitle> <Paragraph position="0"> While NEAT allows navigation by named entity, the user still derives information by reading the original Scrip texts. Scenario access to text (SCAT) utilises summaries generated from templates extracted by the scenario template filling component of the IE system to provide access to the source texts. It is based on the observation that many scenarios of interest can be expressed via single sentence summaries. For example, regulatory announcements in the pharmaceutical industry can be captured in a template and summarised via one or more simple sentence schemas such as &quot;Agency approves/rejects/considers Company's Drug for Disease in Jurisdiction&quot;. null To use SCAT a user selects one of the tracking options (keeping track) from the access frame of the interface. A list of one line summaries, one per extracted scenario, is then presented in the index frame. Along with each summary is a link to the source text, which allows the user to confirm the correctness of the summary, or to follow up for more detail/context. Clicking on this link causes the source text to appear in the text frame (see Figure 3).</Paragraph> <Paragraph position="1"> The presence of a summary in a Scrip article is also presented to the user through coloured tracking flags next to the article headline (see Figure 2). This feature can be viewed as a shortcut to the summary facility; clicking the flag gives the generated summary in the text frame together with the link to the source. Of course sufficient information may have been gleaned from the summary alone, obviating the need to read the full text.</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4. PRELIMINARY USER EVALUATION </SectionTitle> <Paragraph position="0"> Although input from users has informed each stage of the design process from the conceptual non-interactive mock-ups to the development of the web-based prototype, this section reports on a preliminary evaluation of user testing of the first fully functional prototype. The aim was to elicit feedback on the presentation and usability of NEAT and SCAT and the overall interface design. The objectives were two-fold. Firstly, and more broadly, to assess to what extent the interface conformed to principles of good usability design such as simplicity, consistency, predictability, and flexibility [7]. Secondly, and more importantly, to focus on the interaction issues presented by NEAT and SCAT: a0 procedurally, in terms of users' ability to move between different search options in a logical and seamless fashion; and a0 conceptually, in terms of users' awareness and understanding of the respective functions for exploiting current and retrospective Scrip headlines and full text.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 Evaluation Methodology </SectionTitle> <Paragraph position="0"> A group of eight users consisting of postgraduate students and research staff were recruited from the Department of Information Studies at the University of Sheffield. The subjects had different interfaces, searching for information online and some knowledge of alerting/current awareness bulletins.</Paragraph> <Paragraph position="1"> The focus of the exercise was to observe user-system interactions 'real time' to gain insight into: a0 ease of use and learnability of the system; a0 preferred strategies for accessing text; a0 problems in interpreting the interface.</Paragraph> <Paragraph position="2"> In addition, user perceptions of the interface were also elicited to provide further explanations on searcher behaviour. A combination of instruments was thus used including a usability questionnaire, verbal protocols and observational notes. Note that this evaluation was a purely quantitative exercise aimed at gaining an understanding of how the users responded to the novel functions offerred by the interface. A further evaluation will take place in an operational setting with real end users from GSK.</Paragraph> <Paragraph position="3"> After a brief introduction to the purpose of the evaluation and a brief overview of the system, users were asked to explore the system in an undirected manner, asking questions and providing comments as they proceeded. Following this, they were asked to carry out a number of tasks from a set of tasks that simulated typical information needs characteristic of real end-users at GSK and were instructed to identify a 'relevant' article for each task. The tasks were designed to exploit the full functionality of the prototype; an example of the task is given below: You've heard that one of your colleagues, Mr Garcia, has recently accepted an appointment at another pharmaceutical company. You want to find out which company he will be moving to and what post he has taken up.</Paragraph> <Paragraph position="4"> The number of tasks completed by each subject varied according to the subject's thoroughness in exploring the system. The order in which the tasks were assigned was random.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 Access Strategies </SectionTitle> <Paragraph position="0"> Access to named entities was made available in three ways: 1. by clicking directly on a list of four categories; 2. through the index look up query box; 3. through the free-text keyword search option. The optimal strategy differed for the different assigned tasks. Most subjects tended to use the index look-up as a first attempt irrespective of its appropriateness for the task in hand. Preference for the use of the index look-up as opposed to selecting more general entity categories may be explained by the fact that users knew what they were looking for (i.e. an artefact of the assigned task). Moreover the query box for the index look-up option may have been a more familiar feature which encouraged searchers to adopt a searching strategy as opposed to browsing named entities. The preference for using the index look-up option over free text searching may have been influenced by the order of presentation as well as the prominence of the text entry box in the access frame. In addition for assigned tasks where the choice was between any of the three entity access strategies, or using the tracking options, the majority of users opted for the entity access via the index look-up. The novelty of the tracking options appeared to be a contributory factor.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.3 User Perceptions 4.3.0.1 Colour Coding. </SectionTitle> <Paragraph position="0"> The colour coding of the named entities was highly noticeable, although there was some disagreement on its usefulness. Of those subjects that found the colour coding unhelpful, it was the choice of colours that they objected to rather than the function of the colour per se. Although subjects claimed that coloured entity links were distracting when reading full news items, the majority indicated that the linking to previous Scrip items was very useful. The distraction often had a positive effect in leading to useful and related articles. The overall integration of the current awareness and retrospective searching functions through named entities was thus widely appreciated.</Paragraph> </Section> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 4.3.0.2 Index Look-up. </SectionTitle> <Paragraph position="0"> All subjects except one found the index look-up function useful, once they discovered that it was a quick way of accessing pre-defined named entity categories. The fact that the approach only provided exact string matching was judged to be limiting.</Paragraph> <Paragraph position="1"> 4.3.0.3 Scenario Tracking.</Paragraph> <Paragraph position="2"> The keeping track option was not as easily understood as the named entity options. The label &quot;keeping track&quot; was misinterpreted by some subjects as a search history function or an alerting service based on user profiles. After having used the tracking facility half of the subjects did, however, correctly understand the function. One problem that arose was the differentiation between summaries presented in SCAT and the actual Scrip headlines. Although the header informed searchers that they were viewing Scrip summaries, the display of the summaries in the same frame where the headlines were normally presented as well as the similarity in content led to confusion.</Paragraph> <Paragraph position="3"> The coloured flags next to the headlines, which were meant to serve as a tracking label to allow users to move seamlessly from headlines to scenario summaries, raised another problem. Not only was the meaning of the flag symbol poorly understood, but also subjects did not realise that they could click on it. Moreover when they clicked on the flag they expected to see a full news item rather than a summary. Hence, the scenario access was both procedurally and conceptually confusing.</Paragraph> </Section> class="xml-element"></Paper>