File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/h01-1038_metho.xml
Size: 7,077 bytes
Last Modified: 2025-10-06 14:07:34
<?xml version="1.0" standalone="yes"?> <Paper uid="H01-1038"> <Title>Integrated Feasibility Experiment for Bio-Security: IFE-Bio A TIDES Demonstration</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 2. The IFE-Bio System </SectionTitle> <Paragraph position="0"> The current demonstration (March 2001) highlights the basic functionality required by an analyst, including: * Capture of sources, including e-mail, digital library material, news groups, and web-based resources; * Categorizing of the sources into multiple orthogonal hierarchies useful to the analyst, e.g., disease, region, news source, language; * Processing of the information through various stages, including &quot;zoning&quot; of the text to select the relevant portions for processing; named entity detection, event detection, extraction of temporal information, summarization, and translation from Spanish, Portuguese, and Chinese into English; * Access to the information through use of any mail and news group reader, which allows the analyst to organize, save, and share the information in a familiar, readily accessible environment; * Display of the information in alternate forms, including color-tagged documents, tables, summaries, graphs, and geospatial, map-based displays.</Paragraph> <Paragraph position="1"> Figure 1 below shows the overall functionality envisioned for the IFE-Bio system, including capture, categorizing, processing, access and display.</Paragraph> <Paragraph position="2"> Collection capability for the current IFE-Bio system includes email, news groups, journals, and Web resources. We have a complete copy of the ProMED mailings (a moderated source tracking global infectious disease outbreaks), and are routinely collecting other information sources from the World Health Organization and CDC. In addition, we are collecting several general global news feeds. Current volume is around 2000 messages per day; we estimate capacity for the current system at around 4500 messages/day. Once we have integrated a filtering capability, we expect the volume of messages saved in IFE-Bio should drop significantly, since many of the global news services report on a wide range of events and not all need to be passed on to IFE-Bio analysts. The categorizing of sources is done based on the message header. The header is synthesized by extracting key information about disease name, the country, and other relevant information such as type of victim and source of information, as well as date of message receipt.</Paragraph> <Paragraph position="3"> The processing for the current demonstration system uses a limited subset of the Catalyst architecture capabilities and a number of in-house linguistic modules. The linguistic modules in the current demonstration system include tokenization, sentence segmentation, part-of-speech tagging, named entity detection, temporal extraction (Mani and Wilson 2000) and source-specific event detection. In addition, we have incorporated the CyberTrans embedded machine translation system which &quot;wraps&quot; available machine translation engines to make them available via an e-mail or Web interface (Reeder 2000). Single document summarization is performed by the MITRE WebSumm system (Mani and Bloedorn 1999).</Paragraph> <Paragraph position="4"> We carefully chose a light-weight interface mechanism for delivery of the information to the analyst. By treating the incoming streams of data as feeds to a news server, the analyst can inspect and organize the information using a familiar news and e-mail browser. The analyst can subscribe to areas of interest, flag important messages, watch specific threads, and create tailored filters for monitoring outbreaks. The stories are crossed-posted to multiple relevant news groups, based on the information in the header, e.g., a story on Ebola in Africa would be cross posted to the Africa regional newsgroup and to the Ebola disease newsgroup. Search by subject and date allow the analyst to select subsets of the messages for further processing, annotation or sharing. The news client provides notification of incoming messages. In later versions, we plan to integrate topic detection and tracking capabilities, to provide improved filtering and routing of messages, as well as detection of new topics. The use of this simple delivery mechanism provides a familiar environment with almost no learning curve, and it avoids issues of platform and operating system dependence.</Paragraph> <Paragraph position="5"> Finally, the system makes use of several different devices to display the information appropriately. Figure 2 shows the layout of the Netscape news browser interface. It includes the list of newsgroups that have been subscribed to (on the left), the list of messages from the chosen newsgroup (on top), and a particular message with color-coded named entities (including disease terms displayed in red, so that they are easy to spot in the message). What is the status of the current Ebola outbreak? The epidemic is contained; as of 12/22/00, there were was the caus e of the epidemi c hemorr hagic feve r whi ch has been raging in the Gulu district since September. Three of the dea d wer e student nurses, who tre ated the first Ebola patients admitt ed to a Lacor missionary hospital in Gu lu tow n. A task force headed by Gulu district administrator, Walter Ochora, has been set up to co-ordinate efforts to control the epidemic. Field officials in Gul u told the Kampala-based New Http: //ti des2000.mi tre.org/ There are multiple display modalities available. The message in Figure 2 contains a short tabular display in the beginning, identifying disease, region and victim type. Below that is a URL to a document summary, created by MITRE's WebSumm system (see Figure 3 for a sample summary). If an incoming message is in a language other than English, then CyberTrans is called to run code set and language identification modules, and the language is translated into English for further processing. Figure 4 below shows a sample translated message; note that there are a number of untranslated words, but it is still possible to get the gist of the message.</Paragraph> <Paragraph position="6"> In addition, we are working on a mechanism to provide geographic and eventually, temporal display of outbreak information. Figure 5 shows the stages of processing involved. Stage 1 shows onamed entity and temporal tagging to identify the items of interest. These are combined into disease events by further linguistic processing; the result is shown in the table in Stage 2. This spreadsheet of events serves as input for a map-based display, shown in Stage 3. The graph plots number of new cases and number of cumulative cases over time. In the map, the size of the outer dot represents total number of cases to date, and the inner dot represents new cases. This allows the analyst to visualize spread of the disease, as well as the stage of the outbreak (spreading or subsiding).</Paragraph> </Section> class="xml-element"></Paper>