File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/w05-1308_evalu.xml

Size: 3,822 bytes

Last Modified: 2025-10-06 13:59:33

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-1308">
  <Title>IntEx: A Syntactic Role Driven Protein-Protein Interaction Extractor for Bio-Medical Text</Title>
  <Section position="6" start_page="58" end_page="59" type="evalu">
    <SectionTitle>
6 Evaluation &amp; discussion
</SectionTitle>
    <Paragraph position="0"> We have evaluated the performance of our system with two state of the art systems - BioRAT (Corney, Buxton et al. 2004) and GeneWays (Rzhetsky, Iossifov et al. 2004).</Paragraph>
    <Paragraph position="1"> Blaschke and Valencia (Valencia 2001) recommend DIP (Xenarios, Rice et al. 2000) dataset as a benchmark for evaluating biomedical Information Extraction systems. The first evaluation for IntEx system was performed on the same dataset  that was used for the BioRAT evaluation. For BioRAT evaluation, authors identified 389 interactions from the DIP database such that both proteins participating in the interaction had SwissProt entries. These interactions correspond to 229 abstracts from the PubMed. The BioRAT system was evaluated using these 229 abstracts. The interactions extracted by the system were then manually examined by a domain expert for precision and recall. Precision is a measure of correctness of the system, and is calculated as the ratio of true positives to the sum of true positives and false positives. The sensitivity of the system is given by the recall measure, calculated as the ratio of true positives to the sum of true posi- null We have also limited our protein name dictionary to the SwissProt entries. Tables 2 and 3 present the evaluation results as compared with the BioRAT system. A detailed analysis of the sources of all types of errors is shown in Figure 6.</Paragraph>
    <Paragraph position="2">  Dataset was obtained from Dr. David Corney by personal communication.</Paragraph>
    <Paragraph position="3">  abstracts.</Paragraph>
    <Paragraph position="4"> DIP contains protein interactions from both abstracts and full text. Since our extraction system was tested only on the abstracts, the system missed out on some interactions that were only present in the full text of the abstract.</Paragraph>
    <Paragraph position="5"> Second evaluation for the IntEx system was done to test its recall performance using an article  that was also used by the GeneWays (Rzhetsky, Iossifov et al. 2004) system. Both systems performance was tested using the full text of the article (Friedman, Kra et al. 2001). GeneWays system achieves a recall of 65% where as IntEx extracted a total of 44 interactions corresponding to a recall measure of 66 %.</Paragraph>
    <Paragraph position="6"> Conclusion In this paper, we present a fully automated extraction system for identifying gene and protein inter- null Dataset was obtained from Dr. Andrew Rzhetsky by personal communication.</Paragraph>
    <Paragraph position="7"> actions from biomedical text. The source code and documentation of the IntEx system, as well as all experimental documents and extracted interactions are available online at our Web site at http://cips.eas.asu.edu/textmining.htm. Our extraction system handles complex sentences and extracts multiple nested interactions specified in a sentence. Experimental evaluations of the IntEx system with the state of the art semi-automated systems -- the BioRAT and GeneWays datasets indicates that our system performs better without the labor intensive rule engineering requirement. We have shown that a syntactic role-based approach compounded with linguistically sound interpretation rules applied on the full sentence's parse can achieve better performance than existing systems which are based on manually engineered patterns which are both costly to develop and are not as scalable as the automated mechanisms presented in this paper.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML