File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-3307_evalu.xml
Size: 2,140 bytes
Last Modified: 2025-10-06 13:59:57
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-3307"> <Title>Integrating Co-occurrence Statistics with Information Extraction for Robust Retrieval of Protein Interactions from Medline</Title> <Section position="12" start_page="53" end_page="54" type="evalu"> <SectionTitle> 8 Experimental Results </SectionTitle> <Paragraph position="0"> The results for the HPRD corpus-level extraction are shown in Figure 1. Overall, the integrated model has a more consistent performance, with a gain in precision mostly at recall levels past BGBCB1. The SSK.Max and HG models both exhibit a sudden decrease in precision at around BHB1 recall level. While SSK.Max goes back to a higher precision level, the HG model begins to recover only late at BJBCB1 recall.</Paragraph> <Paragraph position="1"> A surprising result in this experiment is the behavior of the HG model, which is significantly out-performed by PMI, and which does only marginally better than a simple baseline that considers all pairs to be interacting.</Paragraph> <Paragraph position="2"> We also compared the two methods on corpus-level extraction from the entire Medline, using the shared protein function benchmark. As before, we considered only protein pairs occurring in the same sentence, with a minimum frequency count of 5. The resulting 47,436 protein pairs were ranked according to their PMI and HG scores, with pairs that are most likely to be interacting being placed at the top. For each ranking, the LLR score was computed for the top N proteins, where N varied in increments of 1,000.</Paragraph> <Paragraph position="3"> The comparative results for PMI and HG are shown in Figure 2, together with the scores for three human curated databases: HPRD, BIND and Reactome. On the top 18,000 protein pairs, PMI outperforms HG substantially, after which both converge to the same value for all the remaining pairs.</Paragraph> <Paragraph position="4"> Figure 3 shows a comparison of the four aggregation operators on the same HPRD corpus, which confirms that, overall, max is most appropriate for integrating corpus-level results.</Paragraph> </Section> class="xml-element"></Paper>