File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-3322_intro.xml

Size: 2,304 bytes

Last Modified: 2025-10-06 14:04:11

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-3322">
  <Title>Extracting Protein-Protein interactions using simple contextual features</Title>
  <Section position="3" start_page="0" end_page="120" type="intro">
    <SectionTitle>
3 Experiments
</SectionTitle>
    <Paragraph position="0"> Each possible combination of proteins and iWords in a sentence was generated as a possible relation 'triple', which combines the relation extraction task with the additional task of finding the iWord to describe each relation. 3400 such triples occur in the data. After each instance is given a probability by the classifiers, the highest scoring instance for each protein pairing is compared to a threshold to decide 2A limited set of words that have been determined to be informative of when a PPI occurs, such as interact, bind, inhibit, phosphorylation. See footnote 1 for complete list.</Paragraph>
    <Paragraph position="1">  the outcome. Correct triples are those that match the iWord assigned to a PPI by the annotators.</Paragraph>
    <Paragraph position="2"> For each instance, a list of features were used to construct a 'generic' model : interindices The combination of the indices of the proteins of the interaction; &amp;quot;P1-position:P2position&amp;quot; null interwords The combination of the lexical forms of the proteins of the interaction; &amp;quot;P1:P2&amp;quot; p1prevword, p1currword, p1nextword The lexical form of P1, and the two words surrounding it p2prevword, p2currword, p2nextword The lexical form of P2, and the two words surrounding it p2pdistance The distance, in tokens, between the two proteins inbetween The number of other identified proteins between the two proteins iWord The lexical form of the iWord iWordPosTag The POS tag of the iWord iWordPlacement Whether the iWord is between, before or after the proteins iWord2ProteinDistance The distance, in words, between the iWord and the protein nearest to it A second model incorporates greater domain-specific features, in addition to those of the 'generic' model : patterns The 22 syntactic patterns used in (Plake et al., 2005) are each used as boolean features3. lemmas and stems Lemma and stem information was used instead of surface forms, using a system developed for the biomedical domain.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML