File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-0311_intro.xml

Size: 2,239 bytes

Last Modified: 2025-10-06 14:01:30

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-0311">
  <Title>Utilizing Text Mining Results: The PastaWeb System</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 IE and its Application to Biomedical
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Texts
</SectionTitle>
      <Paragraph position="0"> Perhaps not surprisingly, the identification of biomedical terms in scientific texts has proved to be the easiest extraction task and has demonstrated acceptable levels of performance, not too far from the best results achieved in the NE task in the MUC competitions, despite differences between the domains (i.e. names of persons, organisations etc. in MUC vs. terms identifying proteins, genes, drugs etc. in biomedical domains). The techniques used for this task vary from rule-based methods (Fukuda et al., 1998; Humphreys et al., 2000), to statistical methods (Collier et al., 2000) and statistical-rule-based hybrids (Proux et al., 1998).</Paragraph>
      <Paragraph position="1"> More complex IE tasks involving the extraction of relational information have also been addressed by the bioinformatics community. These include protein or gene interactions (Sekimizu et al., 1998; Thomas et al., 2000; Pustejovsky et al., 2002), relations between genes and drugs (Rindflesh et al., 2000) and identification of metabolic pathways (Humphreys et al., 2000). The range of techniques used in these systems varies considerably, but in most cases requires the application of more sophisticated NLP methods including part-of-speech tagging, phrasal or syntactic parsing and (for some systems) semantic analysis and discourse processing.</Paragraph>
      <Paragraph position="2"> To date IE researchers working on biological texts have concentrated on building or porting systems to work in biological domains. This paper addresses the issue of utilising the IE results, after describing, in the next section, the underlying PASTA extraction system - what it is designed to extract, how it works, and how well it fares in blind evaluation using conventional evaluation metrics.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML