File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/04/w04-1221_abstr.xml

Size: 1,528 bytes

Last Modified: 2025-10-06 13:43:54

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-1221">
  <Title>Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Sets</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> As the wealth of biomedical knowledge in the form of literature increases, there is a rising need for effective natural language processing tools to assist in organizing, curating, and retrieving this information. To that end, named entity recognition (the task of identifying words and phrases in free text that belong to certain classes of interest) is an important first step for many of these larger information management goals.</Paragraph>
    <Paragraph position="1"> In recent years, much attention has been focused on the problem of recognizing gene and protein mentions in biomedical abstracts. This paper presents a framework for simultaneously recognizing occurrences of PROTEIN, DNA, RNA, CELL-LINE, and CELL-TYPE entity classes using Conditional Random Fields with a variety of traditional and novel features. I show that this approach can achieve an overall F1 measure around 70, which seems to be the current state of the art.</Paragraph>
    <Paragraph position="2"> The system described here was developed as part of the BioNLP/NLPBA 2004 shared task.</Paragraph>
    <Paragraph position="3"> Experiments were conducted on a training and evaluation set provided by the task organizers.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML