File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-3318_intro.xml

Size: 2,664 bytes

Last Modified: 2025-10-06 14:04:11

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-3318">
  <Title>Recognizing Nested Named Entities in GENIA corpus</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Named Entity Recognition (NER) is a key task in biomedical text mining, as biomedical named entities usually represent biomedical concepts of research interest (e.g., protein/gene/virus, etc).</Paragraph>
    <Paragraph position="1"> Nested NEs (also called embedded NEs, or cascade NEs) exhibit an interesting phenomenon in biomedical literature. For example, &amp;quot;human immuneodeficiency virus type 2 enhancer&amp;quot; is a DNA domain, while &amp;quot;human immunodeficiency virus type 2&amp;quot; represents a virus. For simplicity, we call the former the outmost entity (if it is not inside another entity), while the later the inner entity (it may have another one inside).</Paragraph>
    <Paragraph position="2"> Nested NEs account for 16.7% of all entities in GENIA corpus (Kim, 2003). Moreover, they often represent important relations between entities (Nedadic, 2004), as in the above example.</Paragraph>
    <Paragraph position="3"> However, there are few results on recognizing them. Many studies only consider the outmost entities, as in BioNLP/NLPBA 2004 Shared Task (Kim, 2004).</Paragraph>
    <Paragraph position="4"> In this work, we use a machine learning method to recognize nested NEs in GENIA corpus. We view the task as a classification problem for each token in a given sentence, and train a SVM model.</Paragraph>
    <Paragraph position="5"> We note that nested NEs make it hard to be considered as a multi-class problem, because a token in nested entities has more than one class label. We therefore treat it as a binary-class problem, using one-vs-rest scheme.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
1.1 Related Work
</SectionTitle>
      <Paragraph position="0"> Overall, our work is an application of machine learning methods to biomedical NER. While most of earlier approaches rely on handcrafted rules or dictionaries, many recent works adopt machine learning approaches, e.g, SVM (Lee, 2003), HMM (Zhou, 2004), Maximum Entropy (Lin, 2004) and CRF (Settles,2004), especially with the availability of annotated corpora such as GENIA, achieving state-of-the-art performance. We know only one work (Zhou,2004) that deals with nested NEs to improve the overall NER performance. However, their approach is basically rule-based and they did not report how well the nested NEs are recognized.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML