File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/n06-2038_intro.xml

Size: 2,721 bytes

Last Modified: 2025-10-06 14:03:30

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-2038">
  <Title>A Comparison of Tagging Strategies for Statistical Information Extraction</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The purpose of information extraction (IE) is to find desired pieces of information in natural language texts and store them in a form that is suitable for automatic querying and processing.</Paragraph>
    <Paragraph position="1"> IE requires a predefined output representation (target structure) and only searches for facts that fit this representation. Simple target structures define just a number of slots to be filled with a string extracted from a text (slot filler).</Paragraph>
    <Paragraph position="2"> For this simple kind of information extraction, statistical approaches that model IE as a token classification task have proved very successful.</Paragraph>
    <Paragraph position="3"> These systems split a text into a series of tokens and invoke a trainable classifier to decide for each token whether or not it is part of a slot filler of a certain type. To re-assemble the classified tokens into multi-token slot fillers, various tagging strategies can be used.</Paragraph>
    <Paragraph position="4"> So far, each classification-based IE approach combines a specific tagging strategy with a specific classification algorithm and specific other parameter settings, making it hard to detect how each of these choices influences the results.</Paragraph>
    <Paragraph position="5"> To allow systematic research into these choices, we have designed a generalized IE system that allows utilizing any tagging strategy with any classification algorithm. This makes it possible to compare strategies or algorithms in an identical setting. In this paper, we describe the tagging strategies that can be found in the literature and evaluate them in the context of our framework. We also introduce a new strategy, called Begin/After tagging or BIA, and show that it is competitive to the best other strategies. While there are various approaches that employ a classification algorithm with one of the tagging strategies described below, there are no other comparative analyses of tagging strategies yet, to the best of our knowledge.</Paragraph>
    <Paragraph position="6"> In the next section, we describe how IE can be modeled as a token classification task and explain the tagging strategies that can be used for this purpose. In Sec. 3 we describe the IE framework and the experimental setup used for comparing the various tagging strategies. In Sec. 4 we list and analyze the results of the comparison. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML