File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-1060_intro.xml

Size: 5,702 bytes

Last Modified: 2025-10-06 14:03:34

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1060">
  <Title>Factorizing Complex Models: A Case Study in Mention Detection</Title>
  <Section position="2" start_page="0" end_page="473" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Information extraction is a crucial step toward understanding and processing natural language data, its goal being to identify and categorize important information conveyed in a discourse. Examples of information extraction tasks are identification of the actors and the objects in written text, the detection and classification of the relations among them, and the events they participate in. These tasks have applications in, among other fields, summarization, information retrieval, data mining, question answering, and language understanding. null One of the basic tasks of information extraction is the mention detection task. This task is very similar to named entity recognition (NER), as the objects of interest represent very similar concepts.</Paragraph>
    <Paragraph position="1"> The main difference is that the latter will identify, however, only named references, while mention detection seeks named, nominal and pronominal references. In this paper, we will call the identified references mentions - using the ACE (NIST, 2003) nomenclature - to differentiate them from entities whichare the real-world objects (the actual person, location, etc) to which the mentions are referring to1.</Paragraph>
    <Paragraph position="2"> Historically, the goal of the NER task was to find named references to entities and quantity references - time, money (MUC-6, 1995; MUC-7, 1997).</Paragraph>
    <Paragraph position="3"> In recent years, Automatic Content Extraction evaluation (NIST, 2003; NIST, 2004) expanded the task to also identify nominal and pronominal references, and to group the mentions into sets referring to the same entity, making the task more complicated, as it requires a co-reference module. The set of identified properties has also been extended to include the mention type of a reference (whether it is named, nominal or pronominal), its subtype (a more specific type dependent on the main entity type), and its genericity (whether the entity points to a specific entity, or a generic one2), besides the customary main entity type. To our knowledge, little research has been done in the natural language processing context or otherwise on investigating the specific problem of how such multiple labels are best assigned. This article compares three methods for such an assignment.</Paragraph>
    <Paragraph position="4"> The simplest model which can be considered for the task is to create an atomic tag by &amp;quot;gluing&amp;quot; together the sub-task labels and considering the new label atomic. This method transforms the problem into a regular sequence classification task, similar to part-of-speech tagging, text chunking, and named entity recognition tasks. We call this model the all-in-one model. The immediate drawback of this model is that it creates a large classification space (the cross-product of the sub-task classification spaces) and that, during decoding, partially similar classifications will compete instead of cooperate - more details are presented in Section 3.1. Despite (or maybe due to) its relative simplicity, this model obtained good results in several instances in the past, for POS tagging in morphologically rich languages (Hajic and Hladk'a, 1998)  and mention detection (Jing et al., 2003; Florian et al., 2004).</Paragraph>
    <Paragraph position="5"> At the opposite end of classification methodology space, one can use a cascade model, which performs the sub-tasks sequentially in a predefined order. Under such a model, described in Section 3.3, the user will build separate models for each subtask. For instance, it could first identify the mention boundaries, then assign the entity type, subtype, and mention level information. Such a model has the immediate advantage of having smaller classification spaces, with the drawback that it requires a specific model invocation path.</Paragraph>
    <Paragraph position="6"> In between the two extremes, one can use a joint model, which models the classification space in the same way as the all-in-one model, but where the classifications are not atomic. This system incorporates information about sub-model parts, such as whether the current word starts an entity (of any type), or whether the word is part of a nominal mention.</Paragraph>
    <Paragraph position="7"> The paper presents a novel contrastive analysis of these three models, comparing them on several datasets in three languages selected from the ACE 2003 and 2004 evaluations. The methods described here are independent of the underlying classifiers, and can be used with any sequence classifiers. All experiments in this article use our in-house implementation of a maximum entropy classifier (Florian et al., 2004), which we selected because of its flexibility of integrating arbitrary types of features. While we agree that the particular choice of classifier will undoubtedly introduce some classifier bias, we want to point out that the described procedures have more to do with the organization of the search space, and will have an impact, one way or another, on most sequence classifiers, including conditional random field classifiers.3 The paper is organized as follows: Section 2 describes the multi-task classification problem and prior work, Section 3.3 presents and contrasts the threemeta-classificationmodels. Section4outlines the experimental setup and the obtained results, and Section 5 concludes the paper.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML