File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-2401_intro.xml
Size: 6,870 bytes
Last Modified: 2025-10-06 14:02:44
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-2401"> <Title>A Linear Programming Formulation for Global Inference in Natural Language Tasks</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Natural language decisions often depend on the outcomes of several different but mutually dependent predictions. These predictions must respect some constraints that could arise from the nature of the data or from domain or task specific conditions. For example, in part-of-speech tagging, a sentence must have at least one verb, and cannot have three consecutive verbs. These facts can be used as constraints. In named entity recognition, &quot;no entities can overlap&quot; is a common constraint used in various works (Tjong Kim Sang and De Meulder, 2003).</Paragraph> <Paragraph position="1"> Efficient solutions to problems of these sort have been given when the constraints on the predictors are sequential (Dietterich, 2002). These solutions can be categorized into the following two frameworks. Learning global models trains a probabilistic model under the constraints imposed by the domain. Examples include variations of HMMs, conditional models and sequential variations of Markov random fields (Lafferty et al., 2001). The other framework, inference with classifiers (Roth, 2002), views maintaining constraints and learning classifiers as separate processes. Various local classifiers are trained without the knowledge of constraints. The predictions are taken as input on the inference procedure which then finds the best global prediction. In addition to the conceptual simplicity of this approach, it also seems to perform better experimentally (Tjong Kim Sang and De Meulder, 2003).</Paragraph> <Paragraph position="2"> Typically, efficient inference procedures in both frameworks rely on dynamic programming (e.g., Viterbi), which works well in sequential data. However, in many important problems, the structure is more general, resulting in computationally intractable inference. Problems of these sorts have been studied in computer vision, where inference is generally performed over low level measurements rather than over higher level predictors (Levin et al., 2002; Boykov et al., 2001).</Paragraph> <Paragraph position="3"> This work develops a novel inference with classifiers approach. Rather than being restricted on sequential data, we study a fairly general setting. The problem is defined in terms of a collection of discrete random variables representing binary relations and their arguments; we seek an optimal assignment to the variables in the presence of the constraints on the binary relations between variables and the relation types.</Paragraph> <Paragraph position="4"> The key insight to this solution comes from recent techniques developed for approximation algorithms (Chekuri et al., 2001). Following this work, we model inference as an optimization problem, and show how to cast it as a linear program. Using existing numerical packages, which are able to solve very large linear programming problems in a very short time1, inference can be done very quickly.</Paragraph> <Paragraph position="5"> Our approach could be contrasted with other ap- null gramming problem of 13 million variables within 5 minutes.</Paragraph> <Paragraph position="6"> proaches to sequential inference or to general Markov random field approaches (Lafferty et al., 2001; Taskar et al., 2002). The key difference is that in these approaches, the model is learned globally, under the constraints imposed by the domain. In our approach, predictors do not need to be learned in the context of the decision tasks, but rather can be learned in other contexts, or incorporated as background knowledge. This way, our approach allows the incorporation of constraints into decisions in a dynamic fashion and can therefore support task specific inferences. The significance of this is clearly shown in our experimental results.</Paragraph> <Paragraph position="7"> We develop our models in the context of natural language inferences and evaluate it here on the problem of simultaneously recognizing named entities and relations between them.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 1.1 Entity and Relation Recognition </SectionTitle> <Paragraph position="0"> This is the problem of recognizing the kill (KFJ, Oswald) relation in the sentence &quot;J. V. Oswald was murdered at JFK after his assassin, R. U. KFJ...&quot; This task requires making several local decisions, such as identifying named entities in the sentence, in order to support the relation identification. For example, it may be useful to identify that Oswald and KFJ are people, and JFK is a location. This, in turn, may help to identify that the kill action is described in the sentence. At the same time, the relation kill constrains its arguments to be people (or at least, not to be locations) and helps to enforce that Oswald and KFJ are likely to be people, while JFK is not.</Paragraph> <Paragraph position="1"> In our model, we first learn a collection of &quot;local&quot; predictors, e.g., entity and relation identifiers. At decision time, given a sentence, we produce a global decision that optimizes over the suggestions of the classifiers that are active in the sentence, known constraints among them and, potentially, domain or tasks specific constraints relevant to the current decision.</Paragraph> <Paragraph position="2"> Although a brute-force algorithm may seem feasible for short sentences, as the number of entity variable grows, the computation becomes intractable very quickly.</Paragraph> <Paragraph position="3"> Given n entities in a sentence, there are O(n2) possible relations between them. Assume that each variable (entity or relation) can take l labels (&quot;none&quot; is one of these labels). Thus, there are ln2 possible assignments, which is too large even for a small n.</Paragraph> <Paragraph position="4"> When evaluated on simultaneous learning of named entities and relations, our approach not only provides a significant improvement in the predictors' accuracy; more importantly, it provides coherent solutions. While many statistical methods make &quot;stupid&quot; mistakes (i.e., inconsistency among predictions), that no human ever makes, as we show, our approach improves also the quality of the inference significantly.</Paragraph> <Paragraph position="5"> The rest of the paper is organized as follows. Section 2 formally defines our problem and section 3 describes the computational approach we propose. Experimental results are given in section 4, followed by some discussion and conclusion in section 5.</Paragraph> </Section> </Section> class="xml-element"></Paper>