File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-2401_metho.xml

Size: 13,786 bytes

Last Modified: 2025-10-06 14:09:24

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-2401">
  <Title>A Linear Programming Formulation for Global Inference in Natural Language Tasks</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 The Relational Inference Problem
</SectionTitle>
    <Paragraph position="0"> We consider the relational inference problem within the reasoning with classifiers paradigm, and study a specific but fairly general instantiation of this problem, motivated by the problem of recognizing named entities (e.g., persons, locations, organization names) and relations between them (e.g. work for, located in, live in). We consider a set V which consists of two types of variables V = E [?]R. The first set of variables E = {E1,E2,***,En} ranges LE. The value (called &amp;quot;label&amp;quot;) assigned to Ei [?] E is denoted fEi [?] LE. The second set of variables</Paragraph>
    <Paragraph position="2"> over E. Specifically, for each pair of entities Ei and Ej, i negationslash= j, we use Rij and Rji to denote the (binary) relations (Ei,Ej) and (Ej,Ei) respectively. The set of labels of relations is LR and the label assigned to relation Rij [?] R is fRij [?] LR.</Paragraph>
    <Paragraph position="3"> Apparently, there exists some constraints on the labels of corresponding relation and entity variables. For instance, if the relation is live in, then the first entity should be a person, and the second entity should be a location.</Paragraph>
    <Paragraph position="4"> The correspondence between the relation and entity variables can be represented by a bipartite graph. Each relation variable Rij is connected to its first entity Ei , and second entity Ej. We use N1 and N2 to denote the entity variables of a relation Rij. Specifically, Ei = N1(Rij) and Ej = N2(Rij).</Paragraph>
    <Paragraph position="5"> In addition, we define a set of constraints on the outcomes of the variables in V. C1 : LE x LR - {0,1} constraint values of the first argument of a relation. C2 is defined similarly and constrains the second argument a relation can take. For example, (born in, person) is in C1 but not in C2 because the first entity of relation born in has to be a person and the second entity can only be a location instead of a person. Note that while we define the constraints here as Boolean, our formalisms in fact allows for stochastic constraints. Also note that we can define a large number of constraints, such as</Paragraph>
    <Paragraph position="7"> lations, etc. In fact, as will be clear in Sec. 3 the language for defining constraints is very rich - linear (in)equalities over V.</Paragraph>
    <Paragraph position="8"> We exemplify the framework using the problem of simultaneous recognition of named entities and relations in sentences. Briefly speaking, we assume a learning mechanism that can recognize entity phrases in sentences, based on local contextual features. Similarly, we assume a learning mechanism that can recognize the semantic relation between two given phrases in a sentence.</Paragraph>
    <Paragraph position="9"> We seek an inference algorithm that can produce a coherent labeling of entities and relations in a given sentence. Furthermore, it follows, as best as possible the recommendation of the entity and relation classifiers, but also satisfies natural constraints that exist on whether specific entities can be the argument of specific relations, whether two relations can occur together at the same time, or any other information that might be available at the inference time (e.g., suppose it is known that entities A and B represent the same location; one may like to incorporate an additional constraint that prevents an inference of the type: &amp;quot;C lives in A; C does not live in B&amp;quot;). We note that a large number of problems can be modeled this way. Examples include problems such as chunking sentences (Punyakanok and Roth, 2001), coreference resolution and sequencing problems in computational biology. In fact, each of the components of our problem here, the separate task of recognizing named entities in sentences and the task of recognizing semantic relations between phrases, can be modeled this way. However, our goal is specifically to consider interacting problems at different levels, resulting in more complex constraints among them, and exhibit the power of our method.</Paragraph>
    <Paragraph position="10"> The most direct way to formalize our inference problem is via the formalism of Markov Random Field (MRF) theory (Li, 2001). Rather than doing that, for computational reasons, we first use a fairly standard transformation of MRF to a discrete optimization problem (see (Kleinberg and Tardos, 1999) for details). Specifically, under weak assumptions we can view the inference problem as the following optimization problem, which aims to minimize the objective function that is the sum of the following two cost functions.</Paragraph>
    <Paragraph position="11"> Assignment cost: the cost of deviating from the assignment of the variables V given by the classifiers. The specific cost function we use is defined as follows: Let l be the label assigned to variable u [?] V. If the marginal probability estimation is p = P(fu = l), then the assignment cost cu(l) is [?]logp.</Paragraph>
    <Paragraph position="12"> Constraint cost: the cost imposed by breaking constraints between neighboring nodes. The specific cost function we use is defined as follows: Consider two entity nodes Ei,Ej and its corresponding relation node Rij; that is, Ei = N1(Rij) and Ej = N2(Rij). The constraint cost indicates whether the labels are consistent with the constraints. In particular, we use: d1(fEi,fRij) is 0 if (fRij,fEi) [?] C1; otherwise, d1(fEi,fRij) is [?] 2. Similarly, we use d2 to force the consistency of the second argument of a relation.</Paragraph>
    <Paragraph position="13"> 2In practice, we use a very large number (e.g., 915).</Paragraph>
    <Paragraph position="14"> Since we are seeking the most probable global assignment that satisfies the constraints, therefore, the overall cost function we optimize, for a global labeling f of all variables is:</Paragraph>
    <Paragraph position="16"/>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 A Computational Approach to
Relational Inference
</SectionTitle>
    <Paragraph position="0"> Unfortunately, it is not hard to see that the combinatorial problem (Eq. 1) is computationally intractable even when placing assumptions on the cost function (Kleinberg and Tardos, 1999). The computational approach we adopt is to develop a linear programming (LP) formulation of the problem, and then solve the corresponding integer linear programming (ILP) problem. Our LP formulation is based on the method proposed by (Chekuri et al., 2001).</Paragraph>
    <Paragraph position="1"> Since the objective function (Eq. 1) is not a linear function in terms of the labels, we introduce new binary variables to represent different possible assignments to each original variable; we then represent the objective function as a linear function of these binary variables.</Paragraph>
    <Paragraph position="2"> Let x{u,i} be a {0,1}-variable, defined to be 1 if and only if variable u is labeled i, where u [?] E,i [?] LE or u [?] R,i [?] LR. For example, x{E1,2} = 1 when the label of entity E1 is 2; x{R23,3} = 0 when the label of relation R23 is not 3. Let x{Rij,r,Ei,e1} be a {0,1}-variable indicating whether relation Rij is assigned label r and its first argument, Ei, is assigned label e1. For instance, x{R12,1,E1,2} = 1 means the label of relation R12 is 1 and the label of its first argument, E1, is 2. Similarly, x{Rij,r,Ej,e2} = 1 indicates that Rij is assigned label r and its second argument, Ej, is assigned label e2. With these definitions, the optimization problem can be represented as the following ILP problem (Figure 1).</Paragraph>
    <Paragraph position="3"> Equations (2) and (3) require that each entity or relation variable can only be assigned one label. Equations (4) and (5) assure that the assignment to each entity or relation variable is consistent with the assignment to its neighboring variables. (6), (7), and (8) are the integral constraints on these binary variables.</Paragraph>
    <Paragraph position="4"> There are several advantages of representing the problem in an LP formulation. First of all, linear (in)equalities are fairly general and are able to represent many types of constraints (e.g., the decision time constraint in the experiment in Sec. 4). More importantly, an ILP problem at this scale can be solved very quickly using current commercial LP/ILP packages, like (Xpress-MP, 2003) or (CPLEX, 2003). We introduce the general strategies of solving an ILP problem here.</Paragraph>
    <Paragraph position="6"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Linear Programming Relaxation (LPR)
</SectionTitle>
      <Paragraph position="0"> To solve an ILP problem, a natural idea is to relax the integral constraints. That is, replacing (6), (7), and (8) with:</Paragraph>
      <Paragraph position="2"> If LPR returns an integer solution, then it is also the optimal solution to the ILP problem. If the solution is non integer, then at least it gives a lower bound to the value of the cost function, which can be used in modifying the problem and getting closer to deriving an optimal integer solution. A direct way to handle the non integer solution is called rounding, which finds an integer point that is close to the non integer solution. Under some conditions of cost functions, which do not hold here, a well designed rounding algorithm can be shown that the rounded solution is a good approximation to the optimal solution (Kleinberg and Tardos, 1999; Chekuri et al., 2001). Nevertheless, in general, the outcomes of the rounding procedure may not even be a legal solution to the problem.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Branch &amp; Bound and Cutting Plane
</SectionTitle>
      <Paragraph position="0"> Branch and bound is the method that divides an ILP problem into several LP subproblems. It uses LPR as a sub-routine to generate dual (upper and lower) bounds to reduce the search space, and finds the optimal solution as well. When LPR finds a non integer solution, it splits the problem on the non integer variable. For example, suppose variable xi is fractional in an non integer solution to the ILP problem min{cx : x [?] S,x [?] {0,1}n}, where S is the linear constraints. The ILP problem can be split into two sub LPR problems, min{cx : x [?] S[?]{xi = 0}} and min{cx : x [?] S[?]{xi = 1}}. Since any feasible solution provides an upper bound and any LPR solution generates a lower bound, the search tree can be effectively cut.</Paragraph>
      <Paragraph position="1"> Another strategy of dealing with non integer points, which is often combined with branch &amp; bound, is called cutting plane. When a non integer solution is given by LPR, it adds a new linear constraint that makes the non integer point infeasible, while still keeps the optimal integer solution in the feasible region. As a result, the feasible region is closer to the ideal polyhedron, which is the convex hull of feasible integer solutions. The most famous cutting plane algorithm is Gomory's fractional cutting plane method (Wolsey, 1998), which can be shown that only finite number of additional constraints are needed.</Paragraph>
      <Paragraph position="2"> Moreover, researchers develop different cutting plane algorithms for different types of ILP problems. One example is (Wang and Regan, 2000), which only focuses on binary ILP problems.</Paragraph>
      <Paragraph position="3"> Although in theory, a search based strategy may need several steps to find the optimal solution, LPR always generates integer solutions in our experiments. This phenomenon may link to the theory of unimodularity.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Unimodularity
</SectionTitle>
      <Paragraph position="0"> When the coefficient matrix of a given linear program in its standard form is unimodular, it can be shown that the optimal solution to the linear program is in fact integral (Schrijver, 1986). In other words, LPR is guaranteed to produce an integer solution.</Paragraph>
      <Paragraph position="1"> Definition 3.1 A matrix A of rank m is called unimodular if all the entries ofA are integers, and the determinant of every square submatrix of A of order m is in 0,+1,-1.</Paragraph>
      <Paragraph position="2"> Theorem 3.1 (Veinott &amp; Dantzig) Let A be an (m,n)integral matrix with full row rank m. Then the polyhedron {x|x [?] 0;Ax = b} is integral for each integral vector b, if and only if A is unimodular.</Paragraph>
      <Paragraph position="3"> Theorem 3.1 indicates that if a linear programming problem is in its standard form, then regardless of the cost function and the integral vector b, the optimal solution is an integer if and only if the coefficient matrix A is unimodular.</Paragraph>
      <Paragraph position="4"> Although the coefficient matrix in our problem is not unimodular, LPR still produces integer solutions for all the (thousands of cases) we have experimented with. This may be due to the fact that the coefficient matrix shares many properties of a unimodular matrix. As a result, most of the vertices of the polyhedron are integer points. Another possible reason is that given the cost function we have, the optimal solution is always integer. Because of the availability of very efficient LP/ILP packages, we defer the exploration of this direction for now.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML