LP: Duality

PDF of Eric’s handwritten notes are here.

LP General Formulation

Recall the standard form for linear programs (LP’s).  The LP is specified by the following input parameters:

  • Variables: x = (x_1, x_2, \dots, x_n)^T
    (The transpose means that this is a column vector.)
  • Objective function: c=(c_1, c_2, \dots, c_n)^T
  • Constraints: m\times n matrix A and b = (b_1, b_2, \dots, b_m)^T

Standard form for LP:

max c^Tx
subject to:

Ax \leq b
x \geq 0

An alternative form is known as slack form for LP.  Here we use equality (instead of inequalities \leq) for the constraints.

Slack form for LP:

max c^Tx
subject to:

Ax = b
x \geq 0

The two forms are equivalent, it is straightforward to convert between the two forms.

Slack to Standard:  Let’s first see how to convert a slack constraint a_1x_1 + \dots + a_nx_n = b into standard form.  Note the constraint is equivalent to the pair of constraints: a_1x_1 + \dots + a_nx_n \leq b and a_1x_1 + \dots + a_nx_n \geq b.  And the later constraint can be converted into standard form by multiplying by both sides by -1 to yield: -a_1x_1 - \dots - a_nx_n \leq -b.

Standard to Slack: To convert from standard from to slack form, consider a standard constraint a_1x_1 + \dots + a_nx_n \leq b.  Create a new variable c.  We will use c to denote the slack between the left and right hand sides.  Thus if c=b - a_1x_1 + \dots + a_nx_n then we want that c\geq 0.  Therefore we replace the original constraint by this pair:  a_1x_1 + \dots + a_nx_n + c = b and c\geq 0.

LP Geometry

Recall, the feasible region for a LP is the set of points which satisfy all of the constraints (regardless of their objective function value).  Formally, the feasible region is the set:

F =  { x: Ax\leq b, x\geq 0 }.

Feasible region: The equation a_1x_1 + \dots + a_nx_n + c = b defines a plane in n dimensions (this is called a hyperplane since it’s in more than 2 dimensions).  Hence each constraint defines a half-space in n dimensions, i.e., all of the points on one side of the plane.  Therefore the feasible region is the intersection of these n+m halfspaces.  It is a convex polyhedron in n dimensions.

Vertices: The vertices of the feasible region are the “corners” of this polyhedron; they are the points in the feasible region which satisfy at least n of the constraints with equality (they lie on these n hyperplanes).  Note that there can be a huge number of vertices, namely {{n+m}\choose{n}}.  We refer to a pair of vertices as neighbors if they share n-1 constraints that they satisfy with equality.  Hence from a vertex v we choose one constraint to remove and one to add in (and then check if this a valid point).  Therefore, a vertex of the feasible region has O(n(n+m)) neighbors.

Convexity: Since the feasible region is convex if we are at a vertex v and all of its neighbors have smaller objective value then the entire feasible region must be below v (where below is with respect to the objective function), otherwise it would go down for the neighbors and then back up which would be non-convex.  Therefore a vertex which is locally optimal is also globally optimal.   Now there may be optimal points which are not vertices, but in this case there is still a vertex which is also optimal (has the same objective function value).  For example, in 2-dimensions our feasible region is a convex polygon and a point on the edge of the polygon could be optimal, but in this case the entire edge is optimal and thus both endpoints (which are vertices) are also optimal.

Feasible region empty?  Are there any points satisfying all of the LP constraints? In other words, is the feasible region empty or not?  How do we check that?  And if it is non-empty, how do we find a feasible point to start our algorithm from?  If our LP is in standard form we can make a new LP which always has a trivial feasible point.  Then by finding an optimal point for this new LP we can determine if the original LP has a non-empty feasible region.

Consider a LP in standard form with n variables x = (x_1, x_2, \dots, x_n)^T:

max c^Tx
subject to:

Ax \leq b
x \geq 0

Feasibility LP: For the new LP, which we’ll call the feasiblity LP, we add one new variable z.  Then we replace the constraint Ax\leq b with Ax + z \leq b.  Note z is a single variable, not a vector, so every constraint adds in -z for the same variable z.  The key is that all of these constraints are clearly satisfied for x=0 and z=-\infty.  In fact, we don’t need to take z=-\infty but it suffices to take it sufficiently small, such as z=b_{min} where b_{min} = \min_i b_i is the smallest entry in the vector b.  Thus, here is the feasibility LP:

max z
subject to:

Ax + z \leq b
x \geq 0

A few notes on the feasibility LP:
–First off we’ve changed the objective function.  Why?  Because we are no longer interested in finding the optimal solution to the original LP, just a feasible point regardless of its objective function value.  We know that for x=0 and small enough z is a feasible point for the new LP.  If we find a feasible point for the new LP where z\geq 0 then this x is also feasible for the original LP because
Ax \leq Ax + z when z\geq 0.  So our goal is simply to determine if there is a feasible point for the new LP where z\geq 0.  Hence we can run the simplex algorithm (which we’ll describe momentarily) starting from x=0, z=b_{min} and then stop when we find a feasible point with z\geq 0 or if we find that the optimal value has z<0 (and hence the original LP has an empty feasible set).
–Note there is no non-negativity constraint on z.  As discussed earlier this is still a valid LP and can be converted into standard form.  We kept it in this simpler form to simplify the presentation.

Simplex algorithm: The simplex algorithm walks on vertices of the feasible region.  It starts at a vertex, and then chooses a neighbor which has higher objective value; if there are multiple such neighbors that are better than the current vertex then which one we choose depends on the variant of the simplex algorithm (random, greedy, etc.).  If we end at a vertex that is better than all of its neighbors then we know that it is an optimal point.    In the worst case the simplex algorithm takes exponential time.  However it does well on many instances, and is widely used for many huge LP’s.

One note: how do we find a starting feasible point for the simplex algorithm?  We simply run the simplex algorithm on our feasibility LP.  That LP has a trivial starting point and then we will either find a feasible point for our original LP or we’ll determine that the original LP is infeasible.

Poly-time LP algorithms: There are algorithms that are guaranteed to be polynomial-time (for all LP’s), these are based on the ellipsoid method or interior point methods.  Algorithms based on the simplex method are widely used.  There are also popular algorithms based on interior point methods.

Types of LP’s:  In the examples we’ve considered so far there is a point in the feasible region which is optimal, and hence there is a vertex of the feasible region which is also optimal.   This might not be the case in some situations.  More precisely, the optimal point is at a vertex of the feasible region, except:

  • Infeasible LP: the feasible region is empty so there are no valid points.  Here is a simple example:
    • max x_1 + 5x_2, such that
    • x_1+x_2 \geq 2x_1+x_2 \leq 1/2.

Note, whether or not a LP is infeasible just depends on the constraints, the objective function doesn’t matter (so, for a given set of constraints, it is true for all objective functions or none).

  • Unbounded LP: the optimal value of the objective function is unbounded.  So the feasible region is non-empty but in the direction of the objective function the feasible region is unbounded so we can achieve arbitrarily large objective function. Here is a simple example:
    • max x_1 + 5x_2, such that
    • x_1-x_2 \leq 1x_1+x_2 \geq 3

Note, the feasible region can be an unbounded set, but for some objective functions the LP is bounded and other objective functions it may be unbounded.

When these two exceptions do not occur then we refer to the LP as a feasible LP: the feasible region is non-empty and the objective function is bounded.

LP Duality

Now we are ready to explore LP duality.  Let’s recall one of our examples from last class.  Here is the example:

\max x_1+6x_2+10x_3
subject to:

x_1 \leq 300  (1)  (this is simply a labeling of this constraint)
x_2 \leq 200 (2)
x_1 + 3x_2 + 2x_3 \leq 1000  (3)
x_2 + 3x_3 \leq 500  (4)
x_1,x_2,x_3 \geq 0   (5)

Last class we ran the simplex algorithm on this LP and found that the point (200,200,100) was an optimal point with objective value 2400.  Now suppose we are simply given this point and we want to verify that it’s optimal.  How can we do that?  Can we prove that no feasible point has an objective value >2400?  Any feasible point satisfies each of the constraints so it also satisfies any linear combination of the constraints.

Example of dual vector: Let’s take a linear combination of the constraints defined by the vector y=(y_1,y_2,y_3,y_4).  There are 4 non-trivial constraints (we’re excluding the non-negativity constraints) so let’s look at y times these 4 constraints.  We’ll require every coordinate y_i to be \geq 0.  Thus when we take a linear combination we don’t need to flip any of the signs, they are all \leq constraints.  Taking a linear combination of the 4 constraints yields:

y_1\times (1) + y_2 \times (2) + y_3 \times (3) + y_4 \times (4)

Consider the vector y = (y_1, y_2, y_3, y_4) = (0, \frac{1}{3}, 1, \frac{8}{3}).  For this choice of y we have:

0\times x_1 + \frac{1}{3}x_2 + x_1 + 3x_2 + 2x_3 + \frac{8}{3}x_2 + \frac{8}{3}x_2\times 3x_3 \leq 0 + \frac{200}{3}\times 200 + 1000 + \frac{8}{3}\times 500.

Let’s collect terms and this is equivalent to:

x_1 + 6x_2 + 10x_3 \leq 2400.

Note the left-hand-side is the objective function of our original LP.  Thus we’ve shown that we can never achieve a value bigger than 2400.  In other words, the profit is \leq 2400.  Hence the point (200,200,100) is optimal since it achieves the maximum possible profit of 2400.

Finding dual vector: How did we find this y?  For this LP with 4 constraints we had:

y_1\times (1) + y_2 \times (2) + y_3 \times (3) + y_4 \times (4)

For this LP this is equivalent to:

x_1y_1 + x_2y_2 + x_1y_3 + 3x_2y_3 + 2x_3y_3 + x_2y_4 + 3x_3y_4 \leq 300y_1 + 200y_2 + 1000y_3 + 500y_4.

Collecting terms we have:

x_1(y_1+y_3) + x_2(y_2 + 3y_3 + y_4) + x_3(2y_3 + 3y_4) \leq 300y_1 + 200y_2 + 1000y_3 + 500y_4.

We want to obtain the objective function on the left-hand-side, or something bigger.  Thus we want that all of the following hold:

  • y_1 + y_3 \geq 1
  • y_2 + 3y_3 + y_4 \geq 6
  • 2y_3 + 3y_4 \geq 10
  • y_1,y_2,y_3 \geq 0

In addition we are trying to find the best upper bound on the objective function.  The best upper bound means we want the smallest possible right-hand-side.  Thus we want to minimize the right-hand-side, or in other words we want to minimize 300y_1 + 200y_2 + 1000y_3 + 500y_4.  Thus we have the dual LP:

\min 300y_1+200y_2+1000y_3 + 500y_4

subject to:

y_1 + y_3 \geq 1
y_2 + 3y_3 + y_4 \geq 6
2y_3 + 3y_4 \geq 10
y_1,y_2,y_3 \geq 0

We refer to the original LP as the primal LP, and this new LP we just defined as its dual LP.

Primal LP \rightarrow Dual LP: In general, how do we convert a primal LP into its dual LP.  Assume the primal LP is in standard form.  First off the dual LP has a variable for each constraint in the primal LP (except for the non-negativity constraints).  Thus y is a vector of size m.  And there is a constraint in the dual LP for each variable in the primal LP.  Thus the dual LP has n constraints.  So if the primal LP has n variables and m constraints (plus the n non-negativity constraints), then the dual LP has m variables and n constraints (plus the m non-negativity constraints).

In our example, we are trying to find a linear combination of the constraints that minimized the right-hand-side (yielding the smallest possible upper bound on the profit).  Thus, the dual objective function is defined by the right-hand-side of the constraints in the primal LP.  More specifically, the objective function is min \ b^Ty.

The constraints of the dual LP specify that the left-hand-side of the linear combination of the constraints is at least the primal’s objective function.  Thus, the right-hand-side of the dual constraints are defined by the primal’s objective function.  And the left-hand-side of the dual constraints are defined by the columns of the primal’s constraint matrix A.  Thus, the dual LP has constraints $latex A^Ty \geq c$

General form of dual LP: Putting it all together for a primal LP with variables x = (x_1, x_2, \dots, x_n)^T and n\times m constraint matrix $m$ in standard form:

max c^Tx
subject to:

Ax \leq b
x \geq 0

Its dual LP has variables y = (y_1, y_2, \dots, y_m)^T and is the following:

min b^Ty
subject to:

A^Ty \geq c
y \geq 0

Dual(Dual) = Primal:  As a sanity check, when we take the dual of the dual we should end up back with the primal LP.  Let’s check that this is the case.  The below illustration starts with a primal LP in standard form (this is the top-left LP).  We take its dual LP (see the top-right LP).  We then convert this dual LP into standard form (see the bottom-right LP).  We take its dual (see the bottom-left LP).  Notice that if we convert this LP, which is the dual of the dual, into standard form then we end up with the original primal LP.

dual-dual.png

Typo: The bottom right hand box should have -A^Ty\leq -c

Weak Duality: Our construction of the dual LP was designed so that it gave an upper bound on the value of the primal LP’s objective function.  More precisely, any feasible point y for the dual LP gives an upper bound on the value of the primal LP’s objective function.  Thus, for any feasible point x for the primal LP its value of the objective function is at most that of the dual’s objective function at point y.  Therefore we have the following theorem:

Weak Duality Theorem: For a feasible point x of the primal LP and a feasible point y of the dual LP, it holds that:

c^Tx \leq b^Ty

At this point it should be quite intuitive that the theorem holds.  For completeness let’s formally prove it since the proof is fairly elementary.

Proof of the Weak Duality Theorem:

Since y is feasible for the dual LP we know that A^Ty \geq c.  Take the transpose of both sides and we have:  $latex y^TA \geq c^T$.  Now multiply both sides by x and we have: $latex c^Tx \leq y^TAx$.  To summarize we did the following chain:

 A^Ty \geq c \quad \Rightarrow \quad y^TA \geq c^T \quad \Rightarrow \quad c^Tx \leq y^TAx

Note the right-hand-side y^TAx is simply a number.  Thus taking the transpose will yield the same number, therefore:

y^TAx = (y^TAx)^T = x^TA^Ty.

So we have shown that:  c^Tx \leq x^TA^Ty.

Now consider x.  Since it is feasible for the primal LP we have that Ax \leq b.  Again taking the transpose and then multiplying both sides by y in this case we get the following:

Ax \leq b \quad \Rightarrow \quad x^TA^T \leq b^T \quad \Rightarrow \quad x^TA^Ty \leq b^Ty

Combining these two results we have shown that:

c^Tx \leq x^TA^Ty \leq b^Ty.

Ignoring the middle term we have the desired result, which completes the proof of the weak duality theorem.

Consequences of Weak Duality:   There are a few important implications of the weak duality theorem.  First off, the theorem just shows that the primal’s optimal is at most the dual’s optimal.  If we find a pair of feasible points x and y where we achieve equality (as we did in the beginning example from this lecture) then we’ve proven that they are both optimal:

Corollary 1:  Given a primal feasible x^* and dual feasible y^* where c^Tx^* = b^Ty^*, then x^* and y^* are optimal for the primal and dual, respectively.

Suppose the optimal value of the primal LP is unbounded.  Since the dual’s objective value is always at least this unbounded value the only option is that the dual is infeasible (there are no feasible points):

Corollary 2: If the primal LP is unbounded then the dual LP is infeasible.
Similarly, if the dual LP is unbounded then the primal LP is infeasible.
(Note, the reverse implications do not hold.  For example, if the dual LP is infeasible then the primal LP can be unbounded and it can also be infeasible.)

Strong Duality:  Corollary 1 says that if there is a primal feasible x and dual feasible y with equality then they are both optimal.  In our earlier example we found such a x=(200,200,100) and y= (0, \frac{1}{3}, 1, \frac{8}{3}).  Does there always exist such a pair achieving equality?  Yes!  First off we need that both the primal and dual LP’s are feasible (i.e., the feasible regions are non-empty and bounded).  And then we have that the optimal points have equality.

Strong Duality Theorem: 

  1. The primal LP is feasible with bounded optima iff the dual LP is feasible with bounded optima.
    (By feasible with bounded optima we mean that the LP has a non-empty feasible region and the optimal solution has bounded value.)
  2. Moreover, for a feasible primal LP with optimal solution x^* and its dual LP with optimal solution y^*, the objective functions match:

c^Tx^* = b^Ty^*.

Max-flow = Min-cut via LP Duality: Let us now review how to express max-flow as a LP.  Then we’ll see the dual LP corresponds to the capacity of the min-st-cut.  Hence we’ll get an alternative proof of the max-flow = min-cut theorem.

Max-flow as LP:  The straightforward way to express max-flow as a LP is to have a variable f_e for each edge e\in E.  We then have constraints along edges for the capacity constraints and constraints along vertices (except s and t) for the conservation of flow constraints.  Here is the LP:

\max \sum_{e(s,v) \in E} f_{sv} s.t.

0 \leq f_e \leq c_e for all e \in E
\sum_{e(w,v) \in E}f_e = \sum_{e(v,z) \in E}f_e for all v \in V-\{s,t\}

Alternative Max-flow LP: An equivalent LP makes a variable x_p for each path p from s to t.  Hence let P be the set of all paths from s to t.  Note that typically |P| is exponentially large.  So the resulting LP may be huge (too large to write down efficiently).  But we are simply using that it is equivalent to max-flow and we will consider its dual.

In this new LP the original flow conservation constraints are no longer needed since we will specify an amount of flow x_p that is sent along the entire path from s to t, and none of it is lost along the way.  We still need constraints for the capacity constraints, and to do this we need to sum over all paths through an edge e.  Here is this new LP:

max \sum_{p \in P}x_p s.t.

\sum_{p \in P, e \in p} x_p \leq c_e for e \in E
x_P \geq 0 for p \in P

Example Max-flow LP: Here is an example flow network:

flow

The variables in the corresponding LP will be  P = \{p_a, p_{ab}, p_{b}\} for the 3 paths from s to t:
p_a = s \rightarrow a \rightarrow t
p_{ab} = s \rightarrow b \rightarrow t
p_b = s \rightarrow a \rightarrow b \rightarrow t

The max-flow LP (primal LP) is the following:

\max p_a+p_{ab}+p_b s.t.

p_a + p_{ab} \leq 3
p_{ab} \leq 1
p_a \leq 1
p_b \leq 2
p_a + p_{b} \leq 3
p_a,p_{ab},p_b \geq 0

The dual LP is the following:

\min 3y_{sa}+y_{ab}+y_{at}+2y_{sb}+y_{bt} s.t.

y_{sa} + y_{at} \geq 1 (edges on p_a)
y_{sa} + y_{ab} + y_{bt} \geq 1 (edges on p_{ab})
y_{sb} + y_{bt} \geq 1 (edges on p_b)
y_{sa},y_{ab},y_{at},y_{sb},y_{bt} \geq 0

One should think of the variable y_{vw} as indicating whether this edge is counted in a st-cut (L,R) where v \in L, w \in R.
The dual LP finds the capacity of the min st-cut.

Dual of Max-flow LP: In Here is the general form of the dual of the max-flow LP.  We are taking the dual of the LP with a variable x_p for each path p.  In the dual there is a variable y_e for each edge e \in E.

The dual LP is the following:

\min \sum_{e \in E}c_ey_e s.t.:

\sum_{e \in p} y_e \geq 1 for all p \in P
(every path cross the cut at least once and such edge y_e = 1)

y_e \geq 0 for all e \in E

The dual LP is equivalent to the min-st-cut problem.

Lemma:
For st-cut (L,R), there is a feasible y where c^Ty=capacity(L,R).

Hence,  the dual LP optimal value is at most the minimum capacity of a st-cut. Since by strong duality we have that the primal and dual LP have matching values, we have the following:

\mbox{size(max-flow)} = \mbox{primal LP optimal} = \mbox{dual LP optimal} \leq  \mbox{capacity(min st-cut)}.

Proof:
Fix a st-cut (L,R).
Set y_e = 1 if it cross from L to R, and 0 otherwise.
This y is feasible since every p \in P crosses the cut at least once.
And it has value: \sum_{e \in E} c_ey_e = capacity(L, R), which completes the proof of the lemma.

To prove that the size of the max-flow equals the min capacity of a st-cut we need to show that the optimal value of the dual is at least that of the min capacity of a st-cut.

Lemma 2:
For st-cut (L,R), there is a feasible y where c^Ty \geq capacity(L,R).

Hence, Dual LP optimal \geq capacity(min st-cut).  Therefore we have that: |max-flow| = |min st-cut|.

Lemma 2 is harder to prove so we omit the proof from this overview.