NP: Knapsack

PDF of Eric’s handwritten notes are here.

Recall that our DP (dynamic programming) algorithm for Knapsack takes O(nB) time, where n was the number of items and B was the capacity of our bag. This takes exponential time in the size of the input. We will now show that Knapsack (search version) is NP-complete.

The key will be to show that the following problem, known as the Subset Sum problem, is NP-complete.  The knapsack problem is a generalization of Subset Sum so it’ll follow as an easy corollary that knapsack-search is NP-complete.

Subset-sum (SS)
Input: n numbers a_1,...,a_n and a parameter t
Output: A subset S of objects \{1,...,n\} where \sum_{i\in S} a_i = t and NO if no such S exists.

We assume that each a_i is at most t in magnitude. Then the input size is O(n \log t) because each of the n items takes \log t bits.

Exercise: Find a DP algorithm for this problem that takes O(nt) time.

Lemma: Subset-sum is NP-complete

a) First show that subset-sum is in NP:
Given input \{a_1,...,a_n, t\} and a solution S, it takes O(n\log t) time to compute \sum_{i\in S} a_i and check that this equals t.

b) Next, we reduce 3SAT to SS.
Take a 3SAT input f with variables x_1,...x_n and clauses c_1,...,c_m.

Our input for SS will 2n + 2m numbers and some t:
We make 2n numbers for the literals: v_1,v_1',v_2,v_2',...,v_n,v_n'
We make 2m numbers for the clauses: s_1,s_1',...,s_m,s_m'

We will write each of our numbers and t as an n + m digit number in base 10. Note that the magnitude of each number could be as large as 10^{n+m}.

We want that v_i \in S \iff x_1 = T (so v_i \in S \iff x_1 = F). Thus, we want S to include exactly one of \{v_i,v_i'\}.

For each variable x_i, we will set digit i of v_i and v_i' to 1.
For clause C_j, if x_i \in C_j, set digit n+j of v_i to 1, else if \bar{x_i} \in C_j, set digit n+j of v_i' to 1.

We set digits 1 to n of t to 1. We set digits n+1 to m of t to 3.

Now we set our S_i, S_i' variables as “slacks” to complete the reduction. For each clause C_j, set digit n + j of slack variables S_j,S_j' to 1.

Since the first n digits of t are 1, then for each i, we can only choose at most one of v_i, v_i' (because if we choose both, our numbers will sum to 2 in that column instead of 1).

Since the last m digits of t are 3, then for each clause C_j, we will need to choose at least one number corresponding to a literal in C_j. For instance, if C_j is x_2 \vee \bar{x_4} \vee x_7, then the only numbers that are non-zero in the (n+j)^{th} digit are v_2,v_4',v_7,s_j, and s_j'. To get this digit to sum to 3, we need at least one of v_2, v_4', v_7 to be included in S.

Suppose that we have the following input to 3SAT:

f = (\bar{x_1} \vee \bar{x_2} \vee \bar{x_3} ) \wedge  (\bar{x_1} \vee \bar{x_2} \vee x_3 ) \wedge  (x_1 \vee \bar{x_2} \vee x_3) \wedge (x_1 \vee x_2)

Then this table shows how we create the numbers for our input to SS.


Lemma: SS has a solution S \iff 3SAT f has a satisfying assignment

(\Rightarrow) Take solution S for SS. We know that S includes v_i or v_i' (but not both). If v_i \in S, then set x_i = T. If v_i' \in S then set x_i = F. Thus, we have an assignment.

If we look at digit n+j, there will be at least 1 literal in C_j being satisfied by construction.

(\Leftarrow) Take a satisfying assignment for f. If x_i = T, then add v_i to S, else if x_i = F, add v_i' to S. Then digit i of our numbers will sum to 1, as desired.

Look at digit n+j. The assignment must satisfy at least one literal of C_j. If exactly one such literal is satisfied, then add s_j and s_j' to S. If exactly two literals in C_j are satisfied, then add s_j to S. If exactly three literals in C_j are satisfied, then do not add s_j or s_j' to S.

Next, we show that Knapsack-search is NP-complete.


Inputn objects with integer weights w_1,...,w_n and integer values v_1,...,v_n, total capacity B, and goal V.

Output: Subset S of \{1,..,n\} where \sum_{i\in S} w_i \le B and \sum_{i\in S} v_i \ge V.

Lemma: Knapsack-search is NP-complete:

a) Knapsack-search is in NP. We showed this in a previous class.

b) SS \rightarrow Knapsack-search. Take your input to SS: a_1,...,a_n, t. We define the following input to Knapsack-search. For all i, we set w_i = v_i = a_i, and we set $B = V = t$.

A solution S to Knapsack-search must obey:
\sum_{i\in S} w_i \le B \iff  \sum_{i\in S} a_i \le t
\sum_{i\in S} v_i \ge V \iff \sum_{i\in S} a_i \ge t

Thus, \sum_{i\in S} a_i = t, so S is a solution to Knapsack-search iff S is a solution to the SS instance.