CS141 BB: HardnessOfApproximation

Motivation

If you know that the set cover problem is NP-hard, and you know there's a ln(n)-approximation algorithm for set cover, you might ask whether there's, say, a constant-factor approximation algorithm for set cover. The answer is, no, unless P=NP. In fact, there is no o(log n)-approximation algorithm, unless P=NP.

Hardness of approximation results refine traditional NP-hardness results. Typically, an NP-hard problem falls into one of the following categories (in order of increasing intractability):

it has a PTAS (see KnapsackByCoarsening),
the best approximation factor possible is a constant (see VertexCoverByDuality)
the best approximation factor possible is a polylogarithmic in the input size (see SetCoverByGreedy)
the best approximation factor is n^ε for some fixed ε. (e.g. MAX CLIQUE).

It is useful to know which problems are easier to approximate and which are harder to approximate for several reasons. First, one often has a choice of what theoretical problem to use to formulate a particular real-life problem. It's probably better to choose a problem that allows a better approximation factor. Second, in searching for practical solutions to a particular problem that you have to solve, it is useful to know what to look for. If you know the problem has no polynomial-time algorithm that guarantees better than an n^ε approximation ratio, then you know you will have to resort to heuristics that take advantage of the structure of the particular instances you want to solve.

Example

We sketch some of the fundamental ideas at the core of hardness of approximation results. We present a simple result showing that MAX CLIQUE is hard to approximate. The result we show is not the strongest that can be shown.

We will use the following theorem about probabilistically checkable proofs:

THM: NP ⊆ PCP(log n, 1)

(We won't cover the proof of this theorem, although we will describe a proof of a weaker version.)

What this theorem means is the following. For every language L in NP, there is a PCP-verifier V with the following properties:

V takes as input an X, a bit-string P of size n^c, and a bit-string R of size c*log(n), where n=|X|. Here X is a possible member of L, P is a possible PCP, R is a random string, and c is some fixed constant.
V runs in time polynomial in n.
V examines only c bits of P. (Which bits depends on R.)
If X∈ L, then there exists a P such that V(X,P,R) accepts for every R.
If X∉ L, then for any bit string P, V(X,P,R) accepts for less than 1/2 the possible values of R.

This theorem underlies most of the known hardness of approximation results, much as Cook's theorem underlies most NP-hardness results. To illustrate how this theorem relates to hardness of approximation, we show the following theorem:

THM: Unless P=NP, MAX CLIQUE has no (1/2)-approximation algorithm.

PROOF SKETCH:

Let L be any NP-complete language. By the PCP theorem, L has a PCP verifier V as described above. Since this verifier V for L exists, the following algorithm A also exists. A is called a "gap producing reduction".

Algorithm A

: input: X (a possible member of L)
: output: graph G=(V,E) and integer K such that X∈ L → G has a clique of size K, but X∉ L→ G has no clique of size K/2. (Recall C is a constant less than 1.)

On input X, consider running the verifier V on X with some (unknown) PCP P of length n^c.
Build the graph G=(V,E) as follows:
For each bit-string R of length c*log(n), simulate verifier V on input (X,?,R) (without specifying the PCP P). Record which c bits of P the verifier V looks at. For each way W of assigning values to those bits that lead to acceptance by V, add a vertex (R,W) to the graph G. (The possible ways W can be determined in polynomial time because there are only 2^c possibilities. We assume here that which bits the verifier looks at depends only on R and X, not on P.)
For each pair of vertices (R1,W1) and (R2,W2) where R1≠ R2, add an edge between them if W1 and W2 are consistent, meaning that, if they assign values to the same bits of P, they do it in the same way.
Return the graph G.

Suppose X∈ L. Then there exists a P such that V(X,P,R) accepts for, say, K of the possible bit-strings R. Then the graph G contains a clique of size K.

Conversely, suppose there is a clique of size K, say, {(R1,W1), (R2,W2), ..., (RK,WK)}. By the construction, it must be that the Ri's are distinct, Furthermore, all ways {W1, W2, ..., WK} are consistent. This means that there is at least one proof string P whose bits agree with each of these ways Wi. For this proof string P, V(X,P,R) accepts for at least k of the possible bit-strings R.

From the previous two paragraphs, it follows that the maximum clique size in G equals the maximum (over all proofs P) of the number of strings R causing V(X,P,R) to accept.

Let K= (the number of possible bit strings R) =2^{c log(n)}.

Thus, if X∈ L, then the maximum clique size is K. But if X∉ L, then the maximum clique size is less than K/2.

Now, suppose that there was a (1/2)-approximation algorithm for MAX CLIQUE. Under this assumption we will show P=NP.

We claim the following polynomial time algorithm would decide L (the NP-complete language):

Given an input X, run algorithm A to produce the graph G and the integer K.
Run the (1/2)-approximation algorithm on G to find a clique of size at least OPT/2.
If the clique is of size at least K/2, then say "X is in L".
Otherwise, say "X is not in L".

It is easy to verify that (because of the properties of A) this algorithm would run in polynomial time and decide L. Since L is NP-complete, it would follow that P=NP.

QED

Reading:

Chapter on "Hardness of Approximation" in Approximation Algorithms by Vazirani.