# SetCoverByGreedy

Top | recent changes | Preferences

### Set Cover (unweighted version)

The Set Cover problem is, given a collection of sets, to choose a minimum number of those sets so that every element is in at least one of the chosen sets.

The greedy algorithm for set cover is the following:

``` 1. Repeat until all elements are covered by chosen sets:
2.     Choose a set containing a maximum number of not-yet-covered elements.
3. Return the chosen sets.
```

thm:The greedy algorithm is a (1+ln(n))-approximation algorithm, where n is the number of elements.
proof: Let OPT denote the size of the minimum cover.

At each iteration of the algorithm, the set chosen covers at least a fraction 1/OPT of the not-yet-covered elements. (This is because, collectively, the OPT sets in the optimal solution cover all of the remaining elements, so at least one set in the optimal solution must cover a fraction 1/OPT of the remaining elements. The set chosen covers at least this many.)

Thus, after k iterations, the number of elements left uncovered is at most
n(1-1/OPT)k ≤ n exp(-k/OPT).
(Using the inequality 1+x < exp(x) for x ≠ 0.)

Thus, after k = ⌈ OPT ln(n) ⌉ iterations, there is less than 1 element left uncovered.
QED

This analysis differs from those in Approximation Algorithms by Vazirani.

### Weighted set cover

The weighted set cover problem is, given a collection of sets each with a non-negative weight, to choose a set cover minimizing the total weight of the chosen sets.

The greedy algorithm for weighted set cover is the following:

``` 1. Repeat until all elements are covered by chosen sets:
2.     Choose a set S maximizing (number of not-yet-covered elements in S)/weight(S).
3. Return the chosen sets.
```

thm:The greedy algorithm is a (1+ln(n))-approximation algorithm, where n is the number of elements.
proof: Let OPT denote the total weight of the sets in the minimum-weight cover.

Consider the start of an iteration of the algorithm. Let |S| denote the number of elements in S not yet covered by chosen sets. Let U denote the number of elements not yet covered by chosen sets at the start of the iteration. Let U' denote the number of elements not yet covered by chosen sets at the end of the iteration. Since it is possible to cover all U elements using sets of total weight at most OPT, there exists a set S' such that weight(S')/|S'| ≤ OPT/U. Thus,the chosen set S satisfies
weight(S)/|S| ≤ OPT/U.
By algebra this implies
U' = U - |S| ≤ (1-weight(S)/OPT) U < exp(-weight(S)/OPT) U.
By induction, it follows that the algorithm maintains the invariant
U < exp(-(total weight of sets chosen so far)/OPT) n .
Thus, before the last set chosen, the total weight of the chosen sets is less than ln(n)OPT. Since the last set chosen costs at most OPT, the total weight of all chosen sets is at most (1+ln(n))OPT.
QED

### Exercises

1. The partial set cover problem is, given a collection of sets and an integer K, to find a minimum number of sets covering all but some K of the elements. Prove that this problem has a (1+ln(n))-approximation algorithm, where n is the number of elements.

2. Given a set of vertices V and a collection of sets of edges E1, E2, …, Em, find a minimum-size collection of the edge sets so that the union of the chosen sets contains a spanning tree. Given a (1+ln(n))-approximation algorithm, where n is the number of vertices.

3. Extend your analyses above to the weighted versions of the problems.

References:
• Chapters 2, 13, 14 of Approximation Algorithms by Vazirani
• Integer and Combinatorial Optimization, by Nemhauser and Wolsey. (see: minimizing a linear function subject to a submodular constraint).

Top | recent changes | Preferences