# Set cover problem

The set cover problem is a classical question in combinatorics, computer science, operations research, and complexity theory. It is one of Karp's 21 NP-complete problems shown to be NP-complete in 1972.

It is a problem "whose study has led to the development of fundamental techniques for the entire field" of approximation algorithms.[1]

Given a set of elements ${\displaystyle \{1,2,...,n\}}$ (called the universe) and a collection ${\displaystyle S}$ of ${\displaystyle m}$ sets whose union equals the universe, the set cover problem is to identify the smallest sub-collection of ${\displaystyle S}$ whose union equals the universe. For example, consider the universe ${\displaystyle U=\{1,2,3,4,5\}}$ and the collection of sets ${\displaystyle S=\{\{1,2,3\},\{2,4\},\{3,4\},\{4,5\}\}}$. Clearly the union of ${\displaystyle S}$ is ${\displaystyle U}$. However, we can cover all of the elements with the following, smaller number of sets: ${\displaystyle \{\{1,2,3\},\{4,5\}\}}$.

More formally, given a universe ${\displaystyle {\mathcal {U}}}$ and a family ${\displaystyle {\mathcal {S}}}$ of subsets of ${\displaystyle {\mathcal {U}}}$, a cover is a subfamily ${\displaystyle {\mathcal {C}}\subseteq {\mathcal {S}}}$ of sets whose union is ${\displaystyle {\mathcal {U}}}$. In the set covering decision problem, the input is a pair ${\displaystyle ({\mathcal {U}},{\mathcal {S}})}$ and an integer ${\displaystyle k}$; the question is whether there is a set covering of size ${\displaystyle k}$ or less. In the set covering optimization problem, the input is a pair ${\displaystyle ({\mathcal {U}},{\mathcal {S}})}$, and the task is to find a set covering that uses the fewest sets.

The decision version of set covering is NP-complete, and the optimization/search version of set cover is NP-hard.[2]

If each set is assigned a cost, it becomes a weighted set cover problem.

## Integer linear program formulation

The minimum set cover problem can be formulated as the following integer linear program (ILP).[3]

 minimize ${\displaystyle \sum _{S\in {\mathcal {S}}}x_{S}}$ (minimize the number of sets) subject to ${\displaystyle \sum _{S\colon e\in S}x_{S}\geqslant 1}$ for all ${\displaystyle e\in {\mathcal {U}}}$ (cover every element of the universe) ${\displaystyle x_{S}\in \{0,1\}}$ for all ${\displaystyle S\in {\mathcal {S}}}$. (every set is either in the set cover or not)

This ILP belongs to the more general class of ILPs for covering problems. The integrality gap of this ILP is at most ${\displaystyle \scriptstyle \log n}$, so its relaxation gives a factor-${\displaystyle \scriptstyle \log n}$ approximation algorithm for the minimum set cover problem (where ${\displaystyle \scriptstyle n}$ is the size of the universe).[4]

In weighted set cover, the sets are assigned weights. Denote the weight of set ${\displaystyle S\in {\mathcal {S}}}$ by ${\displaystyle w_{S}}$. Then the integer linear program describing weighted set cover is identical to the one given above, except that the objective function to minimize is ${\displaystyle \sum _{S\in {\mathcal {S}}}w_{S}x_{S}}$.

## Hitting set formulation

Set covering is equivalent to the hitting set problem. That is seen by observing that an instance of set covering can be viewed as an arbitrary bipartite graph, with sets represented by vertices on the left, the universe represented by vertices on the right, and edges representing the inclusion of elements in sets. The task is then to find a minimum cardinality subset of left-vertices which covers all of the right-vertices. In the Hitting set problem, the objective is to cover the left-vertices using a minimum subset of the right vertices. Converting from one problem to the other is therefore achieved by interchanging the two sets of vertices.

## Greedy algorithm

There is a greedy algorithm for polynomial time approximation of set covering that chooses sets according to one rule: at each stage, choose the set that contains the largest number of uncovered elements. It can be shown[5] that this algorithm achieves an approximation ratio of ${\displaystyle H(s)}$, where ${\displaystyle s}$ is the size of the set to be covered. In other words, it finds a covering that may be ${\displaystyle H(n)}$ times as large as the minimum one, where ${\displaystyle H(n)}$ is the ${\displaystyle n}$-th harmonic number:

${\displaystyle H(n)=\sum _{k=1}^{n}{\frac {1}{k}}\leq \ln {n}+1}$

This greedy algorithm actually achieves an approximation ratio of ${\displaystyle H(s^{\prime })}$ where ${\displaystyle s^{\prime }}$ is the maximum cardinality set of ${\displaystyle S}$. For ${\displaystyle \delta -}$dense instances, however, there exists a ${\displaystyle c\ln {m}}$-approximation algorithm for every ${\displaystyle c>0}$.[6]

There is a standard example on which the greedy algorithm achieves an approximation ratio of ${\displaystyle \log _{2}(n)/2}$. The universe consists of ${\displaystyle n=2^{(k+1)}-2}$ elements. The set system consists of ${\displaystyle k}$ pairwise disjoint sets ${\displaystyle S_{1},\ldots ,S_{k}}$ with sizes ${\displaystyle 2,4,8,\ldots ,2^{k}}$ respectively, as well as two additional disjoint sets ${\displaystyle T_{0},T_{1}}$, each of which contains half of the elements from each ${\displaystyle S_{i}}$. On this input, the greedy algorithm takes the sets ${\displaystyle S_{k},\ldots ,S_{1}}$, in that order, while the optimal solution consists only of ${\displaystyle T_{0}}$ and ${\displaystyle T_{1}}$. An example of such an input for ${\displaystyle k=3}$ is pictured on the right.

Inapproximability results show that the greedy algorithm is essentially the best-possible polynomial time approximation algorithm for set cover up to lower order terms (see Inapproximability results below), under plausible complexity assumptions. A tighter analysis for the greedy algorithm shows that the approximation ratio is exactly ${\displaystyle \ln {n}-\ln {\ln {n}}+\Theta (1)}$.[7]

## Low-frequency systems

If each element occurs in at most f sets, then a solution can be found in polynomial time that approximates the optimum to within a factor of f using LP relaxation.

If the constraint ${\displaystyle x_{S}\in \{0,1\}}$ is replaced by ${\displaystyle x_{S}\geq 0}$ for all S in ${\displaystyle {\mathcal {S}}}$ in the integer linear program shown above, then it becomes a (non-integer) linear program L. The algorithm can be described as follows:

1. Find an optimal solution O for the program L using some polynomial-time method of solving linear programs.
2. Pick all sets S for which the corresponding variable xS has value at least 1/f in the solution O.[8]

## Inapproximability results

When ${\displaystyle n}$ refers to the size of the universe, Lund & Yannakakis (1994) showed that set covering cannot be approximated in polynomial time to within a factor of ${\displaystyle {\tfrac {1}{2}}\log _{2}{n}\approx 0.72\ln {n}}$, unless NP has quasi-polynomial time algorithms. Feige (1998) improved this lower bound to ${\displaystyle {\bigl (}1-o(1){\bigr )}\cdot \ln {n}}$ under the same assumptions, which essentially matches the approximation ratio achieved by the greedy algorithm. Raz & Safra (1997) established a lower bound of ${\displaystyle c\cdot \ln {n}}$, where ${\displaystyle c}$ is a certain constant, under the weaker assumption that P${\displaystyle \not =}$NP. A similar result with a higher value of ${\displaystyle c}$ was recently proved by Alon, Moshkovitz & Safra (2006). Dinur & Steurer (2013) showed optimal inapproximability by proving that it cannot be approximated to ${\displaystyle {\bigl (}1-o(1){\bigr )}\cdot \ln {n}}$ unless P${\displaystyle =}$NP.

## Weighted set cover

Relaxing the integer linear program for weighted set cover stated above, one may use randomized rounding to get an ${\displaystyle O(\log n)}$-factor approximation. The corresponding analysis for nonweighted set cover is outlined in Randomized rounding#Randomized-rounding algorithm for set cover and can be adapted to the weighted case.[9]

• Hitting set is an equivalent reformulation of Set Cover.
• Vertex cover is a special case of Hitting Set.
• Edge cover is a special case of Set Cover.
• Geometric set cover is a special case of Set Cover when the universe is a set of points in ${\displaystyle \mathbb {R} ^{d}}$ and the sets are induced by the intersection of the universe and geometric shapes (e.g., disks, rectangles).
• Set packing
• Maximum coverage problem is to choose at most k sets to cover as many elements as possible.
• Dominating set is the problem of selecting a set of vertices (the dominating set) in a graph such that all other vertices are adjacent to at least one vertex in the dominating set. The Dominating set problem was shown to be NP complete through a reduction from Set cover.
• Exact cover problem is to choose a set cover with no element included in more than one covering set.

## Notes

1. Vazirani (2001, p. 15)
2. Korte & Vygen 2012, p. 414.
3. Vazirani (2001, p. 108)
4. Vazirani (2001, pp. 110–112)
5. Chvatal, V. A Greedy Heuristic for the Set-Covering Problem. Mathematics of Operations Research Vol. 4, No. 3 (Aug., 1979), pp. 233-235
6. Slavík Petr A tight analysis of the greedy algorithm for set cover. STOC'96, Pages 435-441, doi:10.1145/237814.237991
7. Vazirani (2001, pp. 118–119)
8. Vazirani (2001, Chapter 14)

## References

• Alon, Noga; Moshkovitz, Dana; Safra, Shmuel (2006), "Algorithmic construction of sets for k-restrictions", ACM Trans. Algorithms, 2 (2): 153–177, CiteSeerX 10.1.1.138.8682, doi:10.1145/1150334.1150336, ISSN 1549-6325.
• Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001), Introduction to Algorithms, Cambridge, Mass.: MIT Press and McGraw-Hill, pp. 1033–1038, ISBN 978-0-262-03293-3
• Feige, Uriel (1998), "A threshold of ln n for approximating set cover", Journal of the ACM, 45 (4): 634–652, CiteSeerX 10.1.1.70.5014, doi:10.1145/285055.285059, ISSN 0004-5411.
• Karpinski, Marek; Zelikovsky, Alexander (1998), Approximating dense cases of covering problems, 40, pp. 169–178, ISBN 9780821870846
• Lund, Carsten; Yannakakis, Mihalis (1994), "On the hardness of approximating minimization problems", Journal of the ACM, 41 (5): 960–981, doi:10.1145/185675.306789, ISSN 0004-5411.
• Raz, Ran; Safra, Shmuel (1997), "A sub-constant error-probability low-degree test, and a sub-constant error-probability PCP characterization of NP", STOC '97: Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, ACM, pp. 475–484, ISBN 978-0-89791-888-6.
• Dinur, Irit; Steurer, David (2013), "Analytical approach to parallel repetition", STOC '14: Proceedings of the forty-sixth annual ACM symposium on Theory of computing, ACM, pp. 624–633.
• Vazirani, Vijay V. (2001), Approximation Algorithms (PDF), Springer-Verlag, ISBN 978-3-540-65367-7
• Korte, Bernhard; Vygen, Jens (2012), Combinatorial Optimization: Theory and Algorithms (5 ed.), Springer, ISBN 978-3-642-24487-2
• Cardoso, Nuno; Abreu, Rui (2014), An Efficient Distributed Algorithm for Computing Minimal Hitting Sets (PDF), Graz, Austria, doi:10.5281/zenodo.10037