We study the problem of selecting control clones in DNA array hybridization experiments. The problem arises in the OFRG method for analyzing microbial communities. The OFRG method performs classification of rRNA gene clones using binary fingerprints created from a series of hybridization experiments, where each experiment consists of hybridizing a collection of arrayed clones with a single oligonucleotide probe. This experiment produces analog signals, one for each clone, which then need to be classified, that is, converted into binary values 1 and 0 that represent hybridization and non-hybridization events. In addition to the sample rRNA gene clones, the array contains a number of control clones needed to calibrate the classification procedure of the hybridization signals. These control clones must be selected with care to optimize the classification process. We formulate this as a combinatorial optimization problem called Balanced Covering. We prove that the problem is NP-hard, and we show some results on hardness of approximation. We propose approximation algorithms based on randomized rounding, and we show that, with high probability, our algorithms approximate well the optimum solution. The experimental results confirm that the algorithms find high quality control clones. The algorithms have been implemented and are publicly available as part of the software package called CloneTools.
Journal version of [2007