TOSS

TOSS is a two-phase heuristic algorithm for separating short paired-end reads from different genomes in a metagenomic dataset. We use the observation that most of the l-mers belong to unique genomes when l is sufficiently large. The first phase of the algorithm results in clusters of l-mers each of which belongs to one genome. During the second phase, clusters are merged based on l-mer repeat information. These final clusters are used to assign reads. The algorithm could handle very short reads and sequencing errors. It is initially designed for genomes with similar abundance levels and then extended to handle arbitrary abundance ratios.

Reference: Olga Tanaseichuk, James Borneman and Tao Jiang. Separating Metagenomic Short Reads into Genomes via Clustering (Extended Abstract). 11th Workshop on Algorithms for Bioinformatics (WABI 2011). Download

TOSS 1.0: Download