MetaPhyl



Introduction

MetaPhyl is a supervised classification method for metagenomic samples that takes advantage of the natural structure of microbial community data encoded by phylogenetic trees.

Downloads

Binaries: Linux , Mac OSX.

The C++ source code is available here.

The script to generate syntetic datasets for MetaPhyl is here.

Instructions

Input data files and formats

OTU count data

Data file that has OTU count information for each sample.

1 x1_1 x1_2 ... x1_n

2 x2_1 x2_2 ... x2_n

...

N xN_1 xN_2 ... xN_n

Here N is the number of samples, n - number of OTUs, xi_j is the number of reads in the i-th sample that belong to the j-th OTU.

Class labels

Data file that contains class labels for each sample.

> 1

s1_1 s1_2 ...

> 2

s2_1 s2_2 ...

...

> K

sK_1 sK_2 ...

Here K is the number of classes, sk_1 sk_2 ... is a list of samples that belong to the k-th class

Phylogenetic tree

Data file that contains the phylogenetic tree for the n OTUs in a Newick format. OTUs must be numbered from 0 to n-1.

Command Line Options

-d Data file that has OTU count information for each sample (training mode). -l Class labels file (training mode).
OPTION DESCRIPTION
-help Print help message
MetaPhyl can be run in two modes: training and testing (or classification of new samples).
-train Training mode.
-test Testing mode.
-d Data file that has OTU count information for each sample (training mode).
-l Class labels file (training mode).
-t Phylogenetic tree file in the Newick format (training mode).
-o Output file (training and testing modes).
-c Input file for the testing mode produced during training phase.
-w Weight parameter (training mode), value from 0 to 1.
-lambda Regularization parameter (training mode).

Examples

To train the model:

./MetaPhyl -train -d example/samples.txt -t example/tree.tre -l example/labels.txt -w 0.5 -lambda 1 -o example/out.txt

To classify new samples:

./MetaPhyl -test -d example/samples.txt -c example/out.txt -o example/result.txt