Introduction
MetaPhyl is a supervised classification method for metagenomic samples that takes advantage of the natural structure of microbial community data encoded by phylogenetic trees.
Downloads
Binaries:
Linux ,
Mac OSX.
The C++ source code is available here.
The script to generate syntetic datasets for MetaPhyl is here.
Instructions
Input data files and formats
OTU count data
Data file that has OTU count information for each sample.
1 x1_1 x1_2 ... x1_n
2 x2_1 x2_2 ... x2_n
...
N xN_1 xN_2 ... xN_n
Here N is the number of samples,
n - number of OTUs,
xi_j is the number of reads in the i-th sample that belong to the j-th OTU.
Class labels
Data file that contains class labels for each sample.
> 1
s1_1 s1_2 ...
> 2
s2_1 s2_2 ...
...
> K
sK_1 sK_2 ...
Here K is the number of classes,
sk_1 sk_2 ... is a list of samples that belong to the k-th class
Phylogenetic tree
Data file that contains the phylogenetic tree for the n OTUs in a Newick format.
OTUs must be numbered from 0 to n-1.
Command Line Options
OPTION |
|
DESCRIPTION |
-help |
|
Print help message |
MetaPhyl can be run in two modes: training and testing (or classification of
new samples).
|
-train |
|
Training mode. |
-test |
|
Testing mode. |
-d Data file that has OTU count information for each
sample (training mode).
-l Class labels file (training mode).
-d |
|
Data file that has OTU count information for each sample (training mode). |
-l |
|
Class labels file (training mode). |
-t |
|
Phylogenetic tree file in the Newick format (training mode). |
-o |
|
Output file (training and testing modes). |
-c |
|
Input file for the testing mode produced during training phase. |
-w |
|
Weight parameter (training mode), value from 0 to 1. |
-lambda |
|
Regularization parameter (training mode). |
Examples
To train the model:
./MetaPhyl -train -d example/samples.txt -t example/tree.tre -l example/labels.txt -w 0.5 -lambda 1 -o example/out.txt
To classify new samples:
./MetaPhyl -test -d example/samples.txt -c example/out.txt -o example/result.txt